SlideShare a Scribd company logo
1 of 14
Rethinking
The Storm 2.0 Worker
Roshan Naik
HORTONWORKS
April 2017, Storm & Kafka Meetup
Santa Clara, CA
Present : Storm 1.x
• Has matured into a stable and reliable system
• Widely deployed and holding up well in production
• Lots of new competition
• Differentiating on Features, Performance, Ease of Use etc.
Performance in 2.0
• How do we know if a streaming system is “fast”?
• Faster than another system ?
• What about Hardware potential ?
• See analysis in STORM-2284
• Dimensions
• Throughput
• Latency
• Resource utilization: CPU/Network/Memory/Disk/Power
• STORM-2284
• https://issues.apache.org/jira/browse/STORM-2284
Overview of Proposed Enhancements
• https://issues.apache.org/jira/browse/STORM-2284
Areas critical to Performance
• Messaging System
• Need Bounded Concurrent Queues that operate as fast as hardware allows
• Lock based queues not an option
• Lock free queues or preferably Wait-free queues
• Threading Model
• Fewer Threads. Less synchronization.
• Dedicated threads instead of pooled threads.
• CPU Pinning.
• Memory Model
• Lowering GC Pressure: Recycling Objects
• Reducing CPU cache faults: Controlling Object Layout (contiguous allocation)
Messaging Architecture
Messaging - Current Architecture
ArrayList: Current Batch
CLQ : OVERFLOW
BATCHER
Disruptor Q
Flusher
Thread
Send
Thread
SEND QRECEIVE Q
ArrayList: Current Batch
CLQ : OVERFLOW
BATCHER
Disruptor Q
Bolt
Executor
Thread
(user logic)
publish
Flusher
Thread
ArrayList
ArrayList
Worker’s
Outbound Q
A local Executor’s
RECEIVE Q
S
E
N
D
T
H
R
E
A
D
local
remote
Messaging - New Architecture
ArrayList: Current Batch
CLQ : OVERFLOW
BATCHER
Disruptor Q
Flusher
Thread
Send
Thread
SEND QRECEIVE Q
ArrayList: Current Batch
CLQ : OVERFLOW
BATCHER
Disruptor Q
Bolt
Executor
Thread
(user logic)
publish
Flusher
Thread
ArrayList
ArrayList
Worker’s
Outbound Q
A local Executor’s
RECEIVE Q
S
E
N
D
T
H
R
E
A
D
local
remote
Messaging - New Architecture
RECEIVE Q
ArrayList: Current Batch
BATCHER
JCTools Q
Bolt
Executor
Thread
(user logic)
publish
A local Executor’s
RECEIVE Q
Worker’s
Outbound Q
local
remote
Preliminary Numbers
LATENCY
• 1 spout --> 1 bolt with 1 ACKer (all in same worker)
• v1.0.1 : 3.4 milliseconds
• v2.0 master: 7 milliseconds
• v2.0 redesigned : 60-100 micro seconds (116x improvement)
Preliminary Numbers
THROUGHPUT
• 1 spout --> 1 bolt [w/o ACKing]
• v1.0.1 : ?
• v2.0 master: 3.3 million /sec
• v2.0 redesigned : 5 million /sec (50% improvement)
• 1 spout --> 1 bolt [with ACKing]
• v1.0 : 233 K /sec
• v2.0 master: 900 k/sec
• v2.0 redesigned : 1 million /sec (no change)
Observations
• Latency: Dramatically improved.
• Throughput: Discovered multiple bottlenecks preventing significantly
higher throughput.
• Grouping: Bottlenecks in LocalShuffle & FieldsGrouping if addressed along
with some others, throughput can reach ~7 million/sec.
• TumpleImpl : If inefficiencies here are addressed, throughput can reach 15
mill/sec.
• ACK-ing : ACKer bolt currently maxing out at ~ 2.5 million ACKs / sec.
Limitation with implementation not with concept. I see room for ACKer
specific fixes that can also substantially improve its throughput.
WORKER THD
• Start/Stop/Monitor
Executors
• Manage Metrics
• Topology Reconfig
• Heartbeat
Executor (Thd)
grouper
Task
(Bolt)Q
counters
Executor (Thd)
System Task
(Inter host
Input)
Executor (Thd)
Sys Task
(Outbound
Msgs)
Q
counters
New Threading & Execution Model
(STORM-2307)
Executor (Thd)
System Task
(Intra host
Input)
Executor (Thd)
(grouper)
(Bolt)
Task
(Spout/Bolt)Q
counters
WORKER PROCESS
Questions
• References
• https://issues.apache.org/jira/browse/STORM-2284

More Related Content

What's hot

Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Christopher Curtin
 
Capture the Streams of Database Changes
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changesconfluent
 
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Multi cluster, multitenant and hierarchical kafka messaging service   slideshareMulti cluster, multitenant and hierarchical kafka messaging service   slideshare
Multi cluster, multitenant and hierarchical kafka messaging service slideshareAllen (Xiaozhong) Wang
 
A la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIAA la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIALa Cuisine du Web
 
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean FellowsDeploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean Fellowsconfluent
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structuresconfluent
 
I Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaI Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaJay Kreps
 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 
Openzipkin conf: Zipkin at Yelp
Openzipkin conf: Zipkin at YelpOpenzipkin conf: Zipkin at Yelp
Openzipkin conf: Zipkin at YelpPrateek Agarwal
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-CamusDeep Shah
 
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019confluent
 
Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...
Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...
Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...confluent
 
Introducing Exactly Once Semantics To Apache Kafka
Introducing Exactly Once Semantics To Apache KafkaIntroducing Exactly Once Semantics To Apache Kafka
Introducing Exactly Once Semantics To Apache KafkaApurva Mehta
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelinesSumant Tambe
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...Lucas Jellema
 
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...confluent
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache KafkaJoe Stein
 
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...HostedbyConfluent
 

What's hot (20)

Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
Capture the Streams of Database Changes
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changes
 
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Multi cluster, multitenant and hierarchical kafka messaging service   slideshareMulti cluster, multitenant and hierarchical kafka messaging service   slideshare
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
 
A la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIAA la rencontre de Kafka, le log distribué par Florian GARCIA
A la rencontre de Kafka, le log distribué par Florian GARCIA
 
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean FellowsDeploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
 
I Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaI Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache Kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Openzipkin conf: Zipkin at Yelp
Openzipkin conf: Zipkin at YelpOpenzipkin conf: Zipkin at Yelp
Openzipkin conf: Zipkin at Yelp
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-Camus
 
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
 
Kafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backboneKafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backbone
 
Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...
Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...
Discover Kafka on OpenShift: Processing Real-Time Financial Events at Scale (...
 
Introducing Exactly Once Semantics To Apache Kafka
Introducing Exactly Once Semantics To Apache KafkaIntroducing Exactly Once Semantics To Apache Kafka
Introducing Exactly Once Semantics To Apache Kafka
 
Tuning kafka pipelines
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
 
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
 

Similar to Storm worker redesign

Springone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorSpringone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorStéphane Maldini
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Lucidworks
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProChester Chen
 
Toward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackToward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackTon Ngo
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging振东 刘
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexApache Apex
 
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel Zikmund
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel ZikmundWUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel Zikmund
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel ZikmundKarel Zikmund
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitinbloomreacheng
 
Mail Search As A Sercive: Presented by Rishi Easwaran, Aol
Mail Search As A Sercive: Presented by Rishi Easwaran, AolMail Search As A Sercive: Presented by Rishi Easwaran, Aol
Mail Search As A Sercive: Presented by Rishi Easwaran, AolLucidworks
 
Flink Forward Berlin 2018: Nico Kruber - "Improving throughput and latency wi...
Flink Forward Berlin 2018: Nico Kruber - "Improving throughput and latency wi...Flink Forward Berlin 2018: Nico Kruber - "Improving throughput and latency wi...
Flink Forward Berlin 2018: Nico Kruber - "Improving throughput and latency wi...Flink Forward
 
Kubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewLei (Harry) Zhang
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Nitin S
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationNitin Sharma
 
Apache Hadoop YARN State of the Union
Apache Hadoop YARN State of the UnionApache Hadoop YARN State of the Union
Apache Hadoop YARN State of the UnionWeiwei Yang
 

Similar to Storm worker redesign (20)

Springone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorSpringone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and Reactor
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
 
Building a Router
Building a RouterBuilding a Router
Building a Router
 
Corralling Big Data at TACC
Corralling Big Data at TACCCorralling Big Data at TACC
Corralling Big Data at TACC
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
Toward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackToward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStack
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel Zikmund
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel ZikmundWUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel Zikmund
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel Zikmund
 
Sharam salamian
Sharam salamianSharam salamian
Sharam salamian
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
 
Mail Search As A Sercive: Presented by Rishi Easwaran, Aol
Mail Search As A Sercive: Presented by Rishi Easwaran, AolMail Search As A Sercive: Presented by Rishi Easwaran, Aol
Mail Search As A Sercive: Presented by Rishi Easwaran, Aol
 
Flink Forward Berlin 2018: Nico Kruber - "Improving throughput and latency wi...
Flink Forward Berlin 2018: Nico Kruber - "Improving throughput and latency wi...Flink Forward Berlin 2018: Nico Kruber - "Improving throughput and latency wi...
Flink Forward Berlin 2018: Nico Kruber - "Improving throughput and latency wi...
 
Kubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical ViewKubernetes Walk Through from Technical View
Kubernetes Walk Through from Technical View
 
Play With Streams
Play With StreamsPlay With Streams
Play With Streams
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin Presentation
 
Apache Hadoop YARN State of the Union
Apache Hadoop YARN State of the UnionApache Hadoop YARN State of the Union
Apache Hadoop YARN State of the Union
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 

Recently uploaded

Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityScyllaDB
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandIES VE
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastUXDXConf
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024Stephen Perrenod
 

Recently uploaded (20)

Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 

Storm worker redesign

  • 1. Rethinking The Storm 2.0 Worker Roshan Naik HORTONWORKS April 2017, Storm & Kafka Meetup Santa Clara, CA
  • 2. Present : Storm 1.x • Has matured into a stable and reliable system • Widely deployed and holding up well in production • Lots of new competition • Differentiating on Features, Performance, Ease of Use etc.
  • 3. Performance in 2.0 • How do we know if a streaming system is “fast”? • Faster than another system ? • What about Hardware potential ? • See analysis in STORM-2284 • Dimensions • Throughput • Latency • Resource utilization: CPU/Network/Memory/Disk/Power • STORM-2284 • https://issues.apache.org/jira/browse/STORM-2284
  • 4. Overview of Proposed Enhancements • https://issues.apache.org/jira/browse/STORM-2284
  • 5. Areas critical to Performance • Messaging System • Need Bounded Concurrent Queues that operate as fast as hardware allows • Lock based queues not an option • Lock free queues or preferably Wait-free queues • Threading Model • Fewer Threads. Less synchronization. • Dedicated threads instead of pooled threads. • CPU Pinning. • Memory Model • Lowering GC Pressure: Recycling Objects • Reducing CPU cache faults: Controlling Object Layout (contiguous allocation)
  • 7. Messaging - Current Architecture ArrayList: Current Batch CLQ : OVERFLOW BATCHER Disruptor Q Flusher Thread Send Thread SEND QRECEIVE Q ArrayList: Current Batch CLQ : OVERFLOW BATCHER Disruptor Q Bolt Executor Thread (user logic) publish Flusher Thread ArrayList ArrayList Worker’s Outbound Q A local Executor’s RECEIVE Q S E N D T H R E A D local remote
  • 8. Messaging - New Architecture ArrayList: Current Batch CLQ : OVERFLOW BATCHER Disruptor Q Flusher Thread Send Thread SEND QRECEIVE Q ArrayList: Current Batch CLQ : OVERFLOW BATCHER Disruptor Q Bolt Executor Thread (user logic) publish Flusher Thread ArrayList ArrayList Worker’s Outbound Q A local Executor’s RECEIVE Q S E N D T H R E A D local remote
  • 9. Messaging - New Architecture RECEIVE Q ArrayList: Current Batch BATCHER JCTools Q Bolt Executor Thread (user logic) publish A local Executor’s RECEIVE Q Worker’s Outbound Q local remote
  • 10. Preliminary Numbers LATENCY • 1 spout --> 1 bolt with 1 ACKer (all in same worker) • v1.0.1 : 3.4 milliseconds • v2.0 master: 7 milliseconds • v2.0 redesigned : 60-100 micro seconds (116x improvement)
  • 11. Preliminary Numbers THROUGHPUT • 1 spout --> 1 bolt [w/o ACKing] • v1.0.1 : ? • v2.0 master: 3.3 million /sec • v2.0 redesigned : 5 million /sec (50% improvement) • 1 spout --> 1 bolt [with ACKing] • v1.0 : 233 K /sec • v2.0 master: 900 k/sec • v2.0 redesigned : 1 million /sec (no change)
  • 12. Observations • Latency: Dramatically improved. • Throughput: Discovered multiple bottlenecks preventing significantly higher throughput. • Grouping: Bottlenecks in LocalShuffle & FieldsGrouping if addressed along with some others, throughput can reach ~7 million/sec. • TumpleImpl : If inefficiencies here are addressed, throughput can reach 15 mill/sec. • ACK-ing : ACKer bolt currently maxing out at ~ 2.5 million ACKs / sec. Limitation with implementation not with concept. I see room for ACKer specific fixes that can also substantially improve its throughput.
  • 13. WORKER THD • Start/Stop/Monitor Executors • Manage Metrics • Topology Reconfig • Heartbeat Executor (Thd) grouper Task (Bolt)Q counters Executor (Thd) System Task (Inter host Input) Executor (Thd) Sys Task (Outbound Msgs) Q counters New Threading & Execution Model (STORM-2307) Executor (Thd) System Task (Intra host Input) Executor (Thd) (grouper) (Bolt) Task (Spout/Bolt)Q counters WORKER PROCESS