SlideShare a Scribd company logo

Storm - SpaaS

Slides about Storm platform presented in JUG.

1 of 47
Download to read offline
SpaaS*
* Stream processing as a service with Apache Storm
Ernestas Vaiciukevičius
Birth of the platform
Birth of the platform
Legacy solution issues:
Delays
Resource utilization
Storage for temp data
Hard to scale
Not fault tolerant
Licenses
Batch based
Gradually refactoring old solution
Birth of the platform
Birth of the platform
Storm
Kafka
Our Storm cluster became generic enough to be
offered as a service to other teams.
Just needed to address a few points:
• Simpler scaling
• Resource isolation
Birth of the platform
Storm
Ad

Recommended

Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Multi-tenant Apache Storm as a service
Multi-tenant Apache Storm as a serviceMulti-tenant Apache Storm as a service
Multi-tenant Apache Storm as a serviceRobert Evans
 
Storm: Distributed and fault tolerant realtime computation
Storm: Distributed and fault tolerant realtime computationStorm: Distributed and fault tolerant realtime computation
Storm: Distributed and fault tolerant realtime computationFerran Galí Reniu
 
Storm presentation
Storm presentationStorm presentation
Storm presentationShyam Raj
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleDung Ngua
 

More Related Content

What's hot

Build a custom metrics on aws cloud
Build a custom metrics on aws cloudBuild a custom metrics on aws cloud
Build a custom metrics on aws cloudAhmad karawash
 
Pushing Python: Building a High Throughput, Low Latency System
Pushing Python: Building a High Throughput, Low Latency SystemPushing Python: Building a High Throughput, Low Latency System
Pushing Python: Building a High Throughput, Low Latency SystemKevin Ballard
 
Spark vs storm
Spark vs stormSpark vs storm
Spark vs stormTrong Ton
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQXin Wang
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaAndrew Montalenti
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormMd. Shamsur Rahim
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time ComputationSonal Raj
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.DECK36
 
Analysis big data by use php with storm
Analysis big data by use php with stormAnalysis big data by use php with storm
Analysis big data by use php with storm毅 吕
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to heroAvi Levi
 
Using Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems EasyUsing Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems Easynathanmarz
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Robert Evans
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationUday Vakalapudi
 
Kafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User GroupKafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User GroupJeff Holoman
 

What's hot (20)

Build a custom metrics on aws cloud
Build a custom metrics on aws cloudBuild a custom metrics on aws cloud
Build a custom metrics on aws cloud
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Kafka ops-new
Kafka ops-newKafka ops-new
Kafka ops-new
 
Pushing Python: Building a High Throughput, Low Latency System
Pushing Python: Building a High Throughput, Low Latency SystemPushing Python: Building a High Throughput, Low Latency System
Pushing Python: Building a High Throughput, Low Latency System
 
Spark vs storm
Spark vs stormSpark vs storm
Spark vs storm
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQ
 
Introduction to Apache Storm
Introduction to Apache StormIntroduction to Apache Storm
Introduction to Apache Storm
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and Kafka
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
 
Apache Storm Internals
Apache Storm InternalsApache Storm Internals
Apache Storm Internals
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.
 
Analysis big data by use php with storm
Analysis big data by use php with stormAnalysis big data by use php with storm
Analysis big data by use php with storm
 
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 
Kafka zero to hero
Kafka zero to heroKafka zero to hero
Kafka zero to hero
 
Using Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems EasyUsing Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems Easy
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Apache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integrationApache Storm and twitter Streaming API integration
Apache Storm and twitter Streaming API integration
 
Kafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User GroupKafka Reliability Guarantees ATL Kafka User Group
Kafka Reliability Guarantees ATL Kafka User Group
 

Viewers also liked

How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheLeslie Samuel
 
Data center outsourcing a new paradigm for the IT
Data center outsourcing a new paradigm for the ITData center outsourcing a new paradigm for the IT
Data center outsourcing a new paradigm for the ITAlessandro Guli
 
Melt iron heterogeneous computing - lspe v3
Melt iron   heterogeneous computing - lspe v3Melt iron   heterogeneous computing - lspe v3
Melt iron heterogeneous computing - lspe v3Rinka Singh
 
National Weather Service Storm Spotter Training
National Weather Service Storm Spotter TrainingNational Weather Service Storm Spotter Training
National Weather Service Storm Spotter Trainingchowd
 
Autonomous analytics on streaming data
Autonomous analytics on streaming dataAutonomous analytics on streaming data
Autonomous analytics on streaming dataClaudiu Barbura
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...Nathan Bijnens
 
Storage and warehousing
Storage and warehousingStorage and warehousing
Storage and warehousingChandan Singh
 
Linux Interrupts
Linux InterruptsLinux Interrupts
Linux InterruptsKernel TLV
 
Adform webinar: New Features
Adform webinar: New FeaturesAdform webinar: New Features
Adform webinar: New FeaturesAdformMarketing
 
Loppuraportti: ODA-hankkeen kustannus-hyötyanalyysi
Loppuraportti: ODA-hankkeen kustannus-hyötyanalyysiLoppuraportti: ODA-hankkeen kustannus-hyötyanalyysi
Loppuraportti: ODA-hankkeen kustannus-hyötyanalyysiSitra / Hyvinvointi
 
A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013Nathan Bijnens
 
Why Do Givers Give?
Why Do Givers Give?Why Do Givers Give?
Why Do Givers Give?WeDidIt
 
BABYSCANの開発について - 技術面より
BABYSCANの開発について - 技術面よりBABYSCANの開発について - 技術面より
BABYSCANの開発について - 技術面よりRyu Hayano
 
Harness the Power of 21st Century Online Marketing: LinkedIn
Harness the Power of 21st Century Online Marketing: LinkedInHarness the Power of 21st Century Online Marketing: LinkedIn
Harness the Power of 21st Century Online Marketing: LinkedInCatherine Cunningham
 
Introducing Apache Mesos
Introducing Apache MesosIntroducing Apache Mesos
Introducing Apache MesosMatthias Furrer
 
Lp kmb ulkus dm
Lp kmb ulkus dmLp kmb ulkus dm
Lp kmb ulkus dmifaaa
 
Efficient software development with heterogeneous devices
Efficient software development with heterogeneous devicesEfficient software development with heterogeneous devices
Efficient software development with heterogeneous devicesArm
 

Viewers also liked (20)

How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 
Storm
StormStorm
Storm
 
Data center outsourcing a new paradigm for the IT
Data center outsourcing a new paradigm for the ITData center outsourcing a new paradigm for the IT
Data center outsourcing a new paradigm for the IT
 
Melt iron heterogeneous computing - lspe v3
Melt iron   heterogeneous computing - lspe v3Melt iron   heterogeneous computing - lspe v3
Melt iron heterogeneous computing - lspe v3
 
National Weather Service Storm Spotter Training
National Weather Service Storm Spotter TrainingNational Weather Service Storm Spotter Training
National Weather Service Storm Spotter Training
 
Autonomous analytics on streaming data
Autonomous analytics on streaming dataAutonomous analytics on streaming data
Autonomous analytics on streaming data
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
 
Storage and warehousing
Storage and warehousingStorage and warehousing
Storage and warehousing
 
Crock pot mind
Crock pot mindCrock pot mind
Crock pot mind
 
Linux Interrupts
Linux InterruptsLinux Interrupts
Linux Interrupts
 
Adform webinar: New Features
Adform webinar: New FeaturesAdform webinar: New Features
Adform webinar: New Features
 
Loppuraportti: ODA-hankkeen kustannus-hyötyanalyysi
Loppuraportti: ODA-hankkeen kustannus-hyötyanalyysiLoppuraportti: ODA-hankkeen kustannus-hyötyanalyysi
Loppuraportti: ODA-hankkeen kustannus-hyötyanalyysi
 
A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013
 
Why Do Givers Give?
Why Do Givers Give?Why Do Givers Give?
Why Do Givers Give?
 
BABYSCANの開発について - 技術面より
BABYSCANの開発について - 技術面よりBABYSCANの開発について - 技術面より
BABYSCANの開発について - 技術面より
 
Harness the Power of 21st Century Online Marketing: LinkedIn
Harness the Power of 21st Century Online Marketing: LinkedInHarness the Power of 21st Century Online Marketing: LinkedIn
Harness the Power of 21st Century Online Marketing: LinkedIn
 
Introducing Apache Mesos
Introducing Apache MesosIntroducing Apache Mesos
Introducing Apache Mesos
 
Lp kmb ulkus dm
Lp kmb ulkus dmLp kmb ulkus dm
Lp kmb ulkus dm
 
Efficient software development with heterogeneous devices
Efficient software development with heterogeneous devicesEfficient software development with heterogeneous devices
Efficient software development with heterogeneous devices
 
5s audit template ver2
5s audit template ver25s audit template ver2
5s audit template ver2
 

Similar to Storm - SpaaS

The rice and fail of an IoT solution
The rice and fail of an IoT solutionThe rice and fail of an IoT solution
The rice and fail of an IoT solutionRadu Vunvulea
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Bryan Bende
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachAlexandre Rafalovitch
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Lucidworks
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
Eliminating the Pauses in your Java Application
Eliminating the Pauses in your Java ApplicationEliminating the Pauses in your Java Application
Eliminating the Pauses in your Java ApplicationMark Stoodley
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly SolarWinds Loggly
 
Project Deimos
Project DeimosProject Deimos
Project DeimosSimon Suo
 
Cleveland HUG - Storm
Cleveland HUG - StormCleveland HUG - Storm
Cleveland HUG - Stormjustinjleet
 
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...srisatish ambati
 
Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)Ilya Ganelin
 
Akka london scala_user_group
Akka london scala_user_groupAkka london scala_user_group
Akka london scala_user_groupSkills Matter
 
Springone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorSpringone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorStéphane Maldini
 
3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...Timothy McCormick
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Evan Chan
 
OpenStack: Toward a More Resilient Cloud
OpenStack: Toward a More Resilient CloudOpenStack: Toward a More Resilient Cloud
OpenStack: Toward a More Resilient CloudMark Voelker
 
Cassandra
CassandraCassandra
Cassandraexsuns
 

Similar to Storm - SpaaS (20)

The rice and fail of an IoT solution
The rice and fail of an IoT solutionThe rice and fail of an IoT solution
The rice and fail of an IoT solution
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Eliminating the Pauses in your Java Application
Eliminating the Pauses in your Java ApplicationEliminating the Pauses in your Java Application
Eliminating the Pauses in your Java Application
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
 
Project Deimos
Project DeimosProject Deimos
Project Deimos
 
Cleveland HUG - Storm
Cleveland HUG - StormCleveland HUG - Storm
Cleveland HUG - Storm
 
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
 
Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)
 
Akka london scala_user_group
Akka london scala_user_groupAkka london scala_user_group
Akka london scala_user_group
 
Springone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorSpringone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and Reactor
 
Tech4Africa 2014
Tech4Africa 2014Tech4Africa 2014
Tech4Africa 2014
 
3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
 
OpenStack: Toward a More Resilient Cloud
OpenStack: Toward a More Resilient CloudOpenStack: Toward a More Resilient Cloud
OpenStack: Toward a More Resilient Cloud
 
Cassandra
CassandraCassandra
Cassandra
 

Recently uploaded

Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)GDSCNiT
 
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdf
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdfAUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdf
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdfAutokey
 
sql ppt for students who preparing for sql
sql ppt for students who preparing for sqlsql ppt for students who preparing for sql
sql ppt for students who preparing for sqlbharatjanadharwarud
 
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ..."Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...ISPMAIndia
 
Software Testing life cycle (STLC) Importance, Phases, Benefits...
Software Testing life cycle (STLC) Importance, Phases, Benefits...Software Testing life cycle (STLC) Importance, Phases, Benefits...
Software Testing life cycle (STLC) Importance, Phases, Benefits...Flexsin
 
Self scaling Multi cloud nomad workloads
Self scaling Multi cloud nomad workloadsSelf scaling Multi cloud nomad workloads
Self scaling Multi cloud nomad workloadsBram Vogelaar
 
Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024Asher Sterkin
 
No more Dockerfiles? Buildpacks to help you ship your image!
No more Dockerfiles? Buildpacks to help you ship your image!No more Dockerfiles? Buildpacks to help you ship your image!
No more Dockerfiles? Buildpacks to help you ship your image!Anthony Dahanne
 
maximum subarray ppt for killing camp students
maximum subarray ppt for killing camp studentsmaximum subarray ppt for killing camp students
maximum subarray ppt for killing camp studentsssuser82c38d
 
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...emili denli
 
Getting Started with Trello for Beginners.pptx
Getting Started with Trello for Beginners.pptxGetting Started with Trello for Beginners.pptx
Getting Started with Trello for Beginners.pptxmavinoikein
 
Les02 Restricting and Sorting Data using SQL.ppt
Les02 Restricting and Sorting Data using SQL.pptLes02 Restricting and Sorting Data using SQL.ppt
Les02 Restricting and Sorting Data using SQL.pptDrZeeshanBhatti
 
Embracing Change - The Impact of Generative AI on Strategic Portfolio Management
Embracing Change - The Impact of Generative AI on Strategic Portfolio ManagementEmbracing Change - The Impact of Generative AI on Strategic Portfolio Management
Embracing Change - The Impact of Generative AI on Strategic Portfolio ManagementOnePlan Solutions
 
P1 Inspection Types in Municity 5 Smartsheet
P1 Inspection Types in Municity 5 SmartsheetP1 Inspection Types in Municity 5 Smartsheet
P1 Inspection Types in Municity 5 SmartsheetMatthewTHawley
 
Sql server types of joins with example.pptx
Sql server types of joins with example.pptxSql server types of joins with example.pptx
Sql server types of joins with example.pptxsameer gaikwad
 
killingcamp longest common subsequence.pdf
killingcamp longest common subsequence.pdfkillingcamp longest common subsequence.pdf
killingcamp longest common subsequence.pdfssuser82c38d
 
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...ISPMAIndia
 
The Age of AI: Elevating Experiences & Delivering Customer Value!
The Age of AI: Elevating Experiences & Delivering Customer Value!The Age of AI: Elevating Experiences & Delivering Customer Value!
The Age of AI: Elevating Experiences & Delivering Customer Value!ISPMAIndia
 
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A..."Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...ISPMAIndia
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkTimothy Spann
 

Recently uploaded (20)

Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
Open Sprintera (Where Open Source Sparks a Sprint of Possibilities)
 
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdf
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdfAUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdf
AUTOKEYUNLOCKER-BRANDS-SUPPORT-STANDARD-VERSION.pdf
 
sql ppt for students who preparing for sql
sql ppt for students who preparing for sqlsql ppt for students who preparing for sql
sql ppt for students who preparing for sql
 
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ..."Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
"Taking an idea to a Product in Health diagnostics" by Dr. Geetha Manjunath, ...
 
Software Testing life cycle (STLC) Importance, Phases, Benefits...
Software Testing life cycle (STLC) Importance, Phases, Benefits...Software Testing life cycle (STLC) Importance, Phases, Benefits...
Software Testing life cycle (STLC) Importance, Phases, Benefits...
 
Self scaling Multi cloud nomad workloads
Self scaling Multi cloud nomad workloadsSelf scaling Multi cloud nomad workloads
Self scaling Multi cloud nomad workloads
 
Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024Essence of Requirements Engineering: Pragmatic Insights for 2024
Essence of Requirements Engineering: Pragmatic Insights for 2024
 
No more Dockerfiles? Buildpacks to help you ship your image!
No more Dockerfiles? Buildpacks to help you ship your image!No more Dockerfiles? Buildpacks to help you ship your image!
No more Dockerfiles? Buildpacks to help you ship your image!
 
maximum subarray ppt for killing camp students
maximum subarray ppt for killing camp studentsmaximum subarray ppt for killing camp students
maximum subarray ppt for killing camp students
 
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
The Game-Changer_ How Software Development Outsource Can Catapult Your Growth...
 
Getting Started with Trello for Beginners.pptx
Getting Started with Trello for Beginners.pptxGetting Started with Trello for Beginners.pptx
Getting Started with Trello for Beginners.pptx
 
Les02 Restricting and Sorting Data using SQL.ppt
Les02 Restricting and Sorting Data using SQL.pptLes02 Restricting and Sorting Data using SQL.ppt
Les02 Restricting and Sorting Data using SQL.ppt
 
Embracing Change - The Impact of Generative AI on Strategic Portfolio Management
Embracing Change - The Impact of Generative AI on Strategic Portfolio ManagementEmbracing Change - The Impact of Generative AI on Strategic Portfolio Management
Embracing Change - The Impact of Generative AI on Strategic Portfolio Management
 
P1 Inspection Types in Municity 5 Smartsheet
P1 Inspection Types in Municity 5 SmartsheetP1 Inspection Types in Municity 5 Smartsheet
P1 Inspection Types in Municity 5 Smartsheet
 
Sql server types of joins with example.pptx
Sql server types of joins with example.pptxSql server types of joins with example.pptx
Sql server types of joins with example.pptx
 
killingcamp longest common subsequence.pdf
killingcamp longest common subsequence.pdfkillingcamp longest common subsequence.pdf
killingcamp longest common subsequence.pdf
 
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...
Product Manager vs Product Owner – Why Do Companies Still Struggle 23 Years A...
 
The Age of AI: Elevating Experiences & Delivering Customer Value!
The Age of AI: Elevating Experiences & Delivering Customer Value!The Age of AI: Elevating Experiences & Delivering Customer Value!
The Age of AI: Elevating Experiences & Delivering Customer Value!
 
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A..."Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
"Discovery and Delivery through Product IntelliGenAI framework" by Ramkumar A...
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 

Storm - SpaaS

  • 1. SpaaS* * Stream processing as a service with Apache Storm Ernestas Vaiciukevičius
  • 2. Birth of the platform
  • 3. Birth of the platform Legacy solution issues: Delays Resource utilization Storage for temp data Hard to scale Not fault tolerant Licenses Batch based
  • 4. Gradually refactoring old solution Birth of the platform
  • 5. Birth of the platform Storm Kafka
  • 6. Our Storm cluster became generic enough to be offered as a service to other teams. Just needed to address a few points: • Simpler scaling • Resource isolation Birth of the platform Storm
  • 7. Birth of the platform Storm Mesos Our Storm cluster became generic enough to be offered as a service to other teams. Just needed to address a few points: • Simpler scaling • Resource isolation
  • 8. Birth of the platform Storm Mesos Our Storm cluster became generic enough to be offered as a service to other teams. Just needed to address a few points: • Simpler scaling – Storm-mesos integration • Resource isolation
  • 9. Birth of the platform Storm Mesos Our Storm cluster became generic enough to be offered as a service to other teams. Just needed to address a few points: • Simpler scaling – Storm-mesos integration • Resource isolation - cgroups
  • 10. Birth of the platform Storm Mesos Providing stream processing platform as a service Storm cluster infrastructure • 600 CPU cores, 3TB RAM • Scala common library with reusable components • Monitoring/alerting/logging for topologies • Normal load - 0.7M messages/s
  • 12. Storm basics • Tuple – a record/message/item from whose stream consists • Spout – source of stream • Bolt – a step in processing chain • Topology – graph of connected bolts and spouts describing data flow • Worker – one of many distributed JVM processes that executes a topology
  • 13. Storm basics • Tuple – a record/message/item from whose stream consists • Spout – source of stream • Bolt – a step in processing chain • Topology – graph of connected bolts and spouts describing data flow • Worker – one of many distributed JVM processes that executes a topology
  • 14. Storm basics • Tuple – a record/message/item from whose stream consists • Spout – source of stream • Bolt – a step in processing chain • Topology – graph of connected bolts and spouts describing data flow • Worker – one of many distributed JVM processes that executes a topology
  • 15. Bolt
  • 16. Bolt
  • 17. Storm basics • Tuple – a record/message/item from whose stream consists • Spout – source of stream • Bolt – a step in processing chain • Topology – graph of connected bolts and spouts describing data flow • Worker – one of many distributed JVM processes that executes a topology
  • 19. Storm basics • Tuple – a record/message/item from whose stream consists • Spout – source of stream • Bolt – a step in processing chain • Topology – graph of connected bolts and spouts describing data flow • Worker – one of many distributed JVM processes that executes a topology
  • 20. Storm basics – reliable processing Spout types: • Unreliable • Reliable Guarantees: • At most once • At least once
  • 21. Storm basics – reliable processing Bolts may emit tuples anchored to one or more input tuples. Here tuple B is descendant of A
  • 22. Storm basics – reliable processing Multiple anchorings form a tuple tree.
  • 23. Storm basics – reliable processing Bolts can either • “acknowledge” or • “fail” it’s input tuples.
  • 24. Storm basics – reliable processing Failing in any of the bolts of the tuple tree will fail original tuples(s). Spouts will retry and re-emit them again.
  • 25. Our commons library Tiny layer on top of Storm API and ScalaStorm* DSL to make developing in Scala more convenient • Typed messages • Unified exception handling • Reusable components * https://github.com/velvia/ScalaStorm
  • 26. Our commons library – typed messages t.getInteger(0) t.getString(1) t.getValue(2) {1, 2} {2, "click"} {1, "click", [1, 2, 3] } Standard Storm tuples
  • 27. Our commons library – typed messages override def execute(t: Tuple) = { // what if wrong tuple comes here... val click = t.getValue(0).asInstanceOf[Click] // it would crash the worker with an exception val clickId = t.getInteger(0) // or worse - what if that's not clickId... } Standard "execute" method
  • 28. Our commons library – typed messages case class ClickMessage(id: Int, url: String) extends BaseMessage message {1, "http://example.com"}
  • 29. Our commons library – typed messages case class ClickMessage(id: Int, url: String) extends BaseMessage … override def exec(t: Tuple) = { case ClickMessage(id, url) => ... using anchor t emitMsg NextMessage(id) } We started to use typed Scala case classes
  • 30. Our commons library – typed messages Many fine-grained bolts can lead to high number of threads in worker processes and huge heartbeat states stored in ZooKeeper. override def transformer(): BaseMessage = { case m: BaseMessage => MyNewMessage() } Each bolt brings at least two threads overhead. Message transformation as standard functionality in base bolt helps to avoid “mapper” bolts..
  • 31. Our commons library – exception handling class MyBolt … with FailTupleExceptionHandler … class MyOtherBolt … { override def handleException(t: Tuple, tw: Throwable): Unit = … } • FailTupleExceptionHandler • WorkerRestartExceptionHandler • AckTupleExceptionHandler • DeactivateTopologyExceptionHandler • AckTupleWithLimitExceptionHandler
  • 32. Our commons library – reusable components • CacheBolt • SyncBolt • KafkaProducerBolt • RestApiBolt • HadoopApiUploaderBolt • InMemoryJoinBolt • DeduplicatorBolt • common helpers for logging. metrics, calling REST API's, etc.
  • 33. Our commons library – stream join
  • 34. Our commons library – stream join
  • 35. Challenge 1: Data is not perfectly ordered • out-of-order items in both streams might cause unjoined results
  • 36. Challenge 1: Data is not perfectly ordered • increase join window to compensate for out-of-order items in left stream • increase synchronization offset for out-of-order items in right stream
  • 37. Challenge 2: topic partitions not consumed evenly
  • 38. Challenge 2: topic partitions not consumed evenly • introduced PartitionAwareKafkaSpout – each item knows it's source partition trait PartitionAwareMessage extends BaseMessage { def partition: Int } • use minimal timestamp across all partitions for window expiration and sync time
  • 39. Challenge 2: topic partitions not consumed evenly
  • 40. Challenge 3: joins with huge join windows • there are cases when join windows need to be minutes or even hours rather than seconds – it may be difficult to hold these huge buffers in Storm worker's RAM • items are not acknowledged until they aren't joined and fully processed – so huge number of items stuck in join buffer would not work with reliable Storm topologies
  • 41. Challenge 3: joins with huge join windows Introduced another flavor of the join using external storage • store join window items to Aerospike in-memory storage via REST API • allows to store and retrieve arbitrary data by key • API supports batching for performance
  • 42. Challenge 3: joins with huge join windows Feeding the data to join window
  • 43. Challenge 3: joins with huge join windows Doing the join
  • 44. Challenge 3: joins with huge join windows Tracking data delays
  • 45. Challenge 3: joins with huge join windows
  • 46. Challenge 3: joins with huge join windows • fewer nuances than with in-memory join • more external components • supports huge join windows • no handling for unjoined right stream items • supports right stream with no continuous throughput (allows pauses)