SlideShare a Scribd company logo
1 of 34
STORM
Buckle up Dorothy !!!
Distributed real-time computation
ABOUT
By Nathan Marz
Backtype => Twitter => Apache
Real-time analytics
WHAT IS IT GOOD FOR?
Online machine learning
Continuous computation
Distributed RPC
ETL (Extract, Transform, Load)
…
No data loss
Fault-tolerantScalable
PROMISES
Robust
VIEW FROM ABOVE
StorageTopology
Stream
Source
Storm Cluster
Pull
(Kafka,*
MQ, …)
Read/Write
PRIMITIVES
Field 1 /
Value 1
Field 2 /
Value 2
Field 3 /
Value 3
Field 4 /
Value 4
Field 5 /
Value 5
Tuple
Tuple Tuple Tuple Tuple
Stream
Topology
Bolt
PRIMITIVES
Spout
Bolt
Spout
Bolt
Bolt
ABSTRACTION
PRIMITIVES
Tuples
Filters
Transformation
Incremental
Distributed
Scalable
Functions
Joins
Chaining streams
Small components
EFFECTS
Spouts
Bolts
CLUSTER
Nimbus Zookeeper Cluster
Worker Node
Executor
Supervisor
Executor
Executor
Worker Node
Executor
Supervisor
Executor
Executor
Worker Node
Executor
Supervisor
Executor
Executor
NIMBUS / NODES
CLUSTER
Small
No state
Communication
State
RobustKill / Restart easy
ZOOKEEPER
No data loss
Fault-tolerantScalable
AS PROMISED?
Robust
GUARANTEES
Message transforms into a tuple tree
Storm tracks tuple tree
Fully processed when tree exhausted
FAILURES
Task died – failed tuples replayed
Acker task died – related tuples
timeout and are replayed
Spout task died – source replays, e.g.
pending messages are placed back on
the queue
WHAT DO I HAVE TO DO?
Inform about new links in tree
Inform when finished with a tuple
Every tuple must be acked or failed
TRIDENT
ANYTHING SIMPLER?
High level abstraction
Stateful persistence primitives
Exactly-once semantics
AS PROMISED?
YES
USER DASHBOARD
PROBLEM
Bad performance
Uses core storage
Pre-compute
Customize
Fast
IDEA
Isolate
Quarterly agg.
ARCHITECTURE
Core
Events
Queue
Kafka
4 Partitions
2 Replicas
Storm
4 Workers
MS SQL
4 Staging
Dashboard
Push
Pull Write
Read
State in source
KAFKA
9
8
7
6
5
4
3
2
1
New
Client
Topic Stacked
Flushed
Client offset
Replicated
Old
Partitioned
Fast
TRANSFORMATION
ORIGINAL
{
id: df45er87c78df,
sender: “Info”,
destination: “39345123456”,
parts: 2,
price: 100,
client: “Demo”,
time: “2014-06-02 14:47:58”,
country: “IT”,
network: “Wind”,
type: “SMS”,
…
}
{
client: “Demo”,
type: “SMS”,
country: “IT”,
network: “Wind”,
bucket: “2014-06-02 14:45:00”,
traffic: 2,
expenses: 200
}
COMPUTED
CODE
TridentState tridentState = topology
.newStream("CoreEvents", buildKafkaSpout())
.parallelismHint(4)
.each(
new Fields("bytes"),
new CoreEventMessageParser(),
new Fields("time", "client", "network", "country", "type", "parts", "price"))
.each(
new Fields("time"),
new QuarterTimeBucket(),
new Fields("bucket"))
.project(new Fields("bucket", "client", "network", "country", "type", "traffic", "expenses“))
.groupBy(new Fields("bucket", "client", "network", "country", "type"))
.persistentAggregate(getStateFactory(),
new Fields("traffic", "expenses"),
new Sum(),
new Fields("trafficExpenses"))
.parallelismHint(8);
PERFORMANCE
1.500
PEAKREGULAR
KAFKA 60.000
4.500 160.000
STORAGE 2.000 10.000
DASHBOARD 1 1
TUNING STORAGE
1st Issue - Storage
Random access – 1.500 w/s limit
Staged approach – 30.000 w/s limit
No locks – isolated
Scalable – each worker it’s stage
Main table indexing nicely
Doesn’t affect reading
STAGED WRITES
Worker 1
Main
Table
Merge
Worker 2
Stage
Table 1
Stage
Table 2
MergeWrite
Write
TUNING TOPOLOGY
2nd Issue - Serialization
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
Raw/s Expanded/s Writes/s
200 KB
1 MB
4 MB
8 MB
16 MB
24 MB
Plateauing
SERIALIZATION
0
200
400
600
800
1,000
1,200
S [s] S [byte] S [% CPU] D [s] D [% CPU]
CSV (Plain)
CSV (Deflate)
CSV (GZip)
Jackson (Plain)
Jackson (GZip)
Jackson Smile
Java Object
Kryo
MEASURE
AXIS
Max spout pending
SQL workers
Kafka fetch speed
DB write speed
Kafka / DB ratio
Capacity
DB batch size
Kafka fetch size
Latency
METRICS
Serialization
…
MONITOR
STORM UI TOPOLOGY
METRICS
GRAPHITE
GOTCHAS
Version 0.9.1
Partially in flux
Kafka integration
Message & topology versioning
Performance tuning
Lambda Architecture
NEXT?
Master
Dataset
Real-time Views
Serving LayerBatch Layer
Speed Layer
New
Data
Query
Query
Batch Views
http://storm.incubator.apache.org
RESOURCES
http://lambda-architecture.net
http://kafka.apache.org
http://www.gimp.org
PRESENTATION TOOLS
http://www.pictaculous.com
http://www.colourlovers.com
http://www.easycalculation.com
http://paletton.com
QUESTIONS?

More Related Content

What's hot

SCaml compiler
SCaml compilerSCaml compiler
SCaml compilerJun Furuse
 
Concurrency in Distributed Systems : Leslie Lamport papers
Concurrency in Distributed Systems : Leslie Lamport papersConcurrency in Distributed Systems : Leslie Lamport papers
Concurrency in Distributed Systems : Leslie Lamport papersSubhajit Sahu
 
R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R Vivian S. Zhang
 
Optimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with TruffleOptimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with TruffleStefan Marr
 

What's hot (7)

SCaml compiler
SCaml compilerSCaml compiler
SCaml compiler
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Concurrency in Distributed Systems : Leslie Lamport papers
Concurrency in Distributed Systems : Leslie Lamport papersConcurrency in Distributed Systems : Leslie Lamport papers
Concurrency in Distributed Systems : Leslie Lamport papers
 
Clojure
ClojureClojure
Clojure
 
R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R
 
Ns2
Ns2Ns2
Ns2
 
Optimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with TruffleOptimizing Communicating Event-Loop Languages with Truffle
Optimizing Communicating Event-Loop Languages with Truffle
 

Similar to Storm overview & integration

Server side JavaScript: going all the way
Server side JavaScript: going all the wayServer side JavaScript: going all the way
Server side JavaScript: going all the wayOleg Podsechin
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustEvan Chan
 
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Serverless London 2019   FaaS composition using Kafka and CloudEventsServerless London 2019   FaaS composition using Kafka and CloudEvents
Serverless London 2019 FaaS composition using Kafka and CloudEventsNeil Avery
 
Capacity Planning for Linux Systems
Capacity Planning for Linux SystemsCapacity Planning for Linux Systems
Capacity Planning for Linux SystemsRodrigo Campos
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxpetabridge
 
Workshop apache camel
Workshop apache camelWorkshop apache camel
Workshop apache camelMarko Seifert
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesOleksii Diagiliev
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Spark Summit
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingYaroslav Tkachenko
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
Design and Performance Characteristics of Tap-as-a-Service
Design and Performance Characteristics of Tap-as-a-ServiceDesign and Performance Characteristics of Tap-as-a-Service
Design and Performance Characteristics of Tap-as-a-Servicesoichi shigeta
 
Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Anyscale
 
Overview Of Parallel Development - Ericnel
Overview Of Parallel Development -  EricnelOverview Of Parallel Development -  Ericnel
Overview Of Parallel Development - Ericnelukdpe
 
Kafka Streams: Revisiting the decisions of the past (How I could have made it...
Kafka Streams: Revisiting the decisions of the past (How I could have made it...Kafka Streams: Revisiting the decisions of the past (How I could have made it...
Kafka Streams: Revisiting the decisions of the past (How I could have made it...confluent
 

Similar to Storm overview & integration (20)

Server side JavaScript: going all the way
Server side JavaScript: going all the wayServer side JavaScript: going all the way
Server side JavaScript: going all the way
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
Venkat ns2
Venkat ns2Venkat ns2
Venkat ns2
 
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Serverless London 2019   FaaS composition using Kafka and CloudEventsServerless London 2019   FaaS composition using Kafka and CloudEvents
Serverless London 2019 FaaS composition using Kafka and CloudEvents
 
Capacity Planning for Linux Systems
Capacity Planning for Linux SystemsCapacity Planning for Linux Systems
Capacity Planning for Linux Systems
 
Ns network simulator
Ns network simulatorNs network simulator
Ns network simulator
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptx
 
Ns2
Ns2Ns2
Ns2
 
Workshop apache camel
Workshop apache camelWorkshop apache camel
Workshop apache camel
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
 
Ns2
Ns2Ns2
Ns2
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processing
 
dfl
dfldfl
dfl
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Design and Performance Characteristics of Tap-as-a-Service
Design and Performance Characteristics of Tap-as-a-ServiceDesign and Performance Characteristics of Tap-as-a-Service
Design and Performance Characteristics of Tap-as-a-Service
 
Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0
 
Storm is coming
Storm is comingStorm is coming
Storm is coming
 
Overview Of Parallel Development - Ericnel
Overview Of Parallel Development -  EricnelOverview Of Parallel Development -  Ericnel
Overview Of Parallel Development - Ericnel
 
Kafka Streams: Revisiting the decisions of the past (How I could have made it...
Kafka Streams: Revisiting the decisions of the past (How I could have made it...Kafka Streams: Revisiting the decisions of the past (How I could have made it...
Kafka Streams: Revisiting the decisions of the past (How I could have made it...
 

Recently uploaded

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Storm overview & integration