SlideShare a Scribd company logo
1 of 39
Download to read offline
TUGA IT 2017
LISBON, PORTUGAL
THANK YOU TO OUR SPONSORS
PLATINUM
GOLD SILVER
PARTICIPATING COMMUNITIES
CLOUD
PRO
PT
Event processing with
Apache Storm
Nuno Caneco - Tuga IT - 20/May/2017
Nuno Caneco
Senior Software Engineer @ Talkdesk
/nunocaneco
nuno.caneco@gmail.com
/@nuno.caneco
Who am I
Stream Processing - Why?
WHY
● Data is crucial for business
● New data is always being generated
● Companies want to extract value from
data in “real-time”
USE CASES
● Fraud detection
● Sensor data aggregation
● Live monitoring
What is Storm?
Apache Storm is a free and open source distributed realtime computation system.
Storm makes it easy to reliably process unbounded streams of data, doing for realtime
processing what Hadoop did for batch processing. Storm is simple, can be used with
any programming language, and is a lot of fun to use!
Storm has many use cases: real time analytics, online machine learning, continuous
computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at
over a million tuples processed per second per node. It is scalable, fault-tolerant,
guarantees your data will be processed, and is easy to set up and operate.
http://storm.apache.org/
Under the hood
(A bit of)
Architecture
Nimbus
Master node
Zookeeper
Zookeeper
Cluster
coordination
Supervisor
Supervisor
Supervisor
Supervisor
Cluster
Worker
process
Worker
process
Worker
process
Worker
process
Worker
process
Worker
process
JVM
instances
Concepts
Topology
Topologies combine individual work units to be applied to input data
Spout
Bolt A
Bolt C
Bolt B
data
[Tuple]
[Tuple]
[Tuple]
[Tuple]
[Data out]
Bolt D
[Tuple]
Topology
Spout
● First node of every topology
○ Collects data from the outside world
○ Injects the data on the topology in order to be processed
● Must implement ISpout interface
○ BaseRichSpout is a more convenient abstract class
Spouts
Bolt
● Middle or terminating nodes of a topology
● Implements a Unit of Data Processing
● Each Bolt has an output stream
● Can emit one or more Tuples to other Bolts subscribing the output
stream
● Must implement IBolt
○ BaseRichBolt is a more convenient abstract class
Bolt
Tuple
Hash-alike data structure containing the data that flows between Spouts and Bolts
Data can be accessed by:
● Field index: [{0, “foo”}, {1, “bar”}]
● Key name: [{“foo_key”, “foo”}, {“bar_key”, “bar”}]
Values can be:
● Java primitive types
● String
● byte[]
● Any Serializable object
Example: Alert on monitored words
Monitored
Words Bolt
Collector
Spout
{Message} Split
Sentence
Bolt
{Word, MessageId}
Message
queue
[Message]
Notify User
Bolt
Store event
on DB Bolt
{MonitoredWord,
MessageId}
{MonitoredWord,
MessageId}
Demo
Message Processing Guarantees
Message
is lost
Error
Acknowledging Tuples
● ack(): Tuple was processed successfully
● fail(): Tuple failed to process
Tuples with no ack() nor fail() are automatically replayed
Tuples with fail() will fail up the dependency tree
Acknowledge done right
Ack: Anchoring
public class SplitSentence extends BaseRichBolt {
public void execute(Tuple tuple) {
String sentence = tuple.getString(0);
for(String word: sentence.split(" ")) {
_collector.emit(tuple, new Values(word));
}
_collector.ack(tuple);
}
}
}
Anchors the
output tuple to
the input tupleAcknowledges
the input tuple
Dealing with fail()
Spout
Bolt Bolt
BoltBolt Bolt✅ ✅
✅
✅
✅
✅ Spout
Bolt Bolt
BoltBolt Bolt✅ ✅
✅
❌
❌
❌
Beware!
Storm is designed to scale to process millions of messages per second.
It's design deliberately assumes that some Tuples might be lost.
If your application needs Exactly Once semantics, you should consider using
Trident (will talk about that in a while)
Storm does not ensure
exactly once processing
Demo
Parallelism
Cluster node
Worker Process Worker Process
Cluster Node → 1+ JVM instances
JVM Instance → 1+ Threads
Thread → 1+ Task
Each instance of a Bolt or Spout is a
Task
Thread Thread
Thread Thread
Task Task
Task
Task Task
Task Task Task
Parallelism Example
Config conf = new Config();
conf.setNumWorkers(2); // use two worker processes
topologyBuilder.setSpout("blue-spout", new BlueSpout(), 2);
topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)
.setNumTasks(4)
.shuffleGrouping("blue-spout");
topologyBuilder.setBolt("yellow-bolt", new YellowBolt(), 6)
.shuffleGrouping("green-bolt");
Stream grouping
● Shuffle grouping: randomly distributed across all downstream Bolts
● Fields grouping: GROUP BY values - Same values of the grouped fields will be
delivered to the same Bolt
● All grouping: The stream is replicated across all the bolt's tasks. Use this grouping
with care
● Direct grouping: The producer of the Tuple must indicate which consumer will
receive the Tuple
● Custom Grouping: When you go NIH
Storm UI - Cluster
Storm UI - Cluster
Storm UI - Topology
Storm UI - Topology
Other features: Trident
Trident is an abstraction layer to manage state across the topology
The state can be kept:
● Internally in the topology - in memory or backed by HDFS
● Externally on a Database - such as Memcached or Cassandra
Other features: Storm SQL
The Storm SQL integration allows users to run SQL queries over streaming data
in Storm.
Cool feature, but still experimental
Q&A
Questions ?
Thank you
/nunocaneco
nuno.caneco@gmail.com
/@nuno.caneco
PLEASE FILL IN EVALUATION FORMS
FRIDAY, MAY 19th SATURDAY, MAY 20th
https://survs.com/survey/cprwce7pi8 https://survs.com/survey/l9kksmlzd8
YOUR OPINION IS IMPORTANT!
THANK YOU TO OUR SPONSORS
PLATINUM
GOLD SILVER
Trident: How it works
1. Tuples are processed as small batches
2. Each batch of tuples is given a unique id called the "transaction id" (txid).
a. If the batch is replayed, it is given the exact same txid.
3. State updates are ordered among batches. That is, the state updates for
batch 3 won't be applied until the state updates for batch 2 have succeeded.
Trident: Transactional Spout
Trident
Store
java => [count=5, txid=1]
kotlin => [count=8, txid=2]
csharp => [count=10, txid=3]
["kotlin"]
["kotlin"]
["csharp"]
Batch txid=3
Trident
Store
java => [count=5, txid=1]
kotlin => [count=10, txid=3]
csharp => [count=10, txid=3]
"kotlin" += 2
"csharp" += 0

More Related Content

What's hot

Bsdtw17: george neville neil: realities of dtrace on free-bsd
Bsdtw17: george neville neil: realities of dtrace on free-bsdBsdtw17: george neville neil: realities of dtrace on free-bsd
Bsdtw17: george neville neil: realities of dtrace on free-bsdScott Tsai
 
Short introduction to Storm
Short introduction to StormShort introduction to Storm
Short introduction to StormJimmyZoger
 
Building Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editorBuilding Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editorSun-Li Beatteay
 
EKAW - Linked Data Publishing
EKAW - Linked Data PublishingEKAW - Linked Data Publishing
EKAW - Linked Data PublishingRuben Taelman
 
Be a Zen monk, the Python way
Be a Zen monk, the Python wayBe a Zen monk, the Python way
Be a Zen monk, the Python waySriram Murali
 
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-OnApache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-OnApache Flink Taiwan User Group
 
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation systemBWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation systemAndrii Gakhov
 
Clustering_Algorithm_DR
Clustering_Algorithm_DRClustering_Algorithm_DR
Clustering_Algorithm_DRNguyen Tran
 

What's hot (9)

Bsdtw17: george neville neil: realities of dtrace on free-bsd
Bsdtw17: george neville neil: realities of dtrace on free-bsdBsdtw17: george neville neil: realities of dtrace on free-bsd
Bsdtw17: george neville neil: realities of dtrace on free-bsd
 
Short introduction to Storm
Short introduction to StormShort introduction to Storm
Short introduction to Storm
 
Building Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editorBuilding Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editor
 
Data structure
Data structureData structure
Data structure
 
EKAW - Linked Data Publishing
EKAW - Linked Data PublishingEKAW - Linked Data Publishing
EKAW - Linked Data Publishing
 
Be a Zen monk, the Python way
Be a Zen monk, the Python wayBe a Zen monk, the Python way
Be a Zen monk, the Python way
 
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-OnApache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
Apache Flink Training Workshop @ HadoopCon2016 - #2 DataSet API Hands-On
 
BWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation systemBWB Meetup: Storm - distributed realtime computation system
BWB Meetup: Storm - distributed realtime computation system
 
Clustering_Algorithm_DR
Clustering_Algorithm_DRClustering_Algorithm_DR
Clustering_Algorithm_DR
 

Similar to Tuga it 2017 - Event processing with Apache Storm

Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and howPetr Zapletal
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
Fletcher Framework for Programming FPGA
Fletcher Framework for Programming FPGAFletcher Framework for Programming FPGA
Fletcher Framework for Programming FPGAGanesan Narayanasamy
 
Interconnection Automation For All - Extended - MPS 2023
Interconnection Automation For All - Extended - MPS 2023Interconnection Automation For All - Extended - MPS 2023
Interconnection Automation For All - Extended - MPS 2023Chris Grundemann
 
JConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaTimothy Spann
 
IoT Story: From Edge to HDP
IoT Story: From Edge to HDPIoT Story: From Edge to HDP
IoT Story: From Edge to HDPDataWorks Summit
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...HostedbyConfluent
 
Log Management Systems
Log Management SystemsLog Management Systems
Log Management SystemsMehdi Hamidi
 
Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...
Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...
Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...HostedbyConfluent
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user groupAdam Doyle
 
Being HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on PurposeBeing HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on PurposeAman Kohli
 
Season 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data ScientistsSeason 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data Scientistsaspyker
 
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...Miguel Pérez Colino
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteStreamNative
 
MACHBASE_NEO
MACHBASE_NEOMACHBASE_NEO
MACHBASE_NEOMACHBASE
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the Worldjhugg
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann
 
Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]LivePerson
 

Similar to Tuga it 2017 - Event processing with Apache Storm (20)

Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
FLiP Into Trino
FLiP Into TrinoFLiP Into Trino
FLiP Into Trino
 
Fletcher Framework for Programming FPGA
Fletcher Framework for Programming FPGAFletcher Framework for Programming FPGA
Fletcher Framework for Programming FPGA
 
Interconnection Automation For All - Extended - MPS 2023
Interconnection Automation For All - Extended - MPS 2023Interconnection Automation For All - Extended - MPS 2023
Interconnection Automation For All - Extended - MPS 2023
 
TAU on Power 9
TAU on Power 9TAU on Power 9
TAU on Power 9
 
JConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with Java
 
IoT Story: From Edge to HDP
IoT Story: From Edge to HDPIoT Story: From Edge to HDP
IoT Story: From Edge to HDP
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
 
Log Management Systems
Log Management SystemsLog Management Systems
Log Management Systems
 
Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...
Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...
Next Gen Data Modeling in the Open Data Platform With Doron Porat and Liran Y...
 
Data engineering Stl Big Data IDEA user group
Data engineering   Stl Big Data IDEA user groupData engineering   Stl Big Data IDEA user group
Data engineering Stl Big Data IDEA user group
 
Being HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on PurposeBeing HAPI! Reverse Proxying on Purpose
Being HAPI! Reverse Proxying on Purpose
 
Season 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data ScientistsSeason 7 Episode 1 - Tools for Data Scientists
Season 7 Episode 1 - Tools for Data Scientists
 
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
 
MACHBASE_NEO
MACHBASE_NEOMACHBASE_NEO
MACHBASE_NEO
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
 
Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]Apache Avro in LivePerson [Hebrew]
Apache Avro in LivePerson [Hebrew]
 

More from Nuno Caneco

Building resilient applications
Building resilient applicationsBuilding resilient applications
Building resilient applicationsNuno Caneco
 
Stateful mock servers to the rescue on REST ecosystems
Stateful mock servers to the rescue on REST ecosystemsStateful mock servers to the rescue on REST ecosystems
Stateful mock servers to the rescue on REST ecosystemsNuno Caneco
 
Git from the trenches
Git from the trenchesGit from the trenches
Git from the trenchesNuno Caneco
 
Tuga IT 2017 - Redis
Tuga IT 2017 - RedisTuga IT 2017 - Redis
Tuga IT 2017 - RedisNuno Caneco
 
Fullstack LX - Improving your application performance
Fullstack LX - Improving your application performanceFullstack LX - Improving your application performance
Fullstack LX - Improving your application performanceNuno Caneco
 
Running agile on a non-agile environment
Running agile on a non-agile environmentRunning agile on a non-agile environment
Running agile on a non-agile environmentNuno Caneco
 
Introducing redis
Introducing redisIntroducing redis
Introducing redisNuno Caneco
 
Tuga it 2016 improving your application performance
Tuga it 2016   improving your application performanceTuga it 2016   improving your application performance
Tuga it 2016 improving your application performanceNuno Caneco
 

More from Nuno Caneco (8)

Building resilient applications
Building resilient applicationsBuilding resilient applications
Building resilient applications
 
Stateful mock servers to the rescue on REST ecosystems
Stateful mock servers to the rescue on REST ecosystemsStateful mock servers to the rescue on REST ecosystems
Stateful mock servers to the rescue on REST ecosystems
 
Git from the trenches
Git from the trenchesGit from the trenches
Git from the trenches
 
Tuga IT 2017 - Redis
Tuga IT 2017 - RedisTuga IT 2017 - Redis
Tuga IT 2017 - Redis
 
Fullstack LX - Improving your application performance
Fullstack LX - Improving your application performanceFullstack LX - Improving your application performance
Fullstack LX - Improving your application performance
 
Running agile on a non-agile environment
Running agile on a non-agile environmentRunning agile on a non-agile environment
Running agile on a non-agile environment
 
Introducing redis
Introducing redisIntroducing redis
Introducing redis
 
Tuga it 2016 improving your application performance
Tuga it 2016   improving your application performanceTuga it 2016   improving your application performance
Tuga it 2016 improving your application performance
 

Recently uploaded

UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewDianaGray10
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...FIDO Alliance
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024Stephen Perrenod
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Hiroshi SHIBATA
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfalexjohnson7307
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 

Recently uploaded (20)

UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 

Tuga it 2017 - Event processing with Apache Storm

  • 2. THANK YOU TO OUR SPONSORS PLATINUM GOLD SILVER
  • 4. Event processing with Apache Storm Nuno Caneco - Tuga IT - 20/May/2017
  • 5. Nuno Caneco Senior Software Engineer @ Talkdesk /nunocaneco nuno.caneco@gmail.com /@nuno.caneco Who am I
  • 6. Stream Processing - Why? WHY ● Data is crucial for business ● New data is always being generated ● Companies want to extract value from data in “real-time” USE CASES ● Fraud detection ● Sensor data aggregation ● Live monitoring
  • 7. What is Storm? Apache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use! Storm has many use cases: real time analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate. http://storm.apache.org/
  • 11. Topology Topologies combine individual work units to be applied to input data Spout Bolt A Bolt C Bolt B data [Tuple] [Tuple] [Tuple] [Tuple] [Data out] Bolt D [Tuple] Topology
  • 12. Spout ● First node of every topology ○ Collects data from the outside world ○ Injects the data on the topology in order to be processed ● Must implement ISpout interface ○ BaseRichSpout is a more convenient abstract class Spouts
  • 13. Bolt ● Middle or terminating nodes of a topology ● Implements a Unit of Data Processing ● Each Bolt has an output stream ● Can emit one or more Tuples to other Bolts subscribing the output stream ● Must implement IBolt ○ BaseRichBolt is a more convenient abstract class Bolt
  • 14. Tuple Hash-alike data structure containing the data that flows between Spouts and Bolts Data can be accessed by: ● Field index: [{0, “foo”}, {1, “bar”}] ● Key name: [{“foo_key”, “foo”}, {“bar_key”, “bar”}] Values can be: ● Java primitive types ● String ● byte[] ● Any Serializable object
  • 15. Example: Alert on monitored words Monitored Words Bolt Collector Spout {Message} Split Sentence Bolt {Word, MessageId} Message queue [Message] Notify User Bolt Store event on DB Bolt {MonitoredWord, MessageId} {MonitoredWord, MessageId}
  • 16. Demo
  • 18. Acknowledging Tuples ● ack(): Tuple was processed successfully ● fail(): Tuple failed to process Tuples with no ack() nor fail() are automatically replayed Tuples with fail() will fail up the dependency tree
  • 20. Ack: Anchoring public class SplitSentence extends BaseRichBolt { public void execute(Tuple tuple) { String sentence = tuple.getString(0); for(String word: sentence.split(" ")) { _collector.emit(tuple, new Values(word)); } _collector.ack(tuple); } } } Anchors the output tuple to the input tupleAcknowledges the input tuple
  • 21. Dealing with fail() Spout Bolt Bolt BoltBolt Bolt✅ ✅ ✅ ✅ ✅ ✅ Spout Bolt Bolt BoltBolt Bolt✅ ✅ ✅ ❌ ❌ ❌
  • 22. Beware! Storm is designed to scale to process millions of messages per second. It's design deliberately assumes that some Tuples might be lost. If your application needs Exactly Once semantics, you should consider using Trident (will talk about that in a while) Storm does not ensure exactly once processing
  • 23. Demo
  • 24. Parallelism Cluster node Worker Process Worker Process Cluster Node → 1+ JVM instances JVM Instance → 1+ Threads Thread → 1+ Task Each instance of a Bolt or Spout is a Task Thread Thread Thread Thread Task Task Task Task Task Task Task Task
  • 25. Parallelism Example Config conf = new Config(); conf.setNumWorkers(2); // use two worker processes topologyBuilder.setSpout("blue-spout", new BlueSpout(), 2); topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2) .setNumTasks(4) .shuffleGrouping("blue-spout"); topologyBuilder.setBolt("yellow-bolt", new YellowBolt(), 6) .shuffleGrouping("green-bolt");
  • 26. Stream grouping ● Shuffle grouping: randomly distributed across all downstream Bolts ● Fields grouping: GROUP BY values - Same values of the grouped fields will be delivered to the same Bolt ● All grouping: The stream is replicated across all the bolt's tasks. Use this grouping with care ● Direct grouping: The producer of the Tuple must indicate which consumer will receive the Tuple ● Custom Grouping: When you go NIH
  • 27. Storm UI - Cluster
  • 28. Storm UI - Cluster
  • 29. Storm UI - Topology
  • 30. Storm UI - Topology
  • 31. Other features: Trident Trident is an abstraction layer to manage state across the topology The state can be kept: ● Internally in the topology - in memory or backed by HDFS ● Externally on a Database - such as Memcached or Cassandra
  • 32. Other features: Storm SQL The Storm SQL integration allows users to run SQL queries over streaming data in Storm. Cool feature, but still experimental
  • 35. PLEASE FILL IN EVALUATION FORMS FRIDAY, MAY 19th SATURDAY, MAY 20th https://survs.com/survey/cprwce7pi8 https://survs.com/survey/l9kksmlzd8 YOUR OPINION IS IMPORTANT!
  • 36. THANK YOU TO OUR SPONSORS PLATINUM GOLD SILVER
  • 37.
  • 38. Trident: How it works 1. Tuples are processed as small batches 2. Each batch of tuples is given a unique id called the "transaction id" (txid). a. If the batch is replayed, it is given the exact same txid. 3. State updates are ordered among batches. That is, the state updates for batch 3 won't be applied until the state updates for batch 2 have succeeded.
  • 39. Trident: Transactional Spout Trident Store java => [count=5, txid=1] kotlin => [count=8, txid=2] csharp => [count=10, txid=3] ["kotlin"] ["kotlin"] ["csharp"] Batch txid=3 Trident Store java => [count=5, txid=1] kotlin => [count=10, txid=3] csharp => [count=10, txid=3] "kotlin" += 2 "csharp" += 0