SlideShare a Scribd company logo
1 of 69
Download to read offline
Dancing with
Stream
Processing
Y.S. Horawalavithana
sameera1@mail.usf.edu
07/10/16 MSc. Distributed Systems 2
● Motivation
● Event Stream Processing
– Pub/sub, CEP, “”Buzzwords”
– Stream processing Engines
● Spark Streaming, Storm, Etc.
● Graph Stream Processing
– Theory... {“sketching”, “spanners”, “sparsifiers”}
– Challenges
● Discussion !!
Lightning Talk
07/10/16 MSc. Distributed Systems 3
Motivation
07/10/16 MSc. Distributed Systems 4
The Streaming Era
● Today, most data is continuously produced
- user activity logs, web logs, sensors, database
transactions, …
● The common approach to analyze such data so far
- Record data stream to stable storage (DBMS,
HDFS, …)
- Periodically analyze data with batch
processing engine (DBMS, MapReduce, …)
● Streaming processing engines analyze data
while it arrives
07/10/16 MSc. Distributed Systems 5
The Streaming Era (Contd.)
● Decreases the overall latency to obtain results
- No need to persist data in stable storage
- No periodic batch analysis jobs
● Simplifies the data infrastructure
- Fewer moving parts to be maintained and coordinated
● Makes time dimension of data explicit
- Each event has a timestamp
- Data can be processed based on timestamps
07/10/16 MSc. Distributed Systems 6
Event Streams
[Immutable]
Web Page Event
Wikipedia Page Update Event
LinkedIn User Update Event
07/10/16 MSc. Distributed Systems 7
Middleware
07/10/16 MSc. Distributed Systems 8
 Direct coupling
 Strict Identity
 Time coupling
 Not good for volatile
environment
 Not a good way to
communicate with
several participants
 Space uncoupling
 Anonymity
 Time uncoupling
 Independent lifetimes
between parties
 Through persistent
communication
channel
Point-to-point
communication
Indirect
communication
07/10/16 MSc. Distributed Systems 9
Taxonomy
Indirect
Communication
Communication
based
Group
communication
Message Queues
Publish/subscribe
State based
Tuple spaces
Distributed
Shared Memory
07/10/16 MSc. Distributed Systems 10
Pub/Sub Messaging Pattern
Topic-based
- Each event belongs to a
number of topics (e.g.
“music”, “sport”)
- Users subscribe to topics
and receive all relevant
events
Content-based
- Users subscribe to the
actual content of the
events/ a structured
summary of it
- More expressive
07/10/16 MSc. Distributed Systems 11
Pub/Sub Activities
 Subscription processing
 Indexing and storing subscriptions.
 Event Stream Processing (ESP)
 Pub/sub approach: upon arrival of events, access
subscription index and identify all matched
subscriptions.
 Event delivery
 deliver event to clients with matched subscriptions.
07/10/16 MSc. Distributed Systems 12
Event Stream Processing (ESP)
Wikipedia
07/10/16 MSc. Distributed Systems 13
Today's world...
Pub/sub ≈ ESP ≈
07/10/16 MSc. Distributed Systems 14
“Buzzwords”
https://martin.kleppmann.com/2015/01/29/stream-processing-event-sourcing-reactive-cep.html
07/10/16 MSc. Distributed Systems 15
Complex Event Processing (CEP)
● A set of event processing principles
● Match patterns of events
– Comparable to SQL queries
– High-level query language
● Cloud of causally related events
– POSET (Partially Ordered Set of Events)
07/10/16 MSc. Distributed Systems 16
Complex Event Processing (CEP)
● Some CEP Examples:
– When 2 transactions happen on an account from
radically different geographic locations within a
certain time window then report as potential fraud.
– When a gold customer's trouble ticket is not
resolved within 1 hour, then escalate.
– When a team meeting request overlaps with my
lunch break, then deny the team meeting and
demote the meeting organizer.
07/10/16 MSc. Distributed Systems 17
Complex Event Processing (CEP)
● Some CEP Examples:
– When 2 transactions happen on an account from
radically different geographic locations within a
certain time window then report as potential fraud.
– When a gold customer's trouble ticket is not
resolved within 1 hour, then escalate.
– When a team meeting request overlaps with my
lunch break, then deny the team meeting and
demote the meeting organizer.
07/10/16 MSc. Distributed Systems 18
ESP and CEP
[Timeline]
2002
AuroraAurora
2003
Medusa
2005
Borealis
STREAM
TelegraphCQ
<20001989 - 1995
Rapide
Esper Apama
StreamBase
SQLStream
WSO2 CEP
2016
07/10/16 MSc. Distributed Systems 19
ESP vs. CEP
http://www.slideshare.net/TimBassCEP/mythbusters-event-stream-processing-v-complex-event-processing-presentation
07/10/16 MSc. Distributed Systems 20
Today's world...
ESP ≈ CEP ≈
07/10/16 MSc. Distributed Systems 21
Laundry of “Buzzwords”
● Actor Frameworks
– Better mechanism to handle concurrency
– E.g. Akka, Orleans and Erlang OTP
● “Reactive”
– Language semantics for bringing event streams to the user
interface
– Responsive, Resilient, Elastic and Message Driven
– E.g. Data flow languages, Functional reactive programming
● Event Sourcing
● Change Data Capture (CDC)
07/10/16 MSc. Distributed Systems 22
Analytics ≈ Stream Transformations
https://martin.kleppmann.com/2015/01/29/stream-processing-event-sourcing-reactive-cep.html
07/10/16 MSc. Distributed Systems 23
https://martin.kleppmann.com/2015/01/29/stream-processing-event-sourcing-reactive-cep.html
07/10/16 MSc. Distributed Systems 24
Target
● Better Scalability
● High Throughput
● Low latency
● Powerful semantics
● Easy integration
via Low Level
Stream Processing
Frameworks !!
07/10/16 MSc. Distributed Systems 25
Spark Streaming
● General purpose computing engine to run batch,
interactive and streaming jobs
● Based on Resilient Distributed Datasets (RDD)
– Restricted form of distributed shared memory
– Immutable
– Can only be built through deterministic
transformations
● Efficient fault recovery using lineage graph
– Recompute lost partitions on failure
– No cost if nothing fails
07/10/16 MSc. Distributed Systems 26
Spark Streaming (Contd.)
[Key concepts]
● DStream – sequence of RDDs representing a stream
of data
– HDFS, Kafka, Flume, ZeroMQ, Akka Actor, TCP sockets
● Transformations – modify data from one DStream to
another
– Standard RDD operations – map, countByValue, reduce,
join, …
– Stateful operations – window, countByValueAndWindow, …
● Output Operations – send data to external entity
– saveAsHadoopFiles – saves to HDFS
– foreach – do anything with each batch of results
07/10/16 MSc. Distributed Systems 27
Spark Streaming (Contd.)
● Run a streaming computation as a series of very small,
deterministic batch jobs
– Chop up the live stream into batches of X seconds
– Spark treats each batch of data as RDDs and processes them using RDD
operations
– Finally, the processed results of the RDD operations are returned in
batches
07/10/16 MSc. Distributed Systems 28
Berkeley Data Stack
07/10/16 MSc. Distributed Systems 29
Spark 2.0 is
coming !!
07/10/16 MSc. Distributed Systems 30
Apache Storm
[Key concepts]
● Tuple
– Core Unit of Data
– Immutable Set of Key/Value Pairs
● Spouts
– Source of Streams
– Wraps a streaming data source and emits Tuples
● Bolts
– Core functions of a streaming computation
– Receive tuples and do stuff
– Optionally emit additional tuples
07/10/16 MSc. Distributed Systems 31
Apache Storm
[Key concepts]
● Topology
– DAG of Spouts and
Bolts
– Data Flow
Representation
– Streaming
Computation
07/10/16 MSc. Distributed Systems 32
Apache Storm
[Physical View]
07/10/16 MSc. Distributed Systems 33
Twitter introduces
Heron !!
[Storm's successor]
07/10/16 MSc. Distributed Systems 34
Stream Processing Engines
Many More !!!
07/10/16 MSc. Distributed Systems 35
07/10/16 MSc. Distributed Systems 36
Hidden computation paradigm
via pipelining !!
07/10/16 MSc. Distributed Systems 37
Pipelining ≈ Task Execution
https://martin.kleppmann.com/unix
07/10/16 MSc. Distributed Systems 38
Let's build the concept again...
07/10/16 MSc. Distributed Systems 39
Linux pipelining in
modern middle-ware...
https://martin.kleppmann.com/unix
07/10/16 MSc. Distributed Systems 40
Spark, Storm, Samza, Flink Etc.
https://martin.kleppmann.com/unix
07/10/16 MSc. Distributed Systems 41
Spark, Storm, Samza, Flink Etc.
https://martin.kleppmann.com/unix
07/10/16 MSc. Distributed Systems 42
Pub/sub pitch
https://martin.kleppmann.com/unix
07/10/16 MSc. Distributed Systems 43
Streaming Machine Learning
● By using a programing
abstraction for
distributed streaming
– Apache SAMOA
07/10/16 MSc. Distributed Systems 44
Graph Stream Processing
Referred Author: Vasia Kalavri, KTH
https://berlinbuzzwords.de/sites/berlinbuzzwords.de/files/media/documents/buzzwords-
kalavri.pdf
07/10/16 MSc. Distributed Systems 45
Static Graph Processing
● Load: read the graph from disk and partition
it in memory
● Compute: read and mutate the graph state
● Store: write the final graph state back to
disk
07/10/16 MSc. Distributed Systems 46
Static Graph Processing
[Drawbacks]
● It is slow
– wait until the computation is over before you see
any result
– pre-processing and partitioning
● It is expensive
– lots of memory and CPU required in order to scale
● It requires re-computation for graph changes
– no efficient way to deal with updates
07/10/16 MSc. Distributed Systems 47
Streaming Graph Processing
We consume events in real-time
● Get results faster
– No need to wait for the job to finish
– Sometimes, early approximations are better
than late exact answers
● Get results continuously
– Process unbounded number of events
07/10/16 MSc. Distributed Systems 48
Real-world scenarios
● Targeted Advertisement
– Finding Strongly Connected Components in a
social network graph
– Targeted chain of advertisement on detected
communities
Jane Joe
knows
#Tesla
posts
likes
Self driving cars
Ads
Peter Taphouse
checks-in
John
subscribes
Dinner Offer
Ads
07/10/16 MSc. Distributed Systems 49
Streaming Graph Processing
[Challenges]
● Maintain the graph structure
– How to apply state updates efficiently?
● Result updates
– Re-run the analysis for each event?
– Design an incremental algorithm?
– Run separate instances on multiple snapshots?
● How to preserve graph properties?
– Natural behavior?
07/10/16 MSc. Distributed Systems 50
Streaming Graph Processing
[Current Research]
Each event is an edge addition
Jane Joe
knows
Jane #Tesla
likes
Joe #Tesla
posts
Peter TapHouse
checks-in
07/10/16 MSc. Distributed Systems 51
07/10/16 MSc. Distributed Systems 52
Dynamic Graph Processing
● Instead of analyzing the whole graph
– Analyze it's properties by preserving them
continuously
● Connectivity or Distance (spanners)
● Graph cut estimation (sparsifiers)
● Neighborhood or homomorphic properties (sketches)
07/10/16 MSc. Distributed Systems 53
Dynamic Graph Processing (Contd.)
Jane Joe
knows
#Tesla
posts
likes
Self driving cars
Ads
Peter Taphouse
checks-in
John
subscribes
Dinner Offer
Ads
Peter Jane
loves
loves
Self driving cars
Ads
07/10/16 MSc. Distributed Systems 54
Stream Connected Components
● State: a disjoint set data structure for the
components
● Computation: For each edge
– if seen for the 1st time, create a component with ID
the min of the vertex IDs
– if in different components, merge them and update
the component ID to the min of the component IDs
– if only one of the endpoints belongs to a
component, add the other one to the same
component
07/10/16 MSc. Distributed Systems 55
Stream Connected Components
07/10/16 MSc. Distributed Systems 56
Stream Connected Components
07/10/16 MSc. Distributed Systems 57
Stream Connected Components
07/10/16 MSc. Distributed Systems 58
Stream Connected Components
07/10/16 MSc. Distributed Systems 59
Stream Connected Components
07/10/16 MSc. Distributed Systems 60
Stream Connected Components
07/10/16 MSc. Distributed Systems 61
Stream Connected Components
07/10/16 MSc. Distributed Systems 62
Stream Connected Components
07/10/16 MSc. Distributed Systems 63
Stream Connected Components
07/10/16 MSc. Distributed Systems 64
Stream Connected Components
07/10/16 MSc. Distributed Systems 65
Stream Connected Components
07/10/16 MSc. Distributed Systems 66
Distributed Stream Connected
Components
07/10/16 MSc. Distributed Systems 67
Streaming Graph Processing
[Current Work]
● We're working with Gelly-Streams on
– Preserving natural properties in large scale real-
world evolving graphs
– Joining multiple streams for detects graph
causality/ bipartite
– Efficient graph partitioning mechanisms to on-
board with popular data-stores like Cassandra,
HDFS
– Producing a platform to benchmark NPC
problems in real-world graphs
07/10/16 MSc. Distributed Systems 68
Discussion !!
07/10/16 MSc. Distributed Systems 69
Thank you !!
sameera1@mail.usf.edu

More Related Content

Similar to Dancing with Stream Processing

Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureGabriele Modena
 
Architecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationArchitecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationGeorge Long
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analyticsinoshg
 
CS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingCS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingPalani Kumar
 
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...Facultad de Informática UCM
 
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...StampedeCon
 
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...Ian Lumb
 
[WSO2Con USA 2018] The Rise of Streaming SQL
[WSO2Con USA 2018] The Rise of Streaming SQL[WSO2Con USA 2018] The Rise of Streaming SQL
[WSO2Con USA 2018] The Rise of Streaming SQLWSO2
 
Streaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapStreaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapWithTheBest
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...Reynold Xin
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Ganesh Raju
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterLinaro
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterLinaro
 
Analyzing Data at Scale with Apache Spark
Analyzing Data at Scale with Apache SparkAnalyzing Data at Scale with Apache Spark
Analyzing Data at Scale with Apache SparkNicola Ferraro
 
Putting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at NetflixPutting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at NetflixJeff Magnusson
 
Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson Hakka Labs
 

Similar to Dancing with Stream Processing (20)

Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
 
Architecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & ManipulationArchitecting Big Data Ingest & Manipulation
Architecting Big Data Ingest & Manipulation
 
Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analytics
 
CS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_ComputingCS8091_BDA_Unit_IV_Stream_Computing
CS8091_BDA_Unit_IV_Stream_Computing
 
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
A Glass Half Full: Using Programmable Hardware Accelerators in Analytical Dat...
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...Resilience: the key requirement of a [big] [data] architecture  - StampedeCon...
Resilience: the key requirement of a [big] [data] architecture - StampedeCon...
 
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
Univa and SUSE at SC17: Scaling Machine Learning for SUSE Linux Containers, S...
 
The Rise of Streaming SQL
The Rise of Streaming SQLThe Rise of Streaming SQL
The Rise of Streaming SQL
 
[WSO2Con USA 2018] The Rise of Streaming SQL
[WSO2Con USA 2018] The Rise of Streaming SQL[WSO2Con USA 2018] The Rise of Streaming SQL
[WSO2Con USA 2018] The Rise of Streaming SQL
 
Streaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapStreaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara Prathap
 
Japan's post K Computer
Japan's post K ComputerJapan's post K Computer
Japan's post K Computer
 
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
 
Analyzing Data at Scale with Apache Spark
Analyzing Data at Scale with Apache SparkAnalyzing Data at Scale with Apache Spark
Analyzing Data at Scale with Apache Spark
 
Lipstick On Pig
Lipstick On Pig Lipstick On Pig
Lipstick On Pig
 
Putting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at NetflixPutting Lipstick on Apache Pig at Netflix
Putting Lipstick on Apache Pig at Netflix
 
Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson Netflix - Pig with Lipstick by Jeff Magnusson
Netflix - Pig with Lipstick by Jeff Magnusson
 

More from Sameera Horawalavithana

Data-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and SimulationData-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and SimulationSameera Horawalavithana
 
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
Drivers of Polarized Discussions on Twitter during Venezuela Political CrisisSameera Horawalavithana
 
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
Twitter Is the Megaphone of Cross-platform Messaging on the White HelmetsSameera Horawalavithana
 
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Sameera Horawalavithana
 
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHubMentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHubSameera Horawalavithana
 
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...Sameera Horawalavithana
 
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...Sameera Horawalavithana
 
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation Sameera Horawalavithana
 
Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015Sameera Horawalavithana
 
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...Sameera Horawalavithana
 
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...Sameera Horawalavithana
 
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingTalk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingSameera Horawalavithana
 

More from Sameera Horawalavithana (17)

Data-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and SimulationData-driven Studies on Social Networks: Privacy and Simulation
Data-driven Studies on Social Networks: Privacy and Simulation
 
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
 
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHubMentions of Security Vulnerabilities on Reddit, Twitter and GitHub
Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub
 
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
 
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
 
Duplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy DatasetDuplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy Dataset
 
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
 
Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015
 
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
 
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
 
Zipf distribution
Zipf distributionZipf distribution
Zipf distribution
 
Query personalization
Query personalizationQuery personalization
Query personalization
 
Dancing with publish/subscribe
Dancing with publish/subscribeDancing with publish/subscribe
Dancing with publish/subscribe
 
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingTalk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
 

Recently uploaded

RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 

Recently uploaded (20)

RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 

Dancing with Stream Processing

  • 2. 07/10/16 MSc. Distributed Systems 2 ● Motivation ● Event Stream Processing – Pub/sub, CEP, “”Buzzwords” – Stream processing Engines ● Spark Streaming, Storm, Etc. ● Graph Stream Processing – Theory... {“sketching”, “spanners”, “sparsifiers”} – Challenges ● Discussion !! Lightning Talk
  • 3. 07/10/16 MSc. Distributed Systems 3 Motivation
  • 4. 07/10/16 MSc. Distributed Systems 4 The Streaming Era ● Today, most data is continuously produced - user activity logs, web logs, sensors, database transactions, … ● The common approach to analyze such data so far - Record data stream to stable storage (DBMS, HDFS, …) - Periodically analyze data with batch processing engine (DBMS, MapReduce, …) ● Streaming processing engines analyze data while it arrives
  • 5. 07/10/16 MSc. Distributed Systems 5 The Streaming Era (Contd.) ● Decreases the overall latency to obtain results - No need to persist data in stable storage - No periodic batch analysis jobs ● Simplifies the data infrastructure - Fewer moving parts to be maintained and coordinated ● Makes time dimension of data explicit - Each event has a timestamp - Data can be processed based on timestamps
  • 6. 07/10/16 MSc. Distributed Systems 6 Event Streams [Immutable] Web Page Event Wikipedia Page Update Event LinkedIn User Update Event
  • 7. 07/10/16 MSc. Distributed Systems 7 Middleware
  • 8. 07/10/16 MSc. Distributed Systems 8  Direct coupling  Strict Identity  Time coupling  Not good for volatile environment  Not a good way to communicate with several participants  Space uncoupling  Anonymity  Time uncoupling  Independent lifetimes between parties  Through persistent communication channel Point-to-point communication Indirect communication
  • 9. 07/10/16 MSc. Distributed Systems 9 Taxonomy Indirect Communication Communication based Group communication Message Queues Publish/subscribe State based Tuple spaces Distributed Shared Memory
  • 10. 07/10/16 MSc. Distributed Systems 10 Pub/Sub Messaging Pattern Topic-based - Each event belongs to a number of topics (e.g. “music”, “sport”) - Users subscribe to topics and receive all relevant events Content-based - Users subscribe to the actual content of the events/ a structured summary of it - More expressive
  • 11. 07/10/16 MSc. Distributed Systems 11 Pub/Sub Activities  Subscription processing  Indexing and storing subscriptions.  Event Stream Processing (ESP)  Pub/sub approach: upon arrival of events, access subscription index and identify all matched subscriptions.  Event delivery  deliver event to clients with matched subscriptions.
  • 12. 07/10/16 MSc. Distributed Systems 12 Event Stream Processing (ESP) Wikipedia
  • 13. 07/10/16 MSc. Distributed Systems 13 Today's world... Pub/sub ≈ ESP ≈
  • 14. 07/10/16 MSc. Distributed Systems 14 “Buzzwords” https://martin.kleppmann.com/2015/01/29/stream-processing-event-sourcing-reactive-cep.html
  • 15. 07/10/16 MSc. Distributed Systems 15 Complex Event Processing (CEP) ● A set of event processing principles ● Match patterns of events – Comparable to SQL queries – High-level query language ● Cloud of causally related events – POSET (Partially Ordered Set of Events)
  • 16. 07/10/16 MSc. Distributed Systems 16 Complex Event Processing (CEP) ● Some CEP Examples: – When 2 transactions happen on an account from radically different geographic locations within a certain time window then report as potential fraud. – When a gold customer's trouble ticket is not resolved within 1 hour, then escalate. – When a team meeting request overlaps with my lunch break, then deny the team meeting and demote the meeting organizer.
  • 17. 07/10/16 MSc. Distributed Systems 17 Complex Event Processing (CEP) ● Some CEP Examples: – When 2 transactions happen on an account from radically different geographic locations within a certain time window then report as potential fraud. – When a gold customer's trouble ticket is not resolved within 1 hour, then escalate. – When a team meeting request overlaps with my lunch break, then deny the team meeting and demote the meeting organizer.
  • 18. 07/10/16 MSc. Distributed Systems 18 ESP and CEP [Timeline] 2002 AuroraAurora 2003 Medusa 2005 Borealis STREAM TelegraphCQ <20001989 - 1995 Rapide Esper Apama StreamBase SQLStream WSO2 CEP 2016
  • 19. 07/10/16 MSc. Distributed Systems 19 ESP vs. CEP http://www.slideshare.net/TimBassCEP/mythbusters-event-stream-processing-v-complex-event-processing-presentation
  • 20. 07/10/16 MSc. Distributed Systems 20 Today's world... ESP ≈ CEP ≈
  • 21. 07/10/16 MSc. Distributed Systems 21 Laundry of “Buzzwords” ● Actor Frameworks – Better mechanism to handle concurrency – E.g. Akka, Orleans and Erlang OTP ● “Reactive” – Language semantics for bringing event streams to the user interface – Responsive, Resilient, Elastic and Message Driven – E.g. Data flow languages, Functional reactive programming ● Event Sourcing ● Change Data Capture (CDC)
  • 22. 07/10/16 MSc. Distributed Systems 22 Analytics ≈ Stream Transformations https://martin.kleppmann.com/2015/01/29/stream-processing-event-sourcing-reactive-cep.html
  • 23. 07/10/16 MSc. Distributed Systems 23 https://martin.kleppmann.com/2015/01/29/stream-processing-event-sourcing-reactive-cep.html
  • 24. 07/10/16 MSc. Distributed Systems 24 Target ● Better Scalability ● High Throughput ● Low latency ● Powerful semantics ● Easy integration via Low Level Stream Processing Frameworks !!
  • 25. 07/10/16 MSc. Distributed Systems 25 Spark Streaming ● General purpose computing engine to run batch, interactive and streaming jobs ● Based on Resilient Distributed Datasets (RDD) – Restricted form of distributed shared memory – Immutable – Can only be built through deterministic transformations ● Efficient fault recovery using lineage graph – Recompute lost partitions on failure – No cost if nothing fails
  • 26. 07/10/16 MSc. Distributed Systems 26 Spark Streaming (Contd.) [Key concepts] ● DStream – sequence of RDDs representing a stream of data – HDFS, Kafka, Flume, ZeroMQ, Akka Actor, TCP sockets ● Transformations – modify data from one DStream to another – Standard RDD operations – map, countByValue, reduce, join, … – Stateful operations – window, countByValueAndWindow, … ● Output Operations – send data to external entity – saveAsHadoopFiles – saves to HDFS – foreach – do anything with each batch of results
  • 27. 07/10/16 MSc. Distributed Systems 27 Spark Streaming (Contd.) ● Run a streaming computation as a series of very small, deterministic batch jobs – Chop up the live stream into batches of X seconds – Spark treats each batch of data as RDDs and processes them using RDD operations – Finally, the processed results of the RDD operations are returned in batches
  • 28. 07/10/16 MSc. Distributed Systems 28 Berkeley Data Stack
  • 29. 07/10/16 MSc. Distributed Systems 29 Spark 2.0 is coming !!
  • 30. 07/10/16 MSc. Distributed Systems 30 Apache Storm [Key concepts] ● Tuple – Core Unit of Data – Immutable Set of Key/Value Pairs ● Spouts – Source of Streams – Wraps a streaming data source and emits Tuples ● Bolts – Core functions of a streaming computation – Receive tuples and do stuff – Optionally emit additional tuples
  • 31. 07/10/16 MSc. Distributed Systems 31 Apache Storm [Key concepts] ● Topology – DAG of Spouts and Bolts – Data Flow Representation – Streaming Computation
  • 32. 07/10/16 MSc. Distributed Systems 32 Apache Storm [Physical View]
  • 33. 07/10/16 MSc. Distributed Systems 33 Twitter introduces Heron !! [Storm's successor]
  • 34. 07/10/16 MSc. Distributed Systems 34 Stream Processing Engines Many More !!!
  • 36. 07/10/16 MSc. Distributed Systems 36 Hidden computation paradigm via pipelining !!
  • 37. 07/10/16 MSc. Distributed Systems 37 Pipelining ≈ Task Execution https://martin.kleppmann.com/unix
  • 38. 07/10/16 MSc. Distributed Systems 38 Let's build the concept again...
  • 39. 07/10/16 MSc. Distributed Systems 39 Linux pipelining in modern middle-ware... https://martin.kleppmann.com/unix
  • 40. 07/10/16 MSc. Distributed Systems 40 Spark, Storm, Samza, Flink Etc. https://martin.kleppmann.com/unix
  • 41. 07/10/16 MSc. Distributed Systems 41 Spark, Storm, Samza, Flink Etc. https://martin.kleppmann.com/unix
  • 42. 07/10/16 MSc. Distributed Systems 42 Pub/sub pitch https://martin.kleppmann.com/unix
  • 43. 07/10/16 MSc. Distributed Systems 43 Streaming Machine Learning ● By using a programing abstraction for distributed streaming – Apache SAMOA
  • 44. 07/10/16 MSc. Distributed Systems 44 Graph Stream Processing Referred Author: Vasia Kalavri, KTH https://berlinbuzzwords.de/sites/berlinbuzzwords.de/files/media/documents/buzzwords- kalavri.pdf
  • 45. 07/10/16 MSc. Distributed Systems 45 Static Graph Processing ● Load: read the graph from disk and partition it in memory ● Compute: read and mutate the graph state ● Store: write the final graph state back to disk
  • 46. 07/10/16 MSc. Distributed Systems 46 Static Graph Processing [Drawbacks] ● It is slow – wait until the computation is over before you see any result – pre-processing and partitioning ● It is expensive – lots of memory and CPU required in order to scale ● It requires re-computation for graph changes – no efficient way to deal with updates
  • 47. 07/10/16 MSc. Distributed Systems 47 Streaming Graph Processing We consume events in real-time ● Get results faster – No need to wait for the job to finish – Sometimes, early approximations are better than late exact answers ● Get results continuously – Process unbounded number of events
  • 48. 07/10/16 MSc. Distributed Systems 48 Real-world scenarios ● Targeted Advertisement – Finding Strongly Connected Components in a social network graph – Targeted chain of advertisement on detected communities Jane Joe knows #Tesla posts likes Self driving cars Ads Peter Taphouse checks-in John subscribes Dinner Offer Ads
  • 49. 07/10/16 MSc. Distributed Systems 49 Streaming Graph Processing [Challenges] ● Maintain the graph structure – How to apply state updates efficiently? ● Result updates – Re-run the analysis for each event? – Design an incremental algorithm? – Run separate instances on multiple snapshots? ● How to preserve graph properties? – Natural behavior?
  • 50. 07/10/16 MSc. Distributed Systems 50 Streaming Graph Processing [Current Research] Each event is an edge addition Jane Joe knows Jane #Tesla likes Joe #Tesla posts Peter TapHouse checks-in
  • 52. 07/10/16 MSc. Distributed Systems 52 Dynamic Graph Processing ● Instead of analyzing the whole graph – Analyze it's properties by preserving them continuously ● Connectivity or Distance (spanners) ● Graph cut estimation (sparsifiers) ● Neighborhood or homomorphic properties (sketches)
  • 53. 07/10/16 MSc. Distributed Systems 53 Dynamic Graph Processing (Contd.) Jane Joe knows #Tesla posts likes Self driving cars Ads Peter Taphouse checks-in John subscribes Dinner Offer Ads Peter Jane loves loves Self driving cars Ads
  • 54. 07/10/16 MSc. Distributed Systems 54 Stream Connected Components ● State: a disjoint set data structure for the components ● Computation: For each edge – if seen for the 1st time, create a component with ID the min of the vertex IDs – if in different components, merge them and update the component ID to the min of the component IDs – if only one of the endpoints belongs to a component, add the other one to the same component
  • 55. 07/10/16 MSc. Distributed Systems 55 Stream Connected Components
  • 56. 07/10/16 MSc. Distributed Systems 56 Stream Connected Components
  • 57. 07/10/16 MSc. Distributed Systems 57 Stream Connected Components
  • 58. 07/10/16 MSc. Distributed Systems 58 Stream Connected Components
  • 59. 07/10/16 MSc. Distributed Systems 59 Stream Connected Components
  • 60. 07/10/16 MSc. Distributed Systems 60 Stream Connected Components
  • 61. 07/10/16 MSc. Distributed Systems 61 Stream Connected Components
  • 62. 07/10/16 MSc. Distributed Systems 62 Stream Connected Components
  • 63. 07/10/16 MSc. Distributed Systems 63 Stream Connected Components
  • 64. 07/10/16 MSc. Distributed Systems 64 Stream Connected Components
  • 65. 07/10/16 MSc. Distributed Systems 65 Stream Connected Components
  • 66. 07/10/16 MSc. Distributed Systems 66 Distributed Stream Connected Components
  • 67. 07/10/16 MSc. Distributed Systems 67 Streaming Graph Processing [Current Work] ● We're working with Gelly-Streams on – Preserving natural properties in large scale real- world evolving graphs – Joining multiple streams for detects graph causality/ bipartite – Efficient graph partitioning mechanisms to on- board with popular data-stores like Cassandra, HDFS – Producing a platform to benchmark NPC problems in real-world graphs
  • 68. 07/10/16 MSc. Distributed Systems 68 Discussion !!
  • 69. 07/10/16 MSc. Distributed Systems 69 Thank you !! sameera1@mail.usf.edu