SlideShare a Scribd company logo
1 of 105
Download to read offline
ACM DEBS 2015: Realtime
Streaming Analytics
Patterns
Srinath Perera
Sriskandarajah Suhothayan
WSO2 Inc.
Data Analytics ( Big Data)
o Scientists are doing this for
25 year with MPI (1991)
using special Hardware
o Took off with Google’s
MapReduce paper (2004),
Apache Hadoop, Hive and
whole ecosystem created.
o Later Spark emerged, and it is
faster.
o But, processing takes time.
Value of Some Insights degrade
Fast!
o For some usecases ( e.g. stock
markets, traffic, surveillance,
patient monitoring) the value
of insights degrade very
quickly with time.
o E.g. stock markets and speed of
light
oo We need technology that can produce outputs fast
o Static Queries, but need very fast output (Alerts, Realtime
control)
o Dynamic and Interactive Queries ( Data exploration)
History
▪Realtime Analytics are not new
either!!
- Active Databases (2000+)
- Stream processing (Aurora, Borealis
(2005+) and later Storm)
- Distributed Streaming Operators (e.
g. Database research topic around
2005)
- CEP Vendor Roadmap ( from http:
//www.complexevents.
com/2014/12/03/cep-tooling-
market-survey-2014/)
Data Analytics Landscape
Realtime Interactive Analytics
o Usually done to support
interactive queries
o Index data to make them
them readily accessible so
you can respond to queries
fast. (e.g. Apache Drill)
o Tools like Druid, VoltDB and
SAP Hana can do this with all
data in memory to make
things really fast.
Realtime Streaming Analytics
o Process data without Streaming ( As data some in)
o Queries are fixed ( Static)
o Triggers when given conditions are met.
o Technologies
o Stream Processing ( Apache Storm, Apache Samza)
o Complex Event Processing/CEP (WSO2 CEP, Esper,
StreamBase)
o MicroBatches ( Spark Streaming)
Realtime Football Analytics
● Video: https://www.youtube.com/watch?v=nRI6buQ0NOM
● More Info: http://www.slideshare.net/hemapani/strata-2014-
talktracking-a-soccer-game-with-big-data
Why Realtime Streaming Analytics
Patterns?
o Reason 1: Usual advantages
o Give us better understanding
o Give us better vocabulary to teach and
communicate
o Tools can implement them
o ..
o Reason 2: Under theme realtime analytics, lot of
people get too much carried away with word count
example. Patterns shows word count is just tip of
the iceberg.
Earlier Work on Patterns
o Patterns from SQL ( project, join, filter etc)
o Event Processing Technical Society’s (EPTS)
reference architecture
o higher-level patterns such as tracking, prediction and
learning in addition to low-level operators that
comes from SQL like languages.
o Esper’s Solution Patterns Document (50 patterns)
o Coral8 White Paper
Basic Patterns
o Pattern 1: Preprocessing ( filter, transform, enrich,
project .. )
o Pattern 2: Alerts and Thresholds
o Pattern 3: Simple Counting and Counting with
Windows
o Pattern 4: Joining Event Streams
o Pattern 5: Data Correlation, Missing Events, and
Erroneous Data
Patterns for Handling Trends
o Pattern 7: Detecting Temporal Event Sequence
Patterns
o Pattern 8: Tracking ( track something over space or
time)
o Pattern 9: Detecting Trends ( rise, fall, turn, tipple
bottom)
o Pattern 13: Online Control
Mixed Patterns
o Pattern 6: Interacting with Databases
o Pattern 10: Running the same Query in Batch and
Realtime Pipelines
o Pattern 11: Detecting and switching to Detailed
Analysis
o Pattern 12: Using a Machine Learning Model
Earlier Work on Patterns
Realtime Streaming
Analytics Tools
Implementing Realtime Analytics
o tempting to write a custom code. Filter look very
easy. Too complex!! Don’t!
o Option 1: Stream Processing (e.g. Storm). Kind of
works. It is like Map Reduce, you have to write code.
o Option 2: Spark Streaming - more compact than
Storm, but cannot do some stateful operations.
o Option 3: Complex Event Processing - compact, SQL
like language, fast
Stream Processing
o Program a set of processors and wire them up, data
flows though the graph.
o A middleware framework handles data flow,
distribution, and fault tolerance (e.g. Apache Storm,
Samza)
o Processors may be in the same machine or multiple
machines
Writing a Storm Program
o Write Spout(s)
o Write Bolt(s)
o Wire them up
o Run
Write Bolts
We will use a shorthand
like on the left to explain
public static class WordCount extends BaseBasicBolt {
@Override
public void execute(Tuple tuple, BasicOutputCollector
collector) {
.. do something …
collector.emit(new Values(word, count));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer
declarer) {
declarer.declare(new Fields("word", "count"));
}
}
Wire up and Run
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 5);
builder.setBolt("split", new SplitSentence(), 8)
.shuffleGrouping("spout");
builder.setBolt("count", new WordCount(), 12)
.fieldsGrouping("split", new Fields("word"));
Config conf = new Config();
if (args != null && args.length > 0) {
conf.setNumWorkers(3);
StormSubmitter.submitTopologyWithProgressBar(
args[0], conf, builder.createTopology());
}else {
conf.setMaxTaskParallelism(3);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count", conf,
builder.createTopology());
...
}
}
Complex Event Processing
Micro Batches ( e.g. Spark
Streaming)
o Process data in small batches,
and then combine results for
final results (e.g. Spark)
o Works for simple aggregates,
but tricky to do this for complex
operations (e.g. Event
Sequences)
o Can do it with MapReduce as
well if the deadlines are not too
tight.
o A SQL like data processing
languages (e.g. Apache Hive)
o Since many understand SQL,
Hive made large scale data
processing Big Data accessible
to many
o Expressive, short, and sweet.
o Define core operations that
covers 90% of problems
o Let experts dig in when they
like!
SQL Like Query Languages
o Easy to follow from SQL
o Expressive, short, and sweet.
o Define core operations that covers 90% of problems
o Let experts dig in when they like!
CEP = SQL for Realtime
Analytics
Pattern
Implementations
Code and other details
o Sample code - https://github.
com/suhothayan/DEBS-2015-Realtime-Analytics-
Patterns
o WSO2 CEP
o pack http://svn.wso2.
org/repos/wso2/people/suho/packs/cep/4.0.0
/debs2015/wso2cep-4.0.0-SNAPSHOT.zip
o docs- https://docs.wso2.
com/display/CEP400/WSO2+Complex+Event+Processor+
Documentation
o Apache Storm - https://storm.apache.org/
o We have packs in a pendrive
Pattern 1: Preprocessing
o What? Cleanup and prepare data via operations like
filter, project, enrich, split, and transformations
o Usecases?
o From twitter data stream: we extract author,
timestamp and location fields and then filter
them based on the location of the author.
o From temperature stream we expect
temperature & room number of the sensor and
filter by them.
Filter
from TempStream [ roomNo > 245 and roomNo <= 365]
select roomNo, temp
insert into ServerRoomTempStream ;
In Storm
In CEP ( Siddhi)
Architecture of WSO2 CEP
CEP Event Adapters
Support for several transports (network access)
● SOAP
● HTTP
● JMS
● SMTP
● SMS
● Thrift
● Kafka
● Websocket
● MQTT
Supports database writes using Map messages
● Cassandra
● RDBMs
Supports custom event adaptors via its pluggable architecture!
Stream Definition (Data Model)
{
'name':'soft.drink.coop.sales', 'version':'1.0.0',
'nickName': 'Soft_Drink_Sales', 'description': 'Soft drink sales',
'metaData':[
{'name':'region','type':'STRING'}
],
'correlationData':[
{'name':’transactionID’,'type':'STRING'}
],
'payloadData':[
{'name':'brand','type':'STRING'},
{'name':'quantity','type':'INT'},
{'name':'total','type':'INT'},
{'name':'user','type':'STRING'}
]
}
Projection
define stream TempStream
(deviceID long, roomNo int, temp double);
from TempStream
select roomNo, temp
insert into OutputStream ;
Inferred Streams
from TempStream
select roomNo, temp
insert into OutputStream ;
define stream OutputStream
(roomNo int, temp double);
Enrich
from TempStream
select roomNo, temp,‘C’ as scale
insert into OutputStream
define stream OutputStream
(roomNo int, temp double, scale string);
from TempStream
select deviceID, roomNo, avg(temp) as avgTemp
insert into OutputStream ;
Transformation
from TempStream
select concat(deviceID, ‘-’, roomNo) as uid,
toFahrenheit(temp) as tempInF,
‘F’ as scale
insert into OutputStream ;
Split
from TempStream
select roomNo, temp
insert into RoomTempStream ;
from TempStream
select deviceID, temp
insert into DeviceTempStream ;
Pattern 2: Alerts and Thresholds
o What? detects a condition and generates alerts
based on a condition. (e.g. Alarm on high
temperature).
o These alerts can be based on a simple value or
more complex conditions such as rate of increase
etc.
o Usecases?
o Raise alert when vehicle going too fast
o Alert when a room is too hot
Filter Alert
from TempStream [ roomNo > 245 and roomNo <= 365
and temp > 40 ]
select roomNo, temp
insert into AlertServerRoomTempStream ;
Pattern 3: Simple Counting and
Counting with Windows
o What? aggregate functions like Min, Max,
Percentiles, etc
o Often they can be counted without storing any
data
o Most useful when used with a window
o Usecases?
o Most metrics need a time bound so we can
compare ( errors per day, transactions per
second)
o Linux Load Average give us an idea of overall
trend by reporting last 1m, 3m, and 5m mean.
Types of windows
o Sliding windows vs. Batch (tumbling) windows
o Time vs. Length windows
Also supports
o Unique window
o First unique window
o External time window
Window
In Storm
Aggregation
In CEP (Siddhi)
from TempStream
select roomNo, avg(temp) as avgTemp
insert into HotRoomsStream ;
Sliding Time Window
from TempStream#window.time(1 min)
select roomNo, avg(temp) as avgTemp
insert all events into AvgRoomTempStream ;
Group By
from TempStream#window.time(1 min)
select roomNo, avg(temp) as avgTemp
group by roomNo
insert all events into HotRoomsStream ;
Batch Time Window
from TempStream#window.timeBatch(5 min)
select roomNo, avg(temp) as avgTemp
group by roomNo
insert all events into HotRoomsStream ;
Pattern 4: Joining Event Streams
o What? Create a new event stream by joining
multiple streams
o Complication comes with time. So need at least
one window
o Often used with a window
o Usecases?
o To detecting when a player has kicked the ball in
a football game .
o To correlate TempStream and the state of the
regulator and trigger control commands
Join with Storm
Join
define stream TempStream
(deviceID long, roomNo int, temp double);
define stream RegulatorStream
(deviceID long, roomNo int, isOn bool);
In CEP (Siddhi)
Join
define stream TempStream
(deviceID long, roomNo int, temp double);
define stream RegulatorStream
(deviceID long, roomNo int, isOn bool);
from TempStream[temp > 30.0]#window.time(1 min) as T
join RegulatorStream[isOn == false]#window.length(1) as R
on T.roomNo == R.roomNo
select T.roomNo, R.deviceID, ‘start’ as action
insert into RegulatorActionStream ;
In CEP (Siddhi)
Pattern 5: Data Correlation, Missing
Events, and Erroneous Data
o What? find correlations and use that to detect and
handle missing and erroneous Data
o Use Cases?
o Detecting a missing event (e.g., Detect a
customer request that has not been responded
within 1 hour of its reception)
o Detecting erroneous data (e.g., Detecting failed
sensors using a set of sensors that monitor
overlapping regions. We can use those
redundant data to find erroneous sensors and
remove those data from further processing)
Missing Event in Storm
Missing Event in CEP
In CEP (Siddhi)
from RequestStream#window.time(1h)
insert expired events into ExpiryStream
from r1=RequestStream->r2=Response[id=r1.id] or
r3=ExpiryStream[id=r1.id]
select r1.id as id ...
insert into AlertStream having having r2.id == null;
Pattern 6: Interacting with Databases
o What? Combine realtime data against historical
data
o Use Cases?
o On a transaction, looking up the customer age
using ID from customer database to detect fraud
(enrichment)
o Checking a transaction against blacklists and
whitelists in the database
o Receive an input from the user (e.g., Daily
discount amount may be updated in the
database, and then the query will pick it
automatically without human intervention).
In Storm
Querying Databases
In CEP (Siddhi)
Event Table
define table CardUserTable (name string, cardNum long) ;
@from(eventtable = 'rdbms' , datasource.name = ‘CardDataSource’ ,
table.name = ‘UserTable’, caching.algorithm’=‘LRU’)
define table CardUserTable (name string, cardNum long)
Cache types supported
● Basic: A size-based algorithm based on FIFO.
● LRU (Least Recently Used): The least recently used event is dropped
when cache is full.
● LFU (Least Frequently Used): The least frequently used event is dropped
when cache is full.
Join : Event Table
define stream Purchase (price double, cardNo long, place string);
define table CardUserTable (name string, cardNum long) ;
from Purchase#window.length(1) join CardUserTable
on Purchase.cardNo == CardUserTable.cardNum
select Purchase.cardNo as cardNo,
CardUserTable.name as name,
Purchase.price as price
insert into PurchaseUserStream ;
Insert : Event Table
define stream FraudStream (price double, cardNo long, userName
string);
define table BlacklistedUserTable (name string, cardNum long) ;
from FraudStream
select userName as name, cardNo as cardNum
insert into BlacklistedUserTable ;
Update : Event Table
define stream LoginStream (userID string,
islogin bool, loginTime long);
define table LastLoginTable (userID string, time long) ;
from LoginStream
select userID, loginTime as time
update LastLoginTable
on LoginStream.userID == LastLoginTable.userID ;
Pattern 7: Detecting Temporal
Event Sequence Patterns
o What? detect a temporal sequence of events or
condition arranged in time
o Use Cases?
o Detect suspicious activities like small transaction
immediately followed by a large transaction
o Detect ball possession in a football game
o Detect suspicious financial patterns like large buy
and sell behaviour within a small time period
In Storm
Pattern
In CEP (Siddhi)
Pattern
define stream Purchase (price double, cardNo long,place string);
from every (a1 = Purchase[price < 100] -> a3= ..) ->
a2 = Purchase[price >10000 and a1.cardNo == a2.cardNo]
within 1 day
select a1.cardNo as cardNo, a2.price as price, a2.place as place
insert into PotentialFraud ;
Pattern 8: Tracking
o What? detecting an overall trend over time
o Use Cases?
o Tracking a fleet of vehicles, making sure that
they adhere to speed limits, routes, and Geo-
fences.
o Tracking wildlife, making sure they are alive (they
will not move if they are dead) and making sure
they will not go out of the reservation.
o Tracking airline luggage and making sure they
have not been sent to wrong destinations
o Tracking a logistic network and figuring out
bottlenecks and unexpected conditions.
TFL: Traffic Analytics
Built using TFL ( Transport for London) open data feeds.
http://goo.gl/9xNiCm http://goo.gl/04tX6k
Pattern 9: Detecting Trends
o What? tracking something over space and time and
detects given conditions.
o Useful in stock markets, SLA enforcement, auto
scaling, predictive maintenance
o Use Cases?
o Rise, Fall of values and Turn (switch from rise to
a fall)
o Outliers - deviate from the current trend by a
large value
o Complex trends like “Triple Bottom” and “Cup
and Handle” [17].
Trend in Storm
Build and apply an state machine
In CEP (Siddhi)
Sequence
from t1=TempStream,
t2=TempStream [(isNull(t2[last].temp) and t1.temp<temp) or
(t2[last].temp < temp and not(isNull(t2[last].temp))]+
within 5 min
select t1.temp as initialTemp,
t2[last].temp as finalTemp,
t1.deviceID,
t1.roomNo
insert into IncreaingHotRoomsStream ;
In CEP (Siddhi)
Partition
partition by (roomNo of TempStream)
begin
from t1=TempStream,
t2=TempStream [(isNull(t2[last].temp) and t1.temp<temp)
or (t2[last].temp < temp and not(isNull(t2[last].temp))]+
within 5 min
select t1.temp as initialTemp,
t2[last].temp as finalTemp,
t1.deviceID,
t1.roomNo
insert into IncreaingHotRoomsStream ;
end;
Detecting Trends in Real Life
o Paper “A Complex Event Processing
Toolkit for Detecting Technical Chart
Patterns” (HPBC 2015) used the idea to
identify stock chart patterns
o Used kernel regression for smoothing
and detected maxima’s and minimas.
o Then any pattern can be written as a
temporal event sequence.
Pattern 10: Lambda Architecture
o What? runs the same query in both relatime and
batch pipelines. This uses realtime analytics to fill
the lag in batch analytics results.
o Also called “Lambda Architecture”. See Nathen
Marz’s “Questioning the Lambda Architecture”
o Use Cases?
o For example, if batch processing takes 15
minutes, results would always lags 15 minutes
from the current data. Here realtime processing
fill the gap.
Lambda Architecture. How?
Pattern 11: Detecting and switching
to Detailed Analysis
o What? detect a condition that suggests some
anomaly, and further analyze it using historical data.
o Use Cases?
o Use basic rules to detect Fraud (e.g., large transaction),
then pull out all transactions done against that credit
card for a larger time period (e.g., 3 months data) from
batch pipeline and run a detailed analysis
o While monitoring weather, detect conditions like high
temperature or low pressure in a given region, and then
start a high resolution localized forecast for that region.
o Detect good customers (e.g., through expenditure of
more than $1000 within a month, and then run a
detailed model to decide the potential of offering a deal).
Pattern 11: How?
Pattern 12: Using a Machine
Learning Model
o What? The idea is to train a model (often a
Machine Learning model), and then use it with the
Realtime pipeline to make decisions
o For example, you can build a model using R, export it as
PMML (Predictive Model Markup Language) and use it
within your realtime pipeline.
o Use Cases?
o Fraud Detection
o Segmentation
o Predict Churn
Predictive Analytics
o Build models and use
them with WSO2 CEP,
BAM and ESB using
upcoming WSO2
Machine Learner Product
( 2015 Q2)
o Build model using R,
export them as PMML,
and use within WSO2 CEP
o Call R Scripts from CEP
queries
In CEP (Siddhi)
PMML Model
from TrasnactionStream
#ml:applyModel(‘/path/logisticRegressionModel1.xml’,
timestamp, amount, ip)
insert into PotentialFraudsStream;
Pattern 13: Online Control
o What? Control something Online. These would
involve problems like current situation awareness,
predicting next value(s), and deciding on corrective
actions.
o Use Cases?
o Autopilot
o Self-driving
o Robotics
Fraud Demo
Scaling & HA for Pattern
Implementations
So how we scale a system ?
o Vertical Scaling
o Horizontal Scaling
Vertical Scaling
Horizontal Scaling
E.g. Calculate Mean
Horizontal Scaling ...
E.g. Calculate Mean
Horizontal Scaling ...
E.g. Calculate Mean
Horizontal Scaling ...
How about scaling median ?
Horizontal Scaling ...
How about scaling median ?
If & only if we can partition !
Scalable Realtime solutions ...
Spark Streaming
o Supports distributed processing
o Runs micro batches
o Not supports pattern & sequence detection
Scalable Realtime solutions ...
Spark Streaming
o Supports distributed processing
o Runs micro batches
o Not supports pattern & sequence detection
Apache Storm
o Supports distributed processing
o Stream processing engine
Why not use Apache Storm ?
Advantages
o Supports distributed processing
o Supports Partitioning
o Extendable
o Opensource
Disadvantages
o Need to write Java code
o Need to start from basic principles ( & data structures )
o Adoption for change is slow
o No support to govern artifacts
WSO2 CEP += Apache Storm
Advantages
o Supports distributed processing
o Supports Partitioning
o Extendable
o Opensource
Disadvantages
o No need to write Java code (Supports SQL like query language)
o No need to start from basic principles (Supports high level
language)
o Adoption for change is fast
o Govern artifacts using Toolboxes
o etc ...
How we scale ?
How we scale ...
Scaling with Storm
Siddhi QL
define stream StockStream
(symbol string, volume int, price double);
@name(‘Filter Query’)
from StockStream[price > 75]
select *
insert into HighPriceStockStream ;
@name(‘Window Query’)
from HighPriceStockStream#window.time(10 min)
select symbol, sum(volume) as sumVolume
insert into ResultStockStream ;
Siddhi QL - with partition
define stream StockStream
(symbol string, volume int, price double);
@name(‘Filter Query’)
from StockStream[price > 75]
select *
insert into HighPriceStockStream ;
@name(‘Window Query’)
partition with (symbol of HighPriceStockStream)
begin
from HighPriceStockStream#window.time(10 min)
select symbol, sum(volume) as sumVolume
insert into ResultStockStream ;
end;
Siddhi QL - distributed
define stream StockStream
(symbol string, volume int, price double);
@name(Filter Query’)
@dist(parallel= ‘3')
from StockStream[price > 75]
select *
insert into HightPriceStockStream ;
@name(‘Window Query’)
@dist(parallel= ‘2')
partition with (symbol of HighPriceStockStream)
begin
from HighPriceStockStream#window.time(10 min)
select symbol, sum(volume) as sumVolume
insert into ResultStockStream ;
end;
On Storm UI
On Storm UI
High Availability
HA / Persistence
o Option 1: Side by side
o Recommended
o Takes 2X hardware
o Gives zero down time
o Option 2: Snapshot and restore
o Uses less HW
o Will lose events between snapshots
o Downtime while recovery
o ** Some scenarios you can use event tables to keep intermediate state
Siddhi Extensions
● Function extension
● Aggregator extension
● Window extension
● Transform extension
Siddhi Query : Function Extension
from TempStream
select deviceID, roomNo,
custom:toKelvin(temp) as tempInKelvin,
‘K’ as scale
insert into OutputStream ;
Siddhi Query : Aggregator Extension
from TempStream
select deviceID, roomNo, temp
custom:stdev(temp) as stdevTemp,
‘C’ as scale
insert into OutputStream ;
Siddhi Query : Window Extension
from TempStream
#window.custom:lastUnique(roomNo,2 min)
select *
insert into OutputStream ;
Siddhi Query : Transform Extension
from XYZSpeedStream
#transform.custom:getVelocityVector(v,vx,vy,vz)
select velocity, direction
insert into SpeedStream ;
Contact us !

More Related Content

What's hot

Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with SparkKrishna Sankar
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Databricks
 
Introduction to WSO2 Analytics Platform: 2016 Q2 Update
Introduction to WSO2 Analytics Platform: 2016 Q2 UpdateIntroduction to WSO2 Analytics Platform: 2016 Q2 Update
Introduction to WSO2 Analytics Platform: 2016 Q2 UpdateSrinath Perera
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Srinath Perera
 
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Yves Raimond
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream ProcessingZbigniew Jerzak
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Spark Summit
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsSrinath Perera
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics PlatformSrinath Perera
 
Optimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone MLOptimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone MLSpark Summit
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" Joshua Bloom
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OSri Ambati
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaSpark Summit
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningSwiss Big Data User Group
 
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovA Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovSpark Summit
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...MLconf
 
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkFlink Forward
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAAlbert Bifet
 
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the EdgeUsing Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the EdgeDataWorks Summit
 

What's hot (20)

Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
 
Introduction to WSO2 Analytics Platform: 2016 Q2 Update
Introduction to WSO2 Analytics Platform: 2016 Q2 UpdateIntroduction to WSO2 Analytics Platform: 2016 Q2 Update
Introduction to WSO2 Analytics Platform: 2016 Q2 Update
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
 
Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015Spark Meetup @ Netflix, 05/19/2015
Spark Meetup @ Netflix, 05/19/2015
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream Processing
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
 
Meetup tensorframes
Meetup tensorframesMeetup tensorframes
Meetup tensorframes
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and Applications
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics Platform
 
Optimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone MLOptimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone ML
 
PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning" PyData 2015 Keynote: "A Systems View of Machine Learning"
PyData 2015 Keynote: "A Systems View of Machine Learning"
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
 
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino BusaReal-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
Real-Time Anomoly Detection with Spark MLib, Akka and Cassandra by Natalino Busa
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time Learning
 
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovA Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOA
 
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the EdgeUsing Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
Using Apache Pulsar to Provide Real-Time IoT Analytics on the Edge
 

Viewers also liked

WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and moreBrendan Gregg
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF SuperpowersBrendan Gregg
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf toolsBrendan Gregg
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Brendan Gregg
 

Viewers also liked (8)

WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at UberWSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 

Similar to ACM DEBS 2015: Realtime Streaming Analytics Patterns

Big Data and Machine Learning with FIWARE
Big Data and Machine Learning with FIWAREBig Data and Machine Learning with FIWARE
Big Data and Machine Learning with FIWAREFernando Lopez Aguilar
 
Observability: Beyond the Three Pillars with Spring
Observability: Beyond the Three Pillars with SpringObservability: Beyond the Three Pillars with Spring
Observability: Beyond the Three Pillars with SpringVMware Tanzu
 
Splunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageSplunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageDamien Dallimore
 
Complex Event Processing - A brief overview
Complex Event Processing - A brief overviewComplex Event Processing - A brief overview
Complex Event Processing - A brief overviewIstván Dávid
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsthelabdude
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the artStavros Kontopoulos
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
 
Streaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapStreaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapWithTheBest
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptxAndrew Lamb
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureGabriele Modena
 
OpenTelemetry 101 FTW
OpenTelemetry 101 FTWOpenTelemetry 101 FTW
OpenTelemetry 101 FTWNGINX, Inc.
 
Overview of QP Frameworks and QM Modeling Tools (Notes)
Overview of QP Frameworks and QM Modeling Tools (Notes)Overview of QP Frameworks and QM Modeling Tools (Notes)
Overview of QP Frameworks and QM Modeling Tools (Notes)Quantum Leaps, LLC
 
Time Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming PlatformTime Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming Platformconfluent
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming PlatformDr. Mirko Kämpf
 
Real time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.lyReal time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.lyVarun Vijayaraghavan
 

Similar to ACM DEBS 2015: Realtime Streaming Analytics Patterns (20)

Is this normal?
Is this normal?Is this normal?
Is this normal?
 
Big Data and Machine Learning with FIWARE
Big Data and Machine Learning with FIWAREBig Data and Machine Learning with FIWARE
Big Data and Machine Learning with FIWARE
 
Observability: Beyond the Three Pillars with Spring
Observability: Beyond the Three Pillars with SpringObservability: Beyond the Three Pillars with Spring
Observability: Beyond the Three Pillars with Spring
 
Splunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the messageSplunk Conf 2014 - Getting the message
Splunk Conf 2014 - Getting the message
 
1230 Rtf Final
1230 Rtf Final1230 Rtf Final
1230 Rtf Final
 
Complex Event Processing - A brief overview
Complex Event Processing - A brief overviewComplex Event Processing - A brief overview
Complex Event Processing - A brief overview
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the art
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
 
Streaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara PrathapStreaming Analytics and Internet of Things - Geesara Prathap
Streaming Analytics and Internet of Things - Geesara Prathap
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
 
OpenTelemetry 101 FTW
OpenTelemetry 101 FTWOpenTelemetry 101 FTW
OpenTelemetry 101 FTW
 
Overview of QP Frameworks and QM Modeling Tools (Notes)
Overview of QP Frameworks and QM Modeling Tools (Notes)Overview of QP Frameworks and QM Modeling Tools (Notes)
Overview of QP Frameworks and QM Modeling Tools (Notes)
 
Time Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming PlatformTime Series Analysis… using an Event Streaming Platform
Time Series Analysis… using an Event Streaming Platform
 
Time Series Analysis Using an Event Streaming Platform
 Time Series Analysis Using an Event Streaming Platform Time Series Analysis Using an Event Streaming Platform
Time Series Analysis Using an Event Streaming Platform
 
bakalarska_praca
bakalarska_pracabakalarska_praca
bakalarska_praca
 
Real time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.lyReal time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.ly
 

More from Srinath Perera

Book: Software Architecture and Decision-Making
Book: Software Architecture and Decision-MakingBook: Software Architecture and Decision-Making
Book: Software Architecture and Decision-MakingSrinath Perera
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the EnterpriseSrinath Perera
 
An Introduction to APIs
An Introduction to APIs An Introduction to APIs
An Introduction to APIs Srinath Perera
 
An Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance ProfessionalsAn Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance ProfessionalsSrinath Perera
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?Srinath Perera
 
Healthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesHealthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesSrinath Perera
 
How would AI shape Future Integrations?
How would AI shape Future Integrations?How would AI shape Future Integrations?
How would AI shape Future Integrations?Srinath Perera
 
The Role of Blockchain in Future Integrations
The Role of Blockchain in Future IntegrationsThe Role of Blockchain in Future Integrations
The Role of Blockchain in Future IntegrationsSrinath Perera
 
Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going? Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going? Srinath Perera
 
Few thoughts about Future of Blockchain
Few thoughts about Future of BlockchainFew thoughts about Future of Blockchain
Few thoughts about Future of BlockchainSrinath Perera
 
A Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New TechnologiesA Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New TechnologiesSrinath Perera
 
Privacy in Bigdata Era
Privacy in Bigdata  EraPrivacy in Bigdata  Era
Privacy in Bigdata EraSrinath Perera
 
Blockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and RisksBlockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and RisksSrinath Perera
 
Today's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology LandscapeToday's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology LandscapeSrinath Perera
 
An Emerging Technologies Timeline
An Emerging Technologies TimelineAn Emerging Technologies Timeline
An Emerging Technologies TimelineSrinath Perera
 
The Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsThe Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsSrinath Perera
 
Analytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the UglyAnalytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the UglySrinath Perera
 
Transforming a Business Through Analytics
Transforming a Business Through AnalyticsTransforming a Business Through Analytics
Transforming a Business Through AnalyticsSrinath Perera
 
SoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration TechnologySoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration TechnologySrinath Perera
 

More from Srinath Perera (20)

Book: Software Architecture and Decision-Making
Book: Software Architecture and Decision-MakingBook: Software Architecture and Decision-Making
Book: Software Architecture and Decision-Making
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the Enterprise
 
An Introduction to APIs
An Introduction to APIs An Introduction to APIs
An Introduction to APIs
 
An Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance ProfessionalsAn Introduction to Blockchain for Finance Professionals
An Introduction to Blockchain for Finance Professionals
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
 
Healthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & ChallengesHealthcare + AI: Use cases & Challenges
Healthcare + AI: Use cases & Challenges
 
How would AI shape Future Integrations?
How would AI shape Future Integrations?How would AI shape Future Integrations?
How would AI shape Future Integrations?
 
The Role of Blockchain in Future Integrations
The Role of Blockchain in Future IntegrationsThe Role of Blockchain in Future Integrations
The Role of Blockchain in Future Integrations
 
Future of Serverless
Future of ServerlessFuture of Serverless
Future of Serverless
 
Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going? Blockchain: Where are we? Where are we going?
Blockchain: Where are we? Where are we going?
 
Few thoughts about Future of Blockchain
Few thoughts about Future of BlockchainFew thoughts about Future of Blockchain
Few thoughts about Future of Blockchain
 
A Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New TechnologiesA Visual Canvas for Judging New Technologies
A Visual Canvas for Judging New Technologies
 
Privacy in Bigdata Era
Privacy in Bigdata  EraPrivacy in Bigdata  Era
Privacy in Bigdata Era
 
Blockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and RisksBlockchain, Impact, Challenges, and Risks
Blockchain, Impact, Challenges, and Risks
 
Today's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology LandscapeToday's Technology and Emerging Technology Landscape
Today's Technology and Emerging Technology Landscape
 
An Emerging Technologies Timeline
An Emerging Technologies TimelineAn Emerging Technologies Timeline
An Emerging Technologies Timeline
 
The Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming ApplicationsThe Rise of Streaming SQL and Evolution of Streaming Applications
The Rise of Streaming SQL and Evolution of Streaming Applications
 
Analytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the UglyAnalytics and AI: The Good, the Bad and the Ugly
Analytics and AI: The Good, the Bad and the Ugly
 
Transforming a Business Through Analytics
Transforming a Business Through AnalyticsTransforming a Business Through Analytics
Transforming a Business Through Analytics
 
SoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration TechnologySoC Keynote:The State of the Art in Integration Technology
SoC Keynote:The State of the Art in Integration Technology
 

Recently uploaded

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 

Recently uploaded (20)

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 

ACM DEBS 2015: Realtime Streaming Analytics Patterns

  • 1. ACM DEBS 2015: Realtime Streaming Analytics Patterns Srinath Perera Sriskandarajah Suhothayan WSO2 Inc.
  • 2. Data Analytics ( Big Data) o Scientists are doing this for 25 year with MPI (1991) using special Hardware o Took off with Google’s MapReduce paper (2004), Apache Hadoop, Hive and whole ecosystem created. o Later Spark emerged, and it is faster. o But, processing takes time.
  • 3. Value of Some Insights degrade Fast! o For some usecases ( e.g. stock markets, traffic, surveillance, patient monitoring) the value of insights degrade very quickly with time. o E.g. stock markets and speed of light oo We need technology that can produce outputs fast o Static Queries, but need very fast output (Alerts, Realtime control) o Dynamic and Interactive Queries ( Data exploration)
  • 4. History ▪Realtime Analytics are not new either!! - Active Databases (2000+) - Stream processing (Aurora, Borealis (2005+) and later Storm) - Distributed Streaming Operators (e. g. Database research topic around 2005) - CEP Vendor Roadmap ( from http: //www.complexevents. com/2014/12/03/cep-tooling- market-survey-2014/)
  • 6. Realtime Interactive Analytics o Usually done to support interactive queries o Index data to make them them readily accessible so you can respond to queries fast. (e.g. Apache Drill) o Tools like Druid, VoltDB and SAP Hana can do this with all data in memory to make things really fast.
  • 7. Realtime Streaming Analytics o Process data without Streaming ( As data some in) o Queries are fixed ( Static) o Triggers when given conditions are met. o Technologies o Stream Processing ( Apache Storm, Apache Samza) o Complex Event Processing/CEP (WSO2 CEP, Esper, StreamBase) o MicroBatches ( Spark Streaming)
  • 8. Realtime Football Analytics ● Video: https://www.youtube.com/watch?v=nRI6buQ0NOM ● More Info: http://www.slideshare.net/hemapani/strata-2014- talktracking-a-soccer-game-with-big-data
  • 9. Why Realtime Streaming Analytics Patterns? o Reason 1: Usual advantages o Give us better understanding o Give us better vocabulary to teach and communicate o Tools can implement them o .. o Reason 2: Under theme realtime analytics, lot of people get too much carried away with word count example. Patterns shows word count is just tip of the iceberg.
  • 10. Earlier Work on Patterns o Patterns from SQL ( project, join, filter etc) o Event Processing Technical Society’s (EPTS) reference architecture o higher-level patterns such as tracking, prediction and learning in addition to low-level operators that comes from SQL like languages. o Esper’s Solution Patterns Document (50 patterns) o Coral8 White Paper
  • 11. Basic Patterns o Pattern 1: Preprocessing ( filter, transform, enrich, project .. ) o Pattern 2: Alerts and Thresholds o Pattern 3: Simple Counting and Counting with Windows o Pattern 4: Joining Event Streams o Pattern 5: Data Correlation, Missing Events, and Erroneous Data
  • 12. Patterns for Handling Trends o Pattern 7: Detecting Temporal Event Sequence Patterns o Pattern 8: Tracking ( track something over space or time) o Pattern 9: Detecting Trends ( rise, fall, turn, tipple bottom) o Pattern 13: Online Control
  • 13. Mixed Patterns o Pattern 6: Interacting with Databases o Pattern 10: Running the same Query in Batch and Realtime Pipelines o Pattern 11: Detecting and switching to Detailed Analysis o Pattern 12: Using a Machine Learning Model
  • 14. Earlier Work on Patterns
  • 16. Implementing Realtime Analytics o tempting to write a custom code. Filter look very easy. Too complex!! Don’t! o Option 1: Stream Processing (e.g. Storm). Kind of works. It is like Map Reduce, you have to write code. o Option 2: Spark Streaming - more compact than Storm, but cannot do some stateful operations. o Option 3: Complex Event Processing - compact, SQL like language, fast
  • 17. Stream Processing o Program a set of processors and wire them up, data flows though the graph. o A middleware framework handles data flow, distribution, and fault tolerance (e.g. Apache Storm, Samza) o Processors may be in the same machine or multiple machines
  • 18. Writing a Storm Program o Write Spout(s) o Write Bolt(s) o Wire them up o Run
  • 19. Write Bolts We will use a shorthand like on the left to explain public static class WordCount extends BaseBasicBolt { @Override public void execute(Tuple tuple, BasicOutputCollector collector) { .. do something … collector.emit(new Values(word, count)); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word", "count")); } }
  • 20. Wire up and Run TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("spout", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8) .shuffleGrouping("spout"); builder.setBolt("count", new WordCount(), 12) .fieldsGrouping("split", new Fields("word")); Config conf = new Config(); if (args != null && args.length > 0) { conf.setNumWorkers(3); StormSubmitter.submitTopologyWithProgressBar( args[0], conf, builder.createTopology()); }else { conf.setMaxTaskParallelism(3); LocalCluster cluster = new LocalCluster(); cluster.submitTopology("word-count", conf, builder.createTopology()); ... } }
  • 22. Micro Batches ( e.g. Spark Streaming) o Process data in small batches, and then combine results for final results (e.g. Spark) o Works for simple aggregates, but tricky to do this for complex operations (e.g. Event Sequences) o Can do it with MapReduce as well if the deadlines are not too tight.
  • 23. o A SQL like data processing languages (e.g. Apache Hive) o Since many understand SQL, Hive made large scale data processing Big Data accessible to many o Expressive, short, and sweet. o Define core operations that covers 90% of problems o Let experts dig in when they like! SQL Like Query Languages
  • 24. o Easy to follow from SQL o Expressive, short, and sweet. o Define core operations that covers 90% of problems o Let experts dig in when they like! CEP = SQL for Realtime Analytics
  • 26. Code and other details o Sample code - https://github. com/suhothayan/DEBS-2015-Realtime-Analytics- Patterns o WSO2 CEP o pack http://svn.wso2. org/repos/wso2/people/suho/packs/cep/4.0.0 /debs2015/wso2cep-4.0.0-SNAPSHOT.zip o docs- https://docs.wso2. com/display/CEP400/WSO2+Complex+Event+Processor+ Documentation o Apache Storm - https://storm.apache.org/ o We have packs in a pendrive
  • 27. Pattern 1: Preprocessing o What? Cleanup and prepare data via operations like filter, project, enrich, split, and transformations o Usecases? o From twitter data stream: we extract author, timestamp and location fields and then filter them based on the location of the author. o From temperature stream we expect temperature & room number of the sensor and filter by them.
  • 28. Filter from TempStream [ roomNo > 245 and roomNo <= 365] select roomNo, temp insert into ServerRoomTempStream ; In Storm In CEP ( Siddhi)
  • 30. CEP Event Adapters Support for several transports (network access) ● SOAP ● HTTP ● JMS ● SMTP ● SMS ● Thrift ● Kafka ● Websocket ● MQTT Supports database writes using Map messages ● Cassandra ● RDBMs Supports custom event adaptors via its pluggable architecture!
  • 31. Stream Definition (Data Model) { 'name':'soft.drink.coop.sales', 'version':'1.0.0', 'nickName': 'Soft_Drink_Sales', 'description': 'Soft drink sales', 'metaData':[ {'name':'region','type':'STRING'} ], 'correlationData':[ {'name':’transactionID’,'type':'STRING'} ], 'payloadData':[ {'name':'brand','type':'STRING'}, {'name':'quantity','type':'INT'}, {'name':'total','type':'INT'}, {'name':'user','type':'STRING'} ] }
  • 32. Projection define stream TempStream (deviceID long, roomNo int, temp double); from TempStream select roomNo, temp insert into OutputStream ;
  • 33. Inferred Streams from TempStream select roomNo, temp insert into OutputStream ; define stream OutputStream (roomNo int, temp double);
  • 34. Enrich from TempStream select roomNo, temp,‘C’ as scale insert into OutputStream define stream OutputStream (roomNo int, temp double, scale string); from TempStream select deviceID, roomNo, avg(temp) as avgTemp insert into OutputStream ;
  • 35. Transformation from TempStream select concat(deviceID, ‘-’, roomNo) as uid, toFahrenheit(temp) as tempInF, ‘F’ as scale insert into OutputStream ;
  • 36. Split from TempStream select roomNo, temp insert into RoomTempStream ; from TempStream select deviceID, temp insert into DeviceTempStream ;
  • 37. Pattern 2: Alerts and Thresholds o What? detects a condition and generates alerts based on a condition. (e.g. Alarm on high temperature). o These alerts can be based on a simple value or more complex conditions such as rate of increase etc. o Usecases? o Raise alert when vehicle going too fast o Alert when a room is too hot
  • 38. Filter Alert from TempStream [ roomNo > 245 and roomNo <= 365 and temp > 40 ] select roomNo, temp insert into AlertServerRoomTempStream ;
  • 39. Pattern 3: Simple Counting and Counting with Windows o What? aggregate functions like Min, Max, Percentiles, etc o Often they can be counted without storing any data o Most useful when used with a window o Usecases? o Most metrics need a time bound so we can compare ( errors per day, transactions per second) o Linux Load Average give us an idea of overall trend by reporting last 1m, 3m, and 5m mean.
  • 40. Types of windows o Sliding windows vs. Batch (tumbling) windows o Time vs. Length windows Also supports o Unique window o First unique window o External time window
  • 42. Aggregation In CEP (Siddhi) from TempStream select roomNo, avg(temp) as avgTemp insert into HotRoomsStream ;
  • 43. Sliding Time Window from TempStream#window.time(1 min) select roomNo, avg(temp) as avgTemp insert all events into AvgRoomTempStream ;
  • 44. Group By from TempStream#window.time(1 min) select roomNo, avg(temp) as avgTemp group by roomNo insert all events into HotRoomsStream ;
  • 45. Batch Time Window from TempStream#window.timeBatch(5 min) select roomNo, avg(temp) as avgTemp group by roomNo insert all events into HotRoomsStream ;
  • 46. Pattern 4: Joining Event Streams o What? Create a new event stream by joining multiple streams o Complication comes with time. So need at least one window o Often used with a window o Usecases? o To detecting when a player has kicked the ball in a football game . o To correlate TempStream and the state of the regulator and trigger control commands
  • 48. Join define stream TempStream (deviceID long, roomNo int, temp double); define stream RegulatorStream (deviceID long, roomNo int, isOn bool); In CEP (Siddhi)
  • 49. Join define stream TempStream (deviceID long, roomNo int, temp double); define stream RegulatorStream (deviceID long, roomNo int, isOn bool); from TempStream[temp > 30.0]#window.time(1 min) as T join RegulatorStream[isOn == false]#window.length(1) as R on T.roomNo == R.roomNo select T.roomNo, R.deviceID, ‘start’ as action insert into RegulatorActionStream ; In CEP (Siddhi)
  • 50. Pattern 5: Data Correlation, Missing Events, and Erroneous Data o What? find correlations and use that to detect and handle missing and erroneous Data o Use Cases? o Detecting a missing event (e.g., Detect a customer request that has not been responded within 1 hour of its reception) o Detecting erroneous data (e.g., Detecting failed sensors using a set of sensors that monitor overlapping regions. We can use those redundant data to find erroneous sensors and remove those data from further processing)
  • 52. Missing Event in CEP In CEP (Siddhi) from RequestStream#window.time(1h) insert expired events into ExpiryStream from r1=RequestStream->r2=Response[id=r1.id] or r3=ExpiryStream[id=r1.id] select r1.id as id ... insert into AlertStream having having r2.id == null;
  • 53. Pattern 6: Interacting with Databases o What? Combine realtime data against historical data o Use Cases? o On a transaction, looking up the customer age using ID from customer database to detect fraud (enrichment) o Checking a transaction against blacklists and whitelists in the database o Receive an input from the user (e.g., Daily discount amount may be updated in the database, and then the query will pick it automatically without human intervention).
  • 55. In CEP (Siddhi) Event Table define table CardUserTable (name string, cardNum long) ; @from(eventtable = 'rdbms' , datasource.name = ‘CardDataSource’ , table.name = ‘UserTable’, caching.algorithm’=‘LRU’) define table CardUserTable (name string, cardNum long) Cache types supported ● Basic: A size-based algorithm based on FIFO. ● LRU (Least Recently Used): The least recently used event is dropped when cache is full. ● LFU (Least Frequently Used): The least frequently used event is dropped when cache is full.
  • 56. Join : Event Table define stream Purchase (price double, cardNo long, place string); define table CardUserTable (name string, cardNum long) ; from Purchase#window.length(1) join CardUserTable on Purchase.cardNo == CardUserTable.cardNum select Purchase.cardNo as cardNo, CardUserTable.name as name, Purchase.price as price insert into PurchaseUserStream ;
  • 57. Insert : Event Table define stream FraudStream (price double, cardNo long, userName string); define table BlacklistedUserTable (name string, cardNum long) ; from FraudStream select userName as name, cardNo as cardNum insert into BlacklistedUserTable ;
  • 58. Update : Event Table define stream LoginStream (userID string, islogin bool, loginTime long); define table LastLoginTable (userID string, time long) ; from LoginStream select userID, loginTime as time update LastLoginTable on LoginStream.userID == LastLoginTable.userID ;
  • 59. Pattern 7: Detecting Temporal Event Sequence Patterns o What? detect a temporal sequence of events or condition arranged in time o Use Cases? o Detect suspicious activities like small transaction immediately followed by a large transaction o Detect ball possession in a football game o Detect suspicious financial patterns like large buy and sell behaviour within a small time period
  • 61. In CEP (Siddhi) Pattern define stream Purchase (price double, cardNo long,place string); from every (a1 = Purchase[price < 100] -> a3= ..) -> a2 = Purchase[price >10000 and a1.cardNo == a2.cardNo] within 1 day select a1.cardNo as cardNo, a2.price as price, a2.place as place insert into PotentialFraud ;
  • 62. Pattern 8: Tracking o What? detecting an overall trend over time o Use Cases? o Tracking a fleet of vehicles, making sure that they adhere to speed limits, routes, and Geo- fences. o Tracking wildlife, making sure they are alive (they will not move if they are dead) and making sure they will not go out of the reservation. o Tracking airline luggage and making sure they have not been sent to wrong destinations o Tracking a logistic network and figuring out bottlenecks and unexpected conditions.
  • 63. TFL: Traffic Analytics Built using TFL ( Transport for London) open data feeds. http://goo.gl/9xNiCm http://goo.gl/04tX6k
  • 64. Pattern 9: Detecting Trends o What? tracking something over space and time and detects given conditions. o Useful in stock markets, SLA enforcement, auto scaling, predictive maintenance o Use Cases? o Rise, Fall of values and Turn (switch from rise to a fall) o Outliers - deviate from the current trend by a large value o Complex trends like “Triple Bottom” and “Cup and Handle” [17].
  • 65. Trend in Storm Build and apply an state machine
  • 66. In CEP (Siddhi) Sequence from t1=TempStream, t2=TempStream [(isNull(t2[last].temp) and t1.temp<temp) or (t2[last].temp < temp and not(isNull(t2[last].temp))]+ within 5 min select t1.temp as initialTemp, t2[last].temp as finalTemp, t1.deviceID, t1.roomNo insert into IncreaingHotRoomsStream ;
  • 67. In CEP (Siddhi) Partition partition by (roomNo of TempStream) begin from t1=TempStream, t2=TempStream [(isNull(t2[last].temp) and t1.temp<temp) or (t2[last].temp < temp and not(isNull(t2[last].temp))]+ within 5 min select t1.temp as initialTemp, t2[last].temp as finalTemp, t1.deviceID, t1.roomNo insert into IncreaingHotRoomsStream ; end;
  • 68. Detecting Trends in Real Life o Paper “A Complex Event Processing Toolkit for Detecting Technical Chart Patterns” (HPBC 2015) used the idea to identify stock chart patterns o Used kernel regression for smoothing and detected maxima’s and minimas. o Then any pattern can be written as a temporal event sequence.
  • 69. Pattern 10: Lambda Architecture o What? runs the same query in both relatime and batch pipelines. This uses realtime analytics to fill the lag in batch analytics results. o Also called “Lambda Architecture”. See Nathen Marz’s “Questioning the Lambda Architecture” o Use Cases? o For example, if batch processing takes 15 minutes, results would always lags 15 minutes from the current data. Here realtime processing fill the gap.
  • 71. Pattern 11: Detecting and switching to Detailed Analysis o What? detect a condition that suggests some anomaly, and further analyze it using historical data. o Use Cases? o Use basic rules to detect Fraud (e.g., large transaction), then pull out all transactions done against that credit card for a larger time period (e.g., 3 months data) from batch pipeline and run a detailed analysis o While monitoring weather, detect conditions like high temperature or low pressure in a given region, and then start a high resolution localized forecast for that region. o Detect good customers (e.g., through expenditure of more than $1000 within a month, and then run a detailed model to decide the potential of offering a deal).
  • 73. Pattern 12: Using a Machine Learning Model o What? The idea is to train a model (often a Machine Learning model), and then use it with the Realtime pipeline to make decisions o For example, you can build a model using R, export it as PMML (Predictive Model Markup Language) and use it within your realtime pipeline. o Use Cases? o Fraud Detection o Segmentation o Predict Churn
  • 74. Predictive Analytics o Build models and use them with WSO2 CEP, BAM and ESB using upcoming WSO2 Machine Learner Product ( 2015 Q2) o Build model using R, export them as PMML, and use within WSO2 CEP o Call R Scripts from CEP queries
  • 75. In CEP (Siddhi) PMML Model from TrasnactionStream #ml:applyModel(‘/path/logisticRegressionModel1.xml’, timestamp, amount, ip) insert into PotentialFraudsStream;
  • 76. Pattern 13: Online Control o What? Control something Online. These would involve problems like current situation awareness, predicting next value(s), and deciding on corrective actions. o Use Cases? o Autopilot o Self-driving o Robotics
  • 78. Scaling & HA for Pattern Implementations
  • 79. So how we scale a system ? o Vertical Scaling o Horizontal Scaling
  • 82. Horizontal Scaling ... E.g. Calculate Mean
  • 83. Horizontal Scaling ... E.g. Calculate Mean
  • 84. Horizontal Scaling ... How about scaling median ?
  • 85. Horizontal Scaling ... How about scaling median ? If & only if we can partition !
  • 86. Scalable Realtime solutions ... Spark Streaming o Supports distributed processing o Runs micro batches o Not supports pattern & sequence detection
  • 87. Scalable Realtime solutions ... Spark Streaming o Supports distributed processing o Runs micro batches o Not supports pattern & sequence detection Apache Storm o Supports distributed processing o Stream processing engine
  • 88. Why not use Apache Storm ? Advantages o Supports distributed processing o Supports Partitioning o Extendable o Opensource Disadvantages o Need to write Java code o Need to start from basic principles ( & data structures ) o Adoption for change is slow o No support to govern artifacts
  • 89. WSO2 CEP += Apache Storm Advantages o Supports distributed processing o Supports Partitioning o Extendable o Opensource Disadvantages o No need to write Java code (Supports SQL like query language) o No need to start from basic principles (Supports high level language) o Adoption for change is fast o Govern artifacts using Toolboxes o etc ...
  • 93. Siddhi QL define stream StockStream (symbol string, volume int, price double); @name(‘Filter Query’) from StockStream[price > 75] select * insert into HighPriceStockStream ; @name(‘Window Query’) from HighPriceStockStream#window.time(10 min) select symbol, sum(volume) as sumVolume insert into ResultStockStream ;
  • 94. Siddhi QL - with partition define stream StockStream (symbol string, volume int, price double); @name(‘Filter Query’) from StockStream[price > 75] select * insert into HighPriceStockStream ; @name(‘Window Query’) partition with (symbol of HighPriceStockStream) begin from HighPriceStockStream#window.time(10 min) select symbol, sum(volume) as sumVolume insert into ResultStockStream ; end;
  • 95. Siddhi QL - distributed define stream StockStream (symbol string, volume int, price double); @name(Filter Query’) @dist(parallel= ‘3') from StockStream[price > 75] select * insert into HightPriceStockStream ; @name(‘Window Query’) @dist(parallel= ‘2') partition with (symbol of HighPriceStockStream) begin from HighPriceStockStream#window.time(10 min) select symbol, sum(volume) as sumVolume insert into ResultStockStream ; end;
  • 99. HA / Persistence o Option 1: Side by side o Recommended o Takes 2X hardware o Gives zero down time o Option 2: Snapshot and restore o Uses less HW o Will lose events between snapshots o Downtime while recovery o ** Some scenarios you can use event tables to keep intermediate state
  • 100. Siddhi Extensions ● Function extension ● Aggregator extension ● Window extension ● Transform extension
  • 101. Siddhi Query : Function Extension from TempStream select deviceID, roomNo, custom:toKelvin(temp) as tempInKelvin, ‘K’ as scale insert into OutputStream ;
  • 102. Siddhi Query : Aggregator Extension from TempStream select deviceID, roomNo, temp custom:stdev(temp) as stdevTemp, ‘C’ as scale insert into OutputStream ;
  • 103. Siddhi Query : Window Extension from TempStream #window.custom:lastUnique(roomNo,2 min) select * insert into OutputStream ;
  • 104. Siddhi Query : Transform Extension from XYZSpeedStream #transform.custom:getVelocityVector(v,vx,vy,vz) select velocity, direction insert into SpeedStream ;