SlideShare a Scribd company logo
Stratio is the only Big Data
platform able to combine, in
one query, stored data with
streaming data in real-time
(in less than 30 seconds).
We are polyglots as well: We
use Spark over two noSQL
databases, Cassandra &
Mongo DB.
•
•
•
•
•
Spark Streaming provides a high-level abstraction called discretized stream
or DStream, which represents a continuous stream of data, and in fact is
represented as a sequence of RDDs, which is Spark’s abstraction of an
immutable, distributed dataset.
Shark
(SQL)
Spark
Streaming
Mllib
(machine
learning)
GraphX
(graph)
• map(func), flatMap(func),
filter(func), count()
• repartition(numPartitions)
• union(otherStream)
• reduce(func),countByValue
(), reduceByKey(func,
[numTasks])
• join(otherStream,
[numTasks]), cogroup
(otherStream, [numTasks])
• transform(func)
• updateStateByKey(func)
• window(windowLength,
slideInterval)
• countByWindow(windowLength
, slideInterval)
• reduceByWindow(func,
windowLength, slideInterval)
• reduceByKeyAndWindow(func,
windowLength, slideInterval,
[numTasks])
• countByValueAndWindow(wind
owLength, slideInterval,
[numTasks])
• print()
• foreachRDD(func)
• saveAsObjectFiles(prefix,
[suffix])
• saveAsTextFiles(prefix,
[suffix])
• saveAsHadoopFiles(prefix,
[suffix])
•
Complex event
processing, or CEP,
is event
processing that
combines data
from multiple
sources to infer
events or patterns
that suggest more
complicated
circumstances
CEP as a
technique helps
discover complex
events by
analyzing and
correlating other
events
•
A CEP engine should provide operators over
streams, keeping in mind that events and streams
in a CEP are first-class citizens. In CEP, we think in
terms of event streams: event stream is a sequence
of events that arrives over time.
Users provide queries to the CEP engine whose
main mission is matching those queries against
events coming through event streams.
A CEP engine thus has notion of time and it allows
working with temporal queries that reason in terms
of temporal concepts, such as “time windows” or
“before and after” event relationships… among
others
• Filter
• Join
• Aggregation (Avg, Sum , Min, Max, Custom)
• Group by
• Having
• Conditions and Expressions (and, or, not, true/false,
==,!=, >=, >, <=, <),
• Data types (boolean, string, int, long, float, double)
• Pattern processing
• Sequence processing (zero to many, one to many, and
zero to one)
• You still have to integrate it in your code
• There is nothing like an interactive console
• If you want to do something with the streams, you guessed it, you have to code it!
• There is no way to remotely listen to a stream
• There are no solution patterns ready-to-use with the engine
• No statistics, no auditing
• Hard to integrate with other tools (dashboarding, log stream, batch processing)
With this solution you can use our API in
order to request commands to Stratio
Streaming engine in your code.
And you can also work with the interactive
shell in order to test your queries or interact
with the engine on demand.
Both tools, in fact, hide that you are sending
messages to a complex engine, built with
Zookeeper, Kafka, Spark Streaming and Siddhi
CEP Engine.
kafka
zookeeper
requests
events
CASSANDRA
Kafka
CASSANDRA
CASSANDRA
Kafka
CASSANDRA
•
•
•
•
•
•
•



• create --stream testStream --definition
• "name.string,data.double“
• insert --stream testStream --values
• "name.Temperature, field.testValue,data.33“
• save cassandra start --stream testStream
• alter --stream testStream --definition
"field.string"



CREATE --stream testStream –definition
(name.string,
data.double,
data2.int,
data3.float,
data4.double,
trueorfalse.boolean
)
 Filtering
 Projection
 In-built functions
 Windows (time and length)
 Join
 Event Sequences
There are a lot of CEP operators that you can use in your queries:
 Event Patterns
 Output rate limiting
 Custom windows, custom
functions
from sensor_grid #window.length(10) select name, ind,
avg(data) as data group by name insert into
sensor_grid_avg for current-events
1. >, <, ==, >=, <=, !=
2. contains, instanceof
3. and, or, not
1. sum, avg, max, min, count: when aggregated (group by, having)
2. Field Type Conversion
3. Coalesce: if field null then take
another field
4. IsMatch: true or false if match regex
from orders[price >= 20 and price < 100]…
from orders select * insert into ordersB…
from orders select client, price insert into ordersB…
1. Length window - a sliding window that keeps the last N events.
2. Time window - a sliding window that keeps events that have arrived within the last T time period.
3. Time and Length batch window : same concept but outputs events only at the end of the given window
4. Unique window - keeps only the latest events that are unique according to the given unique attribute.
5. First unique window - keeps the first events that are unique according to the given unique attribute.
6. External Time Window - a sliding window that processes according to timestamps defined externally
from payments[channel == ‘Paypal']#window.time( 1 min )
• With “on <condition>” joins only the events that matches the condition
• With “within <time>”, joins only the events that are within said time of each other
from errorStream#window.length(1) as errorStream join
allStream#window.length(1) as allStream
on errorStream.numberOfErrors > allStream.totalNumberOfEvents*0.05
select * insert into alarmByThreshold;
from every (a1 = infoStock[action == "buy"]
-> a2 = confirmOrder[command == "OK"] )
-> b1 = StockExchangeStream [price > infoStock.price]
within 3000
select a1.action as action, b1.price as price
insert into StockQuote
from every a1 = infoStock[action == "buy"]+,
b1 = StockExchangeStream[price > 70]?,
b2 = StockExchangeStream[price >= 75]
select a1[0].action as action, b1.price as priceA, b2.price as priceBJoin
insert into StockQuote
Apache Flume is a distributed,
reliable, and available system for
efficiently collecting, aggregating
and moving large amounts of log
data from many different sources
to a centralized data store.
Stratio Ingestion is an ETL for Big
Data product, based on Flume.
Design your workflows (wysiwyg)
with useful and improved sources
and sinks, transform your data on
the fly
• Create the stream if it
doesn’t exist
• It is possible to send
filtered event-flows
only to streaming
engine
• Built on the Stratio
Streaming API.
• Call-center Real-time monitoring
Real-time detection of client churn risk
Natural Language Processing Analysis to detect incidents in real-time
Anomaly detection in the service based on patterns
• IT services monitoring
DoS attack detection, hotlinking, etc in real-time
Warnings in monitoring of heterogeneous services
Preventive detection of downtime based on patterns
• Sensor grid monitoring
Alarms when thresholds are reached
Complex alarms involving several sensors
Real-time monitoring (landing support devices in an airport, for example)
Data Machine
Intelligence
SELECT sum(order.quantity), company_data.country
FROM streaming.order WITH WINDOW 15 minutes
INNER JOIN batch.company_data
ON order.company = company_data.company_name;
.
• With an powerful query planner
• Able to perform mixed queries with
streaming and batch data
SQL query example, mixing real-time data
(coming from Stratio Streaming Engine) and
batch data (stored in a noSQL database)
We are first going to use
the Shell to create
streams and queries.
Strtio Spark Streaming + Siddhi CEP Engine
Strtio Spark Streaming + Siddhi CEP Engine
Strtio Spark Streaming + Siddhi CEP Engine
Strtio Spark Streaming + Siddhi CEP Engine

More Related Content

What's hot

Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics Platform
Srinath Perera
 
Introducing Kafka Connect and Implementing Custom Connectors
Introducing Kafka Connect and Implementing Custom ConnectorsIntroducing Kafka Connect and Implementing Custom Connectors
Introducing Kafka Connect and Implementing Custom Connectors
Itai Yaffe
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Brian O'Neill
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Databricks
 
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming AnalyticsDEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
Sriskandarajah Suhothayan
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
Robbie Strickland
 
[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview
Stratio
 
Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015
Databricks
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
Mhug apache storm
Mhug apache stormMhug apache storm
Mhug apache storm
Joseph Niemiec
 
Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎
Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎
Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎
Qingwen zhao
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friends
Natalino Busa
 
Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
Matthias Niehoff
 
Engineering fast indexes
Engineering fast indexesEngineering fast indexes
Engineering fast indexes
Daniel Lemire
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Spark Summit
 
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Spark Streaming @ Berlin Apache Spark Meetup, March 2015Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Stratio
 
Big data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real timeBig data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real time
Itai Yaffe
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
Petr Zapletal
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
Helena Edelson
 
Stratio CrossData: an efficient distributed datahub with batch and streaming ...
Stratio CrossData: an efficient distributed datahub with batch and streaming ...Stratio CrossData: an efficient distributed datahub with batch and streaming ...
Stratio CrossData: an efficient distributed datahub with batch and streaming ...
Stratio
 

What's hot (20)

Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics Platform
 
Introducing Kafka Connect and Implementing Custom Connectors
Introducing Kafka Connect and Implementing Custom ConnectorsIntroducing Kafka Connect and Implementing Custom Connectors
Introducing Kafka Connect and Implementing Custom Connectors
 
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
Data Pipelines & Integrating Real-time Web Services w/ Storm : Improving on t...
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
 
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming AnalyticsDEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview
 
Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
 
Mhug apache storm
Mhug apache stormMhug apache storm
Mhug apache storm
 
Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎
Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎
Apache Eagle: 来自eBay的分布式实时Hadoop数据安全引擎
 
Strata London 16: sightseeing, venues, and friends
Strata  London 16: sightseeing, venues, and friendsStrata  London 16: sightseeing, venues, and friends
Strata London 16: sightseeing, venues, and friends
 
Cassandra & Spark for IoT
Cassandra & Spark for IoTCassandra & Spark for IoT
Cassandra & Spark for IoT
 
Engineering fast indexes
Engineering fast indexesEngineering fast indexes
Engineering fast indexes
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
 
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Spark Streaming @ Berlin Apache Spark Meetup, March 2015Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
 
Big data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real timeBig data serving: Processing and inference at scale in real time
Big data serving: Processing and inference at scale in real time
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
Stratio CrossData: an efficient distributed datahub with batch and streaming ...
Stratio CrossData: an efficient distributed datahub with batch and streaming ...Stratio CrossData: an efficient distributed datahub with batch and streaming ...
Stratio CrossData: an efficient distributed datahub with batch and streaming ...
 

Viewers also liked

Google Cloud Dataflow and lightweight Lambda Architecture for Big Data App
Google Cloud Dataflow and lightweight Lambda Architecture  for Big Data AppGoogle Cloud Dataflow and lightweight Lambda Architecture  for Big Data App
Google Cloud Dataflow and lightweight Lambda Architecture for Big Data App
Trieu Nguyen
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
DataWorks Summit
 
Stratio platform overview v4.1
Stratio platform overview v4.1Stratio platform overview v4.1
Stratio platform overview v4.1
Stratio
 
[Strata] Sparkta
[Strata] Sparkta[Strata] Sparkta
[Strata] Sparkta
Stratio
 
Continuous delivery with Jenkins, Docker and Mesos/Marathon - jbcnconf
Continuous delivery with Jenkins, Docker and Mesos/Marathon - jbcnconfContinuous delivery with Jenkins, Docker and Mesos/Marathon - jbcnconf
Continuous delivery with Jenkins, Docker and Mesos/Marathon - jbcnconf
Julia Mateo
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
Oh Chan Kwon
 
Deploying WSO2 Middleware on Mesos
Deploying WSO2 Middleware on MesosDeploying WSO2 Middleware on Mesos
Deploying WSO2 Middleware on Mesos
Imesh Gunaratne
 

Viewers also liked (7)

Google Cloud Dataflow and lightweight Lambda Architecture for Big Data App
Google Cloud Dataflow and lightweight Lambda Architecture  for Big Data AppGoogle Cloud Dataflow and lightweight Lambda Architecture  for Big Data App
Google Cloud Dataflow and lightweight Lambda Architecture for Big Data App
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
 
Stratio platform overview v4.1
Stratio platform overview v4.1Stratio platform overview v4.1
Stratio platform overview v4.1
 
[Strata] Sparkta
[Strata] Sparkta[Strata] Sparkta
[Strata] Sparkta
 
Continuous delivery with Jenkins, Docker and Mesos/Marathon - jbcnconf
Continuous delivery with Jenkins, Docker and Mesos/Marathon - jbcnconfContinuous delivery with Jenkins, Docker and Mesos/Marathon - jbcnconf
Continuous delivery with Jenkins, Docker and Mesos/Marathon - jbcnconf
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
 
Deploying WSO2 Middleware on Mesos
Deploying WSO2 Middleware on MesosDeploying WSO2 Middleware on Mesos
Deploying WSO2 Middleware on Mesos
 

Similar to Strtio Spark Streaming + Siddhi CEP Engine

WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2
 
WSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needsWSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needs
Sriskandarajah Suhothayan
 
Meet the squirrel @ #CSHUG
Meet the squirrel @ #CSHUGMeet the squirrel @ #CSHUG
Meet the squirrel @ #CSHUG
Márton Balassi
 
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop SummitReal-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Gyula Fóra
 
Flink Streaming Berlin Meetup
Flink Streaming Berlin MeetupFlink Streaming Berlin Meetup
Flink Streaming Berlin Meetup
Márton Balassi
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
Kostas Tzoumas
 
Streaming Solr - Activate 2018 talk
Streaming Solr - Activate 2018 talkStreaming Solr - Activate 2018 talk
Streaming Solr - Activate 2018 talk
Amrit Sarkar
 
Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Lucidworks
 
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
confluent
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processing
Yaroslav Tkachenko
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Sriram Krishnan
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
DataWorks Summit
 
Timeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaTimeseries - data visualization in Grafana
Timeseries - data visualization in Grafana
OCoderFest
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017
Jags Ramnarayan
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
Prakash Chockalingam
 
Azure event hubs, Stream Analytics & Power BI (by Sam Vanhoutte)
Azure event hubs, Stream Analytics & Power BI (by Sam Vanhoutte)Azure event hubs, Stream Analytics & Power BI (by Sam Vanhoutte)
Azure event hubs, Stream Analytics & Power BI (by Sam Vanhoutte)
Codit
 
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
Julian Hyde
 

Similar to Strtio Spark Streaming + Siddhi CEP Engine (20)

WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
 
WSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needsWSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needs
 
Meet the squirrel @ #CSHUG
Meet the squirrel @ #CSHUGMeet the squirrel @ #CSHUG
Meet the squirrel @ #CSHUG
 
Real-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop SummitReal-time Stream Processing with Apache Flink @ Hadoop Summit
Real-time Stream Processing with Apache Flink @ Hadoop Summit
 
Flink Streaming Berlin Meetup
Flink Streaming Berlin MeetupFlink Streaming Berlin Meetup
Flink Streaming Berlin Meetup
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
 
Streaming Solr - Activate 2018 talk
Streaming Solr - Activate 2018 talkStreaming Solr - Activate 2018 talk
Streaming Solr - Activate 2018 talk
 
Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...
 
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
Streamsheets and Apache Kafka – Interactively build real-time Dashboards and ...
 
Kafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processingKafka Streams: the easiest way to start with stream processing
Kafka Streams: the easiest way to start with stream processing
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at ScaleData Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
Data Platform at Twitter: Enabling Real-time & Batch Analytics at Scale
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
Timeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaTimeseries - data visualization in Grafana
Timeseries - data visualization in Grafana
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Azure event hubs, Stream Analytics & Power BI (by Sam Vanhoutte)
Azure event hubs, Stream Analytics & Power BI (by Sam Vanhoutte)Azure event hubs, Stream Analytics & Power BI (by Sam Vanhoutte)
Azure event hubs, Stream Analytics & Power BI (by Sam Vanhoutte)
 
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 

Recently uploaded

Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 

Recently uploaded (20)

Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 

Strtio Spark Streaming + Siddhi CEP Engine

  • 1.
  • 2. Stratio is the only Big Data platform able to combine, in one query, stored data with streaming data in real-time (in less than 30 seconds). We are polyglots as well: We use Spark over two noSQL databases, Cassandra & Mongo DB.
  • 3.
  • 4. • • • • • Spark Streaming provides a high-level abstraction called discretized stream or DStream, which represents a continuous stream of data, and in fact is represented as a sequence of RDDs, which is Spark’s abstraction of an immutable, distributed dataset. Shark (SQL) Spark Streaming Mllib (machine learning) GraphX (graph)
  • 5. • map(func), flatMap(func), filter(func), count() • repartition(numPartitions) • union(otherStream) • reduce(func),countByValue (), reduceByKey(func, [numTasks]) • join(otherStream, [numTasks]), cogroup (otherStream, [numTasks]) • transform(func) • updateStateByKey(func) • window(windowLength, slideInterval) • countByWindow(windowLength , slideInterval) • reduceByWindow(func, windowLength, slideInterval) • reduceByKeyAndWindow(func, windowLength, slideInterval, [numTasks]) • countByValueAndWindow(wind owLength, slideInterval, [numTasks]) • print() • foreachRDD(func) • saveAsObjectFiles(prefix, [suffix]) • saveAsTextFiles(prefix, [suffix]) • saveAsHadoopFiles(prefix, [suffix])
  • 6. • Complex event processing, or CEP, is event processing that combines data from multiple sources to infer events or patterns that suggest more complicated circumstances CEP as a technique helps discover complex events by analyzing and correlating other events •
  • 7. A CEP engine should provide operators over streams, keeping in mind that events and streams in a CEP are first-class citizens. In CEP, we think in terms of event streams: event stream is a sequence of events that arrives over time. Users provide queries to the CEP engine whose main mission is matching those queries against events coming through event streams. A CEP engine thus has notion of time and it allows working with temporal queries that reason in terms of temporal concepts, such as “time windows” or “before and after” event relationships… among others • Filter • Join • Aggregation (Avg, Sum , Min, Max, Custom) • Group by • Having • Conditions and Expressions (and, or, not, true/false, ==,!=, >=, >, <=, <), • Data types (boolean, string, int, long, float, double) • Pattern processing • Sequence processing (zero to many, one to many, and zero to one)
  • 8. • You still have to integrate it in your code • There is nothing like an interactive console • If you want to do something with the streams, you guessed it, you have to code it! • There is no way to remotely listen to a stream • There are no solution patterns ready-to-use with the engine • No statistics, no auditing • Hard to integrate with other tools (dashboarding, log stream, batch processing)
  • 9.
  • 10. With this solution you can use our API in order to request commands to Stratio Streaming engine in your code. And you can also work with the interactive shell in order to test your queries or interact with the engine on demand. Both tools, in fact, hide that you are sending messages to a complex engine, built with Zookeeper, Kafka, Spark Streaming and Siddhi CEP Engine.
  • 14. • create --stream testStream --definition • "name.string,data.double“ • insert --stream testStream --values • "name.Temperature, field.testValue,data.33“ • save cassandra start --stream testStream • alter --stream testStream --definition "field.string"   
  • 15.
  • 16. CREATE --stream testStream –definition (name.string, data.double, data2.int, data3.float, data4.double, trueorfalse.boolean )
  • 17.  Filtering  Projection  In-built functions  Windows (time and length)  Join  Event Sequences There are a lot of CEP operators that you can use in your queries:  Event Patterns  Output rate limiting  Custom windows, custom functions from sensor_grid #window.length(10) select name, ind, avg(data) as data group by name insert into sensor_grid_avg for current-events
  • 18. 1. >, <, ==, >=, <=, != 2. contains, instanceof 3. and, or, not 1. sum, avg, max, min, count: when aggregated (group by, having) 2. Field Type Conversion 3. Coalesce: if field null then take another field 4. IsMatch: true or false if match regex from orders[price >= 20 and price < 100]… from orders select * insert into ordersB… from orders select client, price insert into ordersB…
  • 19. 1. Length window - a sliding window that keeps the last N events. 2. Time window - a sliding window that keeps events that have arrived within the last T time period. 3. Time and Length batch window : same concept but outputs events only at the end of the given window 4. Unique window - keeps only the latest events that are unique according to the given unique attribute. 5. First unique window - keeps the first events that are unique according to the given unique attribute. 6. External Time Window - a sliding window that processes according to timestamps defined externally from payments[channel == ‘Paypal']#window.time( 1 min )
  • 20. • With “on <condition>” joins only the events that matches the condition • With “within <time>”, joins only the events that are within said time of each other from errorStream#window.length(1) as errorStream join allStream#window.length(1) as allStream on errorStream.numberOfErrors > allStream.totalNumberOfEvents*0.05 select * insert into alarmByThreshold;
  • 21. from every (a1 = infoStock[action == "buy"] -> a2 = confirmOrder[command == "OK"] ) -> b1 = StockExchangeStream [price > infoStock.price] within 3000 select a1.action as action, b1.price as price insert into StockQuote from every a1 = infoStock[action == "buy"]+, b1 = StockExchangeStream[price > 70]?, b2 = StockExchangeStream[price >= 75] select a1[0].action as action, b1.price as priceA, b2.price as priceBJoin insert into StockQuote
  • 22.
  • 23. Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. Stratio Ingestion is an ETL for Big Data product, based on Flume. Design your workflows (wysiwyg) with useful and improved sources and sinks, transform your data on the fly • Create the stream if it doesn’t exist • It is possible to send filtered event-flows only to streaming engine • Built on the Stratio Streaming API.
  • 24. • Call-center Real-time monitoring Real-time detection of client churn risk Natural Language Processing Analysis to detect incidents in real-time Anomaly detection in the service based on patterns • IT services monitoring DoS attack detection, hotlinking, etc in real-time Warnings in monitoring of heterogeneous services Preventive detection of downtime based on patterns • Sensor grid monitoring Alarms when thresholds are reached Complex alarms involving several sensors Real-time monitoring (landing support devices in an airport, for example) Data Machine Intelligence
  • 25. SELECT sum(order.quantity), company_data.country FROM streaming.order WITH WINDOW 15 minutes INNER JOIN batch.company_data ON order.company = company_data.company_name; . • With an powerful query planner • Able to perform mixed queries with streaming and batch data SQL query example, mixing real-time data (coming from Stratio Streaming Engine) and batch data (stored in a noSQL database)
  • 26.
  • 27. We are first going to use the Shell to create streams and queries.