SlideShare a Scribd company logo
1 of 112
| © Copyright 2015 Hitachi Consulting1
Real-Time Event and Stream Processing
with Microsoft Azure
Khalid M. Salama
Microsoft Business Intelligence
Hitachi Consulting UK
We Make it Happen. Better.
| © Copyright 2015 Hitachi Consulting2
Outline
 What is Event & Stream Processing?
 Stream Processing Architecture
 Message Queuing
 Introducing Apache Storm
 Introducing Azure Stream Analytics
 Apache Storm vs Azure Stream Analytics
 Useful Resources
| © Copyright 2015 Hitachi Consulting3
Fundamentals
| © Copyright 2015 Hitachi Consulting4
What is Event & Stream Processing?
Terms
Real-time processing of a continuous sequence of data
points (stream), by applying a series of operations
(kernel functions) on each data point.
Stream
Processing
| © Copyright 2015 Hitachi Consulting5
What is Event & Stream Processing?
Terms
Real-time processing of a continuous sequence of data
points (stream), by applying a series of operations
(kernel functions) on each data point.
Stream
Processing
Real-time detection events from a data stream, via
aggregating data points in a time frame,
to perform subsequent actions.
Event
Processing
| © Copyright 2015 Hitachi Consulting6
What is Event & Stream Processing?
Tell me more…
Stream
Processing
𝑃4 𝑃1
```
𝑃2
``𝑃3
`
Operation
1
Operation
2
Operation
3
Final product
𝑃∞ 𝑃7 𝑃6 𝑃5…
Queued Data Points
| © Copyright 2015 Hitachi Consulting7
What is Event & Stream Processing?
Tell me more…
Stream
Processing
Event
Processing
𝑃∞ 𝑃7 𝑃6 𝑃5… 𝑃4 𝑃1
```
𝑃2
``𝑃3
`
Operation
1
Operation
2
Operation
3
Queued Data Points
Final product
𝑃∞ 𝑃7 𝑃6 𝑃5… 𝑃4 𝑃2𝑃3
Queued Data Points
{
Event?
Notifications /
Actions
| © Copyright 2015 Hitachi Consulting8
What is Event & Stream Processing?
Data at rest vs. Data in motion
Traditional – Working with data at rest Real-time – Working with data at motion
Data Store
Bulk-load & Batch Processing
Submit Query Get Results
Continuous
Processing
& Query
Contiguous
Data Stream
Static
Reference Data
Actions &
Data Archiving
Real-time Continuous Results
| © Copyright 2015 Hitachi Consulting9
Lambda Architecture
The speed layer and stream processing
Hot Path
Cold Path
| © Copyright 2015 Hitachi Consulting10
Scenarios for Stream Processing
The hot path…
• Predictive Maintenance
• Energy Efficiency & Smart Cities
IoT & Device Telemetry
• Real-time Sentiment analysis
• Crisis Management
Social Media Analytics
• Identity theft and stolen credit card details
• identify a fraudulent transactionFraud Detection
• Maintain a continual level of stock to support
unpredictable purchasing habitsInventory Management
• User Experience Improvements
• Targeted RecommendationsClickstream Analytics
| © Copyright 2015 Hitachi Consulting11
System Architecture
| © Copyright 2015 Hitachi Consulting12
Events & Stream Processing Architecture
The Canonical System
| © Copyright 2015 Hitachi Consulting13
Events & Stream Processing Architecture
The Canonical System
Event
Triggers
Applications
Web and social
Devices
Sensors
| © Copyright 2015 Hitachi Consulting14
Events & Stream Processing Architecture
The Canonical System
Data Stream
Collection
Message
Queuing
Event
Triggers
Applications
Web and social
Devices
Sensors
Producer/Consumer
Mediator
| © Copyright 2015 Hitachi Consulting15
Events & Stream Processing Architecture
The Canonical System
Stream
Processing
Data Stream
Collection
Message
Queuing
Event
Triggers
Applications
Web and social
Devices
Sensors
Machine Learning
Processing
and event detection
Web API calls
Producer/Consumer
Mediator
| © Copyright 2015 Hitachi Consulting16
Events & Stream Processing Architecture
The Canonical System
Stream
Processing
Data Stream
Collection
Message
Queuing
Event
Triggers
Applications
Web and social
Devices
Sensors
Machine Learning
Processing
and event detection
Web API calls
Reference Data
Producer/Consumer
Mediator
| © Copyright 2015 Hitachi Consulting17
Events & Stream Processing Architecture
The Canonical System
Stream
Processing
Data Stream
Collection
Message
Queuing
Storage and
Batch Analysis
Event
Triggers
Applications
Web and social
Devices
Sensors
Machine Learning
Processing
and event detection
Web API calls
Reference Data
Producer/Consumer
Mediator
| © Copyright 2015 Hitachi Consulting18
Events & Stream Processing Architecture
The Canonical System
Stream
Processing
Data Stream
Collection
Presentation
and Action
Message
Queuing
Storage and
Batch Analysis
Live Dashboards &
Analytics
Apps and Devices
to take actions
Ingress
Event
Triggers
Applications
Web and social
Devices
Sensors
Machine Learning
Processing
and event detection
Web API calls
Reference Data
Producer/Consumer
Mediator
| © Copyright 2015 Hitachi Consulting19
Events & Stream Processing Architecture
 Devices, Websites, and Apps that continuously produce data streamsData Sources
 Listen to, collection, and transfer in-bound eventsData Collection
 De-couples data consumers from data producers
 Reliable, distributed fault-tolerant, high-throughputs short-tem storage
Message Queuing
 Aggregate / filter / join incoming event streams
 Temporal engine for analysing data across time-series windows
Stream Processing
 High throughputs, random access data store to support processing
 Usually NoSQL data stores
Reference Data
 Store processed/ aggregated/ filtered data (SQL/NoSQL)
 Consolidate and store raw data into files for batch analysis (DFS)
Storage
 Rich interactive visualizations for real-time data analysis
 Application integration for process automation
Presentation
| © Copyright 2015 Hitachi Consulting20
Events & Stream Processing Architecture
Tools & Technologies
Stream
Processing
Data Stream
Collection
Presentation
and Action
Message
Queuing
Storage and
Batch Analysis
PowerBI
Live Dashboards
Apps and Devices
to take actions
Ingress
Event
Triggers
Applications
Web and social
Devices
Sensors
Azure ML
Spark Streaming on
HDInsight
Storm on HDInsight
Reference Data
Apache Kafka
Azure Event Hub
Azure Service Bus
HDFS
Azure SQL
DB/DW
Azure Steam
Analytics
Azure IoT Hub
| © Copyright 2015 Hitachi Consulting21
Message Queuing
| © Copyright 2015 Hitachi Consulting22
Message Queuing
A message is a data object to be processed (purchase order,
sensor readings, tweets, etc.)
Message Queuing systems are useful for:
 Decoupling message producers from consumer
 Increase Reliability (guaranteed delivery)
 Reducing latency (fire and forget)
 Load throttling (rate-levelling)
Queue-Centric Solutions (in my own words!)
| © Copyright 2015 Hitachi Consulting23
Message Queuing
Decouple producers and consumers
Queue-Centric Solutions (in my own words!)
Originator Processor
| © Copyright 2015 Hitachi Consulting24
Message Queuing
Decouple producers and consumers
Queue-Centric Solutions (in my own words!)
Originator Processor
Processor
2
| © Copyright 2015 Hitachi Consulting25
Message Queuing
Decouple producers and consumers
Queue-Centric Solutions (in my own words!)
Originator Processor
Processor
2
Processor
3
Originator
2
| © Copyright 2015 Hitachi Consulting26
Message Queuing
Decouple producers and consumers
Queue-Centric Solutions (in my own words!)
Originator Processor
Processor
2
Processor
3
Originator
2
Queueing Service
Originator
2
| © Copyright 2015 Hitachi Consulting27
Message Queuing
Increase Reliability
Queue-Centric Solutions (in my own words!)
Originator Processor
Available
Message Delivered
| © Copyright 2015 Hitachi Consulting28
Message Queuing
Increase Reliability
Queue-Centric Solutions (in my own words!)
Originator Processor
Not
Available
Message Lost
| © Copyright 2015 Hitachi Consulting29
Message Queuing
Increase Reliability
Queue-Centric Solutions (in my own words!)
Originator Processor
Not
Available
Message is queued
Queueing
Service
Guaranteed delivery
Processed when
processor is
available again
| © Copyright 2015 Hitachi Consulting30
Message Queuing
Reduce Latency
Queue-Centric Solutions (in my own words!)
Originator Processor
1 – Send message
| © Copyright 2015 Hitachi Consulting31
Message Queuing
Reduce Latency
Queue-Centric Solutions (in my own words!)
Originator Processor
1 – Send message
2 – Wait to finish processing
| © Copyright 2015 Hitachi Consulting32
Message Queuing
Reduce Latency
Queue-Centric Solutions (in my own words!)
Originator Processor
1 – Send message
2 – Wait to finish processing
3 – Send a new message
| © Copyright 2015 Hitachi Consulting33
Message Queuing
Reduce Latency
Queue-Centric Solutions (in my own words!)
Originator Processor
Keep on queuing messages
(no need to wait)
Queueing
Service
Messages are processed later,
then a confirmation is sent
| © Copyright 2015 Hitachi Consulting34
Message Queuing
Load levelling
Queue-Centric Solutions (in my own words!)
Originator Processor
Normal requests load
| © Copyright 2015 Hitachi Consulting35
Message Queuing
Load levelling
Queue-Centric Solutions (in my own words!)
Originator Processor
Originator
Originator
Originator
Originator Sudden Increase requests
May bring the
service down
| © Copyright 2015 Hitachi Consulting36
Message Queuing
Load levelling
Queue-Centric Solutions (in my own words!)
Originator Processor
Originator
Originator
Originator
Originator Sudden Increase requests
Process requests on
the desired pace
Queueing
Service
| © Copyright 2015 Hitachi Consulting37
Message Queuing
Microsoft Azure
Azure Service Bus
Relay
Sender
Producer
Publisher
Producer
Sender/
Receiver
Queue
Topic
Event
Hubs
Notification
Hubs
Receiver
Consumer
Subscriber
Consumer
Sender/
Receiver
 NAT and Firewall Traversal Service
Request/Response Services
Unbuffered with TCP Throttling
 Transactional Cloud AMQP/HTTP Broker
 High-Scale, High-Reliability Messaging
Sessions, Scheduled Delivery, etc.
 Transactional Message Distribution
 Up to 2000 subscriptions per Topic
 Up to 2K/100K filter rules per subscription
 High-scale notification distribution
 Most mobile push notification services
 Millions of notification targets
 Hyper Scale.
 A Million Clients.
 Concurrent.
| © Copyright 2015 Hitachi Consulting38
Azure Event Hubs
Event Hub
Producer |||||||||||||||||||||||| Consumer
Highly scalable data ingress service that can ingest millions of events per second
| © Copyright 2015 Hitachi Consulting39
Azure Event Hubs
Event Hub
Producer |||||||||||||||||||||||| Consumer
EventData
• EnqueuedTime
• PartitionKey
• Offset
• SequenceNumber
• Body
• UserProperties
• SystemProperties Messages (EventData) are retained for a certain
(configurable) period of time in the hub
Highly scalable data ingress service that can ingest millions of events per second
| © Copyright 2015 Hitachi Consulting40
Azure Event Hubs
Event Hub
Producer |||||||||||||||||||||||| Consumer
EventData
• EnqueuedTime
• PartitionKey
• Offset
• SequenceNumber
• Body
• UserProperties
• SystemProperties Messages (EventData) are retained for a certain
(configurable) period of time in the hub
IEventProcessor
• OpenAsyn()
• ProcessEventAsync()
• Close()
Highly scalable data ingress service that can ingest millions of events per second
| © Copyright 2015 Hitachi Consulting41
Azure Event Hubs
Event Hub
Partition 1
Partition 2
Partition 3
Partition 32
Producer
2
Producer
N
Producer
1
.
.
.
||||||||||||||||||||||||
|||||||||||
||||||||||||||||||||||||||||||||
|||||||||||||||||||||||
.
.
.
Partition to scale and improve computation distribution
Highly scalable data ingress service that can ingest millions of events per second
| © Copyright 2015 Hitachi Consulting42
Azure Event Hubs
Event Hub
Partition 1
Partition 2
Partition 3
Partition 32
Producer
2
Producer
N
Producer
1
.
.
.
Consumer Group 1
Reader
1
Reader
2
Reader
N…
||||||||||||||||||||||||
|||||||||||
||||||||||||||||||||||||||||||||
|||||||||||||||||||||||
.
.
.
• Readers in the same group share the same partition
pointer (read offset)
• E.g. reader1 consumed ,msg 9, then reader3 will
consume msg10
• Only one reader in a consumer group can access
the partition at a time
Partition to scale and improve computation distribution
Highly scalable data ingress service that can ingest millions of events per second
| © Copyright 2015 Hitachi Consulting43
Azure Event Hubs
Event Hub
Partition 1
Partition 2
Partition 3
Partition 32
Producer
2
Producer
N
Producer
1
.
.
.
Consumer Group 2
Reader
1
Reader
3
Reader
N…
||||||||||||||||||||||||
|||||||||||
||||||||||||||||||||||||||||||||
|||||||||||||||||||||||
.
.
.
• Each consumer group has its own partition read offset
• E.g. reader 1 group 1 consumed message 9, group 2
stated, then reader1 group 2 will consume message 1Partition to scale and improve computation distribution
Consumer Group 1
Reader
1
Reader
2
Reader
N…
• Readers in the same group share the same partition
pointer (read offset)
• E.g. reader1 consumed , message 9, then reader3
will consume message 10
• Only one reader in a consumer group can access
the partition at a time
Highly scalable data ingress service that can ingest millions of events per second
| © Copyright 2015 Hitachi Consulting44
Getting Started with Azure Event Hubs
| © Copyright 2015 Hitachi Consulting45
Getting Started with Azure Event Hubs
This is how we do it…
| © Copyright 2015 Hitachi Consulting46
Getting Started with Azure Event Hubs
This is how we do it…
| © Copyright 2015 Hitachi Consulting47
Getting Started with Azure Event Hubs
This is how we do it…
| © Copyright 2015 Hitachi Consulting48
Getting Started with Azure Event Hubs
This is how we do it…
| © Copyright 2015 Hitachi Consulting49
Getting Started with Azure Event Hubs
This is how we do it…
| © Copyright 2015 Hitachi Consulting50
Getting Started with Azure Event Hubs
This is how we do it…
| © Copyright 2015 Hitachi Consulting51
Getting Started with Azure Event Hubs
This is how we do it…
| © Copyright 2015 Hitachi Consulting52
Getting Started with Azure Event Hubs
This is how we do it…
| © Copyright 2015 Hitachi Consulting53
Getting Started with Azure Event Hubs
This is how we do it…
| © Copyright 2015 Hitachi Consulting54
Apache Storm
| © Copyright 2015 Hitachi Consulting55
Introducing Apache Storm
Overview
Originally used by Twitter to
process massive streams of
data from the Twitter firehose.
| © Copyright 2015 Hitachi Consulting56
Introducing Apache Storm
Overview
A distributed, scalable, high-performance, reliable, fault-
tolerant, open source real-time stream processing and
continuous computation system.
Originally used by Twitter to
process massive streams of
data from the Twitter firehose.
| © Copyright 2015 Hitachi Consulting57
Introducing Apache Storm
Overview
A distributed, scalable, high-performance, reliable, fault-
tolerant, open source real-time stream processing and
continuous computation system.
A widely-used stream processing solution in the Big Data
world, (along with Spark Streaming).
Originally used by Twitter to
process massive streams of
data from the Twitter firehose.
| © Copyright 2015 Hitachi Consulting58
Introducing Apache Storm
Overview
A distributed, scalable, high-performance, reliable, fault-
tolerant, open source real-time stream processing and
continuous computation system.
A widely-used stream processing solution in the Big Data
world, (along with Spark Streaming).
Flexible custom
development using Java
(and C# on HDInsight).
Originally used by Twitter to
process massive streams of
data from the Twitter firehose.
| © Copyright 2015 Hitachi Consulting59
Introducing Apache Storm
Overview
A distributed, scalable, high-performance, reliable, fault-
tolerant, open source real-time stream processing and
continuous computation system.
A widely-used stream processing solution in the Big Data
world, (along with Spark Streaming).
Provided by Microsoft Azure on HDInsight (IaaS+);
you pay for the cluster, rather than the jobs, while
Microsoft manages the cluster for you.
Flexible custom
development using Java
(and C# on HDInsight).
Originally used by Twitter to
process massive streams of
data from the Twitter firehose.
| © Copyright 2015 Hitachi Consulting60
Introducing Apache Storm
Overview
A distributed, scalable, high-performance, reliable, fault-
tolerant, open source real-time stream processing and
continuous computation system.
A widely-used stream processing solution in the Big Data
world, (along with Spark Streaming).
Provided by Microsoft Azure on HDInsight (IaaS+);
you pay for the cluster, rather than the jobs, while
Microsoft manages the cluster for you.
Integrates with Message Queuing
solutions, such as Apache Kafka and
Azure Event Hubs.
Flexible custom
development using Java
(and C# on HDInsight).
Originally used by Twitter to
process massive streams of
data from the Twitter firehose.
| © Copyright 2015 Hitachi Consulting61
Introducing Apache Storm
Storm & Hadoop Big Data Ecosystem
Hadoop Distributed File System (HDFS)
Applications
In-Memory Stream SQL
 Spark-
SQL
NoSQL Machine
Learning
….
Batch
Yet Another Resource Negotiator (YARN)
Search Orchest.
MgmntAcquisition
Named
Node
DataNode 1 DataNode 2 DataNode 3 DataNode N
| © Copyright 2015 Hitachi Consulting62
Introducing Apache Storm
Storm & Hadoop Big Data Ecosystem
Hadoop Distributed File System (HDFS)
Storm Cluster
….
Yet Another Resource Negotiator (YARN)Named
Node
DataNode 1 DataNode 2 DataNode 3 DataNode N
Master Node
<Nimbus>
Worker Node
1
<Supervisor>
Worker Node
2
<Supervisor>
Worker Node
N
<Supervisor>
….
Zookeeper
Services
| © Copyright 2015 Hitachi Consulting63
Introducing Apache Storm
Storm & Hadoop Big Data Ecosystem
• Runs a daemon called "Nimbus“
• Responsible for distributing code around the cluster, assigning tasks
to machines, and monitoring for failures.
Master Node
• Runs a daemon called the "Supervisor“
• Listens for work assigned to its machine and starts and stops worker
processes as necessary based on what Nimbus has assigned to it.
Worker Node
• Coordinates between Nimbus and the Supervisors.
• All state is kept in Zookeeper or on local disk
• Nimbus or the Supervisors can go down and they'll start back up like
nothing happened.
Zookeeper
(On a Hadoop Cluster)
| © Copyright 2015 Hitachi Consulting64
Introducing Apache Storm
Basics
{…}
Tuple
Unit of data
(set of key/value pairs)
| © Copyright 2015 Hitachi Consulting65
Introducing Apache Storm
Basics
Stream
{…}
Tuple
{…} {…} {…} {…} {…} {…}
Unit of data
(set of key/value pairs)
Unbounded sequence of tuples
| © Copyright 2015 Hitachi Consulting66
Introducing Apache Storm
Basics
Spout
Stream
{…}
Tuple
{…} {…} {…} {…} {…} {…}
Unit of data
(set of key/value pairs)
Unbounded sequence of tuples
Stream Source Wrapper
Emits tuples
| © Copyright 2015 Hitachi Consulting67
Introducing Apache Storm
Basics
BoltSpout
Stream
{…}
Tuple
{…} {…} {…} {…} {…} {…}
Unit of data
(set of key/value pairs)
Unbounded sequence of tuples
Stream Source Wrapper
Emits tuples
- Receives Tuples
- Write to a data store
- Read from a data store
- Compute
- Emits additional tuples
| © Copyright 2015 Hitachi Consulting68
Introducing Apache Storm
Basics
BoltSpout
Stream
{…}
Tuple Topology
{…} {…} {…} {…} {…} {…}
Unit of data
(set of key/value pairs)
Unbounded sequence of tuples
Stream Source Wrapper
Emits tuples
- Receives Tuples
- Write to a data store
- Read from a data store
- Compute
- Emits additional tuples
Graph of stream transformations
Each node is a spout or bolt
| © Copyright 2015 Hitachi Consulting69
Getting Started with Storm
on HDInsight
| © Copyright 2015 Hitachi Consulting70
Introducing Apache Storm
Getting Started – Creating HDInsight Cluster
| © Copyright 2015 Hitachi Consulting71
Introducing Apache Storm
Getting Started – Creating HDInsight Cluster
| © Copyright 2015 Hitachi Consulting72
Introducing Apache Storm
 Install Azure SDK for Visual Studio https://azure.microsoft.com/en-gb/downloads/
 Create Storm Project
Creating Storm App in Visual Studio
| © Copyright 2015 Hitachi Consulting73
Introducing Apache Storm
SCP.NET
Spout
Bolt3 Bolt1
Bolt2
| © Copyright 2015 Hitachi Consulting74
Introducing Apache Storm
Stream groupings
Grouping Description
Shuffle Sends tuples to bolts in random, round robin sequence
Fields Sends tuples to a bolt based on one or more fields in the tuple
All Sends a single copy of each tuple to all instances of a receiving bolt
Global Sends tuples from all instances of a source to a single target instance
Stream groupings determine how Storm routes Tuples between
tasks in a topology
???
{…}
| © Copyright 2015 Hitachi Consulting79
Introducing Azure Stream Analytics
Overview
Fully-managed real-time processing
• Intake millions of events per second
• Processing on continuous streams of data
• Reference data lookup
• Output to live dashboards and data sores
Mission Critical Reliability
• Guaranteed events delivery
• Preserves event order pre-device basis
• Guaranteed business continuity
• Auto-recovery from failures
No challenges with Scale
• Elasticity for scale up or scale down
• Distributed, scale-out architecture
• Pay only for the resources you use
Rapid Development & Deployment
• SQL-like Language
• Built-in temporal semantics
• Up and running in a few clicks
• Scheduling and Monitoring
A PaaS real-time complex event processing (CEP) on Microsoft Azure
| © Copyright 2015 Hitachi Consulting80
Introducing Azure Stream Analytics
Overview
Data Source Ingest/Queue Process ConsumeDeliver
Event Inputs
- Event Hub
- Azure Blob
- DocumentDB
(coming soon)
Transform
- Temporal joins
- Filter
- Aggregates
- Projections
- Windows
- REST APIs
(coming soon)
Enrich
Azure ML
Outputs
- SQL Azure
- Azure Blobs
- Event Hub
- Service Bus Queue
- Service Bus Topics
- Table storage
- DocumentDB
- PowerBI
Azure
Storage
 Distributed
 Lowlatency
 Highthroughputs
 Scalable-Reliable
 Lowcost
Azure Stream Analytics
Reference Data
- Azure Blob
- HBase
(coming soon)
Power BI
Dashboard
| © Copyright 2015 Hitachi Consulting81
Getting Started with
Azure Stream Analytics
| © Copyright 2015 Hitachi Consulting82
Introducing Azure Stream Analytics
Getting Started
 Everything is done on Azure Portal
 Create a Stream Analytics Job
 Add Inputs
 Add Outputs
 Define Processing Query
 Scale and Configure
| © Copyright 2015 Hitachi Consulting83
Introducing Azure Stream Analytics
Getting Started – Create a Stream Analytics job
| © Copyright 2015 Hitachi Consulting84
Introducing Azure Stream Analytics
Getting Started – Create a Stream Analytics job
| © Copyright 2015 Hitachi Consulting85
Introducing Azure Stream Analytics
Getting Started – Scale
| © Copyright 2015 Hitachi Consulting86
Introducing Azure Stream Analytics
Getting Started – Configure
| © Copyright 2015 Hitachi Consulting87
Introducing Azure Stream Analytics
Getting Started – Add inputs to your job
• Currently supported input Data Streams are Azure Event
Hub , Azure IoT Hub and Azure Blob Storage.
• Advanced options lets you configure how the Job will read
data from the input
• Reference data is usually static or changes very slowly over
time (e.g. product catalog, customer info).
• Currently Azure Blob Storage only
• Cached for performance
| © Copyright 2015 Hitachi Consulting88
Introducing Azure Stream Analytics
Getting Started – Define input schema
 The serialization format and the encoding for the input data
sources must be specified
 Currently three formats are supported: CSV, JSON and Avro,
with optional schema for the CSV and AVRO formats
After creation of the input, configurations can be changed,
connection can be tested, and sample (synthetic) data can be
generated (based on the supplied structure)
| © Copyright 2015 Hitachi Consulting89
Introducing Azure Stream Analytics
Getting Started – Add an output to your job
Currently data stores supported as outputs
 Azure Blob storage - Creates log files with temporal query results
for batch processing and achieving.
 Azure Table storage – NoSQL storage that is more flexible than
SQL database and durable (in contrast to event hub)
 Azure SQL Database - Stores results in Azure SQL Database
table. Ideal as source for traditional reporting and analysis
 Event hub - Sends an event to an event hub. Ideal to generate
actionable events such as alerts or notifications
 Service Bus Queue/Topics: sends an event on a queue. Ideal for
process integration
 PowerBI – Live dashboard and real-time reporting.
 DocumentDB: NoSQL data store that works json object documents
| © Copyright 2015 Hitachi Consulting90
Introducing Azure Stream Analytics
Getting Started – Query
| © Copyright 2015 Hitachi Consulting91
Stream Analytics Query Language
| © Copyright 2015 Hitachi Consulting92
Stream Analytics Query Language
SA Query Language
Data Types
bigint
float
nvarchar(max)
datetime
Date and Time
Functions
DateName
DatePart
Day
Month
Year
DateTimeFromParts
DateDiff
DateAdd
Scaling Extensions
WITH
PARTITION BY
OVER
Windowing Extensions
TumblingWindow
HoppingWindow
SlidingWindow
Duration
Aggregate Functions
Sum
Count
Avg
Min
Max
StDev
StDevP
Var
VarP
DML
SELECT
FROM
WHERE
GROUP BY
HAVING
CASE WHEN THEN ELSE
INNER/LEFT OUTER JOIN
UNION
CROSS/OUTER APPLY
CAST
INTO
ORDER BY ASC, DSC
String Functions
Len
Concat
CharIndex
Substring
PatIndex
Temporal Functions
Lag, IsFirst
CollectTop
| © Copyright 2015 Hitachi Consulting93
Stream Analytics Query Language
Important clauses
INTO clause
 Pipelines the data from input to
output
 Can have multiple outputs
SELECT <columns, derived columns> INTO <output A> FROM <input x>
WHERE <condition 1>
SELECT <columns, derived columns> INTO <output B> FROM <input x>
WHERE <condition 2>
JOIN clause
 Combine multiple event streams
 Combine event streams with
reference data
SELECT <columns, derived columns> INTO <output A> FROM <stream1> JOIN <stream2> ON
DATEDIFF( Minutes, stream1.time, stream2.time) BETWEEN 0 AND 1
AND <stream1.Key> = <stream2.Key>
JOIN <ReferenceData> ON <stream1.Key> = <ReferenceData.Key>
CTEs
 To implement more complex logic
and support multiple steps
WITH
Step1 AS ( SELECT Count(*) AS CountTweets, Topic FROM TwitterStream PARTITION BY
PartitionId GROUP BY TumblingWindow(second, 3), Topic, PartitionId),
Step2 AS ( SELECT Avg(CountTweets) FROM Step1GROUP BY TumblingWindow(minute, 3))
SELECT * INTO Output1 FROM Step2
Time stamping
 Application time
 System time
SELECT <columns, derived columns>, OrderDate FROM <input>
TIMESTAMP BY EventTime - - app time
SELECT <columns, derived columns>, System.Time AS EventTime FROM <input>
TIMESTAMP BY EventTime - - sys time from event hub or azure blob storage
| © Copyright 2015 Hitachi Consulting94
Stream Analytics Query Language
 In data streams, a common requirement is to perform aggregation (max, min, sum, count, etc.)
over messages that arrive within a specified period of time (window) - to detect events.
 Each Group By requires a windowing function
 Each window operation outputs a single event at the end of the window
 All windows have a fixed length
Windowing Functions
Tumbling window
Aggregate per time interval
Hopping window
Schedule overlapping
windows
Sliding window
Windows constant
re-evaluated
| © Copyright 2015 Hitachi Consulting95
Stream Analytics Query Language
Windowing Functions – Thumbing Window
1 5 4 26
Time
(secs)
1 5 4 26
A20-secondTumbling Window Tumbling windows:
 Repeat
 non-overlapping
 An event can belong to only one tumbling window
| © Copyright 2015 Hitachi Consulting96
Stream Analytics Query Language
Windowing Functions – Thumbing Window
1 5 4 26 8 6
Time
(secs)
1 5 4 26
8 6
A20-secondTumbling Window Tumbling windows:
 Repeat
 non-overlapping
 An event can belong to only one tumbling window
| © Copyright 2015 Hitachi Consulting97
Stream Analytics Query Language
Windowing Functions – Thumbing Window
1 5 4 26 8 6 5
Time
(secs)
1 5 4 26
8 6
A20-secondTumbling Window
3 6 1
5 3 6 1
Tumbling windows:
 Repeat
 non-overlapping
 An event can belong to only one tumbling window
| © Copyright 2015 Hitachi Consulting98
Stream Analytics Query Language
Windowing Functions – Thumbing Window
1 5 4 26 8 6 5
Time
(secs)
1 5 4 26
8 6
A20-secondTumbling Window
3 6 1
5 3 6 1
Tumbling windows:
 Repeat
 non-overlapping
 An event can belong to only one tumbling window
SELECT TollId, COUNT(*)
FROM EntryStream TIMESTAMP BY EntryTime
GROUP BY TollId, TumblingWindow(second, 20)
Query: Count the total number of
vehicles entering each toll booth
every interval of 20 seconds.
TumblingWindow(<time interval>, <window size>)
| © Copyright 2015 Hitachi Consulting99
Stream Analytics Query Language
Windowing Functions – Hopping Window
1 5 4 26
A20-second Hopping Window with a 10 second “Hop” Hopping windows:
 Repeat
 Can overlap
 Hop forward in time by a fixed period
 Events can belong to more than one hopping window
1 5 4 26
| © Copyright 2015 Hitachi Consulting100
Stream Analytics Query Language
Windowing Functions – Hopping Window
1 5 4 26
A20-second Hopping Window with a 10 second “Hop” Hopping windows:
 Repeat
 Can overlap
 Hop forward in time by a fixed period
 Events can belong to more than one hopping window
4 26
1 5 4 26
| © Copyright 2015 Hitachi Consulting101
Stream Analytics Query Language
Windowing Functions – Hopping Window
1 5 4 26 8 6
A20-second Hopping Window with a 10 second “Hop” Hopping windows:
 Repeat
 Can overlap
 Hop forward in time by a fixed period
 Events can belong to more than one hopping window
4 26
8 6
1 5 4 26
| © Copyright 2015 Hitachi Consulting102
Stream Analytics Query Language
Windowing Functions – Hopping Window
1 5 4 26 8 6
A20-second Hopping Window with a 10 second “Hop” Hopping windows:
 Repeat
 Can overlap
 Hop forward in time by a fixed period
 Events can belong to more than one hopping window
4 26
8 6
1 5 4 26
8 6 5 3
5 3
| © Copyright 2015 Hitachi Consulting103
Stream Analytics Query Language
Windowing Functions – Hopping Window
1 5 4 26 8 6
A20-second Hopping Window with a 10 second “Hop” Hopping windows:
 Repeat
 Can overlap
 Hop forward in time by a fixed period
 Events can belong to more than one hopping window
4 26
8 6
5 3 6 1
1 5 4 26
8 6 5 3
6 15 3
| © Copyright 2015 Hitachi Consulting104
Stream Analytics Query Language
Windowing Functions – Hopping Window
1 5 4 26 8 6
A20-second Hopping Window with a 10 second “Hop” Hopping windows:
 Repeat
 Can overlap
 Hop forward in time by a fixed period
 Events can belong to more than one hopping window
SELECT COUNT(*), TollId
FROM EntryStream TIMESTAMP BY EntryTime
GROUP BY TollId, HoppingWindow (second, 20,10)
4 26
8 6
5 3 6 1
1 5 4 26
8 6 5 3
6 15 3
QUERY: Count the number of
vehicles entering each toll booth
every interval of 20 seconds;
update results every 10 seconds
HoppingWindow (<time interval>, <window size>, <hop size>)
| © Copyright 2015 Hitachi Consulting105
Stream Analytics Query Language
Windowing Functions – Sliding Window
1
A20-second Sliding Window Sliding window:
 Continuously moves forward by an ε (epsilon)
 Produces an output only during the occurrence of a message
 Every windows will have at least one event
 Events can belong to more than one sliding window1
| © Copyright 2015 Hitachi Consulting106
Stream Analytics Query Language
Windowing Functions – Sliding Window
1 5
A20-second Sliding Window Sliding window:
 Continuously moves forward by an ε (epsilon)
 Produces an output only during the occurrence of a message
 Every windows will have at least one event
 Events can belong to more than one sliding window1
5 1
| © Copyright 2015 Hitachi Consulting107
Stream Analytics Query Language
Windowing Functions – Sliding Window
1 5
A20-second Sliding Window Sliding window:
 Continuously moves forward by an ε (epsilon)
 Produces an output only during the occurrence of a message
 Every windows will have at least one event
 Events can belong to more than one sliding window1
5 1
9
9 5 1
| © Copyright 2015 Hitachi Consulting108
Stream Analytics Query Language
Windowing Functions – Sliding Window
1 5
A20-second Sliding Window Sliding window:
 Continuously moves forward by an ε (epsilon)
 Produces an output only during the occurrence of a message
 Every windows will have at least one event
 Events can belong to more than one sliding window1
8
8
5 1
9
9 5 1
| © Copyright 2015 Hitachi Consulting109
Stream Analytics Query Language
Windowing Functions – Sliding Window
1 5
A20-second Sliding Window Sliding window:
 Continuously moves forward by an ε (epsilon)
 Produces an output only during the occurrence of a message
 Every windows will have at least one event
 Events can belong to more than one sliding window
SELECT TollId, Count(*)
FROM EntryStream ES
GROUP BY TollId, SlidingWindow (second, 20)
HAVING Count(*) > 10
Query: Find all the toll booths
which have served more than 10
vehicles in the last 20 seconds
1
8
8
5 1
9
9 5 1
SlidingWindow (<time interval>, <window size>)
| © Copyright 2015 Hitachi Consulting110
Stream Analytics Query Language
 Aggregation and filter: compute (sum., max., min., avg.) value over a time window.
E.g. What is the average Maximum temperature and Average pressure read by the sensor in a 60 second window?
 Counting unique values: count the number of unique field values that appear in the stream within a time window.
E.g. How many unique make of cars passed through the toll booth in a 2 second window?
 Determine if a value has changed: Look at a previous value to determine if it is different than the current value.
E.g. Is the previous car on the Toll Road the same make as the current car?
 Find first/last event in a window: Find first/last car in every 10 minute interval.
 Detect the absence of events: Check that a stream has no value that matches a certain criteria.
E.g. Have 2 consecutive cars from the same make entered the toll road within 90 seconds?
 Detect duration between events: Find the duration of a given event.
E.g. Given a web clickstream determine time spent on a feature.
 Detect duration of a condition: Find out how long a condition occurred for.
E.g. Suppose that a bug that resulted in all cars having an incorrect weight (above 20,000 pounds) – compute the duration of the bug.
 Fill missing values: For the stream of events that have missing values, produce a stream of events with regular intervals.
E.g. generate event every 5 seconds that will report the most recently seen data point.
Useful SA Query Patterns
https://azure.microsoft.com/en-gb/documentation/articles/stream-analytics-stream-analytics-query-patterns
| © Copyright 2015 Hitachi Consulting111
Apache Storm vs Azure Stream Analytics
The face-off…
Microsoft Azure Stream Analytics Documentation
Feature Azure Stream Analytics Apache Strom on HDInsight
Geared for Event Detection Stream Processing
Open Source No – It is a Microsoft Azure Service Yes – it is Apache
Service Type PaaS – Deploy, Execute and Monitor Jobs SaaS + - Provision HDInsight Storm Cluster
Pricing You pay for the data/jobs You pay for the cluster
Scalability Number of Streaming Units Number of nodes of the cluster
Processing SQL Like query + Temporal operations + Azure Machine
Learning (published models API calls)
Java or C# (custom extensibility)
Dev. Experience Azure Portal – Easy – Limited Visual Studio – More involved – Flexible
Limitations No UDF, No Web API calls (coming soon) You need to Implement aggregations and temporal operation
Input Data Source Azure Event Hubs and Azure Blobs Connectors (Event Hub, Service Bus, Kafka, custom)
Input Data Format CSV, JSON Anything – Custom code is need to parse
Output Data Sink Azure Event Hubs, Azure Blob Storage, Azure Tables, Azure
SQL DB, DocumentDB, and PowerBI.
PowerBI, Azure Event Hubs, Azure Blob Store, Azure
DocumentDB, SQL DB, HBase, Custom
Reference Data Azure Blobs with max size of 100 MB of in-memory lookup
cache.
No limits on data size. Connectors available for HBase,
DocumentDB, SQL, custom
| © Copyright 2015 Hitachi Consulting112
How to Get Started with Stream Processing?
 Read the slides!
 MVA – Big Data Analytics with HDInsight: Hadoop on Azure
https://mva.microsoft.com/en-US/training-courses/big-data-analytics-with-hdinsight-hadoop-on-azure-10551
 MVA – Implementing Big Data Analysis
https://mva.microsoft.com/en-US/training-courses/implementing-big-data-analysis-8311?l=44REr2Yy_5404984382
 Azure Documentation – Storm on HDInsight
https://azure.microsoft.com/en-gb/documentation/services/hdinsight/
 Azure Documentation – EventHub
https://azure.microsoft.com/en-gb/documentation/articles/event-hubs-overview/
 Azure Documentation – Stream Analytics
https://azure.microsoft.com/en-gb/documentation/services/stream-analytics/
 Apache Storm
https://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html
O’Reliy Books– Getting Started with Storm
| © Copyright 2015 Hitachi Consulting113
DEMO
| © Copyright 2015 Hitachi Consulting114
Images
Stream
Temperature/
Pressure
Consume Events
Image Emotion
Emotion
Consume
Emotion Events
Consume
Sensor Data
Output to real-time
dashboard
Output to real-time
dashboard
| © Copyright 2015 Hitachi Consulting115
My Background
Applying Computational Intelligence in Data Mining
• Honorary Research Fellow, School of Computing , University of Kent.
• Ph.D. Computer Science, University of Kent, Canterbury, UK.
• M.Sc. Computer Science , The American University in Cairo, Egypt.
• 25+ published journal and conference papers, focusing on:
– classification rules induction,
– decision trees construction,
– Bayesian classification modelling,
– data reduction,
– instance-based learning,
– evolving neural networks, and
– data clustering
• Journals: Swarm Intelligence, Swarm & Evolutionary Computation,
, Applied Soft Computing, and Memetic Computing.
• Conferences: ANTS, IEEE CEC, IEEE SIS, EvoBio,
ECTA, IEEE WCCI and INNS-BigData.
ResearchGate.org
| © Copyright 2015 Hitachi Consulting116
Thank you!

More Related Content

What's hot

What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningDatabricks
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lakeMykola Zerniuk
 
Building Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure DatabricksBuilding Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure DatabricksLace Lofranco
 
Azure Data Fundamentals DP 900 Full Course
Azure Data Fundamentals DP 900 Full CourseAzure Data Fundamentals DP 900 Full Course
Azure Data Fundamentals DP 900 Full CoursePiyush sachdeva
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Databricks
 
Getting started with azure event hubs and stream analytics services
Getting started with azure event hubs and stream analytics servicesGetting started with azure event hubs and stream analytics services
Getting started with azure event hubs and stream analytics servicesEastBanc Tachnologies
 
Azure Data Factory Introduction.pdf
Azure Data Factory Introduction.pdfAzure Data Factory Introduction.pdf
Azure Data Factory Introduction.pdfMaheshPandit16
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
 
Data warehouse implementation design for a Retail business
Data warehouse implementation design for a Retail businessData warehouse implementation design for a Retail business
Data warehouse implementation design for a Retail businessArsalan Qadri
 
Azure Data Factory
Azure Data FactoryAzure Data Factory
Azure Data FactoryHARIHARAN R
 
Lessons Learned: Understanding Azure Data Factory Pricing (Microsoft Ignite 2...
Lessons Learned: Understanding Azure Data Factory Pricing (Microsoft Ignite 2...Lessons Learned: Understanding Azure Data Factory Pricing (Microsoft Ignite 2...
Lessons Learned: Understanding Azure Data Factory Pricing (Microsoft Ignite 2...Cathrine Wilhelmsen
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data FlowMark Kromer
 
Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Kent Graziano
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingDatabricks
 

What's hot (20)

What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lake
 
Building Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure DatabricksBuilding Advanced Analytics Pipelines with Azure Databricks
Building Advanced Analytics Pipelines with Azure Databricks
 
Azure Data Fundamentals DP 900 Full Course
Azure Data Fundamentals DP 900 Full CourseAzure Data Fundamentals DP 900 Full Course
Azure Data Fundamentals DP 900 Full Course
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 
Getting started with azure event hubs and stream analytics services
Getting started with azure event hubs and stream analytics servicesGetting started with azure event hubs and stream analytics services
Getting started with azure event hubs and stream analytics services
 
Azure Data Factory Introduction.pdf
Azure Data Factory Introduction.pdfAzure Data Factory Introduction.pdf
Azure Data Factory Introduction.pdf
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Azure purview
Azure purviewAzure purview
Azure purview
 
Data warehouse implementation design for a Retail business
Data warehouse implementation design for a Retail businessData warehouse implementation design for a Retail business
Data warehouse implementation design for a Retail business
 
Azure Data Factory
Azure Data FactoryAzure Data Factory
Azure Data Factory
 
Lessons Learned: Understanding Azure Data Factory Pricing (Microsoft Ignite 2...
Lessons Learned: Understanding Azure Data Factory Pricing (Microsoft Ignite 2...Lessons Learned: Understanding Azure Data Factory Pricing (Microsoft Ignite 2...
Lessons Learned: Understanding Azure Data Factory Pricing (Microsoft Ignite 2...
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data Flow
 
Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks Streaming
 

Viewers also liked

Comparison of various streaming technologies
Comparison of various streaming technologiesComparison of various streaming technologies
Comparison of various streaming technologiesSachin Aggarwal
 
Azure Stream Analytics
Azure Stream AnalyticsAzure Stream Analytics
Azure Stream AnalyticsDavide Mauri
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsVincenzo Gulisano
 
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...Data Con LA
 
IoT Innovation Lab Berlin @relayr - Kay Lerch on Getting basics right for you...
IoT Innovation Lab Berlin @relayr - Kay Lerch on Getting basics right for you...IoT Innovation Lab Berlin @relayr - Kay Lerch on Getting basics right for you...
IoT Innovation Lab Berlin @relayr - Kay Lerch on Getting basics right for you...Kay Lerch
 
Azure Stream Analytics
Azure Stream AnalyticsAzure Stream Analytics
Azure Stream AnalyticsMarco Parenzan
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseAldrin Piri
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Developing Connected Applications with AWS IoT - Technical 301
Developing Connected Applications with AWS IoT - Technical 301Developing Connected Applications with AWS IoT - Technical 301
Developing Connected Applications with AWS IoT - Technical 301Amazon Web Services
 
Lightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsCloudera, Inc.
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!Khalid Salama
 
Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?MapR Technologies
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsLinkedIn
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedGuido Schmutz
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming AnalyticsGuido Schmutz
 
Getting started with Azure Event Hubs and Stream Analytics services
Getting started with Azure Event Hubs and Stream Analytics servicesGetting started with Azure Event Hubs and Stream Analytics services
Getting started with Azure Event Hubs and Stream Analytics servicesVladimir Bychkov
 

Viewers also liked (20)

Comparison of various streaming technologies
Comparison of various streaming technologiesComparison of various streaming technologies
Comparison of various streaming technologies
 
Azure Stream Analytics
Azure Stream AnalyticsAzure Stream Analytics
Azure Stream Analytics
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operations
 
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
 
IoT Innovation Lab Berlin @relayr - Kay Lerch on Getting basics right for you...
IoT Innovation Lab Berlin @relayr - Kay Lerch on Getting basics right for you...IoT Innovation Lab Berlin @relayr - Kay Lerch on Getting basics right for you...
IoT Innovation Lab Berlin @relayr - Kay Lerch on Getting basics right for you...
 
Azure Stream Analytics
Azure Stream AnalyticsAzure Stream Analytics
Azure Stream Analytics
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Developing Connected Applications with AWS IoT - Technical 301
Developing Connected Applications with AWS IoT - Technical 301Developing Connected Applications with AWS IoT - Technical 301
Developing Connected Applications with AWS IoT - Technical 301
 
Lightbend Fast Data Platform
Lightbend Fast Data PlatformLightbend Fast Data Platform
Lightbend Fast Data Platform
 
How to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of ThingsHow to Build Continuous Ingestion for the Internet of Things
How to Build Continuous Ingestion for the Internet of Things
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 
Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?Stream Processing Everywhere - What to use?
Stream Processing Everywhere - What to use?
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
Blr hadoop meetup
Blr hadoop meetupBlr hadoop meetup
Blr hadoop meetup
 
Getting started with Azure Event Hubs and Stream Analytics services
Getting started with Azure Event Hubs and Stream Analytics servicesGetting started with Azure Event Hubs and Stream Analytics services
Getting started with Azure Event Hubs and Stream Analytics services
 
Storm over gearpump
Storm over gearpumpStorm over gearpump
Storm over gearpump
 

Similar to Real-Time Event & Stream Processing on MS Azure

Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointconfluent
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...Khalid Salama
 
„Enterprise Event Bus“ Unified Log (Event) Processing Architecture
„Enterprise Event Bus“ Unified Log (Event) Processing Architecture„Enterprise Event Bus“ Unified Log (Event) Processing Architecture
„Enterprise Event Bus“ Unified Log (Event) Processing ArchitectureGuido Schmutz
 
Actionable Insights - Thompson
Actionable Insights - ThompsonActionable Insights - Thompson
Actionable Insights - ThompsonProlifics
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureKhalid Salama
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Framework and Product Comparison for Big Data Log Analytics and ITOA
Framework and Product Comparison for Big Data Log Analytics and ITOA Framework and Product Comparison for Big Data Log Analytics and ITOA
Framework and Product Comparison for Big Data Log Analytics and ITOA Kai Wähner
 
Going Beyond the Device Heart Beat
Going Beyond the Device Heart BeatGoing Beyond the Device Heart Beat
Going Beyond the Device Heart BeatBalwinder Kaur
 
Distributed Data Processing for Real-time Applications
Distributed Data Processing for Real-time ApplicationsDistributed Data Processing for Real-time Applications
Distributed Data Processing for Real-time ApplicationsScyllaDB
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
Enterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureEnterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureKhalid Salama
 
Kochi mulesoft meetup 02
Kochi mulesoft meetup 02Kochi mulesoft meetup 02
Kochi mulesoft meetup 02sumitahuja94
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaKai Wähner
 
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...confluent
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®confluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of dataconfluent
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Guido Schmutz
 

Similar to Real-Time Event & Stream Processing on MS Azure (20)

Confluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPointConfluent Partner Tech Talk with BearingPoint
Confluent Partner Tech Talk with BearingPoint
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
Operational Machine Learning: Using Microsoft Technologies for Applied Data S...
 
„Enterprise Event Bus“ Unified Log (Event) Processing Architecture
„Enterprise Event Bus“ Unified Log (Event) Processing Architecture„Enterprise Event Bus“ Unified Log (Event) Processing Architecture
„Enterprise Event Bus“ Unified Log (Event) Processing Architecture
 
Actionable Insights - Thompson
Actionable Insights - ThompsonActionable Insights - Thompson
Actionable Insights - Thompson
 
Intorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft AzureIntorducing Big Data and Microsoft Azure
Intorducing Big Data and Microsoft Azure
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Framework and Product Comparison for Big Data Log Analytics and ITOA
Framework and Product Comparison for Big Data Log Analytics and ITOA Framework and Product Comparison for Big Data Log Analytics and ITOA
Framework and Product Comparison for Big Data Log Analytics and ITOA
 
Going Beyond the Device Heart Beat
Going Beyond the Device Heart BeatGoing Beyond the Device Heart Beat
Going Beyond the Device Heart Beat
 
Distributed Data Processing for Real-time Applications
Distributed Data Processing for Real-time ApplicationsDistributed Data Processing for Real-time Applications
Distributed Data Processing for Real-time Applications
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
Enterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft AzureEnterprise Cloud Data Platforms - with Microsoft Azure
Enterprise Cloud Data Platforms - with Microsoft Azure
 
Kochi mulesoft meetup 02
Kochi mulesoft meetup 02Kochi mulesoft meetup 02
Kochi mulesoft meetup 02
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
 
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
Viele Autos, noch mehr Daten: IoT-Daten-Streaming mit MQTT & Kafka (Kai Waehn...
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Real-time processing of large amounts of data
Real-time processing of large amounts of dataReal-time processing of large amounts of data
Real-time processing of large amounts of data
 
Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016Big Data Architectures @ JAX / BigDataCon 2016
Big Data Architectures @ JAX / BigDataCon 2016
 

More from Khalid Salama

Microsoft R - ScaleR Overview
Microsoft R - ScaleR OverviewMicrosoft R - ScaleR Overview
Microsoft R - ScaleR OverviewKhalid Salama
 
Microservices, DevOps, and Continuous Delivery
Microservices, DevOps, and Continuous DeliveryMicroservices, DevOps, and Continuous Delivery
Microservices, DevOps, and Continuous DeliveryKhalid Salama
 
Machine learning with Spark
Machine learning with SparkMachine learning with Spark
Machine learning with SparkKhalid Salama
 
Spark with HDInsight
Spark with HDInsightSpark with HDInsight
Spark with HDInsightKhalid Salama
 
Microsoft Azure Batch
Microsoft Azure BatchMicrosoft Azure Batch
Microsoft Azure BatchKhalid Salama
 
NoSQL with Microsoft Azure
NoSQL with Microsoft AzureNoSQL with Microsoft Azure
NoSQL with Microsoft AzureKhalid Salama
 

More from Khalid Salama (8)

Microsoft R - ScaleR Overview
Microsoft R - ScaleR OverviewMicrosoft R - ScaleR Overview
Microsoft R - ScaleR Overview
 
Microservices, DevOps, and Continuous Delivery
Microservices, DevOps, and Continuous DeliveryMicroservices, DevOps, and Continuous Delivery
Microservices, DevOps, and Continuous Delivery
 
Graph Analytics
Graph AnalyticsGraph Analytics
Graph Analytics
 
Machine learning with Spark
Machine learning with SparkMachine learning with Spark
Machine learning with Spark
 
Spark with HDInsight
Spark with HDInsightSpark with HDInsight
Spark with HDInsight
 
Microsoft Azure Batch
Microsoft Azure BatchMicrosoft Azure Batch
Microsoft Azure Batch
 
NoSQL with Microsoft Azure
NoSQL with Microsoft AzureNoSQL with Microsoft Azure
NoSQL with Microsoft Azure
 
Hive with HDInsight
Hive with HDInsightHive with HDInsight
Hive with HDInsight
 

Recently uploaded

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 

Recently uploaded (20)

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 

Real-Time Event & Stream Processing on MS Azure

  • 1. | © Copyright 2015 Hitachi Consulting1 Real-Time Event and Stream Processing with Microsoft Azure Khalid M. Salama Microsoft Business Intelligence Hitachi Consulting UK We Make it Happen. Better.
  • 2. | © Copyright 2015 Hitachi Consulting2 Outline  What is Event & Stream Processing?  Stream Processing Architecture  Message Queuing  Introducing Apache Storm  Introducing Azure Stream Analytics  Apache Storm vs Azure Stream Analytics  Useful Resources
  • 3. | © Copyright 2015 Hitachi Consulting3 Fundamentals
  • 4. | © Copyright 2015 Hitachi Consulting4 What is Event & Stream Processing? Terms Real-time processing of a continuous sequence of data points (stream), by applying a series of operations (kernel functions) on each data point. Stream Processing
  • 5. | © Copyright 2015 Hitachi Consulting5 What is Event & Stream Processing? Terms Real-time processing of a continuous sequence of data points (stream), by applying a series of operations (kernel functions) on each data point. Stream Processing Real-time detection events from a data stream, via aggregating data points in a time frame, to perform subsequent actions. Event Processing
  • 6. | © Copyright 2015 Hitachi Consulting6 What is Event & Stream Processing? Tell me more… Stream Processing 𝑃4 𝑃1 ``` 𝑃2 ``𝑃3 ` Operation 1 Operation 2 Operation 3 Final product 𝑃∞ 𝑃7 𝑃6 𝑃5… Queued Data Points
  • 7. | © Copyright 2015 Hitachi Consulting7 What is Event & Stream Processing? Tell me more… Stream Processing Event Processing 𝑃∞ 𝑃7 𝑃6 𝑃5… 𝑃4 𝑃1 ``` 𝑃2 ``𝑃3 ` Operation 1 Operation 2 Operation 3 Queued Data Points Final product 𝑃∞ 𝑃7 𝑃6 𝑃5… 𝑃4 𝑃2𝑃3 Queued Data Points { Event? Notifications / Actions
  • 8. | © Copyright 2015 Hitachi Consulting8 What is Event & Stream Processing? Data at rest vs. Data in motion Traditional – Working with data at rest Real-time – Working with data at motion Data Store Bulk-load & Batch Processing Submit Query Get Results Continuous Processing & Query Contiguous Data Stream Static Reference Data Actions & Data Archiving Real-time Continuous Results
  • 9. | © Copyright 2015 Hitachi Consulting9 Lambda Architecture The speed layer and stream processing Hot Path Cold Path
  • 10. | © Copyright 2015 Hitachi Consulting10 Scenarios for Stream Processing The hot path… • Predictive Maintenance • Energy Efficiency & Smart Cities IoT & Device Telemetry • Real-time Sentiment analysis • Crisis Management Social Media Analytics • Identity theft and stolen credit card details • identify a fraudulent transactionFraud Detection • Maintain a continual level of stock to support unpredictable purchasing habitsInventory Management • User Experience Improvements • Targeted RecommendationsClickstream Analytics
  • 11. | © Copyright 2015 Hitachi Consulting11 System Architecture
  • 12. | © Copyright 2015 Hitachi Consulting12 Events & Stream Processing Architecture The Canonical System
  • 13. | © Copyright 2015 Hitachi Consulting13 Events & Stream Processing Architecture The Canonical System Event Triggers Applications Web and social Devices Sensors
  • 14. | © Copyright 2015 Hitachi Consulting14 Events & Stream Processing Architecture The Canonical System Data Stream Collection Message Queuing Event Triggers Applications Web and social Devices Sensors Producer/Consumer Mediator
  • 15. | © Copyright 2015 Hitachi Consulting15 Events & Stream Processing Architecture The Canonical System Stream Processing Data Stream Collection Message Queuing Event Triggers Applications Web and social Devices Sensors Machine Learning Processing and event detection Web API calls Producer/Consumer Mediator
  • 16. | © Copyright 2015 Hitachi Consulting16 Events & Stream Processing Architecture The Canonical System Stream Processing Data Stream Collection Message Queuing Event Triggers Applications Web and social Devices Sensors Machine Learning Processing and event detection Web API calls Reference Data Producer/Consumer Mediator
  • 17. | © Copyright 2015 Hitachi Consulting17 Events & Stream Processing Architecture The Canonical System Stream Processing Data Stream Collection Message Queuing Storage and Batch Analysis Event Triggers Applications Web and social Devices Sensors Machine Learning Processing and event detection Web API calls Reference Data Producer/Consumer Mediator
  • 18. | © Copyright 2015 Hitachi Consulting18 Events & Stream Processing Architecture The Canonical System Stream Processing Data Stream Collection Presentation and Action Message Queuing Storage and Batch Analysis Live Dashboards & Analytics Apps and Devices to take actions Ingress Event Triggers Applications Web and social Devices Sensors Machine Learning Processing and event detection Web API calls Reference Data Producer/Consumer Mediator
  • 19. | © Copyright 2015 Hitachi Consulting19 Events & Stream Processing Architecture  Devices, Websites, and Apps that continuously produce data streamsData Sources  Listen to, collection, and transfer in-bound eventsData Collection  De-couples data consumers from data producers  Reliable, distributed fault-tolerant, high-throughputs short-tem storage Message Queuing  Aggregate / filter / join incoming event streams  Temporal engine for analysing data across time-series windows Stream Processing  High throughputs, random access data store to support processing  Usually NoSQL data stores Reference Data  Store processed/ aggregated/ filtered data (SQL/NoSQL)  Consolidate and store raw data into files for batch analysis (DFS) Storage  Rich interactive visualizations for real-time data analysis  Application integration for process automation Presentation
  • 20. | © Copyright 2015 Hitachi Consulting20 Events & Stream Processing Architecture Tools & Technologies Stream Processing Data Stream Collection Presentation and Action Message Queuing Storage and Batch Analysis PowerBI Live Dashboards Apps and Devices to take actions Ingress Event Triggers Applications Web and social Devices Sensors Azure ML Spark Streaming on HDInsight Storm on HDInsight Reference Data Apache Kafka Azure Event Hub Azure Service Bus HDFS Azure SQL DB/DW Azure Steam Analytics Azure IoT Hub
  • 21. | © Copyright 2015 Hitachi Consulting21 Message Queuing
  • 22. | © Copyright 2015 Hitachi Consulting22 Message Queuing A message is a data object to be processed (purchase order, sensor readings, tweets, etc.) Message Queuing systems are useful for:  Decoupling message producers from consumer  Increase Reliability (guaranteed delivery)  Reducing latency (fire and forget)  Load throttling (rate-levelling) Queue-Centric Solutions (in my own words!)
  • 23. | © Copyright 2015 Hitachi Consulting23 Message Queuing Decouple producers and consumers Queue-Centric Solutions (in my own words!) Originator Processor
  • 24. | © Copyright 2015 Hitachi Consulting24 Message Queuing Decouple producers and consumers Queue-Centric Solutions (in my own words!) Originator Processor Processor 2
  • 25. | © Copyright 2015 Hitachi Consulting25 Message Queuing Decouple producers and consumers Queue-Centric Solutions (in my own words!) Originator Processor Processor 2 Processor 3 Originator 2
  • 26. | © Copyright 2015 Hitachi Consulting26 Message Queuing Decouple producers and consumers Queue-Centric Solutions (in my own words!) Originator Processor Processor 2 Processor 3 Originator 2 Queueing Service Originator 2
  • 27. | © Copyright 2015 Hitachi Consulting27 Message Queuing Increase Reliability Queue-Centric Solutions (in my own words!) Originator Processor Available Message Delivered
  • 28. | © Copyright 2015 Hitachi Consulting28 Message Queuing Increase Reliability Queue-Centric Solutions (in my own words!) Originator Processor Not Available Message Lost
  • 29. | © Copyright 2015 Hitachi Consulting29 Message Queuing Increase Reliability Queue-Centric Solutions (in my own words!) Originator Processor Not Available Message is queued Queueing Service Guaranteed delivery Processed when processor is available again
  • 30. | © Copyright 2015 Hitachi Consulting30 Message Queuing Reduce Latency Queue-Centric Solutions (in my own words!) Originator Processor 1 – Send message
  • 31. | © Copyright 2015 Hitachi Consulting31 Message Queuing Reduce Latency Queue-Centric Solutions (in my own words!) Originator Processor 1 – Send message 2 – Wait to finish processing
  • 32. | © Copyright 2015 Hitachi Consulting32 Message Queuing Reduce Latency Queue-Centric Solutions (in my own words!) Originator Processor 1 – Send message 2 – Wait to finish processing 3 – Send a new message
  • 33. | © Copyright 2015 Hitachi Consulting33 Message Queuing Reduce Latency Queue-Centric Solutions (in my own words!) Originator Processor Keep on queuing messages (no need to wait) Queueing Service Messages are processed later, then a confirmation is sent
  • 34. | © Copyright 2015 Hitachi Consulting34 Message Queuing Load levelling Queue-Centric Solutions (in my own words!) Originator Processor Normal requests load
  • 35. | © Copyright 2015 Hitachi Consulting35 Message Queuing Load levelling Queue-Centric Solutions (in my own words!) Originator Processor Originator Originator Originator Originator Sudden Increase requests May bring the service down
  • 36. | © Copyright 2015 Hitachi Consulting36 Message Queuing Load levelling Queue-Centric Solutions (in my own words!) Originator Processor Originator Originator Originator Originator Sudden Increase requests Process requests on the desired pace Queueing Service
  • 37. | © Copyright 2015 Hitachi Consulting37 Message Queuing Microsoft Azure Azure Service Bus Relay Sender Producer Publisher Producer Sender/ Receiver Queue Topic Event Hubs Notification Hubs Receiver Consumer Subscriber Consumer Sender/ Receiver  NAT and Firewall Traversal Service Request/Response Services Unbuffered with TCP Throttling  Transactional Cloud AMQP/HTTP Broker  High-Scale, High-Reliability Messaging Sessions, Scheduled Delivery, etc.  Transactional Message Distribution  Up to 2000 subscriptions per Topic  Up to 2K/100K filter rules per subscription  High-scale notification distribution  Most mobile push notification services  Millions of notification targets  Hyper Scale.  A Million Clients.  Concurrent.
  • 38. | © Copyright 2015 Hitachi Consulting38 Azure Event Hubs Event Hub Producer |||||||||||||||||||||||| Consumer Highly scalable data ingress service that can ingest millions of events per second
  • 39. | © Copyright 2015 Hitachi Consulting39 Azure Event Hubs Event Hub Producer |||||||||||||||||||||||| Consumer EventData • EnqueuedTime • PartitionKey • Offset • SequenceNumber • Body • UserProperties • SystemProperties Messages (EventData) are retained for a certain (configurable) period of time in the hub Highly scalable data ingress service that can ingest millions of events per second
  • 40. | © Copyright 2015 Hitachi Consulting40 Azure Event Hubs Event Hub Producer |||||||||||||||||||||||| Consumer EventData • EnqueuedTime • PartitionKey • Offset • SequenceNumber • Body • UserProperties • SystemProperties Messages (EventData) are retained for a certain (configurable) period of time in the hub IEventProcessor • OpenAsyn() • ProcessEventAsync() • Close() Highly scalable data ingress service that can ingest millions of events per second
  • 41. | © Copyright 2015 Hitachi Consulting41 Azure Event Hubs Event Hub Partition 1 Partition 2 Partition 3 Partition 32 Producer 2 Producer N Producer 1 . . . |||||||||||||||||||||||| ||||||||||| |||||||||||||||||||||||||||||||| ||||||||||||||||||||||| . . . Partition to scale and improve computation distribution Highly scalable data ingress service that can ingest millions of events per second
  • 42. | © Copyright 2015 Hitachi Consulting42 Azure Event Hubs Event Hub Partition 1 Partition 2 Partition 3 Partition 32 Producer 2 Producer N Producer 1 . . . Consumer Group 1 Reader 1 Reader 2 Reader N… |||||||||||||||||||||||| ||||||||||| |||||||||||||||||||||||||||||||| ||||||||||||||||||||||| . . . • Readers in the same group share the same partition pointer (read offset) • E.g. reader1 consumed ,msg 9, then reader3 will consume msg10 • Only one reader in a consumer group can access the partition at a time Partition to scale and improve computation distribution Highly scalable data ingress service that can ingest millions of events per second
  • 43. | © Copyright 2015 Hitachi Consulting43 Azure Event Hubs Event Hub Partition 1 Partition 2 Partition 3 Partition 32 Producer 2 Producer N Producer 1 . . . Consumer Group 2 Reader 1 Reader 3 Reader N… |||||||||||||||||||||||| ||||||||||| |||||||||||||||||||||||||||||||| ||||||||||||||||||||||| . . . • Each consumer group has its own partition read offset • E.g. reader 1 group 1 consumed message 9, group 2 stated, then reader1 group 2 will consume message 1Partition to scale and improve computation distribution Consumer Group 1 Reader 1 Reader 2 Reader N… • Readers in the same group share the same partition pointer (read offset) • E.g. reader1 consumed , message 9, then reader3 will consume message 10 • Only one reader in a consumer group can access the partition at a time Highly scalable data ingress service that can ingest millions of events per second
  • 44. | © Copyright 2015 Hitachi Consulting44 Getting Started with Azure Event Hubs
  • 45. | © Copyright 2015 Hitachi Consulting45 Getting Started with Azure Event Hubs This is how we do it…
  • 46. | © Copyright 2015 Hitachi Consulting46 Getting Started with Azure Event Hubs This is how we do it…
  • 47. | © Copyright 2015 Hitachi Consulting47 Getting Started with Azure Event Hubs This is how we do it…
  • 48. | © Copyright 2015 Hitachi Consulting48 Getting Started with Azure Event Hubs This is how we do it…
  • 49. | © Copyright 2015 Hitachi Consulting49 Getting Started with Azure Event Hubs This is how we do it…
  • 50. | © Copyright 2015 Hitachi Consulting50 Getting Started with Azure Event Hubs This is how we do it…
  • 51. | © Copyright 2015 Hitachi Consulting51 Getting Started with Azure Event Hubs This is how we do it…
  • 52. | © Copyright 2015 Hitachi Consulting52 Getting Started with Azure Event Hubs This is how we do it…
  • 53. | © Copyright 2015 Hitachi Consulting53 Getting Started with Azure Event Hubs This is how we do it…
  • 54. | © Copyright 2015 Hitachi Consulting54 Apache Storm
  • 55. | © Copyright 2015 Hitachi Consulting55 Introducing Apache Storm Overview Originally used by Twitter to process massive streams of data from the Twitter firehose.
  • 56. | © Copyright 2015 Hitachi Consulting56 Introducing Apache Storm Overview A distributed, scalable, high-performance, reliable, fault- tolerant, open source real-time stream processing and continuous computation system. Originally used by Twitter to process massive streams of data from the Twitter firehose.
  • 57. | © Copyright 2015 Hitachi Consulting57 Introducing Apache Storm Overview A distributed, scalable, high-performance, reliable, fault- tolerant, open source real-time stream processing and continuous computation system. A widely-used stream processing solution in the Big Data world, (along with Spark Streaming). Originally used by Twitter to process massive streams of data from the Twitter firehose.
  • 58. | © Copyright 2015 Hitachi Consulting58 Introducing Apache Storm Overview A distributed, scalable, high-performance, reliable, fault- tolerant, open source real-time stream processing and continuous computation system. A widely-used stream processing solution in the Big Data world, (along with Spark Streaming). Flexible custom development using Java (and C# on HDInsight). Originally used by Twitter to process massive streams of data from the Twitter firehose.
  • 59. | © Copyright 2015 Hitachi Consulting59 Introducing Apache Storm Overview A distributed, scalable, high-performance, reliable, fault- tolerant, open source real-time stream processing and continuous computation system. A widely-used stream processing solution in the Big Data world, (along with Spark Streaming). Provided by Microsoft Azure on HDInsight (IaaS+); you pay for the cluster, rather than the jobs, while Microsoft manages the cluster for you. Flexible custom development using Java (and C# on HDInsight). Originally used by Twitter to process massive streams of data from the Twitter firehose.
  • 60. | © Copyright 2015 Hitachi Consulting60 Introducing Apache Storm Overview A distributed, scalable, high-performance, reliable, fault- tolerant, open source real-time stream processing and continuous computation system. A widely-used stream processing solution in the Big Data world, (along with Spark Streaming). Provided by Microsoft Azure on HDInsight (IaaS+); you pay for the cluster, rather than the jobs, while Microsoft manages the cluster for you. Integrates with Message Queuing solutions, such as Apache Kafka and Azure Event Hubs. Flexible custom development using Java (and C# on HDInsight). Originally used by Twitter to process massive streams of data from the Twitter firehose.
  • 61. | © Copyright 2015 Hitachi Consulting61 Introducing Apache Storm Storm & Hadoop Big Data Ecosystem Hadoop Distributed File System (HDFS) Applications In-Memory Stream SQL  Spark- SQL NoSQL Machine Learning …. Batch Yet Another Resource Negotiator (YARN) Search Orchest. MgmntAcquisition Named Node DataNode 1 DataNode 2 DataNode 3 DataNode N
  • 62. | © Copyright 2015 Hitachi Consulting62 Introducing Apache Storm Storm & Hadoop Big Data Ecosystem Hadoop Distributed File System (HDFS) Storm Cluster …. Yet Another Resource Negotiator (YARN)Named Node DataNode 1 DataNode 2 DataNode 3 DataNode N Master Node <Nimbus> Worker Node 1 <Supervisor> Worker Node 2 <Supervisor> Worker Node N <Supervisor> …. Zookeeper Services
  • 63. | © Copyright 2015 Hitachi Consulting63 Introducing Apache Storm Storm & Hadoop Big Data Ecosystem • Runs a daemon called "Nimbus“ • Responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures. Master Node • Runs a daemon called the "Supervisor“ • Listens for work assigned to its machine and starts and stops worker processes as necessary based on what Nimbus has assigned to it. Worker Node • Coordinates between Nimbus and the Supervisors. • All state is kept in Zookeeper or on local disk • Nimbus or the Supervisors can go down and they'll start back up like nothing happened. Zookeeper (On a Hadoop Cluster)
  • 64. | © Copyright 2015 Hitachi Consulting64 Introducing Apache Storm Basics {…} Tuple Unit of data (set of key/value pairs)
  • 65. | © Copyright 2015 Hitachi Consulting65 Introducing Apache Storm Basics Stream {…} Tuple {…} {…} {…} {…} {…} {…} Unit of data (set of key/value pairs) Unbounded sequence of tuples
  • 66. | © Copyright 2015 Hitachi Consulting66 Introducing Apache Storm Basics Spout Stream {…} Tuple {…} {…} {…} {…} {…} {…} Unit of data (set of key/value pairs) Unbounded sequence of tuples Stream Source Wrapper Emits tuples
  • 67. | © Copyright 2015 Hitachi Consulting67 Introducing Apache Storm Basics BoltSpout Stream {…} Tuple {…} {…} {…} {…} {…} {…} Unit of data (set of key/value pairs) Unbounded sequence of tuples Stream Source Wrapper Emits tuples - Receives Tuples - Write to a data store - Read from a data store - Compute - Emits additional tuples
  • 68. | © Copyright 2015 Hitachi Consulting68 Introducing Apache Storm Basics BoltSpout Stream {…} Tuple Topology {…} {…} {…} {…} {…} {…} Unit of data (set of key/value pairs) Unbounded sequence of tuples Stream Source Wrapper Emits tuples - Receives Tuples - Write to a data store - Read from a data store - Compute - Emits additional tuples Graph of stream transformations Each node is a spout or bolt
  • 69. | © Copyright 2015 Hitachi Consulting69 Getting Started with Storm on HDInsight
  • 70. | © Copyright 2015 Hitachi Consulting70 Introducing Apache Storm Getting Started – Creating HDInsight Cluster
  • 71. | © Copyright 2015 Hitachi Consulting71 Introducing Apache Storm Getting Started – Creating HDInsight Cluster
  • 72. | © Copyright 2015 Hitachi Consulting72 Introducing Apache Storm  Install Azure SDK for Visual Studio https://azure.microsoft.com/en-gb/downloads/  Create Storm Project Creating Storm App in Visual Studio
  • 73. | © Copyright 2015 Hitachi Consulting73 Introducing Apache Storm SCP.NET Spout Bolt3 Bolt1 Bolt2
  • 74. | © Copyright 2015 Hitachi Consulting74 Introducing Apache Storm Stream groupings Grouping Description Shuffle Sends tuples to bolts in random, round robin sequence Fields Sends tuples to a bolt based on one or more fields in the tuple All Sends a single copy of each tuple to all instances of a receiving bolt Global Sends tuples from all instances of a source to a single target instance Stream groupings determine how Storm routes Tuples between tasks in a topology ??? {…}
  • 75. | © Copyright 2015 Hitachi Consulting79 Introducing Azure Stream Analytics Overview Fully-managed real-time processing • Intake millions of events per second • Processing on continuous streams of data • Reference data lookup • Output to live dashboards and data sores Mission Critical Reliability • Guaranteed events delivery • Preserves event order pre-device basis • Guaranteed business continuity • Auto-recovery from failures No challenges with Scale • Elasticity for scale up or scale down • Distributed, scale-out architecture • Pay only for the resources you use Rapid Development & Deployment • SQL-like Language • Built-in temporal semantics • Up and running in a few clicks • Scheduling and Monitoring A PaaS real-time complex event processing (CEP) on Microsoft Azure
  • 76. | © Copyright 2015 Hitachi Consulting80 Introducing Azure Stream Analytics Overview Data Source Ingest/Queue Process ConsumeDeliver Event Inputs - Event Hub - Azure Blob - DocumentDB (coming soon) Transform - Temporal joins - Filter - Aggregates - Projections - Windows - REST APIs (coming soon) Enrich Azure ML Outputs - SQL Azure - Azure Blobs - Event Hub - Service Bus Queue - Service Bus Topics - Table storage - DocumentDB - PowerBI Azure Storage  Distributed  Lowlatency  Highthroughputs  Scalable-Reliable  Lowcost Azure Stream Analytics Reference Data - Azure Blob - HBase (coming soon) Power BI Dashboard
  • 77. | © Copyright 2015 Hitachi Consulting81 Getting Started with Azure Stream Analytics
  • 78. | © Copyright 2015 Hitachi Consulting82 Introducing Azure Stream Analytics Getting Started  Everything is done on Azure Portal  Create a Stream Analytics Job  Add Inputs  Add Outputs  Define Processing Query  Scale and Configure
  • 79. | © Copyright 2015 Hitachi Consulting83 Introducing Azure Stream Analytics Getting Started – Create a Stream Analytics job
  • 80. | © Copyright 2015 Hitachi Consulting84 Introducing Azure Stream Analytics Getting Started – Create a Stream Analytics job
  • 81. | © Copyright 2015 Hitachi Consulting85 Introducing Azure Stream Analytics Getting Started – Scale
  • 82. | © Copyright 2015 Hitachi Consulting86 Introducing Azure Stream Analytics Getting Started – Configure
  • 83. | © Copyright 2015 Hitachi Consulting87 Introducing Azure Stream Analytics Getting Started – Add inputs to your job • Currently supported input Data Streams are Azure Event Hub , Azure IoT Hub and Azure Blob Storage. • Advanced options lets you configure how the Job will read data from the input • Reference data is usually static or changes very slowly over time (e.g. product catalog, customer info). • Currently Azure Blob Storage only • Cached for performance
  • 84. | © Copyright 2015 Hitachi Consulting88 Introducing Azure Stream Analytics Getting Started – Define input schema  The serialization format and the encoding for the input data sources must be specified  Currently three formats are supported: CSV, JSON and Avro, with optional schema for the CSV and AVRO formats After creation of the input, configurations can be changed, connection can be tested, and sample (synthetic) data can be generated (based on the supplied structure)
  • 85. | © Copyright 2015 Hitachi Consulting89 Introducing Azure Stream Analytics Getting Started – Add an output to your job Currently data stores supported as outputs  Azure Blob storage - Creates log files with temporal query results for batch processing and achieving.  Azure Table storage – NoSQL storage that is more flexible than SQL database and durable (in contrast to event hub)  Azure SQL Database - Stores results in Azure SQL Database table. Ideal as source for traditional reporting and analysis  Event hub - Sends an event to an event hub. Ideal to generate actionable events such as alerts or notifications  Service Bus Queue/Topics: sends an event on a queue. Ideal for process integration  PowerBI – Live dashboard and real-time reporting.  DocumentDB: NoSQL data store that works json object documents
  • 86. | © Copyright 2015 Hitachi Consulting90 Introducing Azure Stream Analytics Getting Started – Query
  • 87. | © Copyright 2015 Hitachi Consulting91 Stream Analytics Query Language
  • 88. | © Copyright 2015 Hitachi Consulting92 Stream Analytics Query Language SA Query Language Data Types bigint float nvarchar(max) datetime Date and Time Functions DateName DatePart Day Month Year DateTimeFromParts DateDiff DateAdd Scaling Extensions WITH PARTITION BY OVER Windowing Extensions TumblingWindow HoppingWindow SlidingWindow Duration Aggregate Functions Sum Count Avg Min Max StDev StDevP Var VarP DML SELECT FROM WHERE GROUP BY HAVING CASE WHEN THEN ELSE INNER/LEFT OUTER JOIN UNION CROSS/OUTER APPLY CAST INTO ORDER BY ASC, DSC String Functions Len Concat CharIndex Substring PatIndex Temporal Functions Lag, IsFirst CollectTop
  • 89. | © Copyright 2015 Hitachi Consulting93 Stream Analytics Query Language Important clauses INTO clause  Pipelines the data from input to output  Can have multiple outputs SELECT <columns, derived columns> INTO <output A> FROM <input x> WHERE <condition 1> SELECT <columns, derived columns> INTO <output B> FROM <input x> WHERE <condition 2> JOIN clause  Combine multiple event streams  Combine event streams with reference data SELECT <columns, derived columns> INTO <output A> FROM <stream1> JOIN <stream2> ON DATEDIFF( Minutes, stream1.time, stream2.time) BETWEEN 0 AND 1 AND <stream1.Key> = <stream2.Key> JOIN <ReferenceData> ON <stream1.Key> = <ReferenceData.Key> CTEs  To implement more complex logic and support multiple steps WITH Step1 AS ( SELECT Count(*) AS CountTweets, Topic FROM TwitterStream PARTITION BY PartitionId GROUP BY TumblingWindow(second, 3), Topic, PartitionId), Step2 AS ( SELECT Avg(CountTweets) FROM Step1GROUP BY TumblingWindow(minute, 3)) SELECT * INTO Output1 FROM Step2 Time stamping  Application time  System time SELECT <columns, derived columns>, OrderDate FROM <input> TIMESTAMP BY EventTime - - app time SELECT <columns, derived columns>, System.Time AS EventTime FROM <input> TIMESTAMP BY EventTime - - sys time from event hub or azure blob storage
  • 90. | © Copyright 2015 Hitachi Consulting94 Stream Analytics Query Language  In data streams, a common requirement is to perform aggregation (max, min, sum, count, etc.) over messages that arrive within a specified period of time (window) - to detect events.  Each Group By requires a windowing function  Each window operation outputs a single event at the end of the window  All windows have a fixed length Windowing Functions Tumbling window Aggregate per time interval Hopping window Schedule overlapping windows Sliding window Windows constant re-evaluated
  • 91. | © Copyright 2015 Hitachi Consulting95 Stream Analytics Query Language Windowing Functions – Thumbing Window 1 5 4 26 Time (secs) 1 5 4 26 A20-secondTumbling Window Tumbling windows:  Repeat  non-overlapping  An event can belong to only one tumbling window
  • 92. | © Copyright 2015 Hitachi Consulting96 Stream Analytics Query Language Windowing Functions – Thumbing Window 1 5 4 26 8 6 Time (secs) 1 5 4 26 8 6 A20-secondTumbling Window Tumbling windows:  Repeat  non-overlapping  An event can belong to only one tumbling window
  • 93. | © Copyright 2015 Hitachi Consulting97 Stream Analytics Query Language Windowing Functions – Thumbing Window 1 5 4 26 8 6 5 Time (secs) 1 5 4 26 8 6 A20-secondTumbling Window 3 6 1 5 3 6 1 Tumbling windows:  Repeat  non-overlapping  An event can belong to only one tumbling window
  • 94. | © Copyright 2015 Hitachi Consulting98 Stream Analytics Query Language Windowing Functions – Thumbing Window 1 5 4 26 8 6 5 Time (secs) 1 5 4 26 8 6 A20-secondTumbling Window 3 6 1 5 3 6 1 Tumbling windows:  Repeat  non-overlapping  An event can belong to only one tumbling window SELECT TollId, COUNT(*) FROM EntryStream TIMESTAMP BY EntryTime GROUP BY TollId, TumblingWindow(second, 20) Query: Count the total number of vehicles entering each toll booth every interval of 20 seconds. TumblingWindow(<time interval>, <window size>)
  • 95. | © Copyright 2015 Hitachi Consulting99 Stream Analytics Query Language Windowing Functions – Hopping Window 1 5 4 26 A20-second Hopping Window with a 10 second “Hop” Hopping windows:  Repeat  Can overlap  Hop forward in time by a fixed period  Events can belong to more than one hopping window 1 5 4 26
  • 96. | © Copyright 2015 Hitachi Consulting100 Stream Analytics Query Language Windowing Functions – Hopping Window 1 5 4 26 A20-second Hopping Window with a 10 second “Hop” Hopping windows:  Repeat  Can overlap  Hop forward in time by a fixed period  Events can belong to more than one hopping window 4 26 1 5 4 26
  • 97. | © Copyright 2015 Hitachi Consulting101 Stream Analytics Query Language Windowing Functions – Hopping Window 1 5 4 26 8 6 A20-second Hopping Window with a 10 second “Hop” Hopping windows:  Repeat  Can overlap  Hop forward in time by a fixed period  Events can belong to more than one hopping window 4 26 8 6 1 5 4 26
  • 98. | © Copyright 2015 Hitachi Consulting102 Stream Analytics Query Language Windowing Functions – Hopping Window 1 5 4 26 8 6 A20-second Hopping Window with a 10 second “Hop” Hopping windows:  Repeat  Can overlap  Hop forward in time by a fixed period  Events can belong to more than one hopping window 4 26 8 6 1 5 4 26 8 6 5 3 5 3
  • 99. | © Copyright 2015 Hitachi Consulting103 Stream Analytics Query Language Windowing Functions – Hopping Window 1 5 4 26 8 6 A20-second Hopping Window with a 10 second “Hop” Hopping windows:  Repeat  Can overlap  Hop forward in time by a fixed period  Events can belong to more than one hopping window 4 26 8 6 5 3 6 1 1 5 4 26 8 6 5 3 6 15 3
  • 100. | © Copyright 2015 Hitachi Consulting104 Stream Analytics Query Language Windowing Functions – Hopping Window 1 5 4 26 8 6 A20-second Hopping Window with a 10 second “Hop” Hopping windows:  Repeat  Can overlap  Hop forward in time by a fixed period  Events can belong to more than one hopping window SELECT COUNT(*), TollId FROM EntryStream TIMESTAMP BY EntryTime GROUP BY TollId, HoppingWindow (second, 20,10) 4 26 8 6 5 3 6 1 1 5 4 26 8 6 5 3 6 15 3 QUERY: Count the number of vehicles entering each toll booth every interval of 20 seconds; update results every 10 seconds HoppingWindow (<time interval>, <window size>, <hop size>)
  • 101. | © Copyright 2015 Hitachi Consulting105 Stream Analytics Query Language Windowing Functions – Sliding Window 1 A20-second Sliding Window Sliding window:  Continuously moves forward by an ε (epsilon)  Produces an output only during the occurrence of a message  Every windows will have at least one event  Events can belong to more than one sliding window1
  • 102. | © Copyright 2015 Hitachi Consulting106 Stream Analytics Query Language Windowing Functions – Sliding Window 1 5 A20-second Sliding Window Sliding window:  Continuously moves forward by an ε (epsilon)  Produces an output only during the occurrence of a message  Every windows will have at least one event  Events can belong to more than one sliding window1 5 1
  • 103. | © Copyright 2015 Hitachi Consulting107 Stream Analytics Query Language Windowing Functions – Sliding Window 1 5 A20-second Sliding Window Sliding window:  Continuously moves forward by an ε (epsilon)  Produces an output only during the occurrence of a message  Every windows will have at least one event  Events can belong to more than one sliding window1 5 1 9 9 5 1
  • 104. | © Copyright 2015 Hitachi Consulting108 Stream Analytics Query Language Windowing Functions – Sliding Window 1 5 A20-second Sliding Window Sliding window:  Continuously moves forward by an ε (epsilon)  Produces an output only during the occurrence of a message  Every windows will have at least one event  Events can belong to more than one sliding window1 8 8 5 1 9 9 5 1
  • 105. | © Copyright 2015 Hitachi Consulting109 Stream Analytics Query Language Windowing Functions – Sliding Window 1 5 A20-second Sliding Window Sliding window:  Continuously moves forward by an ε (epsilon)  Produces an output only during the occurrence of a message  Every windows will have at least one event  Events can belong to more than one sliding window SELECT TollId, Count(*) FROM EntryStream ES GROUP BY TollId, SlidingWindow (second, 20) HAVING Count(*) > 10 Query: Find all the toll booths which have served more than 10 vehicles in the last 20 seconds 1 8 8 5 1 9 9 5 1 SlidingWindow (<time interval>, <window size>)
  • 106. | © Copyright 2015 Hitachi Consulting110 Stream Analytics Query Language  Aggregation and filter: compute (sum., max., min., avg.) value over a time window. E.g. What is the average Maximum temperature and Average pressure read by the sensor in a 60 second window?  Counting unique values: count the number of unique field values that appear in the stream within a time window. E.g. How many unique make of cars passed through the toll booth in a 2 second window?  Determine if a value has changed: Look at a previous value to determine if it is different than the current value. E.g. Is the previous car on the Toll Road the same make as the current car?  Find first/last event in a window: Find first/last car in every 10 minute interval.  Detect the absence of events: Check that a stream has no value that matches a certain criteria. E.g. Have 2 consecutive cars from the same make entered the toll road within 90 seconds?  Detect duration between events: Find the duration of a given event. E.g. Given a web clickstream determine time spent on a feature.  Detect duration of a condition: Find out how long a condition occurred for. E.g. Suppose that a bug that resulted in all cars having an incorrect weight (above 20,000 pounds) – compute the duration of the bug.  Fill missing values: For the stream of events that have missing values, produce a stream of events with regular intervals. E.g. generate event every 5 seconds that will report the most recently seen data point. Useful SA Query Patterns https://azure.microsoft.com/en-gb/documentation/articles/stream-analytics-stream-analytics-query-patterns
  • 107. | © Copyright 2015 Hitachi Consulting111 Apache Storm vs Azure Stream Analytics The face-off… Microsoft Azure Stream Analytics Documentation Feature Azure Stream Analytics Apache Strom on HDInsight Geared for Event Detection Stream Processing Open Source No – It is a Microsoft Azure Service Yes – it is Apache Service Type PaaS – Deploy, Execute and Monitor Jobs SaaS + - Provision HDInsight Storm Cluster Pricing You pay for the data/jobs You pay for the cluster Scalability Number of Streaming Units Number of nodes of the cluster Processing SQL Like query + Temporal operations + Azure Machine Learning (published models API calls) Java or C# (custom extensibility) Dev. Experience Azure Portal – Easy – Limited Visual Studio – More involved – Flexible Limitations No UDF, No Web API calls (coming soon) You need to Implement aggregations and temporal operation Input Data Source Azure Event Hubs and Azure Blobs Connectors (Event Hub, Service Bus, Kafka, custom) Input Data Format CSV, JSON Anything – Custom code is need to parse Output Data Sink Azure Event Hubs, Azure Blob Storage, Azure Tables, Azure SQL DB, DocumentDB, and PowerBI. PowerBI, Azure Event Hubs, Azure Blob Store, Azure DocumentDB, SQL DB, HBase, Custom Reference Data Azure Blobs with max size of 100 MB of in-memory lookup cache. No limits on data size. Connectors available for HBase, DocumentDB, SQL, custom
  • 108. | © Copyright 2015 Hitachi Consulting112 How to Get Started with Stream Processing?  Read the slides!  MVA – Big Data Analytics with HDInsight: Hadoop on Azure https://mva.microsoft.com/en-US/training-courses/big-data-analytics-with-hdinsight-hadoop-on-azure-10551  MVA – Implementing Big Data Analysis https://mva.microsoft.com/en-US/training-courses/implementing-big-data-analysis-8311?l=44REr2Yy_5404984382  Azure Documentation – Storm on HDInsight https://azure.microsoft.com/en-gb/documentation/services/hdinsight/  Azure Documentation – EventHub https://azure.microsoft.com/en-gb/documentation/articles/event-hubs-overview/  Azure Documentation – Stream Analytics https://azure.microsoft.com/en-gb/documentation/services/stream-analytics/  Apache Storm https://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html O’Reliy Books– Getting Started with Storm
  • 109. | © Copyright 2015 Hitachi Consulting113 DEMO
  • 110. | © Copyright 2015 Hitachi Consulting114 Images Stream Temperature/ Pressure Consume Events Image Emotion Emotion Consume Emotion Events Consume Sensor Data Output to real-time dashboard Output to real-time dashboard
  • 111. | © Copyright 2015 Hitachi Consulting115 My Background Applying Computational Intelligence in Data Mining • Honorary Research Fellow, School of Computing , University of Kent. • Ph.D. Computer Science, University of Kent, Canterbury, UK. • M.Sc. Computer Science , The American University in Cairo, Egypt. • 25+ published journal and conference papers, focusing on: – classification rules induction, – decision trees construction, – Bayesian classification modelling, – data reduction, – instance-based learning, – evolving neural networks, and – data clustering • Journals: Swarm Intelligence, Swarm & Evolutionary Computation, , Applied Soft Computing, and Memetic Computing. • Conferences: ANTS, IEEE CEC, IEEE SIS, EvoBio, ECTA, IEEE WCCI and INNS-BigData. ResearchGate.org
  • 112. | © Copyright 2015 Hitachi Consulting116 Thank you!