Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Real-Time Event & Stream Processing on MS Azure

2,987 views

Published on

These slides discuss the main concepts of event & stream processing, as well as the related technologies on Microsoft Azure. We start by giving and overview of what Event & Stream Processing is. Then we describe the canonical architecture of a Stream Processing solution. We will delve into Message Queuing part of the solution. After that, we Introduce Apache Storm on HDInsight, as well as Azure Stream Analytics. We compare Apache Storm to Azure Stream Analytics, and finally conclude with useful resources

Published in: Data & Analytics
  • Be the first to comment

Real-Time Event & Stream Processing on MS Azure

  1. 1. | © Copyright 2015 Hitachi Consulting1 Real-Time Event and Stream Processing with Microsoft Azure Khalid M. Salama Microsoft Business Intelligence Hitachi Consulting UK We Make it Happen. Better.
  2. 2. | © Copyright 2015 Hitachi Consulting2 Outline  What is Event & Stream Processing?  Stream Processing Architecture  Message Queuing  Introducing Apache Storm  Introducing Azure Stream Analytics  Apache Storm vs Azure Stream Analytics  Useful Resources
  3. 3. | © Copyright 2015 Hitachi Consulting3 Fundamentals
  4. 4. | © Copyright 2015 Hitachi Consulting4 What is Event & Stream Processing? Terms Real-time processing of a continuous sequence of data points (stream), by applying a series of operations (kernel functions) on each data point. Stream Processing
  5. 5. | © Copyright 2015 Hitachi Consulting5 What is Event & Stream Processing? Terms Real-time processing of a continuous sequence of data points (stream), by applying a series of operations (kernel functions) on each data point. Stream Processing Real-time detection events from a data stream, via aggregating data points in a time frame, to perform subsequent actions. Event Processing
  6. 6. | © Copyright 2015 Hitachi Consulting6 What is Event & Stream Processing? Tell me more… Stream Processing 𝑃4 𝑃1 ``` 𝑃2 ``𝑃3 ` Operation 1 Operation 2 Operation 3 Final product 𝑃∞ 𝑃7 𝑃6 𝑃5… Queued Data Points
  7. 7. | © Copyright 2015 Hitachi Consulting7 What is Event & Stream Processing? Tell me more… Stream Processing Event Processing 𝑃∞ 𝑃7 𝑃6 𝑃5… 𝑃4 𝑃1 ``` 𝑃2 ``𝑃3 ` Operation 1 Operation 2 Operation 3 Queued Data Points Final product 𝑃∞ 𝑃7 𝑃6 𝑃5… 𝑃4 𝑃2𝑃3 Queued Data Points { Event? Notifications / Actions
  8. 8. | © Copyright 2015 Hitachi Consulting8 What is Event & Stream Processing? Data at rest vs. Data in motion Traditional – Working with data at rest Real-time – Working with data at motion Data Store Bulk-load & Batch Processing Submit Query Get Results Continuous Processing & Query Contiguous Data Stream Static Reference Data Actions & Data Archiving Real-time Continuous Results
  9. 9. | © Copyright 2015 Hitachi Consulting9 Lambda Architecture The speed layer and stream processing Hot Path Cold Path
  10. 10. | © Copyright 2015 Hitachi Consulting10 Scenarios for Stream Processing The hot path… • Predictive Maintenance • Energy Efficiency & Smart Cities IoT & Device Telemetry • Real-time Sentiment analysis • Crisis Management Social Media Analytics • Identity theft and stolen credit card details • identify a fraudulent transactionFraud Detection • Maintain a continual level of stock to support unpredictable purchasing habitsInventory Management • User Experience Improvements • Targeted RecommendationsClickstream Analytics
  11. 11. | © Copyright 2015 Hitachi Consulting11 System Architecture
  12. 12. | © Copyright 2015 Hitachi Consulting12 Events & Stream Processing Architecture The Canonical System
  13. 13. | © Copyright 2015 Hitachi Consulting13 Events & Stream Processing Architecture The Canonical System Event Triggers Applications Web and social Devices Sensors
  14. 14. | © Copyright 2015 Hitachi Consulting14 Events & Stream Processing Architecture The Canonical System Data Stream Collection Message Queuing Event Triggers Applications Web and social Devices Sensors Producer/Consumer Mediator
  15. 15. | © Copyright 2015 Hitachi Consulting15 Events & Stream Processing Architecture The Canonical System Stream Processing Data Stream Collection Message Queuing Event Triggers Applications Web and social Devices Sensors Machine Learning Processing and event detection Web API calls Producer/Consumer Mediator
  16. 16. | © Copyright 2015 Hitachi Consulting16 Events & Stream Processing Architecture The Canonical System Stream Processing Data Stream Collection Message Queuing Event Triggers Applications Web and social Devices Sensors Machine Learning Processing and event detection Web API calls Reference Data Producer/Consumer Mediator
  17. 17. | © Copyright 2015 Hitachi Consulting17 Events & Stream Processing Architecture The Canonical System Stream Processing Data Stream Collection Message Queuing Storage and Batch Analysis Event Triggers Applications Web and social Devices Sensors Machine Learning Processing and event detection Web API calls Reference Data Producer/Consumer Mediator
  18. 18. | © Copyright 2015 Hitachi Consulting18 Events & Stream Processing Architecture The Canonical System Stream Processing Data Stream Collection Presentation and Action Message Queuing Storage and Batch Analysis Live Dashboards & Analytics Apps and Devices to take actions Ingress Event Triggers Applications Web and social Devices Sensors Machine Learning Processing and event detection Web API calls Reference Data Producer/Consumer Mediator
  19. 19. | © Copyright 2015 Hitachi Consulting19 Events & Stream Processing Architecture  Devices, Websites, and Apps that continuously produce data streamsData Sources  Listen to, collection, and transfer in-bound eventsData Collection  De-couples data consumers from data producers  Reliable, distributed fault-tolerant, high-throughputs short-tem storage Message Queuing  Aggregate / filter / join incoming event streams  Temporal engine for analysing data across time-series windows Stream Processing  High throughputs, random access data store to support processing  Usually NoSQL data stores Reference Data  Store processed/ aggregated/ filtered data (SQL/NoSQL)  Consolidate and store raw data into files for batch analysis (DFS) Storage  Rich interactive visualizations for real-time data analysis  Application integration for process automation Presentation
  20. 20. | © Copyright 2015 Hitachi Consulting20 Events & Stream Processing Architecture Tools & Technologies Stream Processing Data Stream Collection Presentation and Action Message Queuing Storage and Batch Analysis PowerBI Live Dashboards Apps and Devices to take actions Ingress Event Triggers Applications Web and social Devices Sensors Azure ML Spark Streaming on HDInsight Storm on HDInsight Reference Data Apache Kafka Azure Event Hub Azure Service Bus HDFS Azure SQL DB/DW Azure Steam Analytics Azure IoT Hub
  21. 21. | © Copyright 2015 Hitachi Consulting21 Message Queuing
  22. 22. | © Copyright 2015 Hitachi Consulting22 Message Queuing A message is a data object to be processed (purchase order, sensor readings, tweets, etc.) Message Queuing systems are useful for:  Decoupling message producers from consumer  Increase Reliability (guaranteed delivery)  Reducing latency (fire and forget)  Load throttling (rate-levelling) Queue-Centric Solutions (in my own words!)
  23. 23. | © Copyright 2015 Hitachi Consulting23 Message Queuing Decouple producers and consumers Queue-Centric Solutions (in my own words!) Originator Processor
  24. 24. | © Copyright 2015 Hitachi Consulting24 Message Queuing Decouple producers and consumers Queue-Centric Solutions (in my own words!) Originator Processor Processor 2
  25. 25. | © Copyright 2015 Hitachi Consulting25 Message Queuing Decouple producers and consumers Queue-Centric Solutions (in my own words!) Originator Processor Processor 2 Processor 3 Originator 2
  26. 26. | © Copyright 2015 Hitachi Consulting26 Message Queuing Decouple producers and consumers Queue-Centric Solutions (in my own words!) Originator Processor Processor 2 Processor 3 Originator 2 Queueing Service Originator 2
  27. 27. | © Copyright 2015 Hitachi Consulting27 Message Queuing Increase Reliability Queue-Centric Solutions (in my own words!) Originator Processor Available Message Delivered
  28. 28. | © Copyright 2015 Hitachi Consulting28 Message Queuing Increase Reliability Queue-Centric Solutions (in my own words!) Originator Processor Not Available Message Lost
  29. 29. | © Copyright 2015 Hitachi Consulting29 Message Queuing Increase Reliability Queue-Centric Solutions (in my own words!) Originator Processor Not Available Message is queued Queueing Service Guaranteed delivery Processed when processor is available again
  30. 30. | © Copyright 2015 Hitachi Consulting30 Message Queuing Reduce Latency Queue-Centric Solutions (in my own words!) Originator Processor 1 – Send message
  31. 31. | © Copyright 2015 Hitachi Consulting31 Message Queuing Reduce Latency Queue-Centric Solutions (in my own words!) Originator Processor 1 – Send message 2 – Wait to finish processing
  32. 32. | © Copyright 2015 Hitachi Consulting32 Message Queuing Reduce Latency Queue-Centric Solutions (in my own words!) Originator Processor 1 – Send message 2 – Wait to finish processing 3 – Send a new message
  33. 33. | © Copyright 2015 Hitachi Consulting33 Message Queuing Reduce Latency Queue-Centric Solutions (in my own words!) Originator Processor Keep on queuing messages (no need to wait) Queueing Service Messages are processed later, then a confirmation is sent
  34. 34. | © Copyright 2015 Hitachi Consulting34 Message Queuing Load levelling Queue-Centric Solutions (in my own words!) Originator Processor Normal requests load
  35. 35. | © Copyright 2015 Hitachi Consulting35 Message Queuing Load levelling Queue-Centric Solutions (in my own words!) Originator Processor Originator Originator Originator Originator Sudden Increase requests May bring the service down
  36. 36. | © Copyright 2015 Hitachi Consulting36 Message Queuing Load levelling Queue-Centric Solutions (in my own words!) Originator Processor Originator Originator Originator Originator Sudden Increase requests Process requests on the desired pace Queueing Service
  37. 37. | © Copyright 2015 Hitachi Consulting37 Message Queuing Microsoft Azure Azure Service Bus Relay Sender Producer Publisher Producer Sender/ Receiver Queue Topic Event Hubs Notification Hubs Receiver Consumer Subscriber Consumer Sender/ Receiver  NAT and Firewall Traversal Service Request/Response Services Unbuffered with TCP Throttling  Transactional Cloud AMQP/HTTP Broker  High-Scale, High-Reliability Messaging Sessions, Scheduled Delivery, etc.  Transactional Message Distribution  Up to 2000 subscriptions per Topic  Up to 2K/100K filter rules per subscription  High-scale notification distribution  Most mobile push notification services  Millions of notification targets  Hyper Scale.  A Million Clients.  Concurrent.
  38. 38. | © Copyright 2015 Hitachi Consulting38 Azure Event Hubs Event Hub Producer |||||||||||||||||||||||| Consumer Highly scalable data ingress service that can ingest millions of events per second
  39. 39. | © Copyright 2015 Hitachi Consulting39 Azure Event Hubs Event Hub Producer |||||||||||||||||||||||| Consumer EventData • EnqueuedTime • PartitionKey • Offset • SequenceNumber • Body • UserProperties • SystemProperties Messages (EventData) are retained for a certain (configurable) period of time in the hub Highly scalable data ingress service that can ingest millions of events per second
  40. 40. | © Copyright 2015 Hitachi Consulting40 Azure Event Hubs Event Hub Producer |||||||||||||||||||||||| Consumer EventData • EnqueuedTime • PartitionKey • Offset • SequenceNumber • Body • UserProperties • SystemProperties Messages (EventData) are retained for a certain (configurable) period of time in the hub IEventProcessor • OpenAsyn() • ProcessEventAsync() • Close() Highly scalable data ingress service that can ingest millions of events per second
  41. 41. | © Copyright 2015 Hitachi Consulting41 Azure Event Hubs Event Hub Partition 1 Partition 2 Partition 3 Partition 32 Producer 2 Producer N Producer 1 . . . |||||||||||||||||||||||| ||||||||||| |||||||||||||||||||||||||||||||| ||||||||||||||||||||||| . . . Partition to scale and improve computation distribution Highly scalable data ingress service that can ingest millions of events per second
  42. 42. | © Copyright 2015 Hitachi Consulting42 Azure Event Hubs Event Hub Partition 1 Partition 2 Partition 3 Partition 32 Producer 2 Producer N Producer 1 . . . Consumer Group 1 Reader 1 Reader 2 Reader N… |||||||||||||||||||||||| ||||||||||| |||||||||||||||||||||||||||||||| ||||||||||||||||||||||| . . . • Readers in the same group share the same partition pointer (read offset) • E.g. reader1 consumed ,msg 9, then reader3 will consume msg10 • Only one reader in a consumer group can access the partition at a time Partition to scale and improve computation distribution Highly scalable data ingress service that can ingest millions of events per second
  43. 43. | © Copyright 2015 Hitachi Consulting43 Azure Event Hubs Event Hub Partition 1 Partition 2 Partition 3 Partition 32 Producer 2 Producer N Producer 1 . . . Consumer Group 2 Reader 1 Reader 3 Reader N… |||||||||||||||||||||||| ||||||||||| |||||||||||||||||||||||||||||||| ||||||||||||||||||||||| . . . • Each consumer group has its own partition read offset • E.g. reader 1 group 1 consumed message 9, group 2 stated, then reader1 group 2 will consume message 1Partition to scale and improve computation distribution Consumer Group 1 Reader 1 Reader 2 Reader N… • Readers in the same group share the same partition pointer (read offset) • E.g. reader1 consumed , message 9, then reader3 will consume message 10 • Only one reader in a consumer group can access the partition at a time Highly scalable data ingress service that can ingest millions of events per second
  44. 44. | © Copyright 2015 Hitachi Consulting44 Getting Started with Azure Event Hubs
  45. 45. | © Copyright 2015 Hitachi Consulting45 Getting Started with Azure Event Hubs This is how we do it…
  46. 46. | © Copyright 2015 Hitachi Consulting46 Getting Started with Azure Event Hubs This is how we do it…
  47. 47. | © Copyright 2015 Hitachi Consulting47 Getting Started with Azure Event Hubs This is how we do it…
  48. 48. | © Copyright 2015 Hitachi Consulting48 Getting Started with Azure Event Hubs This is how we do it…
  49. 49. | © Copyright 2015 Hitachi Consulting49 Getting Started with Azure Event Hubs This is how we do it…
  50. 50. | © Copyright 2015 Hitachi Consulting50 Getting Started with Azure Event Hubs This is how we do it…
  51. 51. | © Copyright 2015 Hitachi Consulting51 Getting Started with Azure Event Hubs This is how we do it…
  52. 52. | © Copyright 2015 Hitachi Consulting52 Getting Started with Azure Event Hubs This is how we do it…
  53. 53. | © Copyright 2015 Hitachi Consulting53 Getting Started with Azure Event Hubs This is how we do it…
  54. 54. | © Copyright 2015 Hitachi Consulting54 Apache Storm
  55. 55. | © Copyright 2015 Hitachi Consulting55 Introducing Apache Storm Overview Originally used by Twitter to process massive streams of data from the Twitter firehose.
  56. 56. | © Copyright 2015 Hitachi Consulting56 Introducing Apache Storm Overview A distributed, scalable, high-performance, reliable, fault- tolerant, open source real-time stream processing and continuous computation system. Originally used by Twitter to process massive streams of data from the Twitter firehose.
  57. 57. | © Copyright 2015 Hitachi Consulting57 Introducing Apache Storm Overview A distributed, scalable, high-performance, reliable, fault- tolerant, open source real-time stream processing and continuous computation system. A widely-used stream processing solution in the Big Data world, (along with Spark Streaming). Originally used by Twitter to process massive streams of data from the Twitter firehose.
  58. 58. | © Copyright 2015 Hitachi Consulting58 Introducing Apache Storm Overview A distributed, scalable, high-performance, reliable, fault- tolerant, open source real-time stream processing and continuous computation system. A widely-used stream processing solution in the Big Data world, (along with Spark Streaming). Flexible custom development using Java (and C# on HDInsight). Originally used by Twitter to process massive streams of data from the Twitter firehose.
  59. 59. | © Copyright 2015 Hitachi Consulting59 Introducing Apache Storm Overview A distributed, scalable, high-performance, reliable, fault- tolerant, open source real-time stream processing and continuous computation system. A widely-used stream processing solution in the Big Data world, (along with Spark Streaming). Provided by Microsoft Azure on HDInsight (IaaS+); you pay for the cluster, rather than the jobs, while Microsoft manages the cluster for you. Flexible custom development using Java (and C# on HDInsight). Originally used by Twitter to process massive streams of data from the Twitter firehose.
  60. 60. | © Copyright 2015 Hitachi Consulting60 Introducing Apache Storm Overview A distributed, scalable, high-performance, reliable, fault- tolerant, open source real-time stream processing and continuous computation system. A widely-used stream processing solution in the Big Data world, (along with Spark Streaming). Provided by Microsoft Azure on HDInsight (IaaS+); you pay for the cluster, rather than the jobs, while Microsoft manages the cluster for you. Integrates with Message Queuing solutions, such as Apache Kafka and Azure Event Hubs. Flexible custom development using Java (and C# on HDInsight). Originally used by Twitter to process massive streams of data from the Twitter firehose.
  61. 61. | © Copyright 2015 Hitachi Consulting61 Introducing Apache Storm Storm & Hadoop Big Data Ecosystem Hadoop Distributed File System (HDFS) Applications In-Memory Stream SQL  Spark- SQL NoSQL Machine Learning …. Batch Yet Another Resource Negotiator (YARN) Search Orchest. MgmntAcquisition Named Node DataNode 1 DataNode 2 DataNode 3 DataNode N
  62. 62. | © Copyright 2015 Hitachi Consulting62 Introducing Apache Storm Storm & Hadoop Big Data Ecosystem Hadoop Distributed File System (HDFS) Storm Cluster …. Yet Another Resource Negotiator (YARN)Named Node DataNode 1 DataNode 2 DataNode 3 DataNode N Master Node <Nimbus> Worker Node 1 <Supervisor> Worker Node 2 <Supervisor> Worker Node N <Supervisor> …. Zookeeper Services
  63. 63. | © Copyright 2015 Hitachi Consulting63 Introducing Apache Storm Storm & Hadoop Big Data Ecosystem • Runs a daemon called "Nimbus“ • Responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures. Master Node • Runs a daemon called the "Supervisor“ • Listens for work assigned to its machine and starts and stops worker processes as necessary based on what Nimbus has assigned to it. Worker Node • Coordinates between Nimbus and the Supervisors. • All state is kept in Zookeeper or on local disk • Nimbus or the Supervisors can go down and they'll start back up like nothing happened. Zookeeper (On a Hadoop Cluster)
  64. 64. | © Copyright 2015 Hitachi Consulting64 Introducing Apache Storm Basics {…} Tuple Unit of data (set of key/value pairs)
  65. 65. | © Copyright 2015 Hitachi Consulting65 Introducing Apache Storm Basics Stream {…} Tuple {…} {…} {…} {…} {…} {…} Unit of data (set of key/value pairs) Unbounded sequence of tuples
  66. 66. | © Copyright 2015 Hitachi Consulting66 Introducing Apache Storm Basics Spout Stream {…} Tuple {…} {…} {…} {…} {…} {…} Unit of data (set of key/value pairs) Unbounded sequence of tuples Stream Source Wrapper Emits tuples
  67. 67. | © Copyright 2015 Hitachi Consulting67 Introducing Apache Storm Basics BoltSpout Stream {…} Tuple {…} {…} {…} {…} {…} {…} Unit of data (set of key/value pairs) Unbounded sequence of tuples Stream Source Wrapper Emits tuples - Receives Tuples - Write to a data store - Read from a data store - Compute - Emits additional tuples
  68. 68. | © Copyright 2015 Hitachi Consulting68 Introducing Apache Storm Basics BoltSpout Stream {…} Tuple Topology {…} {…} {…} {…} {…} {…} Unit of data (set of key/value pairs) Unbounded sequence of tuples Stream Source Wrapper Emits tuples - Receives Tuples - Write to a data store - Read from a data store - Compute - Emits additional tuples Graph of stream transformations Each node is a spout or bolt
  69. 69. | © Copyright 2015 Hitachi Consulting69 Getting Started with Storm on HDInsight
  70. 70. | © Copyright 2015 Hitachi Consulting70 Introducing Apache Storm Getting Started – Creating HDInsight Cluster
  71. 71. | © Copyright 2015 Hitachi Consulting71 Introducing Apache Storm Getting Started – Creating HDInsight Cluster
  72. 72. | © Copyright 2015 Hitachi Consulting72 Introducing Apache Storm  Install Azure SDK for Visual Studio https://azure.microsoft.com/en-gb/downloads/  Create Storm Project Creating Storm App in Visual Studio
  73. 73. | © Copyright 2015 Hitachi Consulting73 Introducing Apache Storm SCP.NET Spout Bolt3 Bolt1 Bolt2
  74. 74. | © Copyright 2015 Hitachi Consulting74 Introducing Apache Storm Stream groupings Grouping Description Shuffle Sends tuples to bolts in random, round robin sequence Fields Sends tuples to a bolt based on one or more fields in the tuple All Sends a single copy of each tuple to all instances of a receiving bolt Global Sends tuples from all instances of a source to a single target instance Stream groupings determine how Storm routes Tuples between tasks in a topology ??? {…}
  75. 75. | © Copyright 2015 Hitachi Consulting79 Introducing Azure Stream Analytics Overview Fully-managed real-time processing • Intake millions of events per second • Processing on continuous streams of data • Reference data lookup • Output to live dashboards and data sores Mission Critical Reliability • Guaranteed events delivery • Preserves event order pre-device basis • Guaranteed business continuity • Auto-recovery from failures No challenges with Scale • Elasticity for scale up or scale down • Distributed, scale-out architecture • Pay only for the resources you use Rapid Development & Deployment • SQL-like Language • Built-in temporal semantics • Up and running in a few clicks • Scheduling and Monitoring A PaaS real-time complex event processing (CEP) on Microsoft Azure
  76. 76. | © Copyright 2015 Hitachi Consulting80 Introducing Azure Stream Analytics Overview Data Source Ingest/Queue Process ConsumeDeliver Event Inputs - Event Hub - Azure Blob - DocumentDB (coming soon) Transform - Temporal joins - Filter - Aggregates - Projections - Windows - REST APIs (coming soon) Enrich Azure ML Outputs - SQL Azure - Azure Blobs - Event Hub - Service Bus Queue - Service Bus Topics - Table storage - DocumentDB - PowerBI Azure Storage  Distributed  Lowlatency  Highthroughputs  Scalable-Reliable  Lowcost Azure Stream Analytics Reference Data - Azure Blob - HBase (coming soon) Power BI Dashboard
  77. 77. | © Copyright 2015 Hitachi Consulting81 Getting Started with Azure Stream Analytics
  78. 78. | © Copyright 2015 Hitachi Consulting82 Introducing Azure Stream Analytics Getting Started  Everything is done on Azure Portal  Create a Stream Analytics Job  Add Inputs  Add Outputs  Define Processing Query  Scale and Configure
  79. 79. | © Copyright 2015 Hitachi Consulting83 Introducing Azure Stream Analytics Getting Started – Create a Stream Analytics job
  80. 80. | © Copyright 2015 Hitachi Consulting84 Introducing Azure Stream Analytics Getting Started – Create a Stream Analytics job
  81. 81. | © Copyright 2015 Hitachi Consulting85 Introducing Azure Stream Analytics Getting Started – Scale
  82. 82. | © Copyright 2015 Hitachi Consulting86 Introducing Azure Stream Analytics Getting Started – Configure
  83. 83. | © Copyright 2015 Hitachi Consulting87 Introducing Azure Stream Analytics Getting Started – Add inputs to your job • Currently supported input Data Streams are Azure Event Hub , Azure IoT Hub and Azure Blob Storage. • Advanced options lets you configure how the Job will read data from the input • Reference data is usually static or changes very slowly over time (e.g. product catalog, customer info). • Currently Azure Blob Storage only • Cached for performance
  84. 84. | © Copyright 2015 Hitachi Consulting88 Introducing Azure Stream Analytics Getting Started – Define input schema  The serialization format and the encoding for the input data sources must be specified  Currently three formats are supported: CSV, JSON and Avro, with optional schema for the CSV and AVRO formats After creation of the input, configurations can be changed, connection can be tested, and sample (synthetic) data can be generated (based on the supplied structure)
  85. 85. | © Copyright 2015 Hitachi Consulting89 Introducing Azure Stream Analytics Getting Started – Add an output to your job Currently data stores supported as outputs  Azure Blob storage - Creates log files with temporal query results for batch processing and achieving.  Azure Table storage – NoSQL storage that is more flexible than SQL database and durable (in contrast to event hub)  Azure SQL Database - Stores results in Azure SQL Database table. Ideal as source for traditional reporting and analysis  Event hub - Sends an event to an event hub. Ideal to generate actionable events such as alerts or notifications  Service Bus Queue/Topics: sends an event on a queue. Ideal for process integration  PowerBI – Live dashboard and real-time reporting.  DocumentDB: NoSQL data store that works json object documents
  86. 86. | © Copyright 2015 Hitachi Consulting90 Introducing Azure Stream Analytics Getting Started – Query
  87. 87. | © Copyright 2015 Hitachi Consulting91 Stream Analytics Query Language
  88. 88. | © Copyright 2015 Hitachi Consulting92 Stream Analytics Query Language SA Query Language Data Types bigint float nvarchar(max) datetime Date and Time Functions DateName DatePart Day Month Year DateTimeFromParts DateDiff DateAdd Scaling Extensions WITH PARTITION BY OVER Windowing Extensions TumblingWindow HoppingWindow SlidingWindow Duration Aggregate Functions Sum Count Avg Min Max StDev StDevP Var VarP DML SELECT FROM WHERE GROUP BY HAVING CASE WHEN THEN ELSE INNER/LEFT OUTER JOIN UNION CROSS/OUTER APPLY CAST INTO ORDER BY ASC, DSC String Functions Len Concat CharIndex Substring PatIndex Temporal Functions Lag, IsFirst CollectTop
  89. 89. | © Copyright 2015 Hitachi Consulting93 Stream Analytics Query Language Important clauses INTO clause  Pipelines the data from input to output  Can have multiple outputs SELECT <columns, derived columns> INTO <output A> FROM <input x> WHERE <condition 1> SELECT <columns, derived columns> INTO <output B> FROM <input x> WHERE <condition 2> JOIN clause  Combine multiple event streams  Combine event streams with reference data SELECT <columns, derived columns> INTO <output A> FROM <stream1> JOIN <stream2> ON DATEDIFF( Minutes, stream1.time, stream2.time) BETWEEN 0 AND 1 AND <stream1.Key> = <stream2.Key> JOIN <ReferenceData> ON <stream1.Key> = <ReferenceData.Key> CTEs  To implement more complex logic and support multiple steps WITH Step1 AS ( SELECT Count(*) AS CountTweets, Topic FROM TwitterStream PARTITION BY PartitionId GROUP BY TumblingWindow(second, 3), Topic, PartitionId), Step2 AS ( SELECT Avg(CountTweets) FROM Step1GROUP BY TumblingWindow(minute, 3)) SELECT * INTO Output1 FROM Step2 Time stamping  Application time  System time SELECT <columns, derived columns>, OrderDate FROM <input> TIMESTAMP BY EventTime - - app time SELECT <columns, derived columns>, System.Time AS EventTime FROM <input> TIMESTAMP BY EventTime - - sys time from event hub or azure blob storage
  90. 90. | © Copyright 2015 Hitachi Consulting94 Stream Analytics Query Language  In data streams, a common requirement is to perform aggregation (max, min, sum, count, etc.) over messages that arrive within a specified period of time (window) - to detect events.  Each Group By requires a windowing function  Each window operation outputs a single event at the end of the window  All windows have a fixed length Windowing Functions Tumbling window Aggregate per time interval Hopping window Schedule overlapping windows Sliding window Windows constant re-evaluated
  91. 91. | © Copyright 2015 Hitachi Consulting95 Stream Analytics Query Language Windowing Functions – Thumbing Window 1 5 4 26 Time (secs) 1 5 4 26 A20-secondTumbling Window Tumbling windows:  Repeat  non-overlapping  An event can belong to only one tumbling window
  92. 92. | © Copyright 2015 Hitachi Consulting96 Stream Analytics Query Language Windowing Functions – Thumbing Window 1 5 4 26 8 6 Time (secs) 1 5 4 26 8 6 A20-secondTumbling Window Tumbling windows:  Repeat  non-overlapping  An event can belong to only one tumbling window
  93. 93. | © Copyright 2015 Hitachi Consulting97 Stream Analytics Query Language Windowing Functions – Thumbing Window 1 5 4 26 8 6 5 Time (secs) 1 5 4 26 8 6 A20-secondTumbling Window 3 6 1 5 3 6 1 Tumbling windows:  Repeat  non-overlapping  An event can belong to only one tumbling window
  94. 94. | © Copyright 2015 Hitachi Consulting98 Stream Analytics Query Language Windowing Functions – Thumbing Window 1 5 4 26 8 6 5 Time (secs) 1 5 4 26 8 6 A20-secondTumbling Window 3 6 1 5 3 6 1 Tumbling windows:  Repeat  non-overlapping  An event can belong to only one tumbling window SELECT TollId, COUNT(*) FROM EntryStream TIMESTAMP BY EntryTime GROUP BY TollId, TumblingWindow(second, 20) Query: Count the total number of vehicles entering each toll booth every interval of 20 seconds. TumblingWindow(<time interval>, <window size>)
  95. 95. | © Copyright 2015 Hitachi Consulting99 Stream Analytics Query Language Windowing Functions – Hopping Window 1 5 4 26 A20-second Hopping Window with a 10 second “Hop” Hopping windows:  Repeat  Can overlap  Hop forward in time by a fixed period  Events can belong to more than one hopping window 1 5 4 26
  96. 96. | © Copyright 2015 Hitachi Consulting100 Stream Analytics Query Language Windowing Functions – Hopping Window 1 5 4 26 A20-second Hopping Window with a 10 second “Hop” Hopping windows:  Repeat  Can overlap  Hop forward in time by a fixed period  Events can belong to more than one hopping window 4 26 1 5 4 26
  97. 97. | © Copyright 2015 Hitachi Consulting101 Stream Analytics Query Language Windowing Functions – Hopping Window 1 5 4 26 8 6 A20-second Hopping Window with a 10 second “Hop” Hopping windows:  Repeat  Can overlap  Hop forward in time by a fixed period  Events can belong to more than one hopping window 4 26 8 6 1 5 4 26
  98. 98. | © Copyright 2015 Hitachi Consulting102 Stream Analytics Query Language Windowing Functions – Hopping Window 1 5 4 26 8 6 A20-second Hopping Window with a 10 second “Hop” Hopping windows:  Repeat  Can overlap  Hop forward in time by a fixed period  Events can belong to more than one hopping window 4 26 8 6 1 5 4 26 8 6 5 3 5 3
  99. 99. | © Copyright 2015 Hitachi Consulting103 Stream Analytics Query Language Windowing Functions – Hopping Window 1 5 4 26 8 6 A20-second Hopping Window with a 10 second “Hop” Hopping windows:  Repeat  Can overlap  Hop forward in time by a fixed period  Events can belong to more than one hopping window 4 26 8 6 5 3 6 1 1 5 4 26 8 6 5 3 6 15 3
  100. 100. | © Copyright 2015 Hitachi Consulting104 Stream Analytics Query Language Windowing Functions – Hopping Window 1 5 4 26 8 6 A20-second Hopping Window with a 10 second “Hop” Hopping windows:  Repeat  Can overlap  Hop forward in time by a fixed period  Events can belong to more than one hopping window SELECT COUNT(*), TollId FROM EntryStream TIMESTAMP BY EntryTime GROUP BY TollId, HoppingWindow (second, 20,10) 4 26 8 6 5 3 6 1 1 5 4 26 8 6 5 3 6 15 3 QUERY: Count the number of vehicles entering each toll booth every interval of 20 seconds; update results every 10 seconds HoppingWindow (<time interval>, <window size>, <hop size>)
  101. 101. | © Copyright 2015 Hitachi Consulting105 Stream Analytics Query Language Windowing Functions – Sliding Window 1 A20-second Sliding Window Sliding window:  Continuously moves forward by an ε (epsilon)  Produces an output only during the occurrence of a message  Every windows will have at least one event  Events can belong to more than one sliding window1
  102. 102. | © Copyright 2015 Hitachi Consulting106 Stream Analytics Query Language Windowing Functions – Sliding Window 1 5 A20-second Sliding Window Sliding window:  Continuously moves forward by an ε (epsilon)  Produces an output only during the occurrence of a message  Every windows will have at least one event  Events can belong to more than one sliding window1 5 1
  103. 103. | © Copyright 2015 Hitachi Consulting107 Stream Analytics Query Language Windowing Functions – Sliding Window 1 5 A20-second Sliding Window Sliding window:  Continuously moves forward by an ε (epsilon)  Produces an output only during the occurrence of a message  Every windows will have at least one event  Events can belong to more than one sliding window1 5 1 9 9 5 1
  104. 104. | © Copyright 2015 Hitachi Consulting108 Stream Analytics Query Language Windowing Functions – Sliding Window 1 5 A20-second Sliding Window Sliding window:  Continuously moves forward by an ε (epsilon)  Produces an output only during the occurrence of a message  Every windows will have at least one event  Events can belong to more than one sliding window1 8 8 5 1 9 9 5 1
  105. 105. | © Copyright 2015 Hitachi Consulting109 Stream Analytics Query Language Windowing Functions – Sliding Window 1 5 A20-second Sliding Window Sliding window:  Continuously moves forward by an ε (epsilon)  Produces an output only during the occurrence of a message  Every windows will have at least one event  Events can belong to more than one sliding window SELECT TollId, Count(*) FROM EntryStream ES GROUP BY TollId, SlidingWindow (second, 20) HAVING Count(*) > 10 Query: Find all the toll booths which have served more than 10 vehicles in the last 20 seconds 1 8 8 5 1 9 9 5 1 SlidingWindow (<time interval>, <window size>)
  106. 106. | © Copyright 2015 Hitachi Consulting110 Stream Analytics Query Language  Aggregation and filter: compute (sum., max., min., avg.) value over a time window. E.g. What is the average Maximum temperature and Average pressure read by the sensor in a 60 second window?  Counting unique values: count the number of unique field values that appear in the stream within a time window. E.g. How many unique make of cars passed through the toll booth in a 2 second window?  Determine if a value has changed: Look at a previous value to determine if it is different than the current value. E.g. Is the previous car on the Toll Road the same make as the current car?  Find first/last event in a window: Find first/last car in every 10 minute interval.  Detect the absence of events: Check that a stream has no value that matches a certain criteria. E.g. Have 2 consecutive cars from the same make entered the toll road within 90 seconds?  Detect duration between events: Find the duration of a given event. E.g. Given a web clickstream determine time spent on a feature.  Detect duration of a condition: Find out how long a condition occurred for. E.g. Suppose that a bug that resulted in all cars having an incorrect weight (above 20,000 pounds) – compute the duration of the bug.  Fill missing values: For the stream of events that have missing values, produce a stream of events with regular intervals. E.g. generate event every 5 seconds that will report the most recently seen data point. Useful SA Query Patterns https://azure.microsoft.com/en-gb/documentation/articles/stream-analytics-stream-analytics-query-patterns
  107. 107. | © Copyright 2015 Hitachi Consulting111 Apache Storm vs Azure Stream Analytics The face-off… Microsoft Azure Stream Analytics Documentation Feature Azure Stream Analytics Apache Strom on HDInsight Geared for Event Detection Stream Processing Open Source No – It is a Microsoft Azure Service Yes – it is Apache Service Type PaaS – Deploy, Execute and Monitor Jobs SaaS + - Provision HDInsight Storm Cluster Pricing You pay for the data/jobs You pay for the cluster Scalability Number of Streaming Units Number of nodes of the cluster Processing SQL Like query + Temporal operations + Azure Machine Learning (published models API calls) Java or C# (custom extensibility) Dev. Experience Azure Portal – Easy – Limited Visual Studio – More involved – Flexible Limitations No UDF, No Web API calls (coming soon) You need to Implement aggregations and temporal operation Input Data Source Azure Event Hubs and Azure Blobs Connectors (Event Hub, Service Bus, Kafka, custom) Input Data Format CSV, JSON Anything – Custom code is need to parse Output Data Sink Azure Event Hubs, Azure Blob Storage, Azure Tables, Azure SQL DB, DocumentDB, and PowerBI. PowerBI, Azure Event Hubs, Azure Blob Store, Azure DocumentDB, SQL DB, HBase, Custom Reference Data Azure Blobs with max size of 100 MB of in-memory lookup cache. No limits on data size. Connectors available for HBase, DocumentDB, SQL, custom
  108. 108. | © Copyright 2015 Hitachi Consulting112 How to Get Started with Stream Processing?  Read the slides!  MVA – Big Data Analytics with HDInsight: Hadoop on Azure https://mva.microsoft.com/en-US/training-courses/big-data-analytics-with-hdinsight-hadoop-on-azure-10551  MVA – Implementing Big Data Analysis https://mva.microsoft.com/en-US/training-courses/implementing-big-data-analysis-8311?l=44REr2Yy_5404984382  Azure Documentation – Storm on HDInsight https://azure.microsoft.com/en-gb/documentation/services/hdinsight/  Azure Documentation – EventHub https://azure.microsoft.com/en-gb/documentation/articles/event-hubs-overview/  Azure Documentation – Stream Analytics https://azure.microsoft.com/en-gb/documentation/services/stream-analytics/  Apache Storm https://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html O’Reliy Books– Getting Started with Storm
  109. 109. | © Copyright 2015 Hitachi Consulting113 DEMO
  110. 110. | © Copyright 2015 Hitachi Consulting114 Images Stream Temperature/ Pressure Consume Events Image Emotion Emotion Consume Emotion Events Consume Sensor Data Output to real-time dashboard Output to real-time dashboard
  111. 111. | © Copyright 2015 Hitachi Consulting115 My Background Applying Computational Intelligence in Data Mining • Honorary Research Fellow, School of Computing , University of Kent. • Ph.D. Computer Science, University of Kent, Canterbury, UK. • M.Sc. Computer Science , The American University in Cairo, Egypt. • 25+ published journal and conference papers, focusing on: – classification rules induction, – decision trees construction, – Bayesian classification modelling, – data reduction, – instance-based learning, – evolving neural networks, and – data clustering • Journals: Swarm Intelligence, Swarm & Evolutionary Computation, , Applied Soft Computing, and Memetic Computing. • Conferences: ANTS, IEEE CEC, IEEE SIS, EvoBio, ECTA, IEEE WCCI and INNS-BigData. ResearchGate.org
  112. 112. | © Copyright 2015 Hitachi Consulting116 Thank you!

×