AZURE STREAM
ANALYTICS
DAVIDE MAURI
@mauridb
dmauri@solidq.com
• Microsoft SQL Server MVP
• Works with SQL Server from 6.5, on BI from 2003
• Specialized in Data Solution Architecture, Database Design,
Performance Tuning, High-Performance Data Warehousing, BI, Big
Data
• President of UGISS (Italian SQL Server UG)
• Regular Speaker @ SQL Server events
• Consulting & Training, Mentor @ SolidQ
• E-mail: dmauri@solidq.com
• Twitter: @mauridb
• Blog: http://sqlblog.com/blogs/davide_mauri
Davide Mauri
• COMPLEX EVENT PROCESSING
• LAMBDA ARCHITECTURE
• AZURE STREAM ANALYTICS
• DATA INGESTION
• AZURE STREAM ANALYTICS QUERY LANGUAGE
• ADVANCED FEATURES
• ADDITIONAL RESOURCES
COMPLEX EVENT PROCESSING
•Event processing is a method of tracking and analyzing
(processing) streams of information (data) about things that
happen (events)
•Complex event processing, or CEP, is event processing that
combines data from multiple sources to infer events or
patterns that suggest more complicated circumstances.
• Start to appear in 1990
• Goal: identify meaningful events (such as opportunities or
threats) and respond to them as quickly as possible
EVENT PROCESSING USE CASES
• Network monitoring
• Intelligence and surveillance
• Risk management
• E-commerce
• Fraud detection
• Smart order routing
• Transaction cost analysis
• Pricing and analytics
• Market data management
• Algorithmic trading
• Data warehouse augmentation
REAL TIME USE CASES
http://www.digital4.biz
LAMBDA ARCHITECTURE
Generic, scalable and fault-tolerant data processing architecture […]
in which low-latency reads and updates are required.
http://lambda-architecture.net/
HADOOP BUT NOT ONLY THAT!
•Apache Hadoop Ecosystem is the typical solution nowadays
• “Mature” Option
• Flume (optional collector and streaming data movement system)
• Kafka (distributed messaging system)
• Storm (distributed real-time computation system)
• “Innovative” Option
• Spark + Spark Streaming
•Very powerful, but very complex
WHY AZURE?
•Due to the high scalability and computing power that a
streaming solution may require, the cloud is a perfect
environment for it
•Very cheap and Very Simple to start a project
•Very well integrated with all other Azure offerings
• From Monitoring to Power BI
STREAM ANALYTICS
•Real-Time (somehow) complex event processing engine
•Enables real-time event processing in a very simple and
cheap way
• SQL-Like language
• Temporal Semantic Support
• (but different from SQL Server 2016)
•Azure Only at present time
STREAM ANALYTICS
•Platform-as-a-Service
• Can handle millions of events per second
• Based on the REEF project (now Apache incubated)
•Main objects: Job, Query, Functions, Input & Outputs
• Totally manageable from a REST interface
•“Streaming Units” is the base concept to manage
performance, scalability and costs
• Roughly 1 Streaming Units = 1 MB/Sec of throughput
DATA INGESTION
•Inputs for Stream Analytics
• Streaming Sources (“Data in motion”)
• JSON, CSV or AVRO
• Reference Data (“Data at rest”)
• JSON or CSV
• Blob Store (max 50MB)
•Streaming Sources
• Event Hubs
• IoT Hub
DEMO
STREAM ANALYTICS QUERY ENGINE
•Take date from one or more input
•Send resulting data to one or more output
•Support most common data types:
• bigint, float, unicode strings, datetime
• key-value pairs
• arrays
STREAM ANALYTICS QUERY LANGUAGE
•Stream Analytics Query Language Reference
• https://msdn.microsoft.com/library/azure/dn834998.aspx
•Subset of T-SQL
•With specific temporal extension
• Time values to be used can be set using TIMESTAMP BY directive
STREAM ANALYTICS QUERY LANGUAGE
DML Statements
• SELECT
• FROM
• WHERE
• GROUP BY
• HAVING
• CASE
• JOIN
• UNION
Windowing Extensions
• Tumbling Window
• Hopping Window
• Sliding Window
• Duration
Aggregate Functions
• SUM
• COUNT
• AVG
• MIN
• MAX
Scaling Functions
• WITH
• PARTITION BY
Date and Time Functions
• DATENAME
• DATEPART
• DAY
• MONTH
• YEAR
• DATETIMEFROMPARTS
• DATEDIFF
• DATADD
String Functions
• LEN
• CONCAT
• CHARINDEX
• SUBSTRING
• PATINDEX
Statistical Functions
• VAR
• VARP
• STDEV
• STDEVP
ADVANCED FEATURES
•Partitioning Support
• Specially useful for high scalability
•CTE-Like constructs that also helps scaling out
•Temporal aggregations
• Tumbling, Hopping and Sliding Windows
•Join between input streams
DEMO
STREAM ANALYTICS AND MACHINE LEARNING
•Apply AzureML model to streaming data
•Sample use-cases
• Fraud Detection
• Product Recommendation
• Customer Sentiment Analysis
•Right now in preview and available only through the “old”
portal
• https://manage.windowsazure.com/
DEMO
STREAM ANALYTICS ALTERNATIVE (ON AZURE)
•Apache Storm
•IaaS and not PaaS
•Much more complex to manage and develop…but much
more powerful
• https://azure.microsoft.com/en-
us/documentation/articles/stream-analytics-comparison-storm/
STREAM ANALYTICS ON-PREMISES?
•Apache Hadoop Ecosystem
• Flume / Kafka / Storm
•StreamInsight
• CEP solution part of the SQL Server Platform
•EventStore
• Javascript OpenSource CEP
•None of them has native temporal extension
ADDITIONAL RESOURCES
•Online Documentation
•Stream Analytics Reference Architecture
•Lambda Architecture
•GitHub Repository
QUESTIONS & ANSWERS
TO DO LIST
Date il vostro feedback: http://aka.ms/deveval

Seguite www.azurecommunity.it
Riguardate i video su Channel 9

Azure Stream Analytics

  • 2.
  • 3.
    • Microsoft SQLServer MVP • Works with SQL Server from 6.5, on BI from 2003 • Specialized in Data Solution Architecture, Database Design, Performance Tuning, High-Performance Data Warehousing, BI, Big Data • President of UGISS (Italian SQL Server UG) • Regular Speaker @ SQL Server events • Consulting & Training, Mentor @ SolidQ • E-mail: dmauri@solidq.com • Twitter: @mauridb • Blog: http://sqlblog.com/blogs/davide_mauri Davide Mauri
  • 4.
    • COMPLEX EVENTPROCESSING • LAMBDA ARCHITECTURE • AZURE STREAM ANALYTICS • DATA INGESTION • AZURE STREAM ANALYTICS QUERY LANGUAGE • ADVANCED FEATURES • ADDITIONAL RESOURCES
  • 5.
    COMPLEX EVENT PROCESSING •Eventprocessing is a method of tracking and analyzing (processing) streams of information (data) about things that happen (events) •Complex event processing, or CEP, is event processing that combines data from multiple sources to infer events or patterns that suggest more complicated circumstances. • Start to appear in 1990 • Goal: identify meaningful events (such as opportunities or threats) and respond to them as quickly as possible
  • 6.
    EVENT PROCESSING USECASES • Network monitoring • Intelligence and surveillance • Risk management • E-commerce • Fraud detection • Smart order routing • Transaction cost analysis • Pricing and analytics • Market data management • Algorithmic trading • Data warehouse augmentation
  • 7.
    REAL TIME USECASES http://www.digital4.biz
  • 8.
    LAMBDA ARCHITECTURE Generic, scalableand fault-tolerant data processing architecture […] in which low-latency reads and updates are required. http://lambda-architecture.net/
  • 9.
    HADOOP BUT NOTONLY THAT! •Apache Hadoop Ecosystem is the typical solution nowadays • “Mature” Option • Flume (optional collector and streaming data movement system) • Kafka (distributed messaging system) • Storm (distributed real-time computation system) • “Innovative” Option • Spark + Spark Streaming •Very powerful, but very complex
  • 10.
    WHY AZURE? •Due tothe high scalability and computing power that a streaming solution may require, the cloud is a perfect environment for it •Very cheap and Very Simple to start a project •Very well integrated with all other Azure offerings • From Monitoring to Power BI
  • 11.
    STREAM ANALYTICS •Real-Time (somehow)complex event processing engine •Enables real-time event processing in a very simple and cheap way • SQL-Like language • Temporal Semantic Support • (but different from SQL Server 2016) •Azure Only at present time
  • 12.
    STREAM ANALYTICS •Platform-as-a-Service • Canhandle millions of events per second • Based on the REEF project (now Apache incubated) •Main objects: Job, Query, Functions, Input & Outputs • Totally manageable from a REST interface •“Streaming Units” is the base concept to manage performance, scalability and costs • Roughly 1 Streaming Units = 1 MB/Sec of throughput
  • 13.
    DATA INGESTION •Inputs forStream Analytics • Streaming Sources (“Data in motion”) • JSON, CSV or AVRO • Reference Data (“Data at rest”) • JSON or CSV • Blob Store (max 50MB) •Streaming Sources • Event Hubs • IoT Hub
  • 14.
  • 15.
    STREAM ANALYTICS QUERYENGINE •Take date from one or more input •Send resulting data to one or more output •Support most common data types: • bigint, float, unicode strings, datetime • key-value pairs • arrays
  • 16.
    STREAM ANALYTICS QUERYLANGUAGE •Stream Analytics Query Language Reference • https://msdn.microsoft.com/library/azure/dn834998.aspx •Subset of T-SQL •With specific temporal extension • Time values to be used can be set using TIMESTAMP BY directive
  • 17.
    STREAM ANALYTICS QUERYLANGUAGE DML Statements • SELECT • FROM • WHERE • GROUP BY • HAVING • CASE • JOIN • UNION Windowing Extensions • Tumbling Window • Hopping Window • Sliding Window • Duration Aggregate Functions • SUM • COUNT • AVG • MIN • MAX Scaling Functions • WITH • PARTITION BY Date and Time Functions • DATENAME • DATEPART • DAY • MONTH • YEAR • DATETIMEFROMPARTS • DATEDIFF • DATADD String Functions • LEN • CONCAT • CHARINDEX • SUBSTRING • PATINDEX Statistical Functions • VAR • VARP • STDEV • STDEVP
  • 18.
    ADVANCED FEATURES •Partitioning Support •Specially useful for high scalability •CTE-Like constructs that also helps scaling out •Temporal aggregations • Tumbling, Hopping and Sliding Windows •Join between input streams
  • 19.
  • 20.
    STREAM ANALYTICS ANDMACHINE LEARNING •Apply AzureML model to streaming data •Sample use-cases • Fraud Detection • Product Recommendation • Customer Sentiment Analysis •Right now in preview and available only through the “old” portal • https://manage.windowsazure.com/
  • 21.
  • 22.
    STREAM ANALYTICS ALTERNATIVE(ON AZURE) •Apache Storm •IaaS and not PaaS •Much more complex to manage and develop…but much more powerful • https://azure.microsoft.com/en- us/documentation/articles/stream-analytics-comparison-storm/
  • 23.
    STREAM ANALYTICS ON-PREMISES? •ApacheHadoop Ecosystem • Flume / Kafka / Storm •StreamInsight • CEP solution part of the SQL Server Platform •EventStore • Javascript OpenSource CEP •None of them has native temporal extension
  • 24.
    ADDITIONAL RESOURCES •Online Documentation •StreamAnalytics Reference Architecture •Lambda Architecture •GitHub Repository
  • 25.
  • 27.
    TO DO LIST Dateil vostro feedback: http://aka.ms/deveval  Seguite www.azurecommunity.it Riguardate i video su Channel 9

Editor's Notes

  • #7 Ref: http://www.infoq.com/articles/stream-processing-hadoop
  • #13 https://azure.microsoft.com/en-us/documentation/articles/stream-analytics-scale-jobs/
  • #15 Simple Setup of Event Hubs, Source and Destination
  • #20 Full Demo: Stream + Reference Data Windows Functions Tumbling Hopping Sliding
  • #21 Customer Sentiment Analysis: now that companies are offering support also via Twitter this becomes more and more important