Microsoft StreamInsight, part of the recent SQL Server 2008 R2 release, is a new platform for building rich applications that can process high volumes of event stream data with near-zero latency.
Mark Simms of Microsoft's SQLCAT will demonstrate the core skill sets and technologies needed to deliver StreamInsight enabled solutions, and discuss some of the core scenarios.
Mark will provide a detailed walkthrough of the three major components of StreamInsight: input and output adapters, the StreamInsight engine runtime, and the semantics of the continuous standing queries hosted in the StreamInsight engine.
This presentation includes hands-on demos, including building out a real-time data processing solution interacting with SQL Server and Sharepoint.
You will learn:
• The new capabilities StreamInsight brings to data processing and analytics, unlocking the ability to extract real time business intelligence from streaming data.
• How StreamInsight interacts with and compliments other components of SQL Server and the rest of the Microsoft technology stack.
• How to ramp up on the skills and technology necessary to build out end to end solutions leveraging streaming data sources.
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Microsoft StreamInsight
1. SQL Server 2008 R2
StreamInsight
Speaker: Mark Simms
Microsoft SQLCAT
Silicon Valley SQL Server User Group
May, 2010
Mark Ginnebaugh, User Group Leader,
mark@designmind.com
4. Load barrier is dictated by
current choices of the solution,
Facts/sec.
e.g., loading into databases,
persisting into files. This is
intrinsic because in current
approaches no processing can
be done till the data is loaded. Custom-built solutions that
carry huge development and 100000
customization costs
10000
Active DW analytics
1000
Traditional DW Analytics
100
years months days hrs min sec
Time of interest
Present
ET time in ETL Load time in ETL
5. Analytical results need to reflect important changes in
business reality immediately and enable responses to them
with minimal latency
Database Applications Event-driven Applications
Query Ad-hoc queries or Continuous standing
Paradigm requests queries
Latency Seconds, hours, days Milliseconds or less
Data Rate Hundreds of events/sec Tens of thousands of
events/sec or more
Query Declarative relational Declarative relational and
Semantics analytics temporal analytics
request Event
output
stream
input
stream
response
5
6. Latency
Months
StreamInsight
Days Target Scenarios
hours Relational Database Applications Operational Analytics
Applications, e.g., Logistics,
Minutes Data Warehousing etc.
Applications
Seconds
Web Analytics Applications
100 ms Monitoring Manufacturing
Financial trading
Applications Applications
< 1ms
Applications
0 10 100 1000 10000 100000 ~1million
Aggregate Data Rate (Events/sec.)
6
7. Manufacturing: Web Analytics: Financial Services: Power Utilities:
• Sensor on plant • Click-stream data • Stock & news feeds • Energy
floor • Online customer • Algorithmic trading consumption
• React through behavior • Patterns over time • Outages
device controllers • Page layout • Super-low latency • Smart grids
• Aggregated data • 100,000 events /sec • 100,000 events /sec • 100,000 events/sec
• 10,000 events/sec
Asset Instrumentation for Data Acquisition, Subscriptions to Data Feeds
Data Stream
Data Stream
Visual trend-line and KPI monitoring
Batch & product management
Automated anomaly detection
Real-time customer segmentation
Algorithmic trading
Proactive condition-based maintenance
Asset Specs & StreamInsight Engine
Parameters
• Threshold queries
Stream Data Store & • Event correlation from
Archive Lookup multiple sources
• Pattern queries
7
8. StreamInsight
Industry trends
advantage
• Data acquisition
costs are
Manage • Process data
Monitor KPIs incrementally,
negligible business via
Record raw i.e., while it is in
KPI-triggered
• Raw storage costs data (history) flight
are small and
actions
• Avoid loading
continue to while still doing
decrease the processing
you want
• Processing costs
are non-negligible Mine historical data • Seamless
querying for
• Data loading Devise new KPIs
monitoring,
costs continue to managing and
be significant mining
8
21. Tell me the just the color of each car that passes.
var result = from car in carStream
select new
{
car.Color
};
22. Give me only trucks.
var result = from car in carStream
where car.Type == “Truck”
select car;
23. Tell me the number of cars passed
every 10 seconds.
var result = from win in carStream.TumblingWindow(
TimeSpan.FromSeconds(10))
select new
{
count = win.Count()
};
24. var result = from win in carStream.TumblingWindow(
TimeSpan.FromSeconds(10))
select new
{
count = win.Count()
};
25.
26. Count the number of cars for each make
separately every 10 seconds.
var result = from car in carStream
group car by car.make into eachGroup
from win in carStream.TumblingWindow(
TimeSpan.FromSeconds(10))
select new
{
make = eachGroup.Key,
count = win.Count()
};
30. public void EnqueueEvent(SourceData d)
{
var ev = CreateInsertEvent();
ev.Payload = new MouseEvent { Id = d.id, Value = d.value };
ev.StartTime = d.timestamp;
Enqueue(ref ev);
}
31. public void EnqueueEvent(SourceData d)
{
if AdapterState
return
var ev = CreateInsertEvent();
ev.Payload = new MouseEvent { Id = d.id, Value = d.value };
ev.StartTime = d.timestamp;
Enqueue(ref ev);
}
32. public void EnqueueEvent(SourceData d)
{
if AdapterState
return
var ev = CreateInsertEvent();
if (ev == null) return;
ev.Payload = new MouseEvent { Id = d.id, Value = d.value };
ev.StartTime = d.timestamp;
Enqueue(ref ev);
}
33. public void EnqueueEvent(SourceData d)
{
if AdapterState
return
var ev = CreateInsertEvent();
if (ev == null) return;
ev.Payload = new MouseEvent { Id = d.id, Value = d.value };
ev.StartTime = d.timestamp;
if (Enqueue(ref ev) == EnqueueOperationResult.Full)
{
Ready();
return;
}
}