Petabytes for Peanuts! Making sense of “Ambient Data” SQL Server Stream Insight<br />Ing. Eduardo Castro, PhD<br />Comunid...
Key Takeaways…<br />Massive shift in how we process data<br />Incredible data volumes<br />Remaking how we discover<br />C...
1997<br />Storage Cost: $~1.00<br />Transfer Time: ½ hour<br />2009<br />Storage Cost: ~0.1₵<br />Transfer Time: 8 sec.<br...
Ambient Data?<br />Over 84 percent of Americans have cell phones, according to Steve Largent, president and CEO of CTIA. W...
The Old World<br />Data volumes constrained by human typing speed<br />App & Data formed closed system<br />App<br />Assum...
The Old New World<br />Available data exploded<br />Available Data<br />Questions toAnswer<br />What data shouldwe throw o...
The New World of Abundant Data<br />Save All Available Data<br />Hypothesize  Theorize  Test<br />New Question to Answer...
Analyze  Model  Monitor<br />1<br />Event Stream both stored and processed<br />Event Processing<br />Engine<br />4<br /...
Extreme Scale Data Processing<br />Source<br />DW<br />Traditional Data Warehouse<br />Source<br />Source<br />ETL<br />So...
SQL Server 2008 R2 – StreamInsight Technology<br />Data volumes are exploding with event data streaming from sources such ...
SQL Server StreamInsight’s<br />SQL Server StreamInsight’s ability to derive insights from data streams and act in near re...
.NET<br />C#<br />LINQ<br />StreamInsight Application Development<br />StreamInsight Application at Runtime<br />Event sou...
Events<br />Represent the user payload along with temporal characteristics<br />Streams<br />Sequence of events<br />Flows...
Event<br />Complex Event Processing (CEP) is the continuous and incremental processing of event streams from multiple sour...
Latency<br />Relational Database Applications<br />CEP Target Scenarios<br />Operational Analytics Applications, Logistics...
Use Case: Customer Segmentation<br />Analysis of Click Streams on MSN.com<br />Web Server log streamed into StreamInsight<...
Use Case: NBC Sunday Night Football<br />1<br />Telemetry Receiver<br />4<br />StreamInsight<br />Listener Adapter<br />Ge...
Use Case: Data Center<br />Power Consumption<br />Visualize<br />Process Information<br />Complex Aggregations/<br />Corre...
ChallengesHow do I …<br />detect interesting patterns?<br />reason about temporal semantics?<br />correlate data?<br />agg...
Query Expressiveness<br />Selection of events (filter)<br />Calculations on the payload (project)<br />Correlation of stre...
Projection<br />Filter<br />Correlation (Join)<br />Aggregation over windows<br />Group and Aggregate<br />Query Expressiv...
Conclusion<br />CEP Platform & API<br />Event-triggered, fast Computation<br />API for Adapters, Queries, Applications<br ...
Q&A<br />
Links<br />http://comunidadwindows.org<br />http://ecastrom.blogspot.com<br />http://www.microsoft.com/sql<br />
Upcoming SlideShare
Loading in...5
×

SQL Server 2008 R2 StreamInsight

2,203

Published on

In this presentation we review the basic architecture behind SQL Server StreamInsight.

Regards,

Ing. Eduardo Castro Martínez, PhD – Microsoft SQL Server MVP
http://mswindowscr.org
http://comunidadwindows.org
Costa Rica

Technorati Tags: SQL Server
LiveJournal Tags: SQL Server
del.icio.us Tags: SQL Server

http://ecastrom.blogspot.com
http://ecastrom.wordpress.com
http://ecastrom.spaces.live.com
http://universosql.blogspot.com
http://todosobresql.blogspot.com
http://todosobresqlserver.wordpress.com
http://mswindowscr.org/blogs/sql/default.aspx
http://citicr.org/blogs/noticias/default.aspx
http://sqlserverpedia.blogspot.com/

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,203
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Data volumes are exploding with event data streaming from sources such as RFID, sensors and web logs across industries including manufacturing, financial services and utilities.  The size and frequency of the data make it challenging to store for data mining and analysis.  The ability to monitor, analyze and act on the data in motion provides significant opportunity to make more informed business decisions in near real-time
  • NBC Sunday Night Football: live streaming through SilverlightRich client experience, multiple camera anglesNeeded: track, monitor, analyze user behavior, based on silverlight Media analytics
  • SQL Server 2008 R2 StreamInsight

    1. 1. Petabytes for Peanuts! Making sense of “Ambient Data” SQL Server Stream Insight<br />Ing. Eduardo Castro, PhD<br />Comunidad Windows<br />ecastro@grupoasesor.net<br />http://ecastrom.blogspot.com<br />
    2. 2. Key Takeaways…<br />Massive shift in how we process data<br />Incredible data volumes<br />Remaking how we discover<br />Changing the Scientific Method<br />Reducing latency & impedance<br />Extreme Scale Data Processing<br />Stream Processing (Several Views)<br />From “programs” to “queries”<br />What’s up with this “anti-SQL” stuff anyhow?<br />
    3. 3. 1997<br />Storage Cost: $~1.00<br />Transfer Time: ½ hour<br />2009<br />Storage Cost: ~0.1₵<br />Transfer Time: 8 sec.<br />1982<br />Storage Cost: $~2000<br />Transfer Time: 1 day<br />“Free” Storage Power<br />
    4. 4. Ambient Data?<br />Over 84 percent of Americans have cell phones, according to Steve Largent, president and CEO of CTIA. While two trillion minutes were used in 2007, an 18 percent increase over 2006 talk times. <br />More than 48 billion text messages were sent in the month of December 2007, an average 1.6 billion messages per day. The rate of text messaging represented a 157 percent increase over December 2006 texting. <br />http://www.clickz.com/3628985<br />Text Message Traffic in US: 160GB / day  58TB / year<br />Voice traffic in US (GSM encoding)<br /> 200PB / year<br />
    5. 5. The Old World<br />Data volumes constrained by human typing speed<br />App & Data formed closed system<br />App<br />Assume 200M people in US typing 8 hr / day @ 10K keystokes / hour:<br /> 2TB/hror ~6PB / year<br />DB<br />
    6. 6. The Old New World<br />Available data exploded<br />Available Data<br />Questions toAnswer<br />What data shouldwe throw out?<br />Design Schema<br />Design ETL<br />What if we have<br />a new question?<br />DW Nirvana!<br />
    7. 7. The New World of Abundant Data<br />Save All Available Data<br />Hypothesize  Theorize  Test<br />New Question to Answer<br />AlgorithmicProcessing<br />Run “query”<br />over data…<br />Exploit<br />Correlation…<br />Correlation is<br />Enough!<br />Analyze reduced data<br />The CMS front end of the Large Hadron Collider records 1TB/sec!<br />http://blogs.discovermagazine.com/cosmicvariance/2006/09/27/lhc-factoids/<br />Interesting Read: The Petabyte Age: Because More Isn't Just More — More Is Different<br />http://www.wired.com/science/discoveries/magazine/16-07/pb_intro<br />
    8. 8. Analyze  Model  Monitor<br />1<br />Event Stream both stored and processed<br />Event Processing<br />Engine<br />4<br />Produce real time alerts and action<br />Event Stream<br />Alerts & Action<br />3<br />Models installed in event processing engine<br />Correlation Model<br />2<br />Analysis produces event correlation models<br />Analysis<br />
    9. 9. Extreme Scale Data Processing<br />Source<br />DW<br />Traditional Data Warehouse<br />Source<br />Source<br />ETL<br />Source<br />Source<br />Analysis / Reporting<br />Source<br />Source<br />Extreme ScaleData Processing<br />DW<br />Non-traditional<br />Sources<br />1<br />2<br />Majority of data filtered or discarded<br />All data retained and reprocessed<br />Analysis / Reporting<br />Analysis<br />
    10. 10. SQL Server 2008 R2 – StreamInsight Technology<br />Data volumes are exploding with event data streaming from sources such as RFID, sensors and web logs <br />The size and frequency of the data make it challenging to store for data mining and analysis. <br />The ability to monitor, analyze and take business decisions in near real-time<br />
    11. 11. SQL Server StreamInsight’s<br />SQL Server StreamInsight’s ability to derive insights from data streams and act in near real time provides significant business benefits. Some of the possible scenarios include:<br /> Algorithmic trading and fraud detection for financial services<br /> Industrial process control (chemicals, oil and gas) for manufacturing<br /> Electric grid monitoring and advanced metering for utilities<br />Click stream web analytics<br />Network and data center system monitoring.<br />
    12. 12. .NET<br />C#<br />LINQ<br />StreamInsight Application Development<br />StreamInsight Application at Runtime<br />Event sources<br />Event targets<br />Input<br />Adapters<br />Output<br />Adapters<br />StreamInsight Engine<br />Devices, Sensors<br />Pagers &<br />Monitoring devices<br />Standing Queries<br />KPI Dashboards, SharePoint UI<br />Web servers<br />Query Logic<br />Query Logic<br />Trading stations<br />Event stores & Databases<br />Query Logic<br />Event stores & Databases<br />Stock ticker, news feeds<br />StreamInsight Platform<br />
    13. 13.
    14. 14. Events<br />Represent the user payload along with temporal characteristics<br />Streams<br />Sequence of events<br />Flows into (one or more) standing queries in StreamInsightengine<br />Queries<br />Operate on event streams<br />Apply desired semantics on events<br />Adapters<br />Convert custom data from event sources to / from StreamInsight events<br />Key Concepts<br />
    15. 15. Event<br />Complex Event Processing (CEP) is the continuous and incremental processing of event streams from multiple sources based on declarative query and pattern specifications with near-zero latency. <br />request<br />output stream<br />input stream<br />response<br />What is CEP?<br />
    16. 16. Latency<br />Relational Database Applications<br />CEP Target Scenarios<br />Operational Analytics Applications, Logistics, etc.<br />Data Warehousing Applications<br />Web Analytics Applications<br />Manufacturing Applications<br /> Financial Trading Applications<br />Monitoring Applications<br />Aggregate Data Rate (Events/sec)<br />Event Processing Scenarios<br />
    17. 17. Use Case: Customer Segmentation<br />Analysis of Click Streams on MSN.com<br />Web Server log streamed into StreamInsight<br />Categorizing user behavior based on URL:<br />Click targets<br />Search keywords<br />Segmentation of user IDs into markets<br />Adapting navigational structure and ad placement in real time<br />Patterns over time windows: user first clicks PageA, then PageB, then PageC within X seconds<br />High performance requirements<br />Millions of online users<br />Low latency (seconds)<br />Possible late events<br />
    18. 18.
    19. 19. Use Case: NBC Sunday Night Football<br />1<br />Telemetry Receiver<br />4<br />StreamInsight<br />Listener Adapter<br />GeoTag and group by region<br />SQL Adapter<br />PerfCounter Adapter<br />2<br />Count total events<br />Count session starts<br />Count active sessions<br />3<br />
    20. 20. Use Case: Data Center<br />Power Consumption<br />Visualize<br />Process Information<br />Complex Aggregations/<br />Correlations<br />Central<br />time series<br />archive<br />Query<br />ETW<br />Input Adapter<br />Query<br />2<br />1<br />Query<br />Power Meter<br />Input Adapter<br />3<br />
    21. 21. ChallengesHow do I …<br />detect interesting patterns?<br />reason about temporal semantics?<br />correlate data?<br />aggregate data?<br />avoid writing custom imperative code?<br />create a runtime environment for continuous and event-driven processing?<br /> As a developer, I need a platform!<br />
    22. 22. Query Expressiveness<br />Selection of events (filter)<br />Calculations on the payload (project)<br />Correlation of streams (join)<br />Stream partitioning (group and apply)<br />Aggregation (sum, count, …) over event windows<br />Ranking over event windows (topK) <br />
    23. 23. Projection<br />Filter<br />Correlation (Join)<br />Aggregation over windows<br />Group and Aggregate<br />Query Expressiveness<br />var result = from e ininputStream<br />group e by e.id intoeachGroup<br />from win ineachGroup.TumblingWindow(<br />TimeSpan.FromSeconds(10))<br />selectnew { eachGroup.Key,<br />avg = win.Avg(e => e.W) };<br />
    24. 24. Conclusion<br />CEP Platform & API<br />Event-triggered, fast Computation<br />API for Adapters, Queries, Applications<br />Declarative LINQ<br />Flexible Adapter API<br />Extensible<br />Supportability<br />
    25. 25. Q&A<br />
    26. 26. Links<br />http://comunidadwindows.org<br />http://ecastrom.blogspot.com<br />http://www.microsoft.com/sql<br />

    ×