Your SlideShare is downloading. ×
0
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Cloud connect 03 08-2011
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Cloud connect 03 08-2011

1,295

Published on

Published in: Technology, Business
1 Comment
4 Likes
Statistics
Notes
  • I gave this presentation at CloudConnect this past year. It talks about some big data, low latency use cases and highlights a distributed, streaming map/reduce architecture and also the SAX algorithm. Enjoy and comments always welcome!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
1,295
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
32
Comments
1
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Cloud Event Processing Analyze ∙ Sense ∙ Respond CloudConnect March 8, 2011
  • 2. Welcome • High Velocity Big Data • What is Complex Event Processing? • Analyzing Time Series with SAX • What is Map/Reduce? • Correlating with Historical Data • Using the Cloud • QuestionsCLOUDEVENTPROCESSING
  • 3. Data Growth* 18 16 14 12 10 8 6 4 2 0 Category 1 Category 2 Category 3 Category 4CLOUD *It would appear that things will actually get worse, not betterEVENTPROCESSING
  • 4. High Velocity Big Data • What is Big Data? – You’ve got Big Data issues when you can’t turn the data into information fast enough to act on: • Earthquake • Brownout • Market Crash • Terrorist Event – You’ve got Big Data when you have to consider its actually Physicality • What is High Velocity Big Data – Big Data In Flight… • You don’t get to store it before you analyze itCLOUDEVENTPROCESSING
  • 5. What is Complex Event Processing? • Complex Event Processing (CEP) delivers high- speed processing of many events across all the layers of an organization, identifying only the most meaningful events within the event cloud, analyzing their impact, and taking subsequent action in real time. – From WikipediaCLOUDEVENTPROCESSING
  • 6. What? What is CEP? • Domain Specific Language – Makes it easier to deal with events • Continuous Query – Select symbol, side, price from tradeStream • Time/Length Windows – Select symbol, side, avg(price) from tradeStream.win:time(10 minutes) group by symbol, side • Pattern Matching – select a.* from pattern [every a=FIXNewOrderSingle -> (timer:interval(30 seconds) and not FIXNewOrderSingle(a.Side!=Side and a.OrderQty = OrderQty and a.Symbol = Symbol))]CLOUDEVENTPROCESSING
  • 7. Wouldn’t It Be Cool • Select * from everything where itsInteresting = toMe in last 10 minutes; • Select * from everything where earthQuake > .8; • Select * from everything where terroristsWillStrike > .9;CLOUDEVENTPROCESSING
  • 8. CEP – Current Benefits* • Really Fast! • Low Latency! • Provides a ‘ready made’ framework to build real-time pattern matching applications • Think at a higher level – Productivity *your mileage may vary, widelyCLOUDEVENTPROCESSING
  • 9. CEP – Current Limitations • Memory Bound – If you have a lot of events and windows, you risk running out of memory on a single machine • Compute Bound – To ensure high throughput and low latency, most CEP engines are actually doing simplistic things • e.g. Filtering events • Black Box – What’s going on in there?CLOUDEVENTPROCESSING
  • 10. Checkpoint • Ok, so by using Complex Event Processing – You can analyze data in flight – But • You’re constrained by: – Available compute – Memory • Because, there’s still too much data to process on one machine…CLOUDEVENTPROCESSING
  • 11. The Problem With Time Series • Dimensionality – How can I recognize something? • Distance Measures – How do I find similar occurrences? • Time – By the time I process the data, the information has little value…CLOUDEVENTPROCESSING
  • 12. Symbolic Aggregate Approximation SAX Encoding • SAX reduces numerical data to a short string, or SAX word. c c c • Thousands of data points of b b numerical, continuous data b becomes ‘ABCEDEFGH’ - a a 0 20 40 60 80 100 120 • SAX Approximation of the data fits in main memory, yet retains features of interest baabccbc • Creating SAX words from SAX Advantages: historical and streaming data • Patterns identified and described using SAX actually look like the underlying data allows us to perform all kinds of magic… • Other algorithms sometimes don’t actually describeCLOUD the underlying patterns or take way too much work toEVENT be useful in real timePROCESSING
  • 13. SAX – 5 Use Cases • Indexing – Given a time series, find similar time series in the database • Clustering – Find natural grouping in the time series • Classification – Automagically sort patterns found in time series into categories • Summarization – Condense verbose data into meaningful information • Anomaly Detection – Find surprising, interesting, or unexpected behaviorCLOUDEVENTPROCESSING
  • 14. Why SAX is Cool • Lower Bounding – The patterns identified and described using SAX actually look like the underlying data • Dimensionality Reduction – Previously intractable problems become possible in real time • Other algorithms sometimes don’t describe underlying patterns • Take way too much work to be useful in real timeCLOUDEVENTPROCESSING
  • 15. A Day’s Worth of IBMCLOUDEVENTPROCESSING
  • 16. Normalized & PAA AppliedCLOUDEVENTPROCESSING
  • 17. And Finally, SAX G F E E D D C C C C B B ACLOUDEVENTPROCESSING EDDCCBC
  • 18. Checkpoint • We’ve reduced dimensionality • We know were we are – The current pattern is AABASDGF • We’re calculating it in ‘real-time’* – Using Complex Event Processing • But – There’s still too much data to process on one machine… • How can we process more data in the same amount of time?CLOUDEVENTPROCESSING *I much prefer the term event-driven
  • 19. What is Map/Reduce? • Framework for processing ginormous datasets using a large number of computers (nodes) in a cluster. • "Map" Master node takes the input, chops it up into smaller sub- problems, and distributes those to worker nodes. The worker node processes that smaller problem, and passes the answer back to its master node. • "Reduce" Takes the answers to all the sub-problems and combines them in a way to get the output - the answer to the problem it was originally trying to solve. – From WikipediaCLOUDEVENTPROCESSING
  • 20. What? What is Map/Reduce? • WordCount Example (classic) – Map scans text for words and emits - {word,1} – Combine/collapses key values on same node - {word,1,1,1} -> {word,3} – Shuffle/Sort merges results from different nodes • {node A,”NoSQL”,50} {node B,:”Oracle”,50} {node B,”NoSQL”, 50) – becomes • {node A,”NoSQL”,50} {node B,”NoSQL”,50} {node B,”Oracle”,50} – Reduce • Outputs {“NoSQL”,100} {“Oracle”,50}CLOUDEVENTPROCESSING
  • 21. SAX and Map/Reduce • SAX is an ‘embarrassingly parallel’ problem • Using parallel processing allows SAX words to be computed more quickly • Using Streaming Map/Reduce provides results even faster, increasing the value of data even more – Partition by symbol and sort by timestamp – Calculate SAX words for each symbol, in parallel • CEP Time Windows to the Rescue!CLOUDEVENTPROCESSING
  • 22. Checkpoint • CEP is great, but I still have to tell it what I’m looking for, right? • SAX can help us reduce dimensionality, what else can it do for us? • How do I relate Streaming Data to Historical Data? • How do I do this while the Information still has value?CLOUDEVENTPROCESSING
  • 23. High Velocity Big Data Pattern Historical Map Events Map Events Reduce Map Map Events OnRamp Events Map SAX Reduce Context MapCLOUDEVENTPROCESSING
  • 24. So What Do We Need? • Complex Event Processing • The Algorithm (SAX) • Processing Model – Streaming Map/Reduce • Context – The Historical Aspect • What Do We Call This?CLOUDEVENTPROCESSING
  • 25. What is DarkStar? – Platform as a Service (PaaS) • Provides Distributed – Complex Event Processing – Streaming Map/Reduce – Messaging – Web Services – Monitoring/Management – Applications are built on top, or inside • SAX runs inside of DarkStar – SAX is not a component of DarkStar, but an add-in library – And deployed in a cluster • Virtualized ResourcesCLOUDEVENTPROCESSING
  • 26. DarkStar • What patterns are occurring in my data, right now? – CEP based streaming Map/Reduce • Use a cluster of machines • When did this pattern happen before? – Database with embedded Map/Reduce • No need to move data outside the database for processingCLOUDEVENTPROCESSING
  • 27. The Cloud • Elastic Resource – Grows/Shrinks according to demand • Virtualization – Efficient utilization of compute • The Previously Unthinkable – Is now possible, if not already commonplace • Peering can provide access to Big Pipes and Secure DataCLOUDEVENTPROCESSING
  • 28. Thank You! • Questions? • Contact Me – Colin Clark – @EventCloudPro – cpclark@cloudeventprocessing.comCLOUDEVENTPROCESSING

×