Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cloud connect 03 08-2011


Published on

Published in: Technology, Business
  • I gave this presentation at CloudConnect this past year. It talks about some big data, low latency use cases and highlights a distributed, streaming map/reduce architecture and also the SAX algorithm. Enjoy and comments always welcome!
    Are you sure you want to  Yes  No
    Your message goes here

Cloud connect 03 08-2011

  1. 1. Cloud Event Processing Analyze ∙ Sense ∙ Respond CloudConnect March 8, 2011
  2. 2. Welcome • High Velocity Big Data • What is Complex Event Processing? • Analyzing Time Series with SAX • What is Map/Reduce? • Correlating with Historical Data • Using the Cloud • QuestionsCLOUDEVENTPROCESSING
  3. 3. Data Growth* 18 16 14 12 10 8 6 4 2 0 Category 1 Category 2 Category 3 Category 4CLOUD *It would appear that things will actually get worse, not betterEVENTPROCESSING
  4. 4. High Velocity Big Data • What is Big Data? – You’ve got Big Data issues when you can’t turn the data into information fast enough to act on: • Earthquake • Brownout • Market Crash • Terrorist Event – You’ve got Big Data when you have to consider its actually Physicality • What is High Velocity Big Data – Big Data In Flight… • You don’t get to store it before you analyze itCLOUDEVENTPROCESSING
  5. 5. What is Complex Event Processing? • Complex Event Processing (CEP) delivers high- speed processing of many events across all the layers of an organization, identifying only the most meaningful events within the event cloud, analyzing their impact, and taking subsequent action in real time. – From WikipediaCLOUDEVENTPROCESSING
  6. 6. What? What is CEP? • Domain Specific Language – Makes it easier to deal with events • Continuous Query – Select symbol, side, price from tradeStream • Time/Length Windows – Select symbol, side, avg(price) from minutes) group by symbol, side • Pattern Matching – select a.* from pattern [every a=FIXNewOrderSingle -> (timer:interval(30 seconds) and not FIXNewOrderSingle(a.Side!=Side and a.OrderQty = OrderQty and a.Symbol = Symbol))]CLOUDEVENTPROCESSING
  7. 7. Wouldn’t It Be Cool • Select * from everything where itsInteresting = toMe in last 10 minutes; • Select * from everything where earthQuake > .8; • Select * from everything where terroristsWillStrike > .9;CLOUDEVENTPROCESSING
  8. 8. CEP – Current Benefits* • Really Fast! • Low Latency! • Provides a ‘ready made’ framework to build real-time pattern matching applications • Think at a higher level – Productivity *your mileage may vary, widelyCLOUDEVENTPROCESSING
  9. 9. CEP – Current Limitations • Memory Bound – If you have a lot of events and windows, you risk running out of memory on a single machine • Compute Bound – To ensure high throughput and low latency, most CEP engines are actually doing simplistic things • e.g. Filtering events • Black Box – What’s going on in there?CLOUDEVENTPROCESSING
  10. 10. Checkpoint • Ok, so by using Complex Event Processing – You can analyze data in flight – But • You’re constrained by: – Available compute – Memory • Because, there’s still too much data to process on one machine…CLOUDEVENTPROCESSING
  11. 11. The Problem With Time Series • Dimensionality – How can I recognize something? • Distance Measures – How do I find similar occurrences? • Time – By the time I process the data, the information has little value…CLOUDEVENTPROCESSING
  12. 12. Symbolic Aggregate Approximation SAX Encoding • SAX reduces numerical data to a short string, or SAX word. c c c • Thousands of data points of b b numerical, continuous data b becomes ‘ABCEDEFGH’ - a a 0 20 40 60 80 100 120 • SAX Approximation of the data fits in main memory, yet retains features of interest baabccbc • Creating SAX words from SAX Advantages: historical and streaming data • Patterns identified and described using SAX actually look like the underlying data allows us to perform all kinds of magic… • Other algorithms sometimes don’t actually describeCLOUD the underlying patterns or take way too much work toEVENT be useful in real timePROCESSING
  13. 13. SAX – 5 Use Cases • Indexing – Given a time series, find similar time series in the database • Clustering – Find natural grouping in the time series • Classification – Automagically sort patterns found in time series into categories • Summarization – Condense verbose data into meaningful information • Anomaly Detection – Find surprising, interesting, or unexpected behaviorCLOUDEVENTPROCESSING
  14. 14. Why SAX is Cool • Lower Bounding – The patterns identified and described using SAX actually look like the underlying data • Dimensionality Reduction – Previously intractable problems become possible in real time • Other algorithms sometimes don’t describe underlying patterns • Take way too much work to be useful in real timeCLOUDEVENTPROCESSING
  16. 16. Normalized & PAA AppliedCLOUDEVENTPROCESSING
  18. 18. Checkpoint • We’ve reduced dimensionality • We know were we are – The current pattern is AABASDGF • We’re calculating it in ‘real-time’* – Using Complex Event Processing • But – There’s still too much data to process on one machine… • How can we process more data in the same amount of time?CLOUDEVENTPROCESSING *I much prefer the term event-driven
  19. 19. What is Map/Reduce? • Framework for processing ginormous datasets using a large number of computers (nodes) in a cluster. • "Map" Master node takes the input, chops it up into smaller sub- problems, and distributes those to worker nodes. The worker node processes that smaller problem, and passes the answer back to its master node. • "Reduce" Takes the answers to all the sub-problems and combines them in a way to get the output - the answer to the problem it was originally trying to solve. – From WikipediaCLOUDEVENTPROCESSING
  20. 20. What? What is Map/Reduce? • WordCount Example (classic) – Map scans text for words and emits - {word,1} – Combine/collapses key values on same node - {word,1,1,1} -> {word,3} – Shuffle/Sort merges results from different nodes • {node A,”NoSQL”,50} {node B,:”Oracle”,50} {node B,”NoSQL”, 50) – becomes • {node A,”NoSQL”,50} {node B,”NoSQL”,50} {node B,”Oracle”,50} – Reduce • Outputs {“NoSQL”,100} {“Oracle”,50}CLOUDEVENTPROCESSING
  21. 21. SAX and Map/Reduce • SAX is an ‘embarrassingly parallel’ problem • Using parallel processing allows SAX words to be computed more quickly • Using Streaming Map/Reduce provides results even faster, increasing the value of data even more – Partition by symbol and sort by timestamp – Calculate SAX words for each symbol, in parallel • CEP Time Windows to the Rescue!CLOUDEVENTPROCESSING
  22. 22. Checkpoint • CEP is great, but I still have to tell it what I’m looking for, right? • SAX can help us reduce dimensionality, what else can it do for us? • How do I relate Streaming Data to Historical Data? • How do I do this while the Information still has value?CLOUDEVENTPROCESSING
  23. 23. High Velocity Big Data Pattern Historical Map Events Map Events Reduce Map Map Events OnRamp Events Map SAX Reduce Context MapCLOUDEVENTPROCESSING
  24. 24. So What Do We Need? • Complex Event Processing • The Algorithm (SAX) • Processing Model – Streaming Map/Reduce • Context – The Historical Aspect • What Do We Call This?CLOUDEVENTPROCESSING
  25. 25. What is DarkStar? – Platform as a Service (PaaS) • Provides Distributed – Complex Event Processing – Streaming Map/Reduce – Messaging – Web Services – Monitoring/Management – Applications are built on top, or inside • SAX runs inside of DarkStar – SAX is not a component of DarkStar, but an add-in library – And deployed in a cluster • Virtualized ResourcesCLOUDEVENTPROCESSING
  26. 26. DarkStar • What patterns are occurring in my data, right now? – CEP based streaming Map/Reduce • Use a cluster of machines • When did this pattern happen before? – Database with embedded Map/Reduce • No need to move data outside the database for processingCLOUDEVENTPROCESSING
  27. 27. The Cloud • Elastic Resource – Grows/Shrinks according to demand • Virtualization – Efficient utilization of compute • The Previously Unthinkable – Is now possible, if not already commonplace • Peering can provide access to Big Pipes and Secure DataCLOUDEVENTPROCESSING
  28. 28. Thank You! • Questions? • Contact Me – Colin Clark – @EventCloudPro – cpclark@cloudeventprocessing.comCLOUDEVENTPROCESSING