SlideShare a Scribd company logo
Cloud Event Processing
  Analyze ∙ Sense ∙ Respond

       CloudConnect
           March 8, 2011
Welcome
    •   High Velocity Big Data
    •   What is Complex Event Processing?
    •   Analyzing Time Series with SAX
    •   What is Map/Reduce?
    •   Correlating with Historical Data
    •   Using the Cloud
    •   Questions
CLOUD
EVENT
PROCESSING
Data Growth*
       18
       16
       14
       12
       10
        8
        6
        4
        2
        0
             Category 1   Category 2    Category 3     Category 4



CLOUD           *It would appear that things will actually get worse, not better
EVENT
PROCESSING
High Velocity Big Data
    • What is Big Data?
             – You’ve got Big Data issues when you can’t turn the
               data into information fast enough to act on:
                •   Earthquake
                •   Brownout
                •   Market Crash
                •   Terrorist Event
             – You’ve got Big Data when you have to consider its
               actually Physicality
    • What is High Velocity Big Data
             – Big Data In Flight…
                • You don’t get to store it before you analyze it
CLOUD
EVENT
PROCESSING
What is Complex Event Processing?
    • Complex Event Processing (CEP) delivers high-
      speed processing of many events across all the
      layers of an organization, identifying only the
      most meaningful events within the event
      cloud, analyzing their impact, and taking
      subsequent action in real time.
             – From Wikipedia


CLOUD
EVENT
PROCESSING
What? What is CEP?
    • Domain Specific Language
             – Makes it easier to deal with events
    • Continuous Query
             – Select symbol, side, price from tradeStream
    • Time/Length Windows
             – Select symbol, side, avg(price) from
               tradeStream.win:time(10 minutes) group by symbol, side
    • Pattern Matching
             – select a.* from pattern [every a=FIXNewOrderSingle ->
               (timer:interval(30 seconds) and not
               FIXNewOrderSingle(a.Side!=Side and a.OrderQty =
               OrderQty and a.Symbol = Symbol))]
CLOUD
EVENT
PROCESSING
Wouldn’t It Be Cool
    • Select * from everything where itsInteresting
      = toMe in last 10 minutes;

    • Select * from everything where earthQuake >
      .8;

    • Select * from everything where
      terroristsWillStrike > .9;
CLOUD
EVENT
PROCESSING
CEP – Current Benefits*
    • Really Fast!
    • Low Latency!
    • Provides a ‘ready made’ framework to build
      real-time pattern matching applications
    • Think at a higher level
             – Productivity

                          *your mileage may vary, widely
CLOUD
EVENT
PROCESSING
CEP – Current Limitations
    • Memory Bound
             – If you have a lot of events and windows, you risk
               running out of memory on a single machine
    • Compute Bound
             – To ensure high throughput and low latency, most
               CEP engines are actually doing simplistic things
                • e.g. Filtering events
    • Black Box
             – What’s going on in there?
CLOUD
EVENT
PROCESSING
Checkpoint
    • Ok, so by using Complex Event Processing
             – You can analyze data in flight
             – But
                • You’re constrained by:
                   – Available compute
                   – Memory

    • Because, there’s still too much data to process
      on one machine…
CLOUD
EVENT
PROCESSING
The Problem With Time Series
    • Dimensionality
             – How can I recognize something?
    • Distance Measures
             – How do I find similar occurrences?
    • Time
             – By the time I process the data, the information
               has little value…


CLOUD
EVENT
PROCESSING
Symbolic Aggregate Approximation
                                                               SAX Encoding
 •    SAX reduces numerical data to a
      short string, or SAX word.                                                 c
                                                                                      c         c
 •    Thousands of data points of                                           b              b
      numerical, continuous data                                b
      becomes ‘ABCEDEFGH’
                                                    -                 a a
                                                           0     20    40   60   80       100   120
 •    SAX Approximation of the data fits
      in main memory, yet retains
      features of interest
                                                                baabccbc
 •    Creating SAX words from              SAX Advantages:
      historical and streaming data        • Patterns identified and described using SAX actually
                                             look like the underlying data
      allows us to perform all kinds of
      magic…                               • Other algorithms sometimes don’t actually describe
CLOUD                                        the underlying patterns or take way too much work to
EVENT                                        be useful in real time
PROCESSING
SAX – 5 Use Cases
    • Indexing
             – Given a time series, find similar time series in the database
    • Clustering
             – Find natural grouping in the time series
    • Classification
             – Automagically sort patterns found in time series into
               categories
    • Summarization
             – Condense verbose data into meaningful information
    • Anomaly Detection
             – Find surprising, interesting, or unexpected behavior
CLOUD
EVENT
PROCESSING
Why SAX is Cool
    • Lower Bounding
             – The patterns identified and described using SAX
               actually look like the underlying data
    • Dimensionality Reduction
             – Previously intractable problems become possible in
               real time
    • Other algorithms sometimes don’t describe
      underlying patterns
    • Take way too much work to be useful in real time
CLOUD
EVENT
PROCESSING
A Day’s Worth of IBM




CLOUD
EVENT
PROCESSING
Normalized & PAA Applied




CLOUD
EVENT
PROCESSING
And Finally, SAX
                        G
                        F
         E
                        E
             D     D
                        C
                        C    C          C
                                    B
                        B
                        A
CLOUD
EVENT
PROCESSING        EDDCCBC
Checkpoint
    • We’ve reduced dimensionality
    • We know were we are
             – The current pattern is AABASDGF
    • We’re calculating it in ‘real-time’*
             – Using Complex Event Processing
    • But
             – There’s still too much data to process on one
               machine…
    • How can we process more data in the same
      amount of time?
CLOUD
EVENT
PROCESSING
                        *I much prefer the term event-driven
What is Map/Reduce?
    • Framework for processing ginormous datasets using a large number
      of computers (nodes) in a cluster.

    • "Map"
      Master node takes the input, chops it up into smaller sub-
      problems, and distributes those to worker nodes. The worker node
      processes that smaller problem, and passes the answer back to its
      master node.

    • "Reduce"
      Takes the answers to all the sub-problems and combines them in a
      way to get the output - the answer to the problem it was originally
      trying to solve.
             – From Wikipedia
CLOUD
EVENT
PROCESSING
What? What is Map/Reduce?
    • WordCount Example (classic)
             – Map scans text for words and emits - {word,1}
             – Combine/collapses key values on same node -
               {word,1,1,1} -> {word,3}
             – Shuffle/Sort merges results from different nodes
                • {node A,”NoSQL”,50} {node B,:”Oracle”,50} {node B,”NoSQL”, 50)
                    – becomes
                • {node A,”NoSQL”,50} {node B,”NoSQL”,50} {node B,”Oracle”,50}
             – Reduce
                • Outputs {“NoSQL”,100} {“Oracle”,50}

CLOUD
EVENT
PROCESSING
SAX and Map/Reduce
    • SAX is an ‘embarrassingly parallel’ problem
    • Using parallel processing allows SAX words to
      be computed more quickly
    • Using Streaming Map/Reduce provides results
      even faster, increasing the value of data even
      more
             – Partition by symbol and sort by timestamp
             – Calculate SAX words for each symbol, in parallel
    • CEP Time Windows to the Rescue!
CLOUD
EVENT
PROCESSING
Checkpoint
    • CEP is great, but I still have to tell it what I’m
      looking for, right?
    • SAX can help us reduce dimensionality, what
      else can it do for us?
    • How do I relate Streaming Data to Historical
      Data?
    • How do I do this while the Information still has
      value?
CLOUD
EVENT
PROCESSING
High Velocity Big Data Pattern
                                                                        Historical

                                       Map
                     Events            Map   Events   Reduce
                                       Map




                                             Map

       Events   OnRamp        Events         Map      SAX      Reduce       Context

                                             Map




CLOUD
EVENT
PROCESSING
So What Do We Need?
    •   Complex Event Processing
    •   The Algorithm (SAX)
    •   Processing Model – Streaming Map/Reduce
    •   Context – The Historical Aspect
    •   What Do We Call This?



CLOUD
EVENT
PROCESSING
What is DarkStar?
             – Platform as a Service (PaaS)
                • Provides Distributed
                    –   Complex Event Processing
                    –   Streaming Map/Reduce
                    –   Messaging
                    –   Web Services
                    –   Monitoring/Management
             – Applications are built on top, or inside
                • SAX runs inside of DarkStar
                    – SAX is not a component of DarkStar, but an add-in library
             – And deployed in a cluster
                • Virtualized Resources
CLOUD
EVENT
PROCESSING
DarkStar
    • What patterns are occurring in my data, right
      now?
             – CEP based streaming Map/Reduce
               • Use a cluster of machines
    • When did this pattern happen before?
             – Database with embedded Map/Reduce
               • No need to move data outside the database for
                 processing

CLOUD
EVENT
PROCESSING
The Cloud
    • Elastic Resource
             – Grows/Shrinks according to demand
    • Virtualization
             – Efficient utilization of compute
    • The Previously Unthinkable
             – Is now possible, if not already commonplace
    • Peering can provide access to Big Pipes and
      Secure Data
CLOUD
EVENT
PROCESSING
Thank You!
    • Questions?

    • Contact Me
             – Colin Clark
             – @EventCloudPro
             – cpclark@cloudeventprocessing.com


CLOUD
EVENT
PROCESSING

More Related Content

What's hot

AWS re:Invent 2013 Recap
AWS re:Invent 2013 RecapAWS re:Invent 2013 Recap
AWS re:Invent 2013 Recap
Barry Jones
 
Open west 2015 talk ben coverston
Open west 2015 talk ben coverstonOpen west 2015 talk ben coverston
Open west 2015 talk ben coverston
bcoverston
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
Oh Chan Kwon
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
DataStax Academy
 
Rendering Takes Flight
Rendering Takes FlightRendering Takes Flight
Rendering Takes Flight
Avere Systems
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Databricks
 
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Big Data Spain
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
DataStax Academy
 
Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015
Eric Sammer
 
Spark Intro @ analytics big data summit
Spark  Intro @ analytics big data summitSpark  Intro @ analytics big data summit
Spark Intro @ analytics big data summit
Sujee Maniyam
 
Cassandra Summit 2014: Diagnosing Problems in Production
Cassandra Summit 2014: Diagnosing Problems in ProductionCassandra Summit 2014: Diagnosing Problems in Production
Cassandra Summit 2014: Diagnosing Problems in Production
DataStax Academy
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Eric Sammer
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systems
nathanmarz
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
Nathan Bijnens
 
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
Amazon Web Services
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Peter Haase
 

What's hot (16)

AWS re:Invent 2013 Recap
AWS re:Invent 2013 RecapAWS re:Invent 2013 Recap
AWS re:Invent 2013 Recap
 
Open west 2015 talk ben coverston
Open west 2015 talk ben coverstonOpen west 2015 talk ben coverston
Open west 2015 talk ben coverston
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
Rendering Takes Flight
Rendering Takes FlightRendering Takes Flight
Rendering Takes Flight
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
 
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
Architecture to Scale. DONN ROCHETTE at Big Data Spain 2012
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
 
Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015
 
Spark Intro @ analytics big data summit
Spark  Intro @ analytics big data summitSpark  Intro @ analytics big data summit
Spark Intro @ analytics big data summit
 
Cassandra Summit 2014: Diagnosing Problems in Production
Cassandra Summit 2014: Diagnosing Problems in ProductionCassandra Summit 2014: Diagnosing Problems in Production
Cassandra Summit 2014: Diagnosing Problems in Production
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...
 
The Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data SystemsThe Secrets of Building Realtime Big Data Systems
The Secrets of Building Realtime Big Data Systems
 
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
A real-time (lambda) architecture using Hadoop & Storm (NoSQL Matters Cologne...
 
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
 

Viewers also liked

#AU2016 Unlock the "I" in BIM with Data Democratization
#AU2016 Unlock the "I" in BIM with Data Democratization#AU2016 Unlock the "I" in BIM with Data Democratization
#AU2016 Unlock the "I" in BIM with Data Democratization
Nathan C. Wood
 
What exactly is the "Internet of Things"?
What exactly is the "Internet of Things"?What exactly is the "Internet of Things"?
What exactly is the "Internet of Things"?
Dr. Mazlan Abbas
 
McKinsey - unlocking the potential of the internet of things / IoT
McKinsey - unlocking the potential of the internet of things / IoTMcKinsey - unlocking the potential of the internet of things / IoT
McKinsey - unlocking the potential of the internet of things / IoT
polenumerique33
 
Internet of Things (IoT) - We Are at the Tip of An Iceberg
Internet of Things (IoT) - We Are at the Tip of An IcebergInternet of Things (IoT) - We Are at the Tip of An Iceberg
Internet of Things (IoT) - We Are at the Tip of An Iceberg
Dr. Mazlan Abbas
 
Internet of Things and its applications
Internet of Things and its applicationsInternet of Things and its applications
Internet of Things and its applications
Pasquale Puzio
 
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-gInternet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
Mohan Kumar G
 

Viewers also liked (6)

#AU2016 Unlock the "I" in BIM with Data Democratization
#AU2016 Unlock the "I" in BIM with Data Democratization#AU2016 Unlock the "I" in BIM with Data Democratization
#AU2016 Unlock the "I" in BIM with Data Democratization
 
What exactly is the "Internet of Things"?
What exactly is the "Internet of Things"?What exactly is the "Internet of Things"?
What exactly is the "Internet of Things"?
 
McKinsey - unlocking the potential of the internet of things / IoT
McKinsey - unlocking the potential of the internet of things / IoTMcKinsey - unlocking the potential of the internet of things / IoT
McKinsey - unlocking the potential of the internet of things / IoT
 
Internet of Things (IoT) - We Are at the Tip of An Iceberg
Internet of Things (IoT) - We Are at the Tip of An IcebergInternet of Things (IoT) - We Are at the Tip of An Iceberg
Internet of Things (IoT) - We Are at the Tip of An Iceberg
 
Internet of Things and its applications
Internet of Things and its applicationsInternet of Things and its applications
Internet of Things and its applications
 
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-gInternet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
Internet-of-things- (IOT) - a-seminar - ppt - by- mohan-kumar-g
 

Similar to Cloud connect 03 08-2011

Introduction to near real time computing
Introduction to near real time computingIntroduction to near real time computing
Introduction to near real time computing
Tao Li
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
BigDataCloud
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQL
Crate.io
 
Capacity Planning for fun & profit
Capacity Planning for fun & profitCapacity Planning for fun & profit
Capacity Planning for fun & profit
Rodrigo Campos
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
DataStax
 
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics PlatformHow We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
DataStax Academy
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
Tomas Cervenka
 
Gcp dataflow
Gcp dataflowGcp dataflow
Gcp dataflow
Igor Roiter
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
Santanu Dey
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
Edward Capriolo
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
Roger Xia
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
liujianrong
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
DataWorks Summit
 
Windows Azure introduction
Windows Azure introductionWindows Azure introduction
Windows Azure introduction
Microsoft Iceland
 
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, SuccessesSQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
Arnon Shimoni
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
Amazon Web Services
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
Srinath Perera
 
Development of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data GridsDevelopment of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data Grids
jlorenzocima
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
Simon Ambridge
 

Similar to Cloud connect 03 08-2011 (20)

Introduction to near real time computing
Introduction to near real time computingIntroduction to near real time computing
Introduction to near real time computing
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
 
Webinar: The Future of SQL
Webinar: The Future of SQLWebinar: The Future of SQL
Webinar: The Future of SQL
 
Capacity Planning for fun & profit
Capacity Planning for fun & profitCapacity Planning for fun & profit
Capacity Planning for fun & profit
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics PlatformHow We Used Cassandra/Solr to Build Real-Time Analytics Platform
How We Used Cassandra/Solr to Build Real-Time Analytics Platform
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
 
Gcp dataflow
Gcp dataflowGcp dataflow
Gcp dataflow
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
 
Windows Azure introduction
Windows Azure introductionWindows Azure introduction
Windows Azure introduction
 
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, SuccessesSQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
SQream DB - Bigger Data On GPUs: Approaches, Challenges, Successes
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Development of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data GridsDevelopment of concurrent services using In-Memory Data Grids
Development of concurrent services using In-Memory Data Grids
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 

Recently uploaded

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 

Recently uploaded (20)

Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 

Cloud connect 03 08-2011

  • 1. Cloud Event Processing Analyze ∙ Sense ∙ Respond CloudConnect March 8, 2011
  • 2. Welcome • High Velocity Big Data • What is Complex Event Processing? • Analyzing Time Series with SAX • What is Map/Reduce? • Correlating with Historical Data • Using the Cloud • Questions CLOUD EVENT PROCESSING
  • 3. Data Growth* 18 16 14 12 10 8 6 4 2 0 Category 1 Category 2 Category 3 Category 4 CLOUD *It would appear that things will actually get worse, not better EVENT PROCESSING
  • 4. High Velocity Big Data • What is Big Data? – You’ve got Big Data issues when you can’t turn the data into information fast enough to act on: • Earthquake • Brownout • Market Crash • Terrorist Event – You’ve got Big Data when you have to consider its actually Physicality • What is High Velocity Big Data – Big Data In Flight… • You don’t get to store it before you analyze it CLOUD EVENT PROCESSING
  • 5. What is Complex Event Processing? • Complex Event Processing (CEP) delivers high- speed processing of many events across all the layers of an organization, identifying only the most meaningful events within the event cloud, analyzing their impact, and taking subsequent action in real time. – From Wikipedia CLOUD EVENT PROCESSING
  • 6. What? What is CEP? • Domain Specific Language – Makes it easier to deal with events • Continuous Query – Select symbol, side, price from tradeStream • Time/Length Windows – Select symbol, side, avg(price) from tradeStream.win:time(10 minutes) group by symbol, side • Pattern Matching – select a.* from pattern [every a=FIXNewOrderSingle -> (timer:interval(30 seconds) and not FIXNewOrderSingle(a.Side!=Side and a.OrderQty = OrderQty and a.Symbol = Symbol))] CLOUD EVENT PROCESSING
  • 7. Wouldn’t It Be Cool • Select * from everything where itsInteresting = toMe in last 10 minutes; • Select * from everything where earthQuake > .8; • Select * from everything where terroristsWillStrike > .9; CLOUD EVENT PROCESSING
  • 8. CEP – Current Benefits* • Really Fast! • Low Latency! • Provides a ‘ready made’ framework to build real-time pattern matching applications • Think at a higher level – Productivity *your mileage may vary, widely CLOUD EVENT PROCESSING
  • 9. CEP – Current Limitations • Memory Bound – If you have a lot of events and windows, you risk running out of memory on a single machine • Compute Bound – To ensure high throughput and low latency, most CEP engines are actually doing simplistic things • e.g. Filtering events • Black Box – What’s going on in there? CLOUD EVENT PROCESSING
  • 10. Checkpoint • Ok, so by using Complex Event Processing – You can analyze data in flight – But • You’re constrained by: – Available compute – Memory • Because, there’s still too much data to process on one machine… CLOUD EVENT PROCESSING
  • 11. The Problem With Time Series • Dimensionality – How can I recognize something? • Distance Measures – How do I find similar occurrences? • Time – By the time I process the data, the information has little value… CLOUD EVENT PROCESSING
  • 12. Symbolic Aggregate Approximation SAX Encoding • SAX reduces numerical data to a short string, or SAX word. c c c • Thousands of data points of b b numerical, continuous data b becomes ‘ABCEDEFGH’ - a a 0 20 40 60 80 100 120 • SAX Approximation of the data fits in main memory, yet retains features of interest baabccbc • Creating SAX words from SAX Advantages: historical and streaming data • Patterns identified and described using SAX actually look like the underlying data allows us to perform all kinds of magic… • Other algorithms sometimes don’t actually describe CLOUD the underlying patterns or take way too much work to EVENT be useful in real time PROCESSING
  • 13. SAX – 5 Use Cases • Indexing – Given a time series, find similar time series in the database • Clustering – Find natural grouping in the time series • Classification – Automagically sort patterns found in time series into categories • Summarization – Condense verbose data into meaningful information • Anomaly Detection – Find surprising, interesting, or unexpected behavior CLOUD EVENT PROCESSING
  • 14. Why SAX is Cool • Lower Bounding – The patterns identified and described using SAX actually look like the underlying data • Dimensionality Reduction – Previously intractable problems become possible in real time • Other algorithms sometimes don’t describe underlying patterns • Take way too much work to be useful in real time CLOUD EVENT PROCESSING
  • 15. A Day’s Worth of IBM CLOUD EVENT PROCESSING
  • 16. Normalized & PAA Applied CLOUD EVENT PROCESSING
  • 17. And Finally, SAX G F E E D D C C C C B B A CLOUD EVENT PROCESSING EDDCCBC
  • 18. Checkpoint • We’ve reduced dimensionality • We know were we are – The current pattern is AABASDGF • We’re calculating it in ‘real-time’* – Using Complex Event Processing • But – There’s still too much data to process on one machine… • How can we process more data in the same amount of time? CLOUD EVENT PROCESSING *I much prefer the term event-driven
  • 19. What is Map/Reduce? • Framework for processing ginormous datasets using a large number of computers (nodes) in a cluster. • "Map" Master node takes the input, chops it up into smaller sub- problems, and distributes those to worker nodes. The worker node processes that smaller problem, and passes the answer back to its master node. • "Reduce" Takes the answers to all the sub-problems and combines them in a way to get the output - the answer to the problem it was originally trying to solve. – From Wikipedia CLOUD EVENT PROCESSING
  • 20. What? What is Map/Reduce? • WordCount Example (classic) – Map scans text for words and emits - {word,1} – Combine/collapses key values on same node - {word,1,1,1} -> {word,3} – Shuffle/Sort merges results from different nodes • {node A,”NoSQL”,50} {node B,:”Oracle”,50} {node B,”NoSQL”, 50) – becomes • {node A,”NoSQL”,50} {node B,”NoSQL”,50} {node B,”Oracle”,50} – Reduce • Outputs {“NoSQL”,100} {“Oracle”,50} CLOUD EVENT PROCESSING
  • 21. SAX and Map/Reduce • SAX is an ‘embarrassingly parallel’ problem • Using parallel processing allows SAX words to be computed more quickly • Using Streaming Map/Reduce provides results even faster, increasing the value of data even more – Partition by symbol and sort by timestamp – Calculate SAX words for each symbol, in parallel • CEP Time Windows to the Rescue! CLOUD EVENT PROCESSING
  • 22. Checkpoint • CEP is great, but I still have to tell it what I’m looking for, right? • SAX can help us reduce dimensionality, what else can it do for us? • How do I relate Streaming Data to Historical Data? • How do I do this while the Information still has value? CLOUD EVENT PROCESSING
  • 23. High Velocity Big Data Pattern Historical Map Events Map Events Reduce Map Map Events OnRamp Events Map SAX Reduce Context Map CLOUD EVENT PROCESSING
  • 24. So What Do We Need? • Complex Event Processing • The Algorithm (SAX) • Processing Model – Streaming Map/Reduce • Context – The Historical Aspect • What Do We Call This? CLOUD EVENT PROCESSING
  • 25. What is DarkStar? – Platform as a Service (PaaS) • Provides Distributed – Complex Event Processing – Streaming Map/Reduce – Messaging – Web Services – Monitoring/Management – Applications are built on top, or inside • SAX runs inside of DarkStar – SAX is not a component of DarkStar, but an add-in library – And deployed in a cluster • Virtualized Resources CLOUD EVENT PROCESSING
  • 26. DarkStar • What patterns are occurring in my data, right now? – CEP based streaming Map/Reduce • Use a cluster of machines • When did this pattern happen before? – Database with embedded Map/Reduce • No need to move data outside the database for processing CLOUD EVENT PROCESSING
  • 27. The Cloud • Elastic Resource – Grows/Shrinks according to demand • Virtualization – Efficient utilization of compute • The Previously Unthinkable – Is now possible, if not already commonplace • Peering can provide access to Big Pipes and Secure Data CLOUD EVENT PROCESSING
  • 28. Thank You! • Questions? • Contact Me – Colin Clark – @EventCloudPro – cpclark@cloudeventprocessing.com CLOUD EVENT PROCESSING