Speeding up big data with event processing

<Insert Picture Here>
Speeding-up Big Data with
Event Processing
Alexandre de Castro Alves
1Thursday, July 18, 13

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Disclaimers
• The following is intended to outline our general product direction. It is intended
for information purposes only, and may not be incorporated into any contract. It is
not a commitment to deliver any material, code, or functionality, and should not
be relied upon in making purchasing decisions. The development, release, and
timing of any features or functionality described for Oracle’s products remains at
the sole discretion of Oracle.

3
Agenda
• CEP
• Drivers
• Formal description
• Big Data
• Scenarios
• Architecture
• Integration with CEP
• Fast Data
• Architecture
• Predictive Analytics
• Data Mining
• Online data mining
• Scenarios

Event-Driven Applications
Financial Services
Transportation &
Logistics
Public Sector & Military
Manufacturing
Utilities & Insurance
Telecommunications &
ServicesAlgorithmic trading
Asset management
Distributed order orchestration
‘Negative Working Capital’
inventory management
Grid Infrastructure Management
Reponses to calamities –
earthquake, flooding
• Proximity/Location Tracking
• Intrusion detection systems
• Military asset allocation

Business Drivers & Enablers
• Exploding volume of digital event data:
• The cost of sensors and computing power has dropped, network
capacity has increased
• Accelerating business process:
• “the pace of business has increased, the world is changing faster,
and competition is getting tougher”
• Roy Schulte - VP Gartner Analyst
• "Event-driven systems are intrinsically smart because they
are context-aware and run when they detect changes in
the business world rather than occurring on a simple
schedule or requiring someone to tell them when to run."
• K. Mani Chandy, Simon Ramo Professor at the
California Institute of Technology in Pasadena

Event processing
Taxonomy
• Event passing
• Events are exchanged, but not processed
• Simple pub-sup applications
• Example: JMS
• Event mediation (brokering)
• Events are filtered, routed, and enriched
• However not state-full
• Example: ESB
• Complex Event Processing
• Events are aggregated and new complex events are created
• Extremely state-full

Inverted Database
RDBMS
Data
Query CEP
Query
Event
Data
Data
Query
Query
• Data is ‘static’
• Queries are ‘dynamic’
• Data (event) is ‘dynamic’
• Queries are ‘static’

EPTS and Standards
• Event processing technical society
• Defines glossary
• http://www.ep-ts.com/component/option,com_docman/
task,cat_view/gid,16/Itemid,84/
• Steering committee:
• Opher Etzion (IBM), Louis Lovas (Apama), David Luckham
(Stanford), Alan Lundberg (TIBCO), John Morrell (SAP
Corel8), Roy Schulte (Gartner), Richard Tibbetts
(Streambase), Alexandre Alves (Oracle)
• Participation at DEBS
• ANSI SQL Standards Proposal for CQL Pattern Matching
• Oracle, IBM, Stanford University
• OpenSource Adoption of CQL (Swiss University)

CEP Models

CEP Languages
inference
rules
ECA
State-
oriented
Script-
oriented
Agent-
oriented
SQL-
idioms
TIBCO
Apama
RuleCore
AgentLogic
Streambase
IBM
(AptSoft)
Oracle
CEP
Oracle
CEP
Source: EPTS/DEBS Tutorial 2009

Contextual Data
EVENT
SOURCES
EVENT
SINKS
STREAM
RELATION
NOT JEE!
Application Model

Application Model
Contextual Data
NOT JEE!
• Event Processing Network (EPN)
• Non-rooted directed graph describing event flow from event sources to event
sinks
• References to contextual static data (e.g. table, cache, HDFS)
• Intermediate nodes:
• Process events (CQL processor, Java Event-Beans)
• Stage or route processing (channels)
• Edge nodes:
• Adapters (e.g. JMS, HTTP pub/sub JSON)
Event Sinks
Event Sources

Application Model
• Event models:
• STREAM (append-only, unbounded)
• RELATION (insert/delete, bounded)
• Event formats:
• Java Class
• Map (key-value pairs)
• XML
• Timing models:
• system timestamped
• application timestamped
Adapter
Adapter
Processor
Listener
- POJO
Event Source
Data Source
Query
Rule
Processor
Query
Query
Rule
Processor
Query
Rule
Processor
Query
Rule
Cache Rule
Processor
Query
Listener
- ALSB

• EVENT
• Defined by a schema: event -type
• Tuple of event properties
StockEventTypeStockEventType
symbol string
lastBid float
lastAsk float
Event properties
Application Model

• STREAM
• Time ordered sequence of events in time
• APPEND-only
• One cannot remove events, just add them to the sequence
• Unbounded
• There is no end to the sequence
{event1, event2, event3, event4, …, eventN}
Application Model

• STREAM
• Examples:
• {{1s, event1}, {2s, event2}, {4s, event3}}
Application Model

• STREAM
• Examples:
Application Model
STREAM

• STREAM
• Examples:
Application Model
STREAM
EVENT
CLOUD

• RELATION
• Bag of events at some instantaneous time T
• Allow for INSERT, DELETE, and UPDATE
• Example:
• At T=1: {{event1}, {event2}, {event3}}
• At T=2: {{event1}, {event3}, {event4}}
• No changes to event1 and event3
• Event2 was deleted
• Event4 was inserted
Application Model

Event Processing Language: CQL
• High-level descriptive language for EP, dynamically
changeable
• Continuous and incremental
• Driven by time and events, incremental calculations
• Leverages SQL principles/implementation, and extends it
with formal STREAM calculus.
• Based on STREAMs project in Stanford
continuous continuous
Stream-Relational Algebra Control Rate of
Event Output
Define Window
of Events

Stream-relation Window Operator
Time (in secs) Input event Output event
00 ∅ {AVG(price) = 0.0}
01 {symbol = “aaa”, price = 4.0} {AVG(price) = 4.0}
10 {symbol = “bbb”, price = 2.0} {AVG(price) = 3.0}
61 ∅ {AVG(price) = 3.5}
70 ∅ {AVG(price) = 5.0}
SELECT AVG(price)
FROM marketFeed [RANGE 1 MINUTE]

• Window variations:
• Sliding
• Jumping (batching)
• Partitioned
• User-defined windows
• Time-based
• Tuple-based
• Value windows
• CurrentHour (left edge is fixed, and right edge moves)
Stream-relation Window Operator

Relation-stream operators

Relation-stream operators
Time Input event WINDOW ISTREAM output
output
00 ∅ +{AVG(price) = 0.0} +{AVG(price) = 0.0}
01 +{price = 4.0} -{AVG(price) = 0.0}, +{AVG(price) = 4.0}
+{AVG(price) = 4.0}
+{AVG(price) = 3.0}
+{AVG(price) = 3.6}
61 ∅ -{AVG(price) = 3.6}, +{AVG(price) = 3.5}
+{AVG(price) = 3.5}
+{AVG(price) = 5.0}
+{AVG(price) = 5.5}
ISTREAM (SELECT AVG(price)
FROM marketFeed [RANGE 1 MINUTE])
DSTREAM (SELECT AVG(price)
FROM marketFeed [RANGE 1 MINUTE])
Time Input event WINDOW DSTREAM output
output
00 ∅ +{AVG(price) = 0.0} ∅
+{AVG(price) = 4.0}
+{AVG(price) = 3.0}
+{AVG(price) = 3.6}
+{AVG(price) = 3.5}
+{AVG(price) = 5.0}
+{AVG(price) = 5.5}

Pattern Matching
• Detect complex relationships amongst events
• State-machine model
• ANSI standards proposal
• IBM, Oracle, Streambase
• Starting to see adoption by other vendors/users (e.g.
MySQL) [1]

Pattern Matching
SELECT M.up
FROM ticker
MATCH_RECOGNIZE (
MEASURES
B.price as up,
A.price as down
PATTERN (A B)
DEFINE
A as price < 10.0,
B as price => 10.0
) as M
Input event Output event
+{symbol = ‘ORCL’, price = 9.0} ∅
+{symbol = ‘ORCL’, price = 9.5} ∅
+{symbol = ‘ORCL’, price = 12.0} +{M.up = 12.0}
A
A B
price=9.0
price=9.5
price=12.0 up=12.0
price=9.5

Pattern Matching

Event Processing Ecosystem
JMS
HTTP
PUB/SUB
JMS
HTTP
PUB/SUB
Events Events
Contextual Data
IDE OEP Server Visualizer Web Console / BAM
deploy manage
RDBMS Cache Hadoop NoSqlDb

Summary
• Event Processing Network defines the assembly
• CQL defines the processing
• STREAM vs RELATION
• RELATION can be any relational source:
• tables, caches, Hadoop HDFS files, etc.

28
Agenda
• CEP
• Drivers
• Big Data
• Scenarios
• Architecture
• Fast Data
• Architecture
• Data Mining
• Scenarios

Big Data Scenarios
MEDIA/
ENTERTAINMENT
Viewers / advertising
effectiveness
Cross Sell
COMMUNICATIONS
Location-based advertising
EDUCATION &
RESEARCH
Experiment sensor
analysis
Retail / CPG
Sentiment analysis
Hot products
Optimized Marketing
HEALTH CARE
Patient sensors,
monitoring, EHRs
Quality of care
LIFE SCIENCES
Clinical trials
Genomics
HIGH TECHNOLOGY /
INDUSTRIAL MFG.
Mfg quality
Warranty analysis
OIL & GAS
Drilling exploration
sensor analysis
FINANCIAL
SERVICES
Risk & portfolio analysis
New products
AUTOMOTIVE
Auto sensors
reporting location,
problems
Games
Adjust to
player
behavior
In-Game Ads
LAW ENFORCEMENT
& DEFENSE
Threat analysis - social
media monitoring,
photo analysis
TRAVEL &
TRANSPORTATION
Sensor analysis for
optimal traffic flows
Customer sentiment
UTILITIES
Smart Meter
analysis for
network
capacity,
ON-LINE
SERVICES /
SOCIAL MEDIA
People & career
matching
Web-site
optimization

What’s Big Data?
VELOCITYVOLUME VARIETY
1011001010010010
0110101010101110
010101010010010
Web
SMS
VALUE

Big Data Architecture (Map-Reduce)
Data
Data
Data
Data
Data
Data
Data
Data
Data
Big,
Immutable
(append-only,
avoids corruption)
Batch-Layer
Batch views
query = function(data)
e.g. Hadoop
Data
batch
input
batch
input
map
key1,
value1
key2,
value2
key1,
value3
key2,
value4
key1,
value5
reduce
key1, {value1,
value3,
value5}
key2, {value2,
value4}

When is CEP needed?
• If Big Data is about VVV (volume, variety, velocity),
then Stream Processing is needed when at least 2 of
the 3 V’s are present.
• If there is high volume and low-latency is needed (velocity),
then stream processing must be done.
• If there is NOT a lot of volume, but the data is semi-structured
(variety), such as the case of social feeds, and low-latency is
needed, then stream processing must still be applied.
• If volume is low, and no need to do it fast, then use regular
messaging processing technology, such as JMS.

CEP with Big Data

34
Agenda
• CEP
• Drivers
• Big Data
• Scenarios
• Architecture
• Fast Data
• Architecture
• Data Mining
• Scenarios

Big Data Architecture Limitations
Data
Data
Data
Data
Data
Data
Data
Data
Data
Big,
Immutable
(append-only,
avoids corruption)
Batch-Layer
Batch views
e.g. Hadoop
Data
batch
input
batch
input
map
key1,
value1
key2,
value2
key1,
value3
key2,
value4
key1,
value5
reduce
key1, {value1,
value3,
value5}
key2, {value2,
value4}
Batch
output
Deep,
but not real-time

Data
Data
Data
Data
Data
Data
Data
Data
Data
Big,
Immutable
(append-only,
avoids corruption)
Batch-Layer
Batch views
e.g. Hadoop
Indexing-Layer
e.g. ElephantDB,
Cassandra,
NoSqlDB
Indexed batch views
Fast-Layer
e.g. CEP,
Storm
real-time views
+ inc-update
Data
Fast Data Architecture

• Integration with other
Big Data technologies:
• HBase,
• Hive
• Avro (Flume)
• Incremental merge of
Hadoop Jobs with
OEP queries
• Avoids developer from
having to create own
Hadoop job
Fast Data with CEP

38
Agenda
• CEP
• Drivers
• Big Data
• Scenarios
• Architecture
• Fast Data
• Architecture
• Data Mining
• Scenarios

Data Mining
• Identify patterns and relationships in
real world
• Develop descriptive models of
datasets
• Use models to evaluate future
options, risks and
decisions

Data Mining
Data-SetWorld Model
population sample
statistical summaries,
regressions,
machine-learning
Data Model Prediction
(1) TRAIN
(2) SCORE
(3) RE-TRAIN

Online Data Mining
continuous continuous
Event
Model
Export model
Rebuild model
Score events
Predict if price of next event
will be above 0.8 using model
Model
Repository

Challenges (Right Model, Right Cost)
Data
Model
Induction
Data
Deduction
k-Nearest-Neighbors
Decision trees
Neural nets/SVM
Increased
Compression
Computational Cost

Challenges
• All models are wrong, some are useful (George Box)
• Central Limit Theorem
• Means of random samples of the same population will be
normally distributed (even if the data is not normally
distributed)
• However, all bets are off if not from the same population
• Consider a regression function of weight -> height
• Will not work if model is build using samples of a city bus
and scored in bus containing only basketball players
• What confidence level to use?
• Scientific papers demand a 95% confidence level. What
about streaming use-cases? 95% seems too high...

• alex.alves@oracle.com
• http://www.oracle.com/
technetwork/middleware/
complex-event-processing/
overview/index.html
• http://adcalves.wordpress.com
• http://www.packtpub.com/
getting-started-with-oracle-
event-processing-11g/book
Material

Insert Information Protection Policy Classification
from Slide 8
46

Speeding up big data with event processing

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Viewers also liked

Viewers also liked (20)

Similar to Speeding up big data with event processing

Similar to Speeding up big data with event processing (20)

More from Alexandre de Castro Alves

More from Alexandre de Castro Alves (7)

Recently uploaded

Recently uploaded (20)

Speeding up big data with event processing