Your SlideShare is downloading. ×
Speeding up big data with event processing
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Speeding up big data with event processing

321
views

Published on

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
321
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. <Insert Picture Here> Speeding-up Big Data with Event Processing Alexandre de Castro Alves 1Thursday, July 18, 13
  • 2. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Disclaimers • The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. 2Thursday, July 18, 13
  • 3. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 3 <Insert Picture Here> Agenda • CEP • Drivers • Formal description • Big Data • Scenarios • Architecture • Integration with CEP • Fast Data • Architecture • Integration with CEP • Predictive Analytics • Data Mining • Online data mining • Scenarios 3Thursday, July 18, 13
  • 4. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Event-Driven Applications Financial Services Transportation & Logistics Public Sector & Military Manufacturing Utilities & Insurance Telecommunications & ServicesAlgorithmic trading Asset management Distributed order orchestration ‘Negative Working Capital’ inventory management Grid Infrastructure Management Reponses to calamities – earthquake, flooding • Proximity/Location Tracking • Intrusion detection systems • Military asset allocation 4Thursday, July 18, 13
  • 5. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Business Drivers & Enablers • Exploding volume of digital event data: • The cost of sensors and computing power has dropped, network capacity has increased • Accelerating business process: • “the pace of business has increased, the world is changing faster, and competition is getting tougher” • Roy Schulte - VP Gartner Analyst • "Event-driven systems are intrinsically smart because they are context-aware and run when they detect changes in the business world rather than occurring on a simple schedule or requiring someone to tell them when to run." • K. Mani Chandy, Simon Ramo Professor at the California Institute of Technology in Pasadena 5Thursday, July 18, 13
  • 6. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Event processing Taxonomy • Event passing • Events are exchanged, but not processed • Simple pub-sup applications • Example: JMS • Event mediation (brokering) • Events are filtered, routed, and enriched • However not state-full • Example: ESB • Complex Event Processing • Events are aggregated and new complex events are created • Extremely state-full 6Thursday, July 18, 13
  • 7. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Inverted Database RDBMS Data Query CEP Query Event Data Data Query Query • Data is ‘static’ • Queries are ‘dynamic’ • Data (event) is ‘dynamic’ • Queries are ‘static’ 7Thursday, July 18, 13
  • 8. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. EPTS and Standards • Event processing technical society • Defines glossary • http://www.ep-ts.com/component/option,com_docman/ task,cat_view/gid,16/Itemid,84/ • Steering committee: • Opher Etzion (IBM), Louis Lovas (Apama), David Luckham (Stanford), Alan Lundberg (TIBCO), John Morrell (SAP Corel8), Roy Schulte (Gartner), Richard Tibbetts (Streambase), Alexandre Alves (Oracle) • Participation at DEBS • ANSI SQL Standards Proposal for CQL Pattern Matching • Oracle, IBM, Stanford University • OpenSource Adoption of CQL (Swiss University) 8Thursday, July 18, 13
  • 9. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. CEP Models 9Thursday, July 18, 13
  • 10. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. CEP Languages inference rules ECA State- oriented Script- oriented Agent- oriented SQL- idioms TIBCO Apama RuleCore AgentLogic Streambase IBM (AptSoft) Oracle CEP Oracle CEP Source: EPTS/DEBS Tutorial 2009 10Thursday, July 18, 13
  • 11. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Contextual Data EVENT SOURCES EVENT SINKS STREAM RELATION NOT JEE! Application Model 11Thursday, July 18, 13
  • 12. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Application Model Contextual Data NOT JEE! • Event Processing Network (EPN) • Non-rooted directed graph describing event flow from event sources to event sinks • References to contextual static data (e.g. table, cache, HDFS) • Intermediate nodes: • Process events (CQL processor, Java Event-Beans) • Stage or route processing (channels) • Edge nodes: • Adapters (e.g. JMS, HTTP pub/sub JSON) Event Sinks Event Sources 12Thursday, July 18, 13
  • 13. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Application Model • Event models: • STREAM (append-only, unbounded) • RELATION (insert/delete, bounded) • Event formats: • Java Class • Map (key-value pairs) • XML • Timing models: • system timestamped • application timestamped Adapter Adapter Processor Listener - POJO Event Source Data Source Query Rule Processor Query Query Rule Processor Query Rule Processor Query Rule Cache Rule Processor Query Listener - ALSB 13Thursday, July 18, 13
  • 14. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. • EVENT • Defined by a schema: event -type • Tuple of event properties StockEventTypeStockEventType symbol string lastBid float lastAsk float Event properties Application Model 14Thursday, July 18, 13
  • 15. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. • STREAM • Time ordered sequence of events in time • APPEND-only • One cannot remove events, just add them to the sequence • Unbounded • There is no end to the sequence {event1, event2, event3, event4, …, eventN} Application Model 15Thursday, July 18, 13
  • 16. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. • STREAM • Examples: • {{1s, event1}, {2s, event2}, {4s, event3}} • {{1s, event1}, {4s, event2}, {2s, event3}} Application Model 16Thursday, July 18, 13
  • 17. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. • STREAM • Examples: • {{1s, event1}, {2s, event2}, {4s, event3}} • {{1s, event1}, {4s, event2}, {2s, event3}} Application Model STREAM 16Thursday, July 18, 13
  • 18. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. • STREAM • Examples: • {{1s, event1}, {2s, event2}, {4s, event3}} • {{1s, event1}, {4s, event2}, {2s, event3}} Application Model STREAM EVENT CLOUD 16Thursday, July 18, 13
  • 19. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. • RELATION • Bag of events at some instantaneous time T • Allow for INSERT, DELETE, and UPDATE • Example: • At T=1: {{event1}, {event2}, {event3}} • At T=2: {{event1}, {event3}, {event4}} • No changes to event1 and event3 • Event2 was deleted • Event4 was inserted Application Model 17Thursday, July 18, 13
  • 20. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Event Processing Language: CQL • High-level descriptive language for EP, dynamically changeable • Continuous and incremental • Driven by time and events, incremental calculations • Leverages SQL principles/implementation, and extends it with formal STREAM calculus. • Based on STREAMs project in Stanford continuous continuous Stream-Relational Algebra Control Rate of Event Output Define Window of Events 18Thursday, July 18, 13
  • 21. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Stream-relation Window Operator Time (in secs) Input event Output event 00 ∅ {AVG(price) = 0.0} 01 {symbol = “aaa”, price = 4.0} {AVG(price) = 4.0} 10 {symbol = “bbb”, price = 2.0} {AVG(price) = 3.0} 59 {symbol = “aaa”, price = 5.0} {AVG(price) = 3.6} 61 ∅ {AVG(price) = 3.5} 70 ∅ {AVG(price) = 5.0} 80 {symbol = “aaa”, price = 6.0} {AVG(price) = 5.5} SELECT AVG(price) FROM marketFeed [RANGE 1 MINUTE] 19Thursday, July 18, 13
  • 22. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. • Window variations: • Sliding • Jumping (batching) • Partitioned • User-defined windows • Time-based • Tuple-based • Value windows • CurrentHour (left edge is fixed, and right edge moves) Stream-relation Window Operator 20Thursday, July 18, 13
  • 23. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Relation-stream operators 21Thursday, July 18, 13
  • 24. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Relation-stream operators Time Input event WINDOW ISTREAM output output 00 ∅ +{AVG(price) = 0.0} +{AVG(price) = 0.0} 01 +{price = 4.0} -{AVG(price) = 0.0}, +{AVG(price) = 4.0} +{AVG(price) = 4.0} 10 +{price = 2.0} -{AVG(price) = 4.0}, +{AVG(price) = 3.0} +{AVG(price) = 3.0} 59 +{price = 5.0} -{AVG(price) = 3.0}, +{AVG(price) = 3.6} +{AVG(price) = 3.6} 61 ∅ -{AVG(price) = 3.6}, +{AVG(price) = 3.5} +{AVG(price) = 3.5} 70 ∅ -{AVG(price) = 3.5}, +{AVG(price) = 5.0} +{AVG(price) = 5.0} 80 +{price = 6.0} -{AVG(price) = 5.0}, +{AVG(price) = 5.5} +{AVG(price) = 5.5} ISTREAM (SELECT AVG(price) FROM marketFeed [RANGE 1 MINUTE]) DSTREAM (SELECT AVG(price) FROM marketFeed [RANGE 1 MINUTE]) Time Input event WINDOW DSTREAM output output 00 ∅ +{AVG(price) = 0.0} ∅ 01 +{price = 4.0} -{AVG(price) = 0.0}, +{AVG(price) = 0.0} +{AVG(price) = 4.0} 10 +{price = 2.0} -{AVG(price) = 4.0}, +{AVG(price) = 4.0} +{AVG(price) = 3.0} 59 +{price = 5.0} -{AVG(price) = 3.0}, +{AVG(price) = 3.0} +{AVG(price) = 3.6} 61 ∅ -{AVG(price) = 3.6}, +{AVG(price) = 3.6} +{AVG(price) = 3.5} 70 ∅ -{AVG(price) = 3.5}, +{AVG(price) = 3.5} +{AVG(price) = 5.0} 80 +{price = 6.0} -{AVG(price) = 5.0}, +{AVG(price) = 5.0} +{AVG(price) = 5.5} 22Thursday, July 18, 13
  • 25. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Pattern Matching • Detect complex relationships amongst events • State-machine model • ANSI standards proposal • IBM, Oracle, Streambase • Starting to see adoption by other vendors/users (e.g. MySQL) [1] 23Thursday, July 18, 13
  • 26. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Pattern Matching SELECT M.up FROM ticker MATCH_RECOGNIZE ( MEASURES B.price as up, A.price as down PATTERN (A B) DEFINE A as price < 10.0, B as price => 10.0 ) as M Input event Output event +{symbol = ‘ORCL’, price = 9.0} ∅ +{symbol = ‘ORCL’, price = 9.5} ∅ +{symbol = ‘ORCL’, price = 12.0} +{M.up = 12.0} A A B price=9.0 price=9.5 price=12.0 up=12.0 price=9.5 24Thursday, July 18, 13
  • 27. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Pattern Matching 25Thursday, July 18, 13
  • 28. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Event Processing Ecosystem JMS HTTP PUB/SUB JMS HTTP PUB/SUB Events Events Contextual Data IDE OEP Server Visualizer Web Console / BAM deploy manage RDBMS Cache Hadoop NoSqlDb 26Thursday, July 18, 13
  • 29. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Summary • Event Processing Network defines the assembly • CQL defines the processing • STREAM vs RELATION • RELATION can be any relational source: • tables, caches, Hadoop HDFS files, etc. 27Thursday, July 18, 13
  • 30. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 28 <Insert Picture Here> Agenda • CEP • Drivers • Formal description • Big Data • Scenarios • Architecture • Integration with CEP • Fast Data • Architecture • Integration with CEP • Predictive Analytics • Data Mining • Online data mining • Scenarios 28Thursday, July 18, 13
  • 31. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Big Data Scenarios MEDIA/ ENTERTAINMENT Viewers / advertising effectiveness Cross Sell COMMUNICATIONS Location-based advertising EDUCATION & RESEARCH Experiment sensor analysis Retail / CPG Sentiment analysis Hot products Optimized Marketing HEALTH CARE Patient sensors, monitoring, EHRs Quality of care LIFE SCIENCES Clinical trials Genomics HIGH TECHNOLOGY / INDUSTRIAL MFG. Mfg quality Warranty analysis OIL & GAS Drilling exploration sensor analysis FINANCIAL SERVICES Risk & portfolio analysis New products AUTOMOTIVE Auto sensors reporting location, problems Games Adjust to player behavior In-Game Ads LAW ENFORCEMENT & DEFENSE Threat analysis - social media monitoring, photo analysis TRAVEL & TRANSPORTATION Sensor analysis for optimal traffic flows Customer sentiment UTILITIES Smart Meter analysis for network capacity, ON-LINE SERVICES / SOCIAL MEDIA People & career matching Web-site optimization 29Thursday, July 18, 13
  • 32. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. What’s Big Data? VELOCITYVOLUME VARIETY 1011001010010010 0110101010101110 010101010010010 Web SMS VALUE 30Thursday, July 18, 13
  • 33. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Big Data Architecture (Map-Reduce) Data Data Data Data Data Data Data Data Data Big, Immutable (append-only, avoids corruption) Batch-Layer Batch views query = function(data) e.g. Hadoop Data batch input batch input map key1, value1 key2, value2 key1, value3 key2, value4 key1, value5 reduce key1, {value1, value3, value5} key2, {value2, value4} 31Thursday, July 18, 13
  • 34. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. When is CEP needed? • If Big Data is about VVV (volume, variety, velocity), then Stream Processing is needed when at least 2 of the 3 V’s are present. • If there is high volume and low-latency is needed (velocity), then stream processing must be done. • If there is NOT a lot of volume, but the data is semi-structured (variety), such as the case of social feeds, and low-latency is needed, then stream processing must still be applied. • If volume is low, and no need to do it fast, then use regular messaging processing technology, such as JMS. 32Thursday, July 18, 13
  • 35. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. CEP with Big Data 33Thursday, July 18, 13
  • 36. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 34 <Insert Picture Here> Agenda • CEP • Drivers • Formal description • Big Data • Scenarios • Architecture • Integration with CEP • Fast Data • Architecture • Integration with CEP • Predictive Analytics • Data Mining • Online data mining • Scenarios 34Thursday, July 18, 13
  • 37. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Big Data Architecture Limitations Data Data Data Data Data Data Data Data Data Big, Immutable (append-only, avoids corruption) Batch-Layer Batch views query = function(data) e.g. Hadoop Data batch input batch input map key1, value1 key2, value2 key1, value3 key2, value4 key1, value5 reduce key1, {value1, value3, value5} key2, {value2, value4} Batch output Deep, but not real-time 35Thursday, July 18, 13
  • 38. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Data Data Data Data Data Data Data Data Data Big, Immutable (append-only, avoids corruption) Batch-Layer Batch views query = function(data) e.g. Hadoop Indexing-Layer e.g. ElephantDB, Cassandra, NoSqlDB Indexed batch views query = function(data) Fast-Layer e.g. CEP, Storm real-time views query = function(data) + inc-update Data Fast Data Architecture 36Thursday, July 18, 13
  • 39. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. • Integration with other Big Data technologies: • HBase, • Hive • Avro (Flume) • Incremental merge of Hadoop Jobs with OEP queries • Avoids developer from having to create own Hadoop job Fast Data with CEP 37Thursday, July 18, 13
  • 40. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 38 <Insert Picture Here> Agenda • CEP • Drivers • Formal description • Big Data • Scenarios • Architecture • Integration with CEP • Fast Data • Architecture • Integration with CEP • Predictive Analytics • Data Mining • Online data mining • Scenarios 38Thursday, July 18, 13
  • 41. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Data Mining • Identify patterns and relationships in real world • Develop descriptive models of datasets • Use models to evaluate future options, risks and decisions 39Thursday, July 18, 13
  • 42. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Data Mining Data-SetWorld Model population sample statistical summaries, regressions, machine-learning Data Model Prediction (1) TRAIN (2) SCORE (3) RE-TRAIN 40Thursday, July 18, 13
  • 43. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Online Data Mining continuous continuous Event Model Export model Rebuild model Score events Predict if price of next event will be above 0.8 using model Model Repository 41Thursday, July 18, 13
  • 44. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Challenges (Right Model, Right Cost) Data Model Induction Data Deduction k-Nearest-Neighbors Decision trees Neural nets/SVM Increased Compression Computational Cost 42Thursday, July 18, 13
  • 45. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Challenges • All models are wrong, some are useful (George Box) • Central Limit Theorem • Means of random samples of the same population will be normally distributed (even if the data is not normally distributed) • However, all bets are off if not from the same population • Consider a regression function of weight -> height • Will not work if model is build using samples of a city bus and scored in bus containing only basketball players • What confidence level to use? • Scientific papers demand a 95% confidence level. What about streaming use-cases? 95% seems too high... 43Thursday, July 18, 13
  • 46. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. • alex.alves@oracle.com • http://www.oracle.com/ technetwork/middleware/ complex-event-processing/ overview/index.html • http://adcalves.wordpress.com • http://www.packtpub.com/ getting-started-with-oracle- event-processing-11g/book Material 44Thursday, July 18, 13
  • 47. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 45Thursday, July 18, 13
  • 48. Copyright © 2013, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8 46 46Thursday, July 18, 13