• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Big data and its impact on SOA
 

Big data and its impact on SOA

on

  • 1,467 views

Slides from a presentation I gave at the 5th SOA, Cloud + Service Technology Symposium (September 2012, Imperial College, London). The goal of this presentation was to explore with the audience use ...

Slides from a presentation I gave at the 5th SOA, Cloud + Service Technology Symposium (September 2012, Imperial College, London). The goal of this presentation was to explore with the audience use cases at the intersection of SOA, Big Data and Fast Data. If you are working with both SOA and Big Data I would would be very interested to hear about your projects.

Statistics

Views

Total Views
1,467
Views on SlideShare
1,464
Embed Views
3

Actions

Likes
2
Downloads
5
Comments
0

2 Embeds 3

http://www.slashdocs.com 2
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • All kinds of data Large volumes Valuable insight, but difficult to extract (structured and unstructured data) Often extremely time sensitive Most of the vast data types portrayed here is consumer data and while the business will want to leverage Oracle Event Processing for business and application data, they are also impacted by this consumer data and information from the vast array or sensors where stream events showing temperatures in a container mid-pacific may destroy high cost food goods unless immediate action is taken or ….. For Starbucks immediately analyzing tweats after launching a new coffee, seeing spikes of negative comments, and very quickly figuring out that the negative reactions came from stores that were serving a particular warmed cheese sandwich, whose aroma did not go with the new coffee smell….. Huge ROI due to quick analysis and specific targeted response. And as you can see from the Spanish (La Caxia) bank solution, a customers Tweets are also being analzed by Oracle Event Processing and stored in Big Data to augment his preferences and influence his/her real time targetted campaigns  
  • Scripting languages supported via Hadoop Streaming, equivalent to Unix streaming
  • Facebook, Google, Netflix, etc.Hadron Collider, NSF, etc.
  • Being able to preserve info over long term (without copy/filtering) could be very interesting for historical analysis, shipping & process optimization
  • SmartMeter example: want all data to do in-depth energy usage analysis but also want real-time analysis for things like leak detection.
  • Technologist & citizen

Big data and its impact on SOA Big data and its impact on SOA Presentation Transcript

  • Big Data& its impact on SOADemed L’HerSr Director, Product Management, Oracledemed.lher@oracle.com (twitter: @demed)1 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Demed L’Her • Senior Director, Product Management at Oracle – Engineering team • Based in Redwood Shores, California • Team in charge of Oracle SOA Suite: Adapters, Service Bus, BPEL, Event Processing, SOA Suite for Healthcare (Java CAPS and WebLogic Integration) • Responsible for product roadmap, execution • With Oracle since 2006 • Co-author http://snipurl.com/soa11gbook • Twitter: @demed | email: demed.lher@oracle.com2 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Program Agenda 1. Big Data Trends 2. Big Data and SOA 3. Integration Patterns for Big Data 4. Fast Data3 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Introduction to Big Data: Problems, Trends & Technology4 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Data Explosion Web & social networks experienced it first… Infographic by Go-gulf.com5 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • … but enterprises are now facing it too … but • Retail and web transaction data (to refine enterprises are recommendations, detect trends etc.) also facing it • “Sensor” data: now • GPS in mobile phones • RFIDs • NFC • SmartMeters • Etc. • Log file monitoring and analysis • Security monitoring Utilities deploying smart meters?  200x information flowing to data center!6 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 4 V’s of Big Data Defining Big Data Volume: large Velocity: high Variety: complex (txn, files, media, machine data) Value: variable signal-noise ratio7 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Storage was the obvious problem but Analysis is the important one Storage is the first obvious “Big Data Is Not the Created Content, nor Is problem. It Even Its Consumption Analysis is next. — It Is the Analysis of All the Data Surrounding or Swirling Around It “ Source: IDCs Digital Universe Study, sponsored by EMC, June 20118 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13 http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf
  • Companies have realized that there is competitive advantage in this information and that now is the time to put this data to work. An Architect’s Guide to Big Data An Oracle White Paper in Enterprise Architecture http://www.oracle.com/technetwork/topics/entarch/articles/oea-big-data-guide-1522052.pdf9 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Emergence of Hadoop To address Big Data challenges – storage and processing  licensed under the Apache v2 license  created by Doug Cutting and Michael J. Cafarella  Based on papers by Google from 2004 (MapReduce and GFS)  Key advances around distributed processing and distributed storage  First Apache release: 2007  Yahoo! Contributed all its code in 2009  Current release (May. 2012): 1.0.310 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Hadoop: commercial offering rapidly ramping up to respond to demand Market Growth “New research from International  Hortonworks  Datameer Data Corporation (IDC) shows that revenues for the worldwide  Cloudera  Platfora Hadoop-MapReduce ecosystem software market are considered to  Oracle  Etc. be $77 million in 2011 and are expected to grow to $812.8 million  IBM in 2016 for a compound annual growth rate (CAGR) of 60.2%.”  MapR IDC Releases First Worldwide Hadoop-MapReduce Ecosystem Software Forecast, Strong Growth Will Continue to Accelerate as Talent and Tools Develop11 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13 07 May 2012, http://www.idc.com/getdoc.jsp?containerId=prUS23471212
  • Kernel of Hadoop CLIENT Storage: HDFS NAME NODE  Hadoop Distributed File System  Runs on clusters of commodity hardware (cheap, readily available, direct attached DATANODE DATANODE DATANODE storage)  Fault tolerant, Easy to expand DATANODE DATANODE DATANODE  Designed for very large files DATANODE DATANODE DATANODE (default block size = 64MB)  Write-once/Read-many-times, simple semantics  Flat file model accommodate both structured and unstructed data RACK RACK RACK12 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Kernel of Hadoop MAP Analysis: MapReduce  Defined by Google in 2004 MAP REDUCE  Break problem up into smaller sub-problems MAP REDUCE  Able to distribute data workloads across thousands of nodes MAP REDUCE  Programmed via Java/scripting/C++ or higher-level OUTPUT languages such as Pig or Hive INPUT DATA SHUFFLE DATA MAP /SORT13 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Map/Reduce Example Compute re-tweet counts on Twitter data – a simple measure of social influence Input Data Map Shuffle/Sort Reduce Output Execute parallel copies of System groups all mapped Execute parallel copies of RT @oracle: #CIOs: user-provided “Map” key/value pairs with the user-provided “Reduce” How are you going to act function, transform same key together function to distill groups of on all that data you have? Turn it into insight w/our #BigData segments of input into data to output Guide key/value pairs @oracle, 1 RT @oracle_biee: Register @oracle, 3 @oracle, 1 to access the OBIEE Live @oracle, 1 Mobile Demo server @oracle, 1 RT @oracle - 10 Amazing @oracle_biee, 1 Scenes From Oracles @oracle, 1 @AmericasCup World Series @oracle, 3 courtesy of Sarah Kimmel RT @oracleretail: Oracle @oracleretail, 1 @oracleretail, 1 @oracleretail, 1 @oracle_biee, 2 Upgrades Analytics in Oracle Retail Data Model @oracleretail, 1 (News Release) @oracle_biee, 1 RT @oracle_biee: The Oracle @oracle_biee, 1 Exalytics v1 Patch Set 1 is now @oracle, 1 @oracle_biee, 2 generally available (GA) @oracle_biee, 1 RT @oracle: Transform your data, Transform your business! Live Q&A to learn Oracle GoldenGate 11gs new features!14 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Hadoop Ecosystem Rich and evolving PIG SQL-like (HiveQL) Scripting for query language exploring Bulk data transfers large datasets between Hadoop and ZOOKEEPER structured datastores Configuration Management & Coordination Data serialization HDFS / MapReduce Storage & Analysis Column-orientedMachine-learning, database data mining OOZIE CASSANDRA Collect, aggregate, stream log data into Workflow & text search engine HDFS coordination 15 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • What does SOA have to do with Big Data?16 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • SOA Deployments Generate Big Data  Big Data is not just in Social Networks or Science Projects  SOA infrastructures are (quietly) handling increasingly massive amount of transactions  Transactions contain key business information: purchases, inventory levels, package tracking information, profile updates, etc.  Multi-tenancy, private and public clouds are accelerating data growth17 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • SOA Big Data Example Logistics Company  Oracle SOA Suite customer Specific process data captured in star schema  Millions of BPEL processes/day for analytics  Transaction systems involved  analytics limited by a-priori decisions  duplication of data  5 terabytes of database  Purge job every 4 hours18 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Typical Usage of Datastores by SOA Platforms Today XML MTOM XA • headers • timestamps CSV JSON XML • Etc. BLOB Process state Metadata Full Payloads User Data structured unstructured Size - + Many read/write Write once, read-many19 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Typical Usage of Datastores by SOA Platforms Tomorrow XML MTOM XA • headers • timestamps CSV JSON XML • Etc. BLOB Process state Metadata Full Payloads User Data RDBMS Offload to Hadoop or NoSQL20 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • “Finding answers where there are yet to be questions” * SOA infra runtime Analytics Analytics SOA infra runtime (Pre-determined) Universe is copy Intelligence the limit! constrained SOA audit SOA infra OLAP by available big data store database dataset21 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Impact of Big Data: New Integration Patterns22 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Pattern 1: Usage of MapReduce data Async BPEL process Data Query  synchronous interaction not an 2. Wait for option due to Hadoop typical 1. Start Job_done latencies (minutes to hours) MapReduce job notification  Getting data is not as simple as a sync “select” SQL statement  Split query: start job, wait for 3. Get Data notification, get data  Complex to implement for process developer23 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Pattern 2: Query data (noSQL or HBase) Data Query  Synchronous query against 1. Scheduled job initiates NoSQL or HBase  Getting data from batch- processed Hadoop output 3. Sync query of NoSQL  Not operating on absolute latest dataset NoSQL  Familiar pattern, easy to 2. Result set implement for process designer loaded into NoSQL24 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Pattern 2: Initiate process on data availability Initiate process 1. Scheduled  MapReduce job creates dataset job initiates and drops it on filesystem (ex: 2. Result set appears in JSON format) as file in given  BPEL process + file adapter location watches directory for new data  BPEL process kicks in, parse JSON and execute 3. File adapter detects result set and initiates new process25 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Fast Data Get Ahead of the Curve26 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Working with Big Data: some challenges 1. Big data ≠ Infinite storage Yes, storage is cheap but it helps to have clean data, with context and less redundancy 2. Hadoop is batch-oriented and there is inherent latency "With the paths that go through Hadoop [at Yahoo!], the latency is about fifteen minutes […] it will never be true real-time. " * Raymie Stata, Yahoo! CTO (June 2011) minutes *: http://www.theregister.co.uk/2011/06/30/yahoo_hadoop_and_realtime/27 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Get ahead of the curve Use Event Processing techniques Filter out, correlate 1. Filter out noise (ex: data ticks with no change), add context (by correlating multiple sources), increase relevance 2. Identify critical conditions as you insert data in warehouse (not after) Move time-critical analysis to front of process28 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Fast Data Get Ahead of the Curve Example: Fast Data Big Data analysis of traffic patterns and ms minutes congestion times for urban planning Historical shallow depth: Historical depth: deep Example: monitoring of traffic cameras to ensure given license plate not in use on multiple vehicles Add “depth” to your fast data by merging output of MapReduce to stream processing29 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • How Fast is Fast? DPI equipment IP allocation servers Fast enough to support explosion of smartphones in largest markets  Mobile provider usage <-> IP@ IP@ <-> user  Billing smartphone data based on usage  Using OEP to correlate users to packets through dynamically allocated IP addresses  Coherence as fast in-memory grid of user <-> IP addresses Usage <-> user  Processes over 800,000 records/s Billing30 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Putting it all together Big Data, Fast Data & SOA31 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Oracle’s solution: Big Data, Fast Data & SOA Endeca Information Discovery Oracle Oracle Big Data Appliance Exadata Oracle Big Data Processing Connectors Oracle Event Oracle InfiniBand InfiniBand Exalytics Oracle Real-Time Decisions Acquire Organize Analyze Decide Act, orchestrate response Oracle SOA Suite32 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Oracle’s solution: Big Data, Fast Data & SOA Endeca Example: Information Example: monitoring of traffic Discovery search for last cameras to ensure sighting of Oracle given license plate Oracle specific vehicles not in use Appliance Big Dataon Exadata multiple vehicles Oracle Big Data Example: Processing Connectors analysis of traffic Oracle Event Oracle patterns and InfiniBand InfiniBand Exalytics congestion times for urban planning Oracle Real-Time Example: Decisions Example: coordinate Police display real-time and Emergency Example: situation using Acquire Organize Analyze Decide response using traffic rerouting BAM BPEL & Human suggestions Workflow Act, orchestrate response Oracle SOA Suite33 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • Conclusion  Big Data has reached the enterprise  SOA platforms are evolving to leverage Big Data technology  Service developers need to understand how to insert and access data in Hadoop  Time-critical conditions can be detected as data is inserted in Hadoop using event processing techniques – Fast Data  Expect Big Data, Fast Data to become ubiquitous in SOA environments – much like RDBMS are already34 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13
  • 35 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13