Big Data and Fast Data – Big and Fast Combined, is it Possible?
Upcoming SlideShare
Loading in...5
×
 

Big Data and Fast Data – Big and Fast Combined, is it Possible?

on

  • 2,123 views

 

Statistics

Views

Total Views
2,123
Views on SlideShare
2,104
Embed Views
19

Actions

Likes
6
Downloads
86
Comments
1

3 Embeds 19

https://twitter.com 10
http://www.linkedin.com 7
https://www.rebelmouse.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Big Data and Fast Data – Big and Fast Combined, is it Possible? Big Data and Fast Data – Big and Fast Combined, is it Possible? Presentation Transcript

  • WELCOME Big Data and Fast Data – Big and Fast Combined, is it Possible? Guido Schmutz UKOUG Tech 2013 2.12.2013 BASEL 1 BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 HAMBURG MÜNCHEN STUTTGART WIEN

  • Guido Schmutz •  •  Working for Trivadis for more than 16 years Oracle ACE Director for Fusion Middleware and SOA •  •  Co-Author of different books Consultant, Trainer Software Architect for Java, Oracle, SOA and EDA •  •  Member of Trivadis Architecture Board Technology Manager @ Trivadis •  More than 20 years of software development 
 experience •  Contact: guido.schmutz@trivadis.com •  •  Blog: http://guidoschmutz.wordpress.com Twitter: gschmutz 2 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Our company Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany and Austria. We offer our services in the following strategic business fields: OPERATION Trivadis Services takes over the interacting operation of your IT systems. 2013 © Trivadis Trivadis – the company 02/12/13
  • With over 600 specialists and IT experts in your region Hamburg Düsseldorf Frankfurt Stuttgart Freiburg Wien München Basel Brugg Bern Zurich Lausanne 2013 © Trivadis 4 Trivadis – the company 02/12/13 12 Trivadis branches and more than 600 employees   200 Service Level Agreements   Over 4,000 training participants   Research and development budget: CHF 5.0 / EUR 4 million   Financially self-supporting and sustainably profitable   Experience from more than 1,900 projects per year at over 800 customers
  • Agenda 1.  Big Data, what is it? 2.  Motivation 3.  The Lambda Architecture 4.  Implementing the Lambda Architecture 5.  Summary 5 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Big Data Definition (Gartner et al) Characteristics of Big Data: Its Volume, Velocity and Variety in combination Tera-, Peta-, Exa-, Zetta-, Yota- bytes and constantly growing Velocity “Traditional” computing in RDBMS 
 is not scalable enough. 
 We search for “linear scalability” “Only … structured information 
 is not enough” – “95% of produced data in unstructured” + Veracity (IBM) - information uncertainty + Time to action ? – Big Data + Event Processing = Fast Data 6 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Big Data Definition (4 Vs) Characteristics of Big Data: Its Volume, Velocity and Variety in combination + Time to action ? – Big Data + Event Processing = Fast Data 7 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Volume Development 100 Sensors: “internet of things” 6000 Social Media: video, audio, text 4000 60 VoIP: Skype, MSN, ICQ, ... 2000 40 20 Enterprise Data: data dictionary, ERD, ... 0 2005 2007 2009 2011 Year 8 80 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 2013 2015 0 Aggregate Uncertainty % Global Data Volume in Exabytes 8000
  • 9 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Internet Of Things There are more devices tapping into the internet than people on earth How do we prepare our systems/ architecture for the future? 10 2013 © Trivadis Source: The Economist Source: Cisco Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Big Data in Context NoSQL databases •  The storage for Big Data à Polyglot Persistence Complex Event Processing (CEP) •  An architectural style for Fast Data Lots of new terms §  HDFS, Hive, Hadoop, MapReduce, HBase, Pig, Cascading, Flume, Oozie Not only Open Source •  Oracle Big Data Appliance & Microsoft HD Insight No longer a clear distinction between Software Development and Business Intelligence !? •  Java, Python, Clojure, R, … know how needed •  Data Scientists: Natural Language Processing, Statistics, Network Analysis 11 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Big Data Use Cases / Scenarios General •  Analyzing social media data for service optimization, sentiment analysis, ... Retail §  Personalized travel- and shopping guidance depending on location detection (mobile, tablets, previous purchases) Automotive §  Analyzing telemetric data (e.g. for insurance: „Pay how you drive“, warranty, recall, warnings etc.) Finance §  Fraud detection for payments (real time) Telco §  Mobile user location analytics for „behavior mining“ 12 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Velocity §  Velocity requirement examples: §  §  §  §  §  §  §  §  §  §  §  §  13 Recommendation Engine Predictive Analytics Marketing Campaign Analysis Customer Retention and Churn Analysis Social Graph Analysis Capital Markets Analysis Risk Management Rogue Trading Fraud Detection Retail Banking Network Monitoring Research and Development 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Agenda 1.  Big Data, what is it? 2.  Motivation 3.  The Lambda Architecture 4.  Implementing the Lambda Architecture 5.  Summary 14 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • What is a data system? •  A system that manages the storage and querying of data with a lifetime measured in years encompassing every version of the application to ever exist, every hardware failure and every human mistake ever made. •  A data system answers questions based on information that was acquired in the past 15 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Desired Properties of a (Big) Data System Robust and fault-tolerant Low latency reads and updates Scalable General Extensible Allows ad hoc queries Minimal maintenance Debug-able 16 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Complexity in today‘s architecture/systems Lack of Human Fault Tolerance Same structure for write/query Schemas done wrong 17 2013 © Trivadis Big Data und Fast Data 24. April 2013
  • Typical problem in today’s
 architecture/systems Lack of Human Fault Tolerance Bugs will be deployed to production over the lifetime of a data system Operational mistakes will be made Humans are part of the overall system •  •  Just like hard disks, CPUs, memory, software design for human error like you design for any other fault Examples of human error •  •  •  Deploy a bug that increments counters by two instead of by one Accidentally delete data from database Accidental DOS on important internal service Worst two consequences: data loss or data corruption As long as an error doesn‘t lose or corrupt good data, you can fix what went wrong 18 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Lack of Human Fault Tolerance Mutability The U and D in CRUD A mutable system updates the current state of the world Mutable systems inherently lack human fault-tolerance Easy to corrupt or lose data Capturing change traditionally Name City Name City Guido Berne Guido Basel Albert Zurich Albert Zurich 19 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Immutability Lack of Human Fault Tolerance An immutable system captures historical records of events Each event happens at a particular time and is always true Capturing change by storing events Name City Timestamp Name City Timestamp Guido Berne 1.8.1999 Guido Berne 1.8.1999 Albert Zurich 10.5.1988 Albert Zurich 10.5.1988 Guido Basel 1.4.2013 20 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Immutability Lack of Human Fault Tolerance Immutability greatly restricts the range of errors that can cause data loss or data corruption Vastly more human fault-tolerant Much easier to reason about systems based on immutability Conclusion: Your source of truth should always be immutable 21 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • What about traditional/today’s architectures ? 
 Mutable Database Application (Query) Source of Truth Mobile Web RIA Rich Client RDBMS NoSQL NewSQL Source of Truth Source of Truth is mutable! Rather than build systems like this …. 22 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • A different kind of architecture with immutable source of truth … why not building them like this Immutable data View on Data Application (Query) View on Data Mobile Web RIA Rich Client Source of Truth HDFS NoSQL NewSQL RDBMS Source of Truth 23 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • How to create the views on the Immutable data? On the fly ? Immutable data View Query Materialized, i.e. Pre-computed ? Pre-
 Computed
 Views Immutable data 24 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 Query
  • Data = the most raw information Data is information which is not derived from anywhere else •  The most raw form of information •  from which everything else is derived Questions on data can be answered by running functions that take data as input The most general purpose data system can answer questions by running functions that take the entire dataset as input query = function (all data) The lambda architecture provides a general purpose approach for implementing arbitrary functions on an arbitrary datasets 25 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Data = the most raw information Favorite Product List Changes 1.2.13 10.3.13 11..3.13 11.3.13 12.3.13 14.4.13 15.4.13 20.4.13 Add Add Add Remove Add Add Add Remove iPAD 64GB Sony RX-100 Canon GX-10 Sony RX-100 Nikon S-100 BoseQC-15 MacBook Pro 15 Canon GX10 derive Raw information => data 26 Current Product Count Current Favorite 
 Product List 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 iPAD 64GB Nikon S-100 BoseQC-15 MacBook Pro 15 derive Information => derived 4
  • Big Data and Batch Processing Incoming Data Immutable data Batch View ? ? Query How to compute the batch views ? How to compute queries from the views ? 27 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Big Data and Batch Processing But we are not done yet … batch-processed data non-processed data now time now time Fully processed data Last full Time for
 batch period batch job Adapted from Ted Dunning (March 2012): http://www.youtube.com/watch?v=7PcmbI5aC20 §  Using only batch processing, leaves you always with a portion of nonprocessed data. 28 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Big Data and Batch Processing Stream 1 Stream 2 Event HDFS Hadoop Distributed File System Hadoop cluster Map/Reduce in Pig Data Store optimized for appending large results Queries 29 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Adding Real-Time Processing Immutable data Batch Views Incoming Data Query Data Stream ? Realtime Views How to compute real-time views 30 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 How to compute queries 
 from the views ?
  • Immutable data Adding Real-Time Processing Views Data Stream Favorite Product List Changes 1.2.13 10.3.13 11..3.13 11.3.13 12.3.13 14.4.13 15.4.13 20.4.13 Now incoming iPAD 64GB Sony RX-100 Canon GX-10 Sony RX-100 Nikon S-100 BoseQC-15 MacBook Pro 15 Canon GX10 Canon Scanner compute iPAD 64GB Nikon S-100 BoseQC-15 MacBook Pro 15 Query Current Product Count 5 Stream of Favorite Product List Changes Add 31 Add Add Add Remove Add Add Add Remove Add Current Favorite 
 Product List Canon Scanner compute 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 Now Canon Scanner
  • Big Data and Real Time Processing blended view for end user batch processing
 worked fine here (e.g. Hadoop) real time processing
 works here now time Fully processed data Last full Time for
 batch period batch job Adapted from Ted Dunning (March 2012): http://www.youtube.com/watch?v=7PcmbI5aC20 32 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Agenda 1.  Big Data, what is it? 2.  Motivation 3.  The Lambda Architecture 4.  Implementing the Lambda Architecture 5.  Summary 33 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Lambda Architecture Batch Layer Serving Layer Immutable data Batch View B Incoming Data C D A G Speed Layer Data Stream E 34 Realtime View F 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 Query
  • Lambda Architecture A.  All data is sent to both the batch and speed layer B.  Master data set is an immutable, append-only set of data C.  Batch layer pre-computes query functions from scratch, result is called Batch Views. Batch layer constantly re-computes the batch views. D.  Batch views are indexed and stored in a scalable database to get particular values very quickly. Swaps in new batch views when they are available E.  Speed layer compensates for the high latency of updates to the Batch Views F.  Uses fast incremental algorithms and read/write databases to produce realtime views G.  Queries are resolved by getting results from both batch and real-time views 35 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Layered Architecture Batch Layer Speed Layer Stores the immutable constantly growing dataset Computes arbitrary views from this dataset using BigData technologies (can take hours) Can be always recreated Computes the views from the constant stream of data it receives Needed to compensate for the high latency of the batch layer Incremental model and views are transient Serving Layer Responsible for indexing and exposing the pre-computed batch views so that they can be queried Exposes the incremented real-time views Merges the batch and the real-time views into a consistent result 36 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Agenda 1.  Big Data, what is it? 2.  Motivation 3.  The Lambda Architecture 4.  Implementing the Lambda Architecture 5.  Summary 37 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Lambda Architecture Precompute Precomputed Views information All data Batch recompute Incoming Data Serving Layer batch view batch view Merge Batch Layer Speed Layer Process stream Incremented information Realtime increment query real time view real time view Source: Marz, N. & Warren, J. (2013) Big Data. Manning. 38 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Lambda Architecture in Action Implementation in ongoing Proof-of-concept (after completion of phase 1) Precompute Precomputed Views information All data Batch recompute Incoming Data batch view batch view Speed Layer Process stream Incremented information Realtime increment 39 Serving Layer Merge Batch Layer 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 real time view real time view query
  • Lambda Architecture in Action Twitter Horsebird Client (hbc) •  Twitter Java API over Streaming API Spring Framework •  Popular Java Framework used to modularize part of the logic (sensor and serving layer) Apache Kafka •  Simple messaging framework based on file system to distribute information to both batch and speed layer Apache Avro •  40 •  Distribution of Apache Hadoop: HDFS, MapReduce, Hive, Flume, Pig, Impala Cloudera Impala •  distributed query execution engine that runs against data stored in HDFS and HBase Apache Zookeeper •  Distributed, highly available coordination service. Provides primitives such as distributed locks Apache Storm & Trident Serialization system for efficient cross-language RPC and persistent data storage JSON •  Cloudera Distribution •  distributed, fault-tolerant realtime computation system Apache Cassandra open standard format that uses humanreadable text to transmit data objects consisting of attribute–value pairs. •  2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure
  • Lambda Architecture with Oracle Product Stack Incoming Data Views information Batch recompute Oracle Data Integrator Speed Layer Incremented Oracle Process stream Event Processing information Oracle Event Processing Oracle GoldenGate Oracle GoldenGate Oracle Service Bus 41 Oracle Coherence Oracle RDBMS batch view 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 Oracle Endeca OBIEE All data batch view Oracle NoSQL Merge Precompute Oracle BigData Appliance Precomputed Serving Layer Oracle Web Logic Server Oracle ADF Batch Layer Oracle Big Data
 Connectors Possible implementation with Oracle Product stack real time view Oracle Coherence Oracle NoSQL real time view query
  • Agenda 1.  Big Data, what is it? 2.  Motivation 3.  The Lambda Architecture 4.  Implementing the Lambda Architecture 5.  Summary 42 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • Summary – The lambda architecture §  The Lambda Architecture §  Can discard batch views and real-time views and recreate everything from scratch §  Mistakes corrected via re-computation §  Data storage layer optimized independently from query resolution layer §  Still in a very early …. But a very interesting idea! -  Today a zoo of technologies are needed => Operations won‘t like it §  The technology/implementation §  Different query language for batch and real time §  An abstraction over batch and speed layer needed -  Cascading and Trident are already similar §  Not everything works out-of-the-box and together §  Industry standards needed! 43 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013
  • THANK YOU. Trivadis AG Guido Schmutz Europa-Strasse 5
 CH-8095 Glattbrugg info@trivadis.com
 www.trivadis.com BASEL 44 BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. 2013 © Trivadis Big Data and Fast Data – Big and Fast Combined, is it Possible? 2.12.2013 HAMBURG MÜNCHEN STUTTGART WIEN