Big Data = Big Decisions
Upcoming SlideShare
Loading in...5

Big Data = Big Decisions



Presented on April 17th for InnoTech Dallas.

Presented on April 17th for InnoTech Dallas.



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.


11 of 1

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Big Data = Big Decisions Big Data = Big Decisions Presentation Transcript

    • BIG DATA = BIG DECISIONS Bob Zurek | SVP Products | Epsilon |
    • Consider the following:• New model for data• Accessible over TCP/IP and variety of languages• Initially difficult to understand• Capable of processing thousands of ops/sec• Very different from old model• Threatening as much was invested in old model• Changing course seems ridiculous Source: Eben Hewitt
    • What are we talking about?
    • IBM IMS “IMS is IBMs premier transaction and hierarchical database management system, virtually unsurpassed in database and transaction processing availability and speed” – IBM 2013 “Mission-critical processing that requires unparalleled performance is best served by a hierarchical model. Analytics and business intelligence are best served by a relational model. Most Fortune 100 companies use both.” Source: IBM
    • Data evolution A New Model Is Invented A Disruptive Model A Threatening Model A Competitive Model Source: Eben Hewitt
    • The relational model & SQL A HUGE industry success
    • So now what?
    • We have a problem
    • innovation complexity confusion a new modeldisruption fierce competition Sound familiar?
    • Big data – a growing torrent$600 to buy a disk drive that can store all of the world’s music 5 billion mobile phones in use in 2010 30 pieces of content shared on Facebook every month billion 40% projected growth in global data generated per year vs.5%235 terabytes data collected by the U.S. Library of Congress by April 2011 growth in global IT spending 15 out of 17 sectors in the United States have more data stored per company than the U.S. Library of Congress Source: McKinsey
    • Industry buzz What is big data, exactly?
    • Big data confusion? What do business executives think “big data” is? A greater scope of information 18% New kinds of data and analysis 16% Real-time information 15% Data influx from new technologies 13% Non-traditional forms of media 13% Large volumes of data 10% The latest buzzword 8% Social media data 7% Source: IBM
    • Big data is… Large pools of data that can be captured, communicated, aggregated, stored, and analyzed Source: McKinsey
    • Another way of looking at it Source: TDWI
    • Is it time to lookfor an alternative?
    • It’s not that simple, is it?
    • How are we solving (historically)?• Vertical scaling = throw hardware at it• Optimize the application = sql, indexes, access• Employ caching layers = MemcacheD, Coherence• Denormalization = reduce joins• Sharding/Shared Nothing = split the data up• Innovation = columnar
    • What’s drivingchange andinnovation?
    • 102556397 102556397
    • Big data innovation incubatedBig data innovation incubated A search engine project at Yahoo Doug Cutting = Nutch Google = GFS and GMR
    • eBay erected a Hadoop clusterspanning 530 servers –now five times the size! “Hadoop is an amazing technology stack. We now depend on it to run eBay.” Bob Page, Vice President of Analytics, eBay Source:
    • It can get complexand confusing “It replaced our need for ETL” “It is great for batch processing in parallel” “A beautiful platform for all of problems”
    • What it’s not good for• High volume transactional data• Structured data with low latency“Note that Hadoop is not an Extract-Transform-Load (ETL) tool. It is a platform that supportsrunning ETL processes in parallel. The dataintegration vendors do not compete withHadoop; rather, Hadoop is another channelfor use of their data transformation modules. “ Teradata/Cloudera Presentation
    • What it’s really good for• Index building• Pattern recognitions• Sentiment analysis• Machine generated data• Log processing• Web scale = Google, Twitter, YouTube
    • Use Cases Fraud Detection Spot fraud anomolies Mobile Data Process mobile dataOnline Travel Reservations IT Security Travel booking Analyze machine generated data Image Processing E-Commerce Large marketplaces Detecting patterns in sat imagery HealthCare Energy Discovery Semantic analysis for relevance Sort and process seismic data Energy Savings Infrastructure Management Suggest ways customers save money Collecting device logs
    • Source: Teradata/Cloudera
    • Source: Teradata/Cloudera
    • Many shades of grey andlots of great innovations
    • Relational is still in playSome innovations worth a look Dynamically Scaling OLTP = “No Need To Shard”
    • The NoSQL generation • Document Storage Model • Released by NSA to open source • Allows MTV to store • Apache Accumulo hierarchical data • Based on Google Big Table • Flexible schema to model • Built on top of Hadoop structure/data by brand • Fine-grained access control • Needed to have ability • Cell level security to query nested content • Server side programming • No need for a shared disk storage
    • Why NoSQL? • Schemaless model = Easy to to add fields • Document oriented = Json format (think objects) • Built from the ground up to be distributed • Auto sharding • Distributed querying capabilities
    • NoSQL Use Case 1. Click/Event into Hadoop 2. Data Analyzed via Map Reduce jobs; generates 100M profiles based on campaigns running 3. Selected profiles loaded into Couch 4. Ad targeting logic query Couch with sub-second latency to optimize decision and real-time ad placement Source: Couchbase
    • Hadoop Augmentation• Side-by-Side will be commonplace• ETL solutions support Hadoop• Relational Databases • Provide ETL interfaces to Hadoop • Execute map/reduce jobs inside DBMS• NoSQL supports ETL
    • Example Hybrid DBMS SystemsOracle Endeca Server • Hybrid Search/Analytic Database • Supports structured, unstructured, semi-structured • No schema required. Records stacked. • Columnar
    • Trends • SQL On Hadoop – Hadapt, Clodera Impala, EMC • Unified Support of Structured, Unstructured, Semi • Embedding Search • Expanded ETL/ELT Support • Big Data In Motion Takes Hold • Added Data Mining and Analytic Functions In NoSQL • Embedding R Language = gain in popularity • Data Scientists instrumental in business success
    • Bob Zurek |