Big Data = Big Decisions

642 views
489 views

Published on

Presented on April 17th for InnoTech Dallas.

Published in: Technology
1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total views
642
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
49
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Big Data = Big Decisions

  1. 1. BIG DATA = BIG DECISIONS Bob Zurek | SVP Products | Epsilon | www.epsilon.com
  2. 2. BIG DATA APPROACHING
  3. 3. Consider the following:• New model for data• Accessible over TCP/IP and variety of languages• Initially difficult to understand• Capable of processing thousands of ops/sec• Very different from old model• Threatening as much was invested in old model• Changing course seems ridiculous Source: Eben Hewitt
  4. 4. What are we talking about?
  5. 5. IBM IMS “IMS is IBMs premier transaction and hierarchical database management system, virtually unsurpassed in database and transaction processing availability and speed” – IBM 2013 “Mission-critical processing that requires unparalleled performance is best served by a hierarchical model. Analytics and business intelligence are best served by a relational model. Most Fortune 100 companies use both.” Source: IBM
  6. 6. Data evolution A New Model Is Invented A Disruptive Model A Threatening Model A Competitive Model Source: Eben Hewitt
  7. 7. The relational model & SQL A HUGE industry success
  8. 8. So now what?
  9. 9. We have a problem
  10. 10. innovation complexity confusion a new modeldisruption fierce competition Sound familiar?
  11. 11. Big data – a growing torrent$600 to buy a disk drive that can store all of the world’s music 5 billion mobile phones in use in 2010 30 pieces of content shared on Facebook every month billion 40% projected growth in global data generated per year vs.5%235 terabytes data collected by the U.S. Library of Congress by April 2011 growth in global IT spending 15 out of 17 sectors in the United States have more data stored per company than the U.S. Library of Congress Source: McKinsey
  12. 12. Industry buzz What is big data, exactly?
  13. 13. Big data confusion? What do business executives think “big data” is? A greater scope of information 18% New kinds of data and analysis 16% Real-time information 15% Data influx from new technologies 13% Non-traditional forms of media 13% Large volumes of data 10% The latest buzzword 8% Social media data 7% Source: IBM
  14. 14. Big data is… Large pools of data that can be captured, communicated, aggregated, stored, and analyzed Source: McKinsey
  15. 15. Another way of looking at it Source: TDWI
  16. 16. Is it time to lookfor an alternative?
  17. 17. It’s not that simple, is it?
  18. 18. How are we solving (historically)?• Vertical scaling = throw hardware at it• Optimize the application = sql, indexes, access• Employ caching layers = MemcacheD, Coherence• Denormalization = reduce joins• Sharding/Shared Nothing = split the data up• Innovation = columnar
  19. 19. What’s drivingchange andinnovation?
  20. 20. 102556397 102556397
  21. 21. Big data innovation incubatedBig data innovation incubated A search engine project at Yahoo Doug Cutting = Nutch Google = GFS and GMR
  22. 22. eBay erected a Hadoop clusterspanning 530 servers –now five times the size! “Hadoop is an amazing technology stack. We now depend on it to run eBay.” Bob Page, Vice President of Analytics, eBay Source: http://www.wired.com/wiredenterprise/2011/10/how-yahoo-spawned-hadoop/
  23. 23. It can get complexand confusing “It replaced our need for ETL” “It is great for batch processing in parallel” “A beautiful platform for all of problems”
  24. 24. What it’s not good for• High volume transactional data• Structured data with low latency“Note that Hadoop is not an Extract-Transform-Load (ETL) tool. It is a platform that supportsrunning ETL processes in parallel. The dataintegration vendors do not compete withHadoop; rather, Hadoop is another channelfor use of their data transformation modules. “ Teradata/Cloudera Presentation
  25. 25. What it’s really good for• Index building• Pattern recognitions• Sentiment analysis• Machine generated data• Log processing• Web scale = Google, Twitter, YouTube
  26. 26. Use Cases Fraud Detection Spot fraud anomolies Mobile Data Process mobile dataOnline Travel Reservations IT Security Travel booking Analyze machine generated data Image Processing E-Commerce Large marketplaces Detecting patterns in sat imagery HealthCare Energy Discovery Semantic analysis for relevance Sort and process seismic data Energy Savings Infrastructure Management Suggest ways customers save money Collecting device logs
  27. 27. Source: Teradata/Cloudera
  28. 28. Source: Teradata/Cloudera
  29. 29. Many shades of grey andlots of great innovations
  30. 30. Relational is still in playSome innovations worth a look Dynamically Scaling OLTP = “No Need To Shard”
  31. 31. The NoSQL generation • Document Storage Model • Released by NSA to open source • Allows MTV to store • Apache Accumulo hierarchical data • Based on Google Big Table • Flexible schema to model • Built on top of Hadoop structure/data by brand • Fine-grained access control • Needed to have ability • Cell level security to query nested content • Server side programming • No need for a shared disk storage
  32. 32. Why NoSQL? • Schemaless model = Easy to to add fields • Document oriented = Json format (think objects) • Built from the ground up to be distributed • Auto sharding • Distributed querying capabilities
  33. 33. NoSQL Use Case 1. Click/Event into Hadoop 2. Data Analyzed via Map Reduce jobs; generates 100M profiles based on campaigns running 3. Selected profiles loaded into Couch 4. Ad targeting logic query Couch with sub-second latency to optimize decision and real-time ad placement Source: Couchbase
  34. 34. Hadoop Augmentation• Side-by-Side will be commonplace• ETL solutions support Hadoop• Relational Databases • Provide ETL interfaces to Hadoop • Execute map/reduce jobs inside DBMS• NoSQL supports ETL
  35. 35. Example Hybrid DBMS SystemsOracle Endeca Server • Hybrid Search/Analytic Database • Supports structured, unstructured, semi-structured • No schema required. Records stacked. • Columnar
  36. 36. Trends • SQL On Hadoop – Hadapt, Clodera Impala, EMC • Unified Support of Structured, Unstructured, Semi • Embedding Search • Expanded ETL/ELT Support • Big Data In Motion Takes Hold • Added Data Mining and Analytic Functions In NoSQL • Embedding R Language = gain in popularity • Data Scientists instrumental in business success
  37. 37. Bob Zurek | bzurek@epsilon.com

×