Create your Big Data vision and Hadoop-ify your data warehouse

1,428 views
1,170 views

Published on

How to get your data warehouse and Hadoop to play nice together.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,428
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
55
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Create your Big Data vision and Hadoop-ify your data warehouse

  1. 1. 1© Copyright 2012 EMC Corporation. All rights reserved. Create Your Big Data Vision And Hadoop-ify Your Data Warehouse Jeff Kelly, Principal Research Contributor The Wikibon Project Bill Schmarzo, CTO EIMA Practice, EMC Professional Services
  2. 2. 2© Copyright 2012 EMC Corporation. All rights reserved. Agenda Ÿ  Current Market Observations Ÿ  The Big Data Business Maturity Index and How to Identify Your Best Use Case Ÿ  Get Started With Hadoop and Other New Technologies Ÿ  What Should You Look For in a Vendor? Ÿ  Q&A
  3. 3. 3© Copyright 2012 EMC Corporation. All rights reserved. Current Market Observations Jeff Kelly
  4. 4. 4© Copyright 2012 EMC Corporation. All rights reserved. Big Data Market Size 2012 $11.4b 2013 $18.2b 2017 $48b ü  59% Growth Y-o-Y 2011 to 2012 ü  Forecast 60%+ Growth in 2013 ü  31% CAGR Forecast 2012 through 2017 2014 $28b 2015 $37.9b 2016 $43.7b Source: Wikibon Big Data Vendor Revenue and Market Forecast, 2012-2017
  5. 5. 5© Copyright 2012 EMC Corporation. All rights reserved. Big Data Market Segmentation, 2012 Services Leading the Way Professional Services $3,784m 34% Cloud and SaaS $608m 5% Pro. Services Compute Storage Networking Database Applications Data mgt. Cloud n = $11,400m
  6. 6. 6© Copyright 2012 EMC Corporation. All rights reserved. Big Data Growth Drivers ü  Increased Awareness and Investments By Large Enterprises Beyond the Web ü  Retailers like Sears leveraging Big Data for price optimization. ü  Financial services firms, including JPMC, Morgan Stanley and BoA, conduct fraud analysis, risk profiling and more. ü  Pharmaceutical including Bristol Myers Squibb makers use Big Data to support drug development. ü  Continued Investment by Web Pioneers and Three Letter Agencies ü  Google alone spent $1b+ on infrastructure in Q4 2012. ü  “Everything we do is a Big Data problem.” – Jay Parikh, VP of Engineering, Facebook ü  CIA CTO Ira Hunt: Our mission is to “collect everything and hang on to it forever.”
  7. 7. 7© Copyright 2012 EMC Corporation. All rights reserved. Big Data Growth Drivers, Cont. ü  Increasingly Sophisticated Professional Services ü  Professional services building on experience of assisting early adopters. ü  Some (but not all) are vendor and product agnostic. ü  Focusing on identifying use cases, improving communication, and leveraging existing assets. ü  Technology Maturation ü  Open source community and vendors making Hadoop enterprise-ready, easier to use. ü  Better integration between Big Data and existing IT infrastructure. ü  Extending Big Data accessibility to business users via BI and data visualization tools. Consulting Training & Educations Integration
  8. 8. 8© Copyright 2012 EMC Corporation. All rights reserved. Big Data Growth Inhibitors ü  Lack of Data Scientists and Big Data Practitioners ü  Big Data Technology Still Complex, Difficult to Manage/Use ü  Organizational Resistance to Data-Driven Decision Making ü  Confusion Due to Vendor Marketing and “Big Data Washing” Big Data [Your Product Name Here]
  9. 9. 9© Copyright 2012 EMC Corporation. All rights reserved. The Big Data Business Maturity Index and How to Identify Your Best Use Case Bill Schmarzo
  10. 10. 10© Copyright 2012 EMC Corporation. All rights reserved. Business Metamorphosis Data Monetization Business Optimization Business Insights Business Monitoring Monitoring business performance to flag areas of interest Big Data Business Model Maturation Index Integrate insights & recommendations into existing business processes Embed analytics to optimize select business processes Leverage insights to identify new revenue opportunities Transform customer and product insights to move into new markets Measures the degree to which the organization has integrated big data and advanced analytics into their business model
  11. 11. 11© Copyright 2012 EMC Corporation. All rights reserved. How to Identify Your Best Use Case The Big Data strategy document ensures a tight linkage between your organization’s business initiatives and your big data strategy •  Big  data  business  cases,  ROI  and   analy4c  requirements   •  Key  Performance  Indicators  and   leading  metrics     •  Business  ques4ons  with  metrics,   dimensions,  hierarchies   •  Business  decisions,  decision  flow/ process  and  UEX  requirements   •  Analy4c  algorithms  and  modeling   requirements   •  Required  data  sources   Business Strategy: Provide Unique Starbucks Customer Experience Business Initiatives: •  Increase number of “Gold Card” customers •  Increase “Gold Card” customer revenue & engagement (store visits, spend per visit, advocacy) Mobile App •  •  Social Media •  •  Store Sales •  •  Customer Loyalty •  •  Collect customer engagement information through multiple channels (store, web, mobile) Profile and micro-segment customers to improve marketing and offers effectiveness Analyze social media data to identify and monitor brand advocates Monitor and adjust customer engagement effectiveness (visits, revenue, margin, advocacy) Tasks Develop intimate knowledge of “Gold Card” customers life stage, behaviors and interests Act upon intimate knowledge of “Gold Card” customers to increase store revenues •  Expand customer data collection points •  Leverage “gold card” member transactions, feedback (surveys) and social data •  Integrate customer-specific insights back into operational, management and loyalty systems Outcomes & CSF’s
  12. 12. 12© Copyright 2012 EMC Corporation. All rights reserved. Get Started With Hadoop and Other New Technologies Bill Schmarzo
  13. 13. 13© Copyright 2012 EMC Corporation. All rights reserved. A Playbook For Modernizing Your Data Warehouse With New Big Data Technologies And Capabilities #1) Enhance data warehouse with new unstructured data metrics #2) Data virtualization to extend existing data warehouse environment #3) MPP RDBMS to increase data platform scalability and agility #4) In-database analytics to accelerate analytic development #5) Hadoop to create the next generation Operational Data Store
  14. 14. 14© Copyright 2012 EMC Corporation. All rights reserved. #1) Enhance Data Warehouse With New Unstructured Data Metrics Leverage HDFS to provide a single platform that supports your traditional SQL- based BI environment plus your growing unstructured data needs at scale HDFS HBase Pig, Hive, Mahout Map Reduce Sqoop Flume Resource Management & Workflow Yarn Zookeeper Apache Pivotal HD Configure, Deploy, Monitor, Manage Command Center Hadoop Virtualization (HVE) DataLoader Xtension Framework Catalog Services Query Optimizer Dynamic Pipelining ANSI SQL + Analytics HAWQ – Advanced Database Services
  15. 15. 15© Copyright 2012 EMC Corporation. All rights reserved. ETL Cached Streaming Data Unified Data Platform Data Source Real-Time Visualization Advanced Analytics and Modeling Data Source CEP/ Workflow Data Federation Tool Semantic Master Data Discovery / Data Mapping Data Source Data Source #2) Extend Existing Data Warehouse Via Data Virtualization Leverage data federation tools to speed data discovery and analysis via virtual, on- demand access to data sources outside your EDW
  16. 16. 16© Copyright 2012 EMC Corporation. All rights reserved. •  Massively Parallel Processing (MPP), scale- out architectures provide cost effective options for managing and analyzing massive data volumes •  MPP data warehouses provide linear scalability on general purpose, commodity systems (e.g., fault-tolerant scale out environment; automatic parallelization; I/O optimized) #3) Massively Parallel Processing (MPP) Relational Databases
  17. 17. 17© Copyright 2012 EMC Corporation. All rights reserved. #4) In-Database Processing And Analytics Conventional: A Data Scientist needs to move 1 TB of data from a 5- processor database server to the analytical server at 1 gigabytes per second (Gbs) In-Database: A Data Scientist leaves the 1 TB data in the 5-processor database server and runs the same algorithm directly in the database 0 20 40 60 80 100 120 140 180160 200 Data Movement Time = (1TB x 8) / 1Gbs / 60 s = 133.3 minutes Processing Time = 60 minutes 12 minutes Total Time = 193.3 minutes Time (minutes) Conventional In-Database
  18. 18. 18© Copyright 2012 EMC Corporation. All rights reserved. Hadoop Data Store Analytics Environment Data Preparation and Enrichment ALL data fed into Hadoop Data Store EDWETL Analytic Sandbox BI Environment •  Production •  Predictable load •  SLA-drive •  Standard tools •  Exploratory, Ad Hoc •  Unpredictable load •  Experimentation •  Best tool for the job #5) Next Gen Operational Data Store/Data Prep With Hadoop Feeds production BI and Enterprise Data Warehouse environment and high- velocity Analytics Sandbox
  19. 19. 19© Copyright 2012 EMC Corporation. All rights reserved. How To Get Started
  20. 20. 20© Copyright 2012 EMC Corporation. All rights reserved. EMC Big Data Analytics Strategy And Implementation Services Analytics Operationalization Identify current state, determine required state and conduct gap analysis to develop analytics implementation roadmap Analytics Lab Deploy analytics sandbox to quantify the business case Vision Workshop Identify big data analytics business use cases Repeat the process for identified business cases
  21. 21. 21© Copyright 2012 EMC Corporation. All rights reserved. What Should You Look For in a Vendor? Jeff Kelly
  22. 22. 22© Copyright 2012 EMC Corporation. All rights reserved. Advice for Selecting Big Data Vendors ü  Balance short-term goals with long-term vision. ü  Objectives are: ü  Quick, demonstrable ROI. ü  Sustainable Big Data practice. ü  Don’t get hung up on “speeds and feeds” or feature-by-feature comparisons. ü  Focus on substance, flexibility, commitment and experience.
  23. 23. 23© Copyright 2012 EMC Corporation. All rights reserved. Selecting Big Data Vendors, Cont. ü  Evaluate products portfolios based on: ü  Ability to monetize existing and future data assets. ü  Ability to integrate with and compliment existing data management technology. ü  Accessibility to power users and business users alike (depending on use case). ü  Ability to apply information governance and security best practices. ü  Select service providers with track records of assisting enterprises adopt data- driven culture as well as technology.
  24. 24. 24© Copyright 2012 EMC Corporation. All rights reserved. To type a question via WebEx, click on the Q&A tab Please select “Ask: All Panelists” to ensure your questions reach us. Thank you! Questions and Answers
  25. 25. 25© Copyright 2012 EMC Corporation. All rights reserved. Learn More… Ÿ  See us at… –  EMC World, May 5-9 www.emc.world.com Ÿ  Contact Jeff Kelly –  Email: jeff.kelly@wikibon.org –  LinkedIn: http://www.linkedin.com/in/jeffreyfkelly/ –  Twitter: @jeffreyfkelly –  Research: http://www.wikibon.org/bigdata Ÿ  Contact Bill Schmarzo –  Email: william.schmarzo@emc.com –  LinkedIn: http://www.linkedin.com/in/schmarzo –  Twitter: @schmarzo –  Blog: http://infocus.emc.com/author/william_schmarzo/

×