1© Copyright 2012 EMC Corporation. All rights reserved.
Create Your Big Data
Vision And Hadoop-ify
Your Data Warehouse
Jeff Kelly, Principal Research Contributor
The Wikibon Project
Bill Schmarzo, CTO
EIMA Practice, EMC Professional
Services
2© Copyright 2012 EMC Corporation. All rights reserved.
Agenda
Ÿ  Current Market Observations
Ÿ  The Big Data Business Maturity Index
and How to Identify Your Best Use Case
Ÿ  Get Started With Hadoop and Other New
Technologies
Ÿ  What Should You Look For in a Vendor?
Ÿ  Q&A
3© Copyright 2012 EMC Corporation. All rights reserved.
Current Market
Observations
Jeff Kelly
4© Copyright 2012 EMC Corporation. All rights reserved.
Big Data Market Size
2012
$11.4b
2013
$18.2b
2017
$48b
ü  59% Growth Y-o-Y 2011 to
2012
ü  Forecast 60%+ Growth in
2013
ü  31% CAGR Forecast 2012
through 2017
2014
$28b
2015
$37.9b
2016
$43.7b
Source: Wikibon Big Data Vendor Revenue and Market Forecast, 2012-2017
5© Copyright 2012 EMC Corporation. All rights reserved.
Big Data Market Segmentation, 2012
Services Leading the Way
Professional
Services
$3,784m
34%
Cloud and SaaS
$608m
5%
Pro. Services
Compute
Storage
Networking
Database
Applications
Data mgt.
Cloud
n = $11,400m
6© Copyright 2012 EMC Corporation. All rights reserved.
Big Data Growth Drivers
ü  Increased Awareness and Investments
By Large Enterprises Beyond the Web
ü  Retailers like Sears leveraging Big Data for
price optimization.
ü  Financial services firms, including JPMC, Morgan
Stanley and BoA, conduct fraud analysis, risk
profiling and more.
ü  Pharmaceutical including Bristol Myers Squibb
makers use Big Data to support drug
development.
ü  Continued Investment by Web Pioneers and Three Letter Agencies
ü  Google alone spent $1b+ on infrastructure in Q4 2012.
ü  “Everything we do is a Big Data problem.” – Jay Parikh, VP of Engineering, Facebook
ü  CIA CTO Ira Hunt: Our mission is to “collect everything and hang on to it forever.”
7© Copyright 2012 EMC Corporation. All rights reserved.
Big Data Growth Drivers, Cont.
ü  Increasingly Sophisticated Professional Services
ü  Professional services building on experience of assisting early adopters.
ü  Some (but not all) are vendor and product agnostic.
ü  Focusing on identifying use cases, improving communication, and leveraging
existing assets.
ü  Technology Maturation
ü  Open source community and vendors making
Hadoop enterprise-ready, easier to use.
ü  Better integration between Big Data and
existing IT infrastructure.
ü  Extending Big Data accessibility to business
users via BI and data visualization tools.
Consulting
Training &
Educations
Integration
8© Copyright 2012 EMC Corporation. All rights reserved.
Big Data Growth Inhibitors
ü  Lack of Data Scientists and Big Data
Practitioners
ü  Big Data Technology Still Complex, Difficult to
Manage/Use
ü  Organizational Resistance to Data-Driven
Decision Making
ü  Confusion Due to Vendor Marketing and “Big
Data Washing”
Big Data [Your Product Name Here]
9© Copyright 2012 EMC Corporation. All rights reserved.
The Big Data Business
Maturity Index and How to
Identify Your Best Use Case
Bill Schmarzo
10© Copyright 2012 EMC Corporation. All rights reserved.
Business
Metamorphosis
Data
Monetization
Business
Optimization
Business
Insights
Business
Monitoring
Monitoring business
performance to flag
areas of interest
Big Data Business Model Maturation Index
Integrate insights &
recommendations
into existing
business processes
Embed analytics to
optimize select
business processes
Leverage insights to
identify new revenue
opportunities
Transform customer and
product insights to
move into new markets
Measures the degree to which the organization has
integrated big data and advanced analytics into
their business model
11© Copyright 2012 EMC Corporation. All rights reserved.
How to Identify Your Best Use Case
The Big Data strategy document ensures a tight linkage between your
organization’s business initiatives and your big data strategy
•  Big	
  data	
  business	
  cases,	
  ROI	
  and	
  
analy4c	
  requirements	
  
•  Key	
  Performance	
  Indicators	
  and	
  
leading	
  metrics	
  	
  
•  Business	
  ques4ons	
  with	
  metrics,	
  
dimensions,	
  hierarchies	
  
•  Business	
  decisions,	
  decision	
  flow/
process	
  and	
  UEX	
  requirements	
  
•  Analy4c	
  algorithms	
  and	
  modeling	
  
requirements	
  
•  Required	
  data	
  sources	
  
Business Strategy: Provide Unique Starbucks Customer Experience
Business Initiatives:
•  Increase number of “Gold Card” customers
•  Increase “Gold Card” customer revenue & engagement (store visits, spend per visit, advocacy)
Mobile App
• 
• 
Social Media
• 
• 
Store Sales
• 
• 
Customer Loyalty
• 
• 
Collect customer engagement information through multiple channels (store, web, mobile)
Profile and micro-segment customers to improve marketing and offers effectiveness
Analyze social media data to identify and monitor brand advocates
Monitor and adjust customer engagement effectiveness (visits, revenue, margin, advocacy)
Tasks
Develop intimate knowledge of “Gold Card”
customers life stage, behaviors and interests
Act upon intimate knowledge of “Gold Card”
customers to increase store revenues
•  Expand customer data collection points
•  Leverage “gold card” member transactions, feedback (surveys) and social data
•  Integrate customer-specific insights back into operational, management and loyalty systems
Outcomes & CSF’s
12© Copyright 2012 EMC Corporation. All rights reserved.
Get Started With Hadoop and
Other New Technologies
Bill Schmarzo
13© Copyright 2012 EMC Corporation. All rights reserved.
A Playbook For Modernizing Your Data Warehouse With
New Big Data Technologies And Capabilities
#1) Enhance data warehouse with new unstructured data metrics
#2) Data virtualization to extend existing data warehouse environment
#3) MPP RDBMS to increase data platform scalability and agility
#4) In-database analytics to accelerate analytic development
#5) Hadoop to create the next generation Operational Data Store
14© Copyright 2012 EMC Corporation. All rights reserved.
#1) Enhance Data Warehouse With New Unstructured Data Metrics
Leverage HDFS to provide a single platform that supports your traditional SQL-
based BI environment plus your growing unstructured data needs at scale
HDFS
HBase
Pig, Hive,
Mahout
Map Reduce
Sqoop Flume
Resource
Management
& Workflow
Yarn
Zookeeper
Apache
Pivotal HD
Configure,
Deploy,
Monitor,
Manage
Command
Center
Hadoop Virtualization (HVE)
DataLoader
Xtension
Framework
Catalog
Services
Query
Optimizer
Dynamic Pipelining
ANSI SQL + Analytics
HAWQ – Advanced
Database Services
15© Copyright 2012 EMC Corporation. All rights reserved.
ETL
Cached
Streaming Data
Unified Data Platform
Data Source
Real-Time Visualization
Advanced Analytics and Modeling
Data Source
CEP/
Workflow
Data Federation Tool
Semantic Master
Data
Discovery /
Data Mapping
Data
Source
Data
Source
#2) Extend Existing Data Warehouse Via Data Virtualization
Leverage data federation tools to speed data discovery and analysis via virtual, on-
demand access to data sources outside your EDW
16© Copyright 2012 EMC Corporation. All rights reserved.
•  Massively Parallel
Processing (MPP), scale-
out architectures provide
cost effective options for
managing and analyzing
massive data volumes
•  MPP data warehouses
provide linear scalability on
general purpose,
commodity systems (e.g.,
fault-tolerant scale out
environment; automatic
parallelization; I/O
optimized)
#3) Massively Parallel Processing (MPP) Relational Databases
17© Copyright 2012 EMC Corporation. All rights reserved.
#4) In-Database Processing And Analytics
Conventional: A Data Scientist needs to move 1 TB of data from a 5-
processor database server to the analytical server at 1 gigabytes per
second (Gbs)
In-Database: A Data Scientist leaves the 1 TB data in the 5-processor
database server and runs the same algorithm directly in the database
0 20 40 60 80 100 120 140 180160 200
Data Movement Time = (1TB x 8) / 1Gbs / 60 s = 133.3 minutes Processing Time = 60 minutes
12
minutes
Total Time = 193.3 minutes
Time
(minutes)
Conventional
In-Database
18© Copyright 2012 EMC Corporation. All rights reserved.
Hadoop
Data
Store Analytics Environment
Data Preparation
and Enrichment
ALL data fed
into Hadoop
Data Store
EDWETL
Analytic
Sandbox
BI Environment
•  Production
•  Predictable load
•  SLA-drive
•  Standard tools
•  Exploratory, Ad Hoc
•  Unpredictable load
•  Experimentation
•  Best tool for the job
#5) Next Gen Operational Data Store/Data Prep With Hadoop
Feeds production BI and Enterprise
Data Warehouse environment and high-
velocity Analytics Sandbox
19© Copyright 2012 EMC Corporation. All rights reserved.
How To Get Started
20© Copyright 2012 EMC Corporation. All rights reserved.
EMC Big Data Analytics Strategy And Implementation Services
Analytics
Operationalization
Identify current state, determine required
state and conduct gap analysis to develop
analytics implementation roadmap
Analytics
Lab
Deploy analytics sandbox
to quantify the business
case
Vision
Workshop
Identify big data
analytics business
use cases
Repeat the process for
identified business cases
21© Copyright 2012 EMC Corporation. All rights reserved.
What Should You Look
For in a Vendor?
Jeff Kelly
22© Copyright 2012 EMC Corporation. All rights reserved.
Advice for Selecting Big Data Vendors
ü  Balance short-term goals with long-term vision.
ü  Objectives are:
ü  Quick, demonstrable ROI.
ü  Sustainable Big Data practice.
ü  Don’t get hung up on “speeds and feeds” or feature-by-feature comparisons.
ü  Focus on substance, flexibility, commitment and experience.
23© Copyright 2012 EMC Corporation. All rights reserved.
Selecting Big Data Vendors, Cont.
ü  Evaluate products portfolios based on:
ü  Ability to monetize existing and future data assets.
ü  Ability to integrate with and compliment existing data management technology.
ü  Accessibility to power users and business users alike (depending on use case).
ü  Ability to apply information governance and security best practices.
ü  Select service providers with track records of assisting enterprises adopt data-
driven culture as well as technology.
24© Copyright 2012 EMC Corporation. All rights reserved.
To type a question via WebEx, click on the Q&A tab
Please select “Ask: All Panelists”
to ensure your questions reach us. Thank you!
Questions and Answers
25© Copyright 2012 EMC Corporation. All rights reserved.
Learn More…
Ÿ  See us at…
–  EMC World, May 5-9 www.emc.world.com
Ÿ  Contact Jeff Kelly
–  Email: jeff.kelly@wikibon.org
–  LinkedIn: http://www.linkedin.com/in/jeffreyfkelly/
–  Twitter: @jeffreyfkelly
–  Research: http://www.wikibon.org/bigdata
Ÿ  Contact Bill Schmarzo
–  Email: william.schmarzo@emc.com
–  LinkedIn: http://www.linkedin.com/in/schmarzo
–  Twitter: @schmarzo
–  Blog: http://infocus.emc.com/author/william_schmarzo/

Create your Big Data vision and Hadoop-ify your data warehouse

  • 1.
    1© Copyright 2012EMC Corporation. All rights reserved. Create Your Big Data Vision And Hadoop-ify Your Data Warehouse Jeff Kelly, Principal Research Contributor The Wikibon Project Bill Schmarzo, CTO EIMA Practice, EMC Professional Services
  • 2.
    2© Copyright 2012EMC Corporation. All rights reserved. Agenda Ÿ  Current Market Observations Ÿ  The Big Data Business Maturity Index and How to Identify Your Best Use Case Ÿ  Get Started With Hadoop and Other New Technologies Ÿ  What Should You Look For in a Vendor? Ÿ  Q&A
  • 3.
    3© Copyright 2012EMC Corporation. All rights reserved. Current Market Observations Jeff Kelly
  • 4.
    4© Copyright 2012EMC Corporation. All rights reserved. Big Data Market Size 2012 $11.4b 2013 $18.2b 2017 $48b ü  59% Growth Y-o-Y 2011 to 2012 ü  Forecast 60%+ Growth in 2013 ü  31% CAGR Forecast 2012 through 2017 2014 $28b 2015 $37.9b 2016 $43.7b Source: Wikibon Big Data Vendor Revenue and Market Forecast, 2012-2017
  • 5.
    5© Copyright 2012EMC Corporation. All rights reserved. Big Data Market Segmentation, 2012 Services Leading the Way Professional Services $3,784m 34% Cloud and SaaS $608m 5% Pro. Services Compute Storage Networking Database Applications Data mgt. Cloud n = $11,400m
  • 6.
    6© Copyright 2012EMC Corporation. All rights reserved. Big Data Growth Drivers ü  Increased Awareness and Investments By Large Enterprises Beyond the Web ü  Retailers like Sears leveraging Big Data for price optimization. ü  Financial services firms, including JPMC, Morgan Stanley and BoA, conduct fraud analysis, risk profiling and more. ü  Pharmaceutical including Bristol Myers Squibb makers use Big Data to support drug development. ü  Continued Investment by Web Pioneers and Three Letter Agencies ü  Google alone spent $1b+ on infrastructure in Q4 2012. ü  “Everything we do is a Big Data problem.” – Jay Parikh, VP of Engineering, Facebook ü  CIA CTO Ira Hunt: Our mission is to “collect everything and hang on to it forever.”
  • 7.
    7© Copyright 2012EMC Corporation. All rights reserved. Big Data Growth Drivers, Cont. ü  Increasingly Sophisticated Professional Services ü  Professional services building on experience of assisting early adopters. ü  Some (but not all) are vendor and product agnostic. ü  Focusing on identifying use cases, improving communication, and leveraging existing assets. ü  Technology Maturation ü  Open source community and vendors making Hadoop enterprise-ready, easier to use. ü  Better integration between Big Data and existing IT infrastructure. ü  Extending Big Data accessibility to business users via BI and data visualization tools. Consulting Training & Educations Integration
  • 8.
    8© Copyright 2012EMC Corporation. All rights reserved. Big Data Growth Inhibitors ü  Lack of Data Scientists and Big Data Practitioners ü  Big Data Technology Still Complex, Difficult to Manage/Use ü  Organizational Resistance to Data-Driven Decision Making ü  Confusion Due to Vendor Marketing and “Big Data Washing” Big Data [Your Product Name Here]
  • 9.
    9© Copyright 2012EMC Corporation. All rights reserved. The Big Data Business Maturity Index and How to Identify Your Best Use Case Bill Schmarzo
  • 10.
    10© Copyright 2012EMC Corporation. All rights reserved. Business Metamorphosis Data Monetization Business Optimization Business Insights Business Monitoring Monitoring business performance to flag areas of interest Big Data Business Model Maturation Index Integrate insights & recommendations into existing business processes Embed analytics to optimize select business processes Leverage insights to identify new revenue opportunities Transform customer and product insights to move into new markets Measures the degree to which the organization has integrated big data and advanced analytics into their business model
  • 11.
    11© Copyright 2012EMC Corporation. All rights reserved. How to Identify Your Best Use Case The Big Data strategy document ensures a tight linkage between your organization’s business initiatives and your big data strategy •  Big  data  business  cases,  ROI  and   analy4c  requirements   •  Key  Performance  Indicators  and   leading  metrics     •  Business  ques4ons  with  metrics,   dimensions,  hierarchies   •  Business  decisions,  decision  flow/ process  and  UEX  requirements   •  Analy4c  algorithms  and  modeling   requirements   •  Required  data  sources   Business Strategy: Provide Unique Starbucks Customer Experience Business Initiatives: •  Increase number of “Gold Card” customers •  Increase “Gold Card” customer revenue & engagement (store visits, spend per visit, advocacy) Mobile App •  •  Social Media •  •  Store Sales •  •  Customer Loyalty •  •  Collect customer engagement information through multiple channels (store, web, mobile) Profile and micro-segment customers to improve marketing and offers effectiveness Analyze social media data to identify and monitor brand advocates Monitor and adjust customer engagement effectiveness (visits, revenue, margin, advocacy) Tasks Develop intimate knowledge of “Gold Card” customers life stage, behaviors and interests Act upon intimate knowledge of “Gold Card” customers to increase store revenues •  Expand customer data collection points •  Leverage “gold card” member transactions, feedback (surveys) and social data •  Integrate customer-specific insights back into operational, management and loyalty systems Outcomes & CSF’s
  • 12.
    12© Copyright 2012EMC Corporation. All rights reserved. Get Started With Hadoop and Other New Technologies Bill Schmarzo
  • 13.
    13© Copyright 2012EMC Corporation. All rights reserved. A Playbook For Modernizing Your Data Warehouse With New Big Data Technologies And Capabilities #1) Enhance data warehouse with new unstructured data metrics #2) Data virtualization to extend existing data warehouse environment #3) MPP RDBMS to increase data platform scalability and agility #4) In-database analytics to accelerate analytic development #5) Hadoop to create the next generation Operational Data Store
  • 14.
    14© Copyright 2012EMC Corporation. All rights reserved. #1) Enhance Data Warehouse With New Unstructured Data Metrics Leverage HDFS to provide a single platform that supports your traditional SQL- based BI environment plus your growing unstructured data needs at scale HDFS HBase Pig, Hive, Mahout Map Reduce Sqoop Flume Resource Management & Workflow Yarn Zookeeper Apache Pivotal HD Configure, Deploy, Monitor, Manage Command Center Hadoop Virtualization (HVE) DataLoader Xtension Framework Catalog Services Query Optimizer Dynamic Pipelining ANSI SQL + Analytics HAWQ – Advanced Database Services
  • 15.
    15© Copyright 2012EMC Corporation. All rights reserved. ETL Cached Streaming Data Unified Data Platform Data Source Real-Time Visualization Advanced Analytics and Modeling Data Source CEP/ Workflow Data Federation Tool Semantic Master Data Discovery / Data Mapping Data Source Data Source #2) Extend Existing Data Warehouse Via Data Virtualization Leverage data federation tools to speed data discovery and analysis via virtual, on- demand access to data sources outside your EDW
  • 16.
    16© Copyright 2012EMC Corporation. All rights reserved. •  Massively Parallel Processing (MPP), scale- out architectures provide cost effective options for managing and analyzing massive data volumes •  MPP data warehouses provide linear scalability on general purpose, commodity systems (e.g., fault-tolerant scale out environment; automatic parallelization; I/O optimized) #3) Massively Parallel Processing (MPP) Relational Databases
  • 17.
    17© Copyright 2012EMC Corporation. All rights reserved. #4) In-Database Processing And Analytics Conventional: A Data Scientist needs to move 1 TB of data from a 5- processor database server to the analytical server at 1 gigabytes per second (Gbs) In-Database: A Data Scientist leaves the 1 TB data in the 5-processor database server and runs the same algorithm directly in the database 0 20 40 60 80 100 120 140 180160 200 Data Movement Time = (1TB x 8) / 1Gbs / 60 s = 133.3 minutes Processing Time = 60 minutes 12 minutes Total Time = 193.3 minutes Time (minutes) Conventional In-Database
  • 18.
    18© Copyright 2012EMC Corporation. All rights reserved. Hadoop Data Store Analytics Environment Data Preparation and Enrichment ALL data fed into Hadoop Data Store EDWETL Analytic Sandbox BI Environment •  Production •  Predictable load •  SLA-drive •  Standard tools •  Exploratory, Ad Hoc •  Unpredictable load •  Experimentation •  Best tool for the job #5) Next Gen Operational Data Store/Data Prep With Hadoop Feeds production BI and Enterprise Data Warehouse environment and high- velocity Analytics Sandbox
  • 19.
    19© Copyright 2012EMC Corporation. All rights reserved. How To Get Started
  • 20.
    20© Copyright 2012EMC Corporation. All rights reserved. EMC Big Data Analytics Strategy And Implementation Services Analytics Operationalization Identify current state, determine required state and conduct gap analysis to develop analytics implementation roadmap Analytics Lab Deploy analytics sandbox to quantify the business case Vision Workshop Identify big data analytics business use cases Repeat the process for identified business cases
  • 21.
    21© Copyright 2012EMC Corporation. All rights reserved. What Should You Look For in a Vendor? Jeff Kelly
  • 22.
    22© Copyright 2012EMC Corporation. All rights reserved. Advice for Selecting Big Data Vendors ü  Balance short-term goals with long-term vision. ü  Objectives are: ü  Quick, demonstrable ROI. ü  Sustainable Big Data practice. ü  Don’t get hung up on “speeds and feeds” or feature-by-feature comparisons. ü  Focus on substance, flexibility, commitment and experience.
  • 23.
    23© Copyright 2012EMC Corporation. All rights reserved. Selecting Big Data Vendors, Cont. ü  Evaluate products portfolios based on: ü  Ability to monetize existing and future data assets. ü  Ability to integrate with and compliment existing data management technology. ü  Accessibility to power users and business users alike (depending on use case). ü  Ability to apply information governance and security best practices. ü  Select service providers with track records of assisting enterprises adopt data- driven culture as well as technology.
  • 24.
    24© Copyright 2012EMC Corporation. All rights reserved. To type a question via WebEx, click on the Q&A tab Please select “Ask: All Panelists” to ensure your questions reach us. Thank you! Questions and Answers
  • 25.
    25© Copyright 2012EMC Corporation. All rights reserved. Learn More… Ÿ  See us at… –  EMC World, May 5-9 www.emc.world.com Ÿ  Contact Jeff Kelly –  Email: jeff.kelly@wikibon.org –  LinkedIn: http://www.linkedin.com/in/jeffreyfkelly/ –  Twitter: @jeffreyfkelly –  Research: http://www.wikibon.org/bigdata Ÿ  Contact Bill Schmarzo –  Email: william.schmarzo@emc.com –  LinkedIn: http://www.linkedin.com/in/schmarzo –  Twitter: @schmarzo –  Blog: http://infocus.emc.com/author/william_schmarzo/