HYBRID DATA
ARCHITECTUREIntegrating Hadoop with a Data Warehouse
Chuck Currin
Principal Consultant @ Key2 Consulting
Arvid Tchivzhel
Director @ Mather Economics
INTRODUCTIONS
Key2 Consulting – Who are we? Mather Economics – Who are we?
PROJECT BACKGROUND
• Project Team Turnover
• Requirements Refactoring
• Existing Code Review
• Speeding up the SDLC
SO, WHAT DID MATHER WANT?
• Data Agnostic
• Technology Agnostic
• Responsive End User Experience
• On the fly data slicing and aggregating
• 100% sample sizing for Modeling
Customer Experience Needs
Responsive Data Architecture
PROCESS REQUIREMENTS PYRAMID
The Architecture must
Support Ease of Data
Extensibility and Scalability
Statistical Modelling (100% Sample Size)
Capability to capture any data format
TWO SEPARATE APPROACHES
DW Hadoop
Cube
Drill
ETL
Facts
Slice
Star
Schema
Dice
Multi
Dimensional
Aggregate
Dimensions
CDC
Slowly
changing
HUE
Beeline
HBase
Cluster
HDFS
Node
Python
SQOOP
PIG
Hive
HYBRID APPROACH
SQOOP
Python
Hive
PIG
AggregateDimensions
HUE
Beeline
HBase
Cluster
HDFS
Node
Cube
Drill
ETL
Facts
Slice
Star
Schema
Dice
Multi
Dimensional
CDC
Slowly
changing
IMPLEMENTATION PYRAMID
Data Sourcing
Data Lake
Dashboard
Data Warehouse
HDFS
Customer
Interaction
Reducing &
Linking
Mapping & Storage
Segmenting &
Staging
Ingestion
Transformation &
Integration
Dimensional
Data Warehouse
Analyst
Interaction
Relational
DATA SOURCING
Ingestion
DATA LAKE
Mapping & Storage
Segmenting &
Staging
Reducing &
Linking
DATA WAREHOUSE
Transformation &
Integration
Dimensional
Data Warehouse
USER INTERACTION
Analyst
Interaction
Customer
Interaction
DATA IS OIL… BUT ORGANIZATIONS NEED
GASOLINE
“…Data is the new oil — it’s very valuable to the companies that have it, but
only after it has been mined and processed.
The analogy makes some sense, but it ignores the fact that people and
companies don’t have the means to collect the data they need or the ability
to process it once they have it. A lot of us just need gasoline.”
Derrick Harris, Gigaom.com, March 4, 2015
Copyright 2015 Mather Economics LLC. All rights reserved.
THE GASOLINE: ANALYTICS AND
OPTIMIZATION ENABLED BY BIG DATA
• Digital Customer Lifetime Value (CLV)
• e-Commerce flow
• Advertising targeting and pricing
• Editorial programming decisions
• Product offerings, bundles, and pricing
• Acquisition targeting
• Retention strategy
• Content recommendation
Copyright 2015 Mather Economics LLC. All rights reserved.
INTEGRATING FORMERLY DISPARATE SYSTEMS TO
CREATE 360-DEGREE CUSTOMER INTERACTION
VIEW
Copyright 2015 Mather Economics LLC. All rights reserved.
Centralized Big
Data & Analytics:
• Customer- Level
• Online Data
• Offline Data
Data
Analytics
Customer
Transactions
Demographics
Online
Behavior
Marketing
Analytics
Dashboard
ENTITY RESOLUTION AND BIG DATA
• Big Data and traditional warehousing
are critical for successful entity
resolution
• Clients understand number of
customers that interact through
different channels
Copyright 2015 Mather Economics LLC. All rights reserved.
Transactions
Demographics
Registrations
Email
Campaigns
Newsletter
TRADITIONAL OFFLINE METRICS CAN BE
MEASURED BY ONLINE ENGAGEMENT
Copyright 2015 Mather Economics LLC. All rights reserved.
CUSTOMER LIFETIME VALUE
Copyright 2015 Mather Economics LLC. All rights reserved.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%0
60
120
180
240
300
360
420
480
540
600
660
720
Expected Value
(Area Under Curve)
CLV = [(Revenues – Costs)*(Expected Lifetime)]*PV – Acquisition Cost
OPERATIONALIZING CLV
• Find high/low value segments
• Integrate with campaign targeting
• Apply to pricing and product offerings
• Available to customer service reps
• Apply to online audience segmentation
• Overlay against content, psychographics…etc.
Copyright 2015 Mather Economics LLC. All rights reserved.
V
a
l
u
e
Copyright 2015 Mather Economics LLC. All rights reserved.
CUSTOMER LIFETIME VALUE HELPS MANAGE
PROFITABILITY ON A CUSTOMER LEVEL
Copyright 2015 Mather Economics LLC. All rights reserved.
CONCLUSION
• DW and Hadoop not mutually exclusive
• Mather requirements met by a hybrid approach
• Hybrid key to satisfying both internal/external users
• Speed of dimensional data critical for analytics
Copyright 2015 Mather Economics LLC. All rights reserved.

Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse

  • 1.
    HYBRID DATA ARCHITECTUREIntegrating Hadoopwith a Data Warehouse Chuck Currin Principal Consultant @ Key2 Consulting Arvid Tchivzhel Director @ Mather Economics
  • 2.
    INTRODUCTIONS Key2 Consulting –Who are we? Mather Economics – Who are we?
  • 3.
    PROJECT BACKGROUND • ProjectTeam Turnover • Requirements Refactoring • Existing Code Review • Speeding up the SDLC
  • 4.
    SO, WHAT DIDMATHER WANT? • Data Agnostic • Technology Agnostic • Responsive End User Experience • On the fly data slicing and aggregating • 100% sample sizing for Modeling
  • 5.
    Customer Experience Needs ResponsiveData Architecture PROCESS REQUIREMENTS PYRAMID The Architecture must Support Ease of Data Extensibility and Scalability Statistical Modelling (100% Sample Size) Capability to capture any data format
  • 6.
    TWO SEPARATE APPROACHES DWHadoop Cube Drill ETL Facts Slice Star Schema Dice Multi Dimensional Aggregate Dimensions CDC Slowly changing HUE Beeline HBase Cluster HDFS Node Python SQOOP PIG Hive
  • 7.
  • 8.
    IMPLEMENTATION PYRAMID Data Sourcing DataLake Dashboard Data Warehouse
  • 9.
    HDFS Customer Interaction Reducing & Linking Mapping &Storage Segmenting & Staging Ingestion Transformation & Integration Dimensional Data Warehouse Analyst Interaction Relational
  • 10.
  • 11.
    DATA LAKE Mapping &Storage Segmenting & Staging Reducing & Linking
  • 12.
  • 13.
  • 14.
    DATA IS OIL…BUT ORGANIZATIONS NEED GASOLINE “…Data is the new oil — it’s very valuable to the companies that have it, but only after it has been mined and processed. The analogy makes some sense, but it ignores the fact that people and companies don’t have the means to collect the data they need or the ability to process it once they have it. A lot of us just need gasoline.” Derrick Harris, Gigaom.com, March 4, 2015 Copyright 2015 Mather Economics LLC. All rights reserved.
  • 15.
    THE GASOLINE: ANALYTICSAND OPTIMIZATION ENABLED BY BIG DATA • Digital Customer Lifetime Value (CLV) • e-Commerce flow • Advertising targeting and pricing • Editorial programming decisions • Product offerings, bundles, and pricing • Acquisition targeting • Retention strategy • Content recommendation Copyright 2015 Mather Economics LLC. All rights reserved.
  • 16.
    INTEGRATING FORMERLY DISPARATESYSTEMS TO CREATE 360-DEGREE CUSTOMER INTERACTION VIEW Copyright 2015 Mather Economics LLC. All rights reserved. Centralized Big Data & Analytics: • Customer- Level • Online Data • Offline Data Data Analytics Customer Transactions Demographics Online Behavior Marketing Analytics Dashboard
  • 17.
    ENTITY RESOLUTION ANDBIG DATA • Big Data and traditional warehousing are critical for successful entity resolution • Clients understand number of customers that interact through different channels Copyright 2015 Mather Economics LLC. All rights reserved. Transactions Demographics Registrations Email Campaigns Newsletter
  • 18.
    TRADITIONAL OFFLINE METRICSCAN BE MEASURED BY ONLINE ENGAGEMENT Copyright 2015 Mather Economics LLC. All rights reserved.
  • 19.
    CUSTOMER LIFETIME VALUE Copyright2015 Mather Economics LLC. All rights reserved. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%0 60 120 180 240 300 360 420 480 540 600 660 720 Expected Value (Area Under Curve) CLV = [(Revenues – Costs)*(Expected Lifetime)]*PV – Acquisition Cost
  • 20.
    OPERATIONALIZING CLV • Findhigh/low value segments • Integrate with campaign targeting • Apply to pricing and product offerings • Available to customer service reps • Apply to online audience segmentation • Overlay against content, psychographics…etc. Copyright 2015 Mather Economics LLC. All rights reserved.
  • 21.
    V a l u e Copyright 2015 MatherEconomics LLC. All rights reserved.
  • 22.
    CUSTOMER LIFETIME VALUEHELPS MANAGE PROFITABILITY ON A CUSTOMER LEVEL Copyright 2015 Mather Economics LLC. All rights reserved.
  • 23.
    CONCLUSION • DW andHadoop not mutually exclusive • Mather requirements met by a hybrid approach • Hybrid key to satisfying both internal/external users • Speed of dimensional data critical for analytics Copyright 2015 Mather Economics LLC. All rights reserved.