More Related Content Similar to Enterprise Data Warehouse Optimization: 7 Keys to Success (20) More from Hortonworks (20) Enterprise Data Warehouse Optimization: 7 Keys to Success3. Step 1: Leverage Horizontal Scalability
• DW appliances require
significant capital investment
– System must be sized to meet
anticipated needs
– Allows for unused capacity at
beginning
– Requires increased “step-up”
investments on regular intervals
• Hadoop finesses this challenge
– Relies on commodity
components
– Start with what you need, grow
with increased demand
– Introduce newer hardware
seamlessly
– Exploit innovations to speed
performance (e.g., Stinger.next,
Low Latency Analytical
Processing)
© 2017 Knowledge Integrity, Inc loshin@knowledge-integrity.com (301) 754-6350 3
Rack switch
NameNode
DataNode &
TaskTracker
DataNode &
TaskTracker
DataNode &
TaskTracker
DataNode &
TaskTracker
Rack switch
NameNode
DataNode &
TaskTracker
DataNode &
TaskTracker
DataNode &
TaskTracker
DataNode &
TaskTracker
Rack switch
NameNode
DataNode &
TaskTracker
DataNode &
TaskTracker
DataNode &
TaskTracker
DataNode &
TaskTracker
Rack switch
NameNode
DataNode &
TaskTracker
DataNode &
TaskTracker
DataNode &
TaskTracker
DataNode &
TaskTracker
5. Step 3: Increase Data Flexibility
• Conventional data warehouse architectures are organized using a dimensional model
– Facts represent events
– Dimensions characterize the facts
• The dimensional model is suited to typical DW operations
– Aggregation and rolled-up reporting
– “Slice and dice”
• However, this model forces all data into predetermined schema (“schema-on-write”)
– Introduces bias, creates constraints and limits data flexibility
• Alternative: schema-on-read
– Data sets are captured in their source formats
– Frees data consumers to apply their own organization
– Allows logical structure to be layered on top of data in source format
– Enables use of creative algorithms for analytics, text mining, and machine learning
© 2017 Knowledge Integrity, Inc loshin@knowledge-integrity.com (301) 754-6350 5
24. 24 © Hortonworks Inc. 2011 –2016. All Rights Reserved
Offload ETL to Hadoop
à The Problem:
– EDWs can consume between 50% and 90% of
resources just on ETL/ELT tasks.
– These jobs interfere with more business-
critical tasks like BI and advanced analytics.
à The Solution:
– Hive and HDP deliver ETL that scales to
petabytes.
– Economical scale-out processing on
commodity servers.
à The Result:
– Better SLAs for mission-critical analytics.
– Limit EDW expansion or retire old systems.
ETL/ELT
DATA
MART
DATA
LANDING &
DEEP
ARCHIVE
CUBE
MART
END USER
APPLICATIONS
APPLICATIONS
APPLICATIONS
END USERS
AND APPS
26. 26 © Hortonworks Inc. 2011 –2016. All Rights Reserved
Use Case 1: Multi-Channel Behavioral Analysis
à Industry: Mass Media
– Largest broadcasting and cable company
in the world by revenue
– Multiple channels: Cable (set-top-box),
wireless devices, streaming
programming,
– 22 million+ subscribers (internet &
video)
à Results:
– Scalability: 480B rows, 500 nodes
– 60x query performance improvement
– Insights: New info improve negations
– Loyalty: Outreach to customers viewing
competitive streams; ▼churn ▲
revenue
Before After
Leading Media Company
Hortonworks HDP
AtScale Intelligence Server
Hortonworks HDP
Netezza Data Mart
Channel Feeds
Tableau + MS Excel + R
Channel Feeds
Tableau + MS Excel
27. 27 © Hortonworks Inc. 2011 –2016. All Rights Reserved
Use Case 2: Campaign Paid-Search Effectiveness
à Industry: Retail / eCommerce
– Top US department store (by rev)
– Online sales $4B+ & growing (11%+ total)
– 800+ department stores nationwide
à Results
– Scale: Millions paid keywords analyzed
– Speed: Eliminate extract step
– Insight: Operationalized closed-loop
analysis à insight à decision à action
– Impact: Make and save $ millions w/
instant bid decisions over 6-week season
à that drives 60% annual revenue
Before After
Hortonworks HDP
AtScale Intelligence Server
Hortonworks HDP
Vertica Data Marts
Ad & Paid Keywords
Cognos + Tableau + Excel
Ad & Paid Keywords
Tableau + Excel
Leading Retailer
28. 28 © Hortonworks Inc. 2011 –2016. All Rights Reserved
Use Case 3: Client and Patient Analysis
à Industry: Managed Health Care
– Member of Fortune 100
– Health, life + other insurance products
– ~ 52 million members;
medical/dental/pharm
à Results
– Scalable: BI directly on 264+ nodes data
– Time: Eliminate data movement step
– 62x query performance improvement
– Speed: <2.2 second average query time
– Insight: Tableau on Hadoop for 1000+
– Security: Access control by user; HIPAA
Before After
Leading Managed Healthcare Provider
Hortonworks HDP
AtScale Intelligence Server
Hortonworks HDP
Netezza Data Mart
Client / Patient Details
Tableau + MS Excel
Client / Patient Details
Tableau + MS Excel