1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Scott Gnau, CTO
@Scott_Gnau
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The Next Gen EDW is the Big Data Warehouse
 In Forrester’s 2016 global survey, 59% of respondents stated that leveraging big data
and analytics was a critical or high priority.
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Companies Are Looking to Big Data for EDW Optimization
 82% of 2550+ respondents are looking to Big Data for EDW Optimization rather than a
straight replacement. – 2016 Big Data Maturity Survey
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks Connected Data Platforms and Solutions
Hortonworks
Connection
Hortonworks Solutions
Enterprise Data
Warehouse Optimization
Cyber Security and
Threat Management
Internet of Things
and Streaming Analytics
Hortonworks Connection
Subscription Support
SmartSense
Premier Support
Educational Services
Professional Services
Community Connection
Cloud
Hortonworks Data Cloud
AWS HDInsight
Data Center
Hortonworks Data Suite
HDFHDP
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Drivers of a Modern BI Infrastructure
Deeper and
Broader Data Sets
Complete Data
‘Provenance’
Leading Analytics
and Tools
Integrate non-EDW
data and EDW data
Total Cost of
Ownership
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Open Source Transformational Impact to EDW
Unmatched Economics
support low cost data-center and cloud
architectures for Enterprise Apache
Hadoop
Eliminates Risk and Ensures Integration
prevents vendor lock-in and speeds
ecosystem adoption of ODPi-compliant
core
COST
EFFICIENCY
DATA
VARIETY
EDW
PROPRIETARY
HADOOP
HORTONWORKS
OPEN SOURCE
RDBMS
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
But, why aren’t more companies running to this solution?
Risky
Hadoop requires a bunch of
new skill sets
It’ll take a long time
There’s too much manual coding required
It’s hard to integrate to
my BI tool stack
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Legacy EDW vs. EDW Optimization Solution with Connected Data Platforms
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
EDW Optimization: Fast BI on Hadoop
 The Problem:
– Legacy EDW systems were adopted for Fast BI
and deep slice-and-dice analytics, but EDW
costs can limit breadth and depth of these
analytics.
 The Solution:
– Interactive SQL is a reality on Hadoop today.
– AtScale Intelligence Platform adds OLAP
capabilities for deep drilldown at scale.
 The Result:
– Query terabytes of data in seconds.
– Connect your favorite BI tools like Tableau and
Excel through SQL and MDX interfaces.
– The EDW Optimization Solution is tailor-made
to deliver Fast BI on Hadoop.
ETL/ELT
DATA
MART
DATA
LANDING &
DEEP
ARCHIVE
CUBE
MART
END USER
APPLICATIONS
APPLICATIONS
APPLICATIONS
END USERS
AND APPS
EDW OPTIMIZATION SOLUTION
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
EDW Optimization: ETL Offload
 The Problem:
– EDWs can consume between 50% and 90% of
resources just on ETL/ELT tasks.
– These jobs interfere with more business-
critical tasks like BI and advanced analytics.
 The Solution:
– Hive and HDP deliver ETL that scales to
petabytes.
– Syncsort DMX-h for simple drag-and-drop ETL
workflows.
– Economical scale-out processing on
commodity servers.
 The Result:
– Better SLAs for mission-critical analytics.
– Limit EDW expansion or retire old systems.
ETL/ELT
DATA
MART
DATA
LANDING &
DEEP
ARCHIVE
CUBE
MART
END USER
APPLICATIONS
APPLICATIONS
APPLICATIONS
END USERS
AND APPS
EDW OPTIMIZATION SOLUTION
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
EDW Optimization: Active Archive
 The Problem:
– Increasing data volumes and cost pressure
force data to be archived to tape.
– Archived data not available for analytics, or
must be retrieved at great expense.
 The Solution:
– Adopting Hadoop delivers cost per terabyte
on par with tape backup solutions.
– Data in Hadoop can be analyzed by all major
BI tools, allowing analytics on archive data.
 The Result:
– Data always available for analytics.
– Store years of data rather than months.
ETL/ELT
DATA
MART
DATA
LANDING &
DEEP
ARCHIVE
CUBE
MART
END USER
APPLICATIONS
APPLICATIONS
APPLICATIONS
END USERS
AND APPS
EDW OPTIMIZATION SOLUTION
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi-Channel Behavioral Analysis
 Industry: Mass Media
– Largest broadcasting and cable company
in the world by revenue
– Multiple channels: Cable (set-top-box),
wireless devices, streaming
programming,
– 22 million+ subscribers (internet &
video)
 Results:
– Scalability: 480B rows, 500 nodes
– 60x query performance improvement
– Insights: New info improve negations
– Loyalty: Outreach to customers viewing
competitive streams; ▼churn ▲
revenue
Before After
Leading Media Company
Hortonworks HDP
AtScale Intelligence Server
Hortonworks HDP
Netezza Data Mart
Channel Feeds
Tableau + MS Excel + R
Channel Feeds
Tableau + MS Excel
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Campaign Paid-Search Effectiveness: Retail
 Industry: Retail / eCommerce
– Top US department store (by rev)
– Online sales $4B+ & growing (11%+ total)
– 800+ department stores nationwide
 Results
– Scale: Millions paid keywords analyzed
– Speed: Eliminate extract step
– Insight: Operationalized closed-loop
analysis  insight  decision  action
– Impact: Make and save $ millions w/
instant bid decisions over 6-week season
 that drives 60% annual revenue
Before After
Hortonworks HDP
AtScale Intelligence Server
Hortonworks HDP
Vertica Data Marts
Ad & Paid Keywords
Cognos + Tableau + Excel
Ad & Paid Keywords
Tableau + Excel
Leading Retailer
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Client and Patient Analysis
 Industry: Managed Health Care
– Member of Fortune 100
– Health, life + other insurance products
– ~ 52 million members;
medical/dental/pharm
 Results
– Scalable: BI directly on 264+ nodes data
– Time: Eliminate data movement step
– 62x query performance improvement
– Speed: <2.2 second average query time
– Insight: Tableau on Hadoop for 1000+
– Security: Access control by user; HIPAA
Before After
Leading Managed Healthcare Provider
Hortonworks HDP
AtScale Intelligence Server
Hortonworks HDP
Netezza Data Mart
Client / Patient Details
Tableau + MS Excel
Client / Patient Details
Tableau + MS Excel
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Solution Architecture
Inbound
HDFS
(Based Data and Aggregates
Stored in ORC)
HIVE
(Batch and Interactive SQL)
HORTONWORKS DATA PLATFORM (HDP)
MULTITENANT PROCESSING:
YARN
(syncsort, llap, spark, tez)
AtScale
virtual cube
DMX Data
Funnel
DMX-h
Engine
EDW/
Legacy
4. Build Virtual Cube using AtScale
5. Build aggregates in Atscale for optimization
6. Query data using BI Tool like Tableau/Excel
through odbc/jdbc connection
High Level Flow
1. Install HDP, Syncsort and AtScale
2. Install EDW/Hive Drivers on Edge Node
3. Bring all tables involved in use case using
Syncsort data funnel into Hive
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks EDW Optimization Solution Components
Syncsort
High-Performance
Data Movement
Hadoop
Scalable Storage and Compute
Hive LLAP
High Performance SQL Data Mart
AtScale Intelligence Platform
OLAP Cubes for Higher Performance
Source Data
Systems
Fast, scalable SQL analytics
Intelligent in-memory caching
Define OLAP cubes for 10x faster queries
Unified semantic layer for all BI tools
High performance data import
from all major EDW platforms
Pre-aggregated
data
... Or, full-fidelity
data
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
ETL Workflow Onboarding: SyncSort DMX-h
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hybrid Query Service
❑ Choice of BI Tool
❑ Zero Client Install
❑ Secure Data
Access
❑ Optimized Queries
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Enterprise Data Optimization Solution Components
 Hortonworks: 24 nodes of Enterprise
Plus Support
 Syncsort: 24 nodes of DMX-H
 AtScale: 24 nodes of AtScale Intelligence
Platform
 Single Legacy Data source
 1 Fact table with 5 Dimensions
 Load up to 15 tables
 One time data dump
 Up to 1 cube with 10 measures
 1 BI Connection
 5TB Total Cube Limit
12 month license and support offering Pre-packaged Professional Services
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Future Proof
 Hive Optimizations
– Hve, Tez, ORC, LLAP
– Additional SQL coverage
 ACID Merge for SQL 2011 compliant (Upsert)
 Business Continuity Options
– Replication
– Backup/Restore
 Additional Hive options tech preview in 2.6
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
EDW Package: Professional Services ‘Proof of Value’
1. Install HDP, AtScale and Syncsort
2. Configure drivers for appropriate EDW and Hive on Edge Node
3. Enable and configure Interactive Hive (LLAP)
4. Ingest data from 1 legacy system
5. Create up to 3 BI cubes
6. Support connection to BI Tool
7. Demo of capabilities ( functionality and Performance). Under 10 second response time.
8. Solution Architecture Document and Schema definition
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
EDW Optimization Solution - Try It Now!
Tool-based approach means we can
leverage existing skillsets
Proof points in 60 days
Integrated into my BI tool stack
Hive supports scaled
queries and fast queries
It works!
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
To Learn More
 Everyone will receive a free copy of Forrester White Paper titled ”The Next-Generation
EDW Is The Big Data Warehouse”
 EDW Optimization with HDP
– http://hortonworks.com/solutions/edw-optimization/
– EDW Optimization 7 min video
 AtScale Intelligence Platform
– http://hortonworks.com/partner/atscale/
 Syncsort DMX-h
– http://hortonworks.com/partner/syncsort/
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks Connected Data Platforms and Solutions
Hortonworks
Connection
Hortonworks Solutions
Enterprise Data
Warehouse Optimization
Cyber Security and
Threat Management
Internet of Things
and Streaming Analytics
Hortonworks Connection
Subscription Support
SmartSense
Premier Support
Educational Services
Professional Services
Community Connection
Cloud
Hortonworks Data Cloud
AWS HDInsight
Data Center
Hortonworks Data Suite
HDFHDP
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You

Edw Optimization Solution

  • 1.
    1 © HortonworksInc. 2011 – 2016. All Rights Reserved1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Scott Gnau, CTO @Scott_Gnau
  • 2.
    2 © HortonworksInc. 2011 – 2016. All Rights Reserved The Next Gen EDW is the Big Data Warehouse  In Forrester’s 2016 global survey, 59% of respondents stated that leveraging big data and analytics was a critical or high priority.
  • 3.
    3 © HortonworksInc. 2011 – 2016. All Rights Reserved Companies Are Looking to Big Data for EDW Optimization  82% of 2550+ respondents are looking to Big Data for EDW Optimization rather than a straight replacement. – 2016 Big Data Maturity Survey
  • 4.
    4 © HortonworksInc. 2011 – 2016. All Rights Reserved Hortonworks Connected Data Platforms and Solutions Hortonworks Connection Hortonworks Solutions Enterprise Data Warehouse Optimization Cyber Security and Threat Management Internet of Things and Streaming Analytics Hortonworks Connection Subscription Support SmartSense Premier Support Educational Services Professional Services Community Connection Cloud Hortonworks Data Cloud AWS HDInsight Data Center Hortonworks Data Suite HDFHDP
  • 5.
    5 © HortonworksInc. 2011 – 2016. All Rights Reserved Drivers of a Modern BI Infrastructure Deeper and Broader Data Sets Complete Data ‘Provenance’ Leading Analytics and Tools Integrate non-EDW data and EDW data Total Cost of Ownership
  • 6.
    6 © HortonworksInc. 2011 – 2016. All Rights Reserved Open Source Transformational Impact to EDW Unmatched Economics support low cost data-center and cloud architectures for Enterprise Apache Hadoop Eliminates Risk and Ensures Integration prevents vendor lock-in and speeds ecosystem adoption of ODPi-compliant core COST EFFICIENCY DATA VARIETY EDW PROPRIETARY HADOOP HORTONWORKS OPEN SOURCE RDBMS
  • 7.
    7 © HortonworksInc. 2011 – 2016. All Rights Reserved But, why aren’t more companies running to this solution? Risky Hadoop requires a bunch of new skill sets It’ll take a long time There’s too much manual coding required It’s hard to integrate to my BI tool stack
  • 8.
    8 © HortonworksInc. 2011 – 2016. All Rights Reserved Legacy EDW vs. EDW Optimization Solution with Connected Data Platforms
  • 9.
    9 © HortonworksInc. 2011 – 2016. All Rights Reserved EDW Optimization: Fast BI on Hadoop  The Problem: – Legacy EDW systems were adopted for Fast BI and deep slice-and-dice analytics, but EDW costs can limit breadth and depth of these analytics.  The Solution: – Interactive SQL is a reality on Hadoop today. – AtScale Intelligence Platform adds OLAP capabilities for deep drilldown at scale.  The Result: – Query terabytes of data in seconds. – Connect your favorite BI tools like Tableau and Excel through SQL and MDX interfaces. – The EDW Optimization Solution is tailor-made to deliver Fast BI on Hadoop. ETL/ELT DATA MART DATA LANDING & DEEP ARCHIVE CUBE MART END USER APPLICATIONS APPLICATIONS APPLICATIONS END USERS AND APPS EDW OPTIMIZATION SOLUTION
  • 10.
    10 © HortonworksInc. 2011 – 2016. All Rights Reserved EDW Optimization: ETL Offload  The Problem: – EDWs can consume between 50% and 90% of resources just on ETL/ELT tasks. – These jobs interfere with more business- critical tasks like BI and advanced analytics.  The Solution: – Hive and HDP deliver ETL that scales to petabytes. – Syncsort DMX-h for simple drag-and-drop ETL workflows. – Economical scale-out processing on commodity servers.  The Result: – Better SLAs for mission-critical analytics. – Limit EDW expansion or retire old systems. ETL/ELT DATA MART DATA LANDING & DEEP ARCHIVE CUBE MART END USER APPLICATIONS APPLICATIONS APPLICATIONS END USERS AND APPS EDW OPTIMIZATION SOLUTION
  • 11.
    11 © HortonworksInc. 2011 – 2016. All Rights Reserved EDW Optimization: Active Archive  The Problem: – Increasing data volumes and cost pressure force data to be archived to tape. – Archived data not available for analytics, or must be retrieved at great expense.  The Solution: – Adopting Hadoop delivers cost per terabyte on par with tape backup solutions. – Data in Hadoop can be analyzed by all major BI tools, allowing analytics on archive data.  The Result: – Data always available for analytics. – Store years of data rather than months. ETL/ELT DATA MART DATA LANDING & DEEP ARCHIVE CUBE MART END USER APPLICATIONS APPLICATIONS APPLICATIONS END USERS AND APPS EDW OPTIMIZATION SOLUTION
  • 12.
    12 © HortonworksInc. 2011 – 2016. All Rights Reserved Multi-Channel Behavioral Analysis  Industry: Mass Media – Largest broadcasting and cable company in the world by revenue – Multiple channels: Cable (set-top-box), wireless devices, streaming programming, – 22 million+ subscribers (internet & video)  Results: – Scalability: 480B rows, 500 nodes – 60x query performance improvement – Insights: New info improve negations – Loyalty: Outreach to customers viewing competitive streams; ▼churn ▲ revenue Before After Leading Media Company Hortonworks HDP AtScale Intelligence Server Hortonworks HDP Netezza Data Mart Channel Feeds Tableau + MS Excel + R Channel Feeds Tableau + MS Excel
  • 13.
    13 © HortonworksInc. 2011 – 2016. All Rights Reserved Campaign Paid-Search Effectiveness: Retail  Industry: Retail / eCommerce – Top US department store (by rev) – Online sales $4B+ & growing (11%+ total) – 800+ department stores nationwide  Results – Scale: Millions paid keywords analyzed – Speed: Eliminate extract step – Insight: Operationalized closed-loop analysis  insight  decision  action – Impact: Make and save $ millions w/ instant bid decisions over 6-week season  that drives 60% annual revenue Before After Hortonworks HDP AtScale Intelligence Server Hortonworks HDP Vertica Data Marts Ad & Paid Keywords Cognos + Tableau + Excel Ad & Paid Keywords Tableau + Excel Leading Retailer
  • 14.
    14 © HortonworksInc. 2011 – 2016. All Rights Reserved Client and Patient Analysis  Industry: Managed Health Care – Member of Fortune 100 – Health, life + other insurance products – ~ 52 million members; medical/dental/pharm  Results – Scalable: BI directly on 264+ nodes data – Time: Eliminate data movement step – 62x query performance improvement – Speed: <2.2 second average query time – Insight: Tableau on Hadoop for 1000+ – Security: Access control by user; HIPAA Before After Leading Managed Healthcare Provider Hortonworks HDP AtScale Intelligence Server Hortonworks HDP Netezza Data Mart Client / Patient Details Tableau + MS Excel Client / Patient Details Tableau + MS Excel
  • 15.
    15 © HortonworksInc. 2011 – 2016. All Rights Reserved Solution Architecture Inbound HDFS (Based Data and Aggregates Stored in ORC) HIVE (Batch and Interactive SQL) HORTONWORKS DATA PLATFORM (HDP) MULTITENANT PROCESSING: YARN (syncsort, llap, spark, tez) AtScale virtual cube DMX Data Funnel DMX-h Engine EDW/ Legacy 4. Build Virtual Cube using AtScale 5. Build aggregates in Atscale for optimization 6. Query data using BI Tool like Tableau/Excel through odbc/jdbc connection High Level Flow 1. Install HDP, Syncsort and AtScale 2. Install EDW/Hive Drivers on Edge Node 3. Bring all tables involved in use case using Syncsort data funnel into Hive
  • 16.
    16 © HortonworksInc. 2011 – 2016. All Rights Reserved Hortonworks EDW Optimization Solution Components Syncsort High-Performance Data Movement Hadoop Scalable Storage and Compute Hive LLAP High Performance SQL Data Mart AtScale Intelligence Platform OLAP Cubes for Higher Performance Source Data Systems Fast, scalable SQL analytics Intelligent in-memory caching Define OLAP cubes for 10x faster queries Unified semantic layer for all BI tools High performance data import from all major EDW platforms Pre-aggregated data ... Or, full-fidelity data
  • 17.
    17 © HortonworksInc. 2011 – 2016. All Rights Reserved ETL Workflow Onboarding: SyncSort DMX-h
  • 18.
    18 © HortonworksInc. 2011 – 2016. All Rights Reserved Hybrid Query Service ❑ Choice of BI Tool ❑ Zero Client Install ❑ Secure Data Access ❑ Optimized Queries
  • 19.
    19 © HortonworksInc. 2011 – 2016. All Rights Reserved Enterprise Data Optimization Solution Components  Hortonworks: 24 nodes of Enterprise Plus Support  Syncsort: 24 nodes of DMX-H  AtScale: 24 nodes of AtScale Intelligence Platform  Single Legacy Data source  1 Fact table with 5 Dimensions  Load up to 15 tables  One time data dump  Up to 1 cube with 10 measures  1 BI Connection  5TB Total Cube Limit 12 month license and support offering Pre-packaged Professional Services
  • 20.
    20 © HortonworksInc. 2011 – 2016. All Rights Reserved Future Proof  Hive Optimizations – Hve, Tez, ORC, LLAP – Additional SQL coverage  ACID Merge for SQL 2011 compliant (Upsert)  Business Continuity Options – Replication – Backup/Restore  Additional Hive options tech preview in 2.6
  • 21.
    21 © HortonworksInc. 2011 – 2016. All Rights Reserved EDW Package: Professional Services ‘Proof of Value’ 1. Install HDP, AtScale and Syncsort 2. Configure drivers for appropriate EDW and Hive on Edge Node 3. Enable and configure Interactive Hive (LLAP) 4. Ingest data from 1 legacy system 5. Create up to 3 BI cubes 6. Support connection to BI Tool 7. Demo of capabilities ( functionality and Performance). Under 10 second response time. 8. Solution Architecture Document and Schema definition
  • 22.
    22 © HortonworksInc. 2011 – 2016. All Rights Reserved EDW Optimization Solution - Try It Now! Tool-based approach means we can leverage existing skillsets Proof points in 60 days Integrated into my BI tool stack Hive supports scaled queries and fast queries It works!
  • 23.
    23 © HortonworksInc. 2011 – 2016. All Rights Reserved To Learn More  Everyone will receive a free copy of Forrester White Paper titled ”The Next-Generation EDW Is The Big Data Warehouse”  EDW Optimization with HDP – http://hortonworks.com/solutions/edw-optimization/ – EDW Optimization 7 min video  AtScale Intelligence Platform – http://hortonworks.com/partner/atscale/  Syncsort DMX-h – http://hortonworks.com/partner/syncsort/
  • 24.
    24 © HortonworksInc. 2011 – 2016. All Rights Reserved Hortonworks Connected Data Platforms and Solutions Hortonworks Connection Hortonworks Solutions Enterprise Data Warehouse Optimization Cyber Security and Threat Management Internet of Things and Streaming Analytics Hortonworks Connection Subscription Support SmartSense Premier Support Educational Services Professional Services Community Connection Cloud Hortonworks Data Cloud AWS HDInsight Data Center Hortonworks Data Suite HDFHDP
  • 25.
    25 © HortonworksInc. 2011 – 2016. All Rights Reserved Thank You