1 © Hortonworks Inc. 2011–2018. All rights reserved
Running Enterprise Workloads with an Open
Source Hybrid Cloud Data Architecture
Sanjay Radia, Chief Architect and Co-founder Hortonworks
Alan Gates, Co-founder Hortonworks
2 © Hortonworks Inc. 2011–2018. All rights reserved
HDF HDP
Next Generation Data Problems
My Data Is Spread Across Multiple
Clusters and Data Sources
I Store & Analyze Data From
ERP/CRM, Systems, IoT/ Mobile
Devices, Social Media, Geo
Location etc.
Some of my data is on-premise,
some is in the cloud. I move my data
from cloud to on-premise & vice
versa between different clouds
™ ®
3 © Hortonworks Inc. 2011–2018. All rights reserved
Data Is Your Business
Focus on Your Data Strategy
●Consider how you store, manage and protect your data
●Data must be made known, discoverable, available, trusted and compliant
●Security and Governance of all data is paramount
●Stewardship, discovery, delivery and use of data is a key concern
Treat Your Data as a Strategic Asset
●Turn data into predictive and prescriptive analytics
●Enable self-service analytics to accelerate delivery of new business insights
●Build a solid foundation for higher value Data Science, ML and AI
●Data explosion is uncovering new possibilities – if you can seize them
The Next Generation of Data Problems require a Data Strategy
Big Data Platform Owners
Balancing Enterprise Requirements for Hybrid Cloud Data Strategy
Time to Insight
Access a Broad Set of Analytics Tools
On-demand, Self-service Access
Data Discovery, Provisioning and Deployment
Global Data Access Transparent of Location
Single Pane of Glass
Reduce Risk
Consistent Security and Governance
Manage Cloud and Shadow Spend
Retain Data Context, Lineage and Visibility
Operational Reliability, Portability
Remain Cloud Agnostic
Data Analyst, Data Engineer
and Data Scientists
Line of Business practitioners vs Enterprise IT stakeholders
5 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
You Have Data Everywhere
Cluster 2
(Unstructured)
Cluster 1
(Structured
)
Cluster 2
(Unstructured)
Cluster 1
(Structured
)
Cluster 3
(Structured
)
Data Center Dublin
Cluster 2
(Unstructured)
Cluster 1
(Structured
)
Cluster 3
(Structured
)
Cluster 4
(Unstructured)
Data Center Las Vegas
Cluster 2
(Unstructured)
Cluster 1
(Structured
)
Cluster 3
(Structured
)
Data Center Bangkok
Cluster 1
(Unstructured)
Cluster 2
(Structured
)
6 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Data Plane Service is the Global Data Fabric
Cluster 2
(Unstructured)
Cluster 1
(Structured
)
Cluster 2
(Unstructured)
Cluster 1
(Structured
)
Cluster 3
(Structured
)
Data Center Dublin
Cluster 2
(Unstructured)
Cluster 1
(Structured
)
Cluster 3
(Structured
)
Cluster 4
(Unstructured)
Data Center Las Vegas
Cluster 2
(Unstructured)
Cluster 1
(Structured
)
Cluster 3
(Structured
)
Data Center Bangkok
Cluster 1
(Unstructured)
Cluster 2
(Structured
)
Shared
Services
Connectivity
Application
Portability
7 © Hortonworks Inc. 2011–2018. All rights reserved
Hortonworks Data Plane Service Enables a Hybrid Architecture for
Global Data Management
From the edge, through movement, to rest
Hortonworks DataPlane Service
a foundational platform for the delivery of data
solutions that will:
• Support enterprise hybrid deployment strategy
and adoption of cloud
• Common Metadata, Security and Governance
across all deployments
• Simplified enterprise data asset management
• Support variety of workloads
• Extensible to new services: Services enablement
layer for rapidly bringing new solutions to market
HORTONWORKS
DATAPLANE
SERVICE
MULTIPLE CLUSTERS AND SOURCES
MULTIHYBRID
Manage, Secure, Govern
DATA AT REST
Hortonworks
Data Platform
DATA IN MOTION
Hortonworks
Data Flow
8 © Hortonworks Inc. 2011–2018. All rights reserved
The DPS Ecosystem
DPS PLATFORM
DATA
LIFECYCLE
MANAGER
DATA
STEWARD
STUDIO*
DATA
ANALYTICS
STUDIO*
STREAMS
MESSAGING
MANAGER
DATA PLANE SERVICES
Authentication, Role-based access, Service lifecycle management,
Cluster registration, Cluster Service discovery and access
HDP/HDF Cluster
DLM Engine
Profiler
Service
DAS Agent
SMM Agent
9 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
⬢ Manage the Data Lifecycle:
– Replication/failback to another cloud/on-prem
site for Disaster Recovery
– Auto Tiering of hot/warm/cold data to cloud
object storage/on-prem for TCO reduction
– Backup & Recover Critical Business Data
⬢ Maintain Common Security and Governance Policies
Across Multi Data Sources/ Environments
Data Lifecycle Manager (DLM)
DATA LIFECYCLE MANAGER
REPLICATION &
DISASTER
RECOVERY
Cluster Cluster ClusterMOVE MOVE
AUTO TIERING
BACKUP &
RESTORE
P(use): high
Cost: $$$
P(use): medium
Cost: $$
P(use): low
Cost: $
Full
backup
day 1 day 2 day 3
Cumulative incremental
backups
Accident
delete
X
FAILBACK
REPLICATION
RESTORE
Prod
Cluster
Backup
Cluster
Generally
Available
Coming Soon
Coming Soon
DLM
10 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
DLM 1.0 (GA Product) DLM: Pair clusters and manage data replication flows
Data Lifecycle Manager (DLM)
11 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
DLM: Replicate between on-prem and cloud
DPS PlatformData Lifecycle Manager (DLM)
12 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
DLM: Replication policies and instances
Data Lifecycle Manager (DLM)Data Lifecycle Manager (DLM)
13 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
14 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Enhance productivity through full featured auto-
complete, results direct download, quick-data
preview features
Data Analytics Studio (DAS)
15 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Self optimize queries and storage based on heuristic
recommendation engine
Data Analytics Studio (DAS)
16 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Built-in batch operations
No more scripting needed for day-to-day operations
Data Analytics Studio (DAS)Data Analytics Studio (DAS)
17 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
Hortonworks Streams Messaging Manager (SMM)
What is SMM?
à Kafka Management and Monitoring tool
à Single Monitoring Dashboard for all your
Kafka Clusters across 4 entities
– Broker
– Producer
– Topic
– Consumer
à Supports multiple HDP and/or HDF Kafka
Clusters
à REST as a First Class Citizen
à Delivered as a DataPlane Service
18 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
SMMSMM: Full visibility into all details of Kafka Clusters
DPS PlatformStreams Messaging Manager
19 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
SMMSMM: Detailed Views of specific Topics
DPS PlatformStreams Messaging Manager
20 © Hortonworks, Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
SMMSMM: All producers and Consumers associated with a
topic
DPS PlatformStreams Messaging Manager
21 © Hortonworks Inc. 2011–2018. All rights reserved.
Goals
22 © Hortonworks Inc. 2011–2018. All rights reserved.
Know your Sensitive Data
• Automatically detect and
profile sensitive & personal
data
• Attach classification
annotations for sensitivity
• Manual approval and curation
of sensitive data
classifications
• Leverage classification based
data protection
• Sensitive data dashboard on
Asset 360
Sensitive Data Profiling
23 © Hortonworks Inc. 2011–2018. All rights reserved.
Track your Sensitive Data
• IBAN (27 EU Countries)
• Credit Card Numbers
• Email
• Telephone (AMER, EU)
• IP Address
• URL
• Passport (12 EU Countries)
• National ID (19 EU Countries)
• Australian Drivers License
• Australian Passport
• Australian National ID
Sensitive Data Types
24 © Hortonworks Inc. 2011–2018. All rights reserved.
Track Your Data Asset – Lineage and Impact
• Consolidated Upstream lineage and
downstream impact
• Detailed click-through to asset properties
Data Lineage and Impact
25 © Hortonworks Inc. 2011–2018. All rights reserved.
View Security Policies for your Data Assets
• View security policies on
data assets
• View classification based
policies on assets
Security Policies
26 © Hortonworks Inc. 2011–2018. All rights reserved
Thank you!

Running Enterprise Workloads with an open source Hybrid Cloud Data Architecture- Tokyo

  • 1.
    1 © HortonworksInc. 2011–2018. All rights reserved Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture Sanjay Radia, Chief Architect and Co-founder Hortonworks Alan Gates, Co-founder Hortonworks
  • 2.
    2 © HortonworksInc. 2011–2018. All rights reserved HDF HDP Next Generation Data Problems My Data Is Spread Across Multiple Clusters and Data Sources I Store & Analyze Data From ERP/CRM, Systems, IoT/ Mobile Devices, Social Media, Geo Location etc. Some of my data is on-premise, some is in the cloud. I move my data from cloud to on-premise & vice versa between different clouds ™ ®
  • 3.
    3 © HortonworksInc. 2011–2018. All rights reserved Data Is Your Business Focus on Your Data Strategy ●Consider how you store, manage and protect your data ●Data must be made known, discoverable, available, trusted and compliant ●Security and Governance of all data is paramount ●Stewardship, discovery, delivery and use of data is a key concern Treat Your Data as a Strategic Asset ●Turn data into predictive and prescriptive analytics ●Enable self-service analytics to accelerate delivery of new business insights ●Build a solid foundation for higher value Data Science, ML and AI ●Data explosion is uncovering new possibilities – if you can seize them The Next Generation of Data Problems require a Data Strategy
  • 4.
    Big Data PlatformOwners Balancing Enterprise Requirements for Hybrid Cloud Data Strategy Time to Insight Access a Broad Set of Analytics Tools On-demand, Self-service Access Data Discovery, Provisioning and Deployment Global Data Access Transparent of Location Single Pane of Glass Reduce Risk Consistent Security and Governance Manage Cloud and Shadow Spend Retain Data Context, Lineage and Visibility Operational Reliability, Portability Remain Cloud Agnostic Data Analyst, Data Engineer and Data Scientists Line of Business practitioners vs Enterprise IT stakeholders
  • 5.
    5 © Hortonworks,Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. You Have Data Everywhere Cluster 2 (Unstructured) Cluster 1 (Structured ) Cluster 2 (Unstructured) Cluster 1 (Structured ) Cluster 3 (Structured ) Data Center Dublin Cluster 2 (Unstructured) Cluster 1 (Structured ) Cluster 3 (Structured ) Cluster 4 (Unstructured) Data Center Las Vegas Cluster 2 (Unstructured) Cluster 1 (Structured ) Cluster 3 (Structured ) Data Center Bangkok Cluster 1 (Unstructured) Cluster 2 (Structured )
  • 6.
    6 © Hortonworks,Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. Data Plane Service is the Global Data Fabric Cluster 2 (Unstructured) Cluster 1 (Structured ) Cluster 2 (Unstructured) Cluster 1 (Structured ) Cluster 3 (Structured ) Data Center Dublin Cluster 2 (Unstructured) Cluster 1 (Structured ) Cluster 3 (Structured ) Cluster 4 (Unstructured) Data Center Las Vegas Cluster 2 (Unstructured) Cluster 1 (Structured ) Cluster 3 (Structured ) Data Center Bangkok Cluster 1 (Unstructured) Cluster 2 (Structured ) Shared Services Connectivity Application Portability
  • 7.
    7 © HortonworksInc. 2011–2018. All rights reserved Hortonworks Data Plane Service Enables a Hybrid Architecture for Global Data Management From the edge, through movement, to rest Hortonworks DataPlane Service a foundational platform for the delivery of data solutions that will: • Support enterprise hybrid deployment strategy and adoption of cloud • Common Metadata, Security and Governance across all deployments • Simplified enterprise data asset management • Support variety of workloads • Extensible to new services: Services enablement layer for rapidly bringing new solutions to market HORTONWORKS DATAPLANE SERVICE MULTIPLE CLUSTERS AND SOURCES MULTIHYBRID Manage, Secure, Govern DATA AT REST Hortonworks Data Platform DATA IN MOTION Hortonworks Data Flow
  • 8.
    8 © HortonworksInc. 2011–2018. All rights reserved The DPS Ecosystem DPS PLATFORM DATA LIFECYCLE MANAGER DATA STEWARD STUDIO* DATA ANALYTICS STUDIO* STREAMS MESSAGING MANAGER DATA PLANE SERVICES Authentication, Role-based access, Service lifecycle management, Cluster registration, Cluster Service discovery and access HDP/HDF Cluster DLM Engine Profiler Service DAS Agent SMM Agent
  • 9.
    9 © Hortonworks,Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. ⬢ Manage the Data Lifecycle: – Replication/failback to another cloud/on-prem site for Disaster Recovery – Auto Tiering of hot/warm/cold data to cloud object storage/on-prem for TCO reduction – Backup & Recover Critical Business Data ⬢ Maintain Common Security and Governance Policies Across Multi Data Sources/ Environments Data Lifecycle Manager (DLM) DATA LIFECYCLE MANAGER REPLICATION & DISASTER RECOVERY Cluster Cluster ClusterMOVE MOVE AUTO TIERING BACKUP & RESTORE P(use): high Cost: $$$ P(use): medium Cost: $$ P(use): low Cost: $ Full backup day 1 day 2 day 3 Cumulative incremental backups Accident delete X FAILBACK REPLICATION RESTORE Prod Cluster Backup Cluster Generally Available Coming Soon Coming Soon DLM
  • 10.
    10 © Hortonworks,Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. DLM 1.0 (GA Product) DLM: Pair clusters and manage data replication flows Data Lifecycle Manager (DLM)
  • 11.
    11 © Hortonworks,Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. DLM: Replicate between on-prem and cloud DPS PlatformData Lifecycle Manager (DLM)
  • 12.
    12 © Hortonworks,Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. DLM: Replication policies and instances Data Lifecycle Manager (DLM)Data Lifecycle Manager (DLM)
  • 13.
    13 © Hortonworks,Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information.
  • 14.
    14 © Hortonworks,Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. Enhance productivity through full featured auto- complete, results direct download, quick-data preview features Data Analytics Studio (DAS)
  • 15.
    15 © Hortonworks,Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. Self optimize queries and storage based on heuristic recommendation engine Data Analytics Studio (DAS)
  • 16.
    16 © Hortonworks,Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. Built-in batch operations No more scripting needed for day-to-day operations Data Analytics Studio (DAS)Data Analytics Studio (DAS)
  • 17.
    17 © Hortonworks,Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. Hortonworks Streams Messaging Manager (SMM) What is SMM? à Kafka Management and Monitoring tool à Single Monitoring Dashboard for all your Kafka Clusters across 4 entities – Broker – Producer – Topic – Consumer à Supports multiple HDP and/or HDF Kafka Clusters à REST as a First Class Citizen à Delivered as a DataPlane Service
  • 18.
    18 © Hortonworks,Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. SMMSMM: Full visibility into all details of Kafka Clusters DPS PlatformStreams Messaging Manager
  • 19.
    19 © Hortonworks,Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. SMMSMM: Detailed Views of specific Topics DPS PlatformStreams Messaging Manager
  • 20.
    20 © Hortonworks,Inc. 2011-2018. All rights reserved. Hortonworks confidential and proprietary information. SMMSMM: All producers and Consumers associated with a topic DPS PlatformStreams Messaging Manager
  • 21.
    21 © HortonworksInc. 2011–2018. All rights reserved. Goals
  • 22.
    22 © HortonworksInc. 2011–2018. All rights reserved. Know your Sensitive Data • Automatically detect and profile sensitive & personal data • Attach classification annotations for sensitivity • Manual approval and curation of sensitive data classifications • Leverage classification based data protection • Sensitive data dashboard on Asset 360 Sensitive Data Profiling
  • 23.
    23 © HortonworksInc. 2011–2018. All rights reserved. Track your Sensitive Data • IBAN (27 EU Countries) • Credit Card Numbers • Email • Telephone (AMER, EU) • IP Address • URL • Passport (12 EU Countries) • National ID (19 EU Countries) • Australian Drivers License • Australian Passport • Australian National ID Sensitive Data Types
  • 24.
    24 © HortonworksInc. 2011–2018. All rights reserved. Track Your Data Asset – Lineage and Impact • Consolidated Upstream lineage and downstream impact • Detailed click-through to asset properties Data Lineage and Impact
  • 25.
    25 © HortonworksInc. 2011–2018. All rights reserved. View Security Policies for your Data Assets • View security policies on data assets • View classification based policies on assets Security Policies
  • 26.
    26 © HortonworksInc. 2011–2018. All rights reserved Thank you!