1 © Hortonworks Inc. 2011–2018. All rights reserved
Data in the Cloud
DataWorks Summit - Singapore
Oct 2018
2 © Hortonworks Inc. 2011–2018. All rights reserved
Who Am I ?
Santhosh B Gowda
Engineer @ Hortonworks
3 © Hortonworks Inc. 2011–2018. All rights reserved
Agenda
• Introduction
• Primary Use Cases
• Cloudbreak Architecture
• Cloudbreak Core Concepts
• Credentials
• Blueprints
• Recipes
• Images
• CLI
• Streaming in Cloudbreak
• Shared Services Data Lake
• Hands-On Lab
4 © Hortonworks Inc. 2011–2018. All rights reserved
No Upfront
HW Costs
$0
Unlimited
Elastic Scale
Ephemeral &
Long-Running
IT &
Business Agility
Why Data on Cloud ?
5 © Hortonworks Inc. 2011–2018. All rights reserved
What Is Cloudbreak ?
Cloudbreak is a tool for provisioning Hadoop
clusters on any cloud infrastructure
Simplified Cluster Provisioning - prescriptive
setup, simple automation
6 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak: Harness the Agility of Cloud with Ease
Cloudbreak
• Declarative workload
provisioning across
multiple cloud providers
• Flexible topologies and
security configuration
options
• DevOps friendly, easy setup
and simple to automate
• Built-in elasticity and auto-
scaling
• Prescriptive integration
with cloud services
7 © Hortonworks Inc. 2011–2018. All rights reserved
Dev / Test
(all HDP services)
BI / Analytics
(Hive)
Data Science
(Spark)
IoT Apps
(Storm, HBase, Hive)
Cloudbreak
Cloudbreak
1. Pick a Blueprint
2. Choose a Cloud
3. Launch HDP/HDF!
Example Ambari Blueprints:
IoT Apps, BI / Analytics, Data Science,
Dev / Test
8 © Hortonworks Inc. 2011–2018. All rights reserved
Create Cluster
Manage
Credentials
Manage
Templates
Manage
Network
Manage
Security
Groups
Manage
Blueprints
Advanced
Configurati
on
Configure Credentials for
Cloudbreak for connecting
to a Cloud Provider
Configure size of VM
images
Manage who can access
HDP clusters
Define network in which
HDP cluster will run Define HDP Services
Configure HDP Repository
Define Failure Actions
Cloud Provider Specific Configurations
HDP Specific Configurations
10 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak UI
11 © Hortonworks Inc. 2011–2018. All rights reserved
Manage Cluster
12 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak CLI
DevOps
13 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak CLI
14 © Hortonworks Inc. 2011–2018. All rights reserved
“SHOW CLI COMMAND”
15 © Hortonworks Inc. 2011–2018. All rights reserved
Recipes
DevOps
16 © Hortonworks Inc. 2011–2018. All rights reserved
Background: Recipes
• Cloudbreak lets you provision cluster using Ambari Blueprint however not all use-cases
can be addressed.
• Install additional software.
• System config changes.
• A recipe is a script that runs on all nodes of a selected node group at a specific time.
• Support for bash and python scripts.
• Available hooks
• Pre-ambari-start
• Post-ambari-start
• Post-cluster-install
• Pre-termination
17 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak: Add Recipes
• Cluster Extensions > Recipes >
Create
• Add recipe as File, Url or Text
18 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak: Add Recipes
• Clusters > Create Cluster >
Cluster Extensions
19 © Hortonworks Inc. 2011–2018. All rights reserved
Streaming in the Cloud
20 © Hortonworks Inc. 2011–2018. All rights reserved
21 © Hortonworks Inc. 2011–2018. All rights reserved
HDF in CloudBreak 2.7
Apache NiFi and Apache NiFi Registry
22 © Hortonworks Inc. 2011–2018. All rights reserved
HDF in CloudBreak 2.7
Apache Kafka
23 © Hortonworks Inc. 2011–2018. All rights reserved
Auto Scaling
24 © Hortonworks Inc. 2011–2018. All rights reserved
Auto-Scaling
• Alerts: Create metric or time-based alerts for cluster scaling
• Policies: Scaling policies adjust cluster size based on activity and workload alerts
• General Configurations: Boundaries and cooldown period
25 © Hortonworks Inc. 2011–2018. All rights reserved
Auto-Scaling Time-Based Alert
Fire at 10:15 am everyday
26 © Hortonworks Inc. 2011–2018. All rights reserved
Auto-Scaling Metric-Based Alert
Fire after NodeManagers are
CRITICAL for 10 minutes
27 © Hortonworks Inc. 2011–2018. All rights reserved
Auto-Scaling Policies
• Define the Scale Adjustment (Node Count, Percentage, Exact)
• Select the Host Group (to Scale)
• Select Alert (which when fired, executes the Policy)
28 © Hortonworks Inc. 2011–2018. All rights reserved
Data Lake Shared
Services
29 © Hortonworks Inc. 2011–2018. All rights reserved
Why Data Lake Shared Services
• Customers have a need to secure ephemeral workload clusters
• Customers need a single metadata repository for Hive schema
• Customers want a single pane of glass to define users, groups and authorization policies
30 © Hortonworks Inc. 2011–2018. All rights reserved
Introducing “Data Lake” in Cloudbreak
CLOUD STORAGE WORKLOAD
S
Durable Ephemeral
When data resides in cloud object
stores (e.g. S3, ADLS, WASB, GCS),
Hadoop optimizes reads/writes
and acts as an intermediate cache
to increase performance and
decrease latency.
Metastore
SCHEM
A
Long Running
Security access to workload
clusters via a Protected Gateway
enabled for AuthN and HTTPS.
Define your data schema,
security policies, and metadata
catalog once for your ephemeral
and always-on workloads.
Atlas
CATALOG
Ranger
POLIC
Y
SHARED DATA LAKE SERVICES
31 © Hortonworks Inc. 2011–2018. All rights reserved
Data Lake: Flyover
LDAP/AD
Hive
Database
Ranger
Database
Cloud
Storage
Data Lake Workload
Cluster(s)
Ranger
Hive
Metastore
Hive, Spark,
Zeppelin
Attach
32 © Hortonworks Inc. 2011–2018. All rights reserved
Learn More
• Try Cloudbreak 2.8 (TP)
• http://docs.hortonworks.com
33 © Hortonworks Inc. 2011–2018. All rights reserved
Thank You
35 © Hortonworks Inc. 2011–2018. All rights reserved
Launch HDP from
Cloudbreak UI
(Demo#1)
36 © Hortonworks Inc. 2011–2018. All rights reserved
Launch HDF from
Cloudbreak UI
(Demo#2)
37 © Hortonworks Inc. 2011–2018. All rights reserved
Launch HDP from CLI
(Demo#3)

Data in the Cloud Crash Course

  • 1.
    1 © HortonworksInc. 2011–2018. All rights reserved Data in the Cloud DataWorks Summit - Singapore Oct 2018
  • 2.
    2 © HortonworksInc. 2011–2018. All rights reserved Who Am I ? Santhosh B Gowda Engineer @ Hortonworks
  • 3.
    3 © HortonworksInc. 2011–2018. All rights reserved Agenda • Introduction • Primary Use Cases • Cloudbreak Architecture • Cloudbreak Core Concepts • Credentials • Blueprints • Recipes • Images • CLI • Streaming in Cloudbreak • Shared Services Data Lake • Hands-On Lab
  • 4.
    4 © HortonworksInc. 2011–2018. All rights reserved No Upfront HW Costs $0 Unlimited Elastic Scale Ephemeral & Long-Running IT & Business Agility Why Data on Cloud ?
  • 5.
    5 © HortonworksInc. 2011–2018. All rights reserved What Is Cloudbreak ? Cloudbreak is a tool for provisioning Hadoop clusters on any cloud infrastructure Simplified Cluster Provisioning - prescriptive setup, simple automation
  • 6.
    6 © HortonworksInc. 2011–2018. All rights reserved Cloudbreak: Harness the Agility of Cloud with Ease Cloudbreak • Declarative workload provisioning across multiple cloud providers • Flexible topologies and security configuration options • DevOps friendly, easy setup and simple to automate • Built-in elasticity and auto- scaling • Prescriptive integration with cloud services
  • 7.
    7 © HortonworksInc. 2011–2018. All rights reserved Dev / Test (all HDP services) BI / Analytics (Hive) Data Science (Spark) IoT Apps (Storm, HBase, Hive) Cloudbreak Cloudbreak 1. Pick a Blueprint 2. Choose a Cloud 3. Launch HDP/HDF! Example Ambari Blueprints: IoT Apps, BI / Analytics, Data Science, Dev / Test
  • 8.
    8 © HortonworksInc. 2011–2018. All rights reserved Create Cluster Manage Credentials Manage Templates Manage Network Manage Security Groups Manage Blueprints Advanced Configurati on Configure Credentials for Cloudbreak for connecting to a Cloud Provider Configure size of VM images Manage who can access HDP clusters Define network in which HDP cluster will run Define HDP Services Configure HDP Repository Define Failure Actions Cloud Provider Specific Configurations HDP Specific Configurations
  • 9.
    10 © HortonworksInc. 2011–2018. All rights reserved Cloudbreak UI
  • 10.
    11 © HortonworksInc. 2011–2018. All rights reserved Manage Cluster
  • 11.
    12 © HortonworksInc. 2011–2018. All rights reserved Cloudbreak CLI DevOps
  • 12.
    13 © HortonworksInc. 2011–2018. All rights reserved Cloudbreak CLI
  • 13.
    14 © HortonworksInc. 2011–2018. All rights reserved “SHOW CLI COMMAND”
  • 14.
    15 © HortonworksInc. 2011–2018. All rights reserved Recipes DevOps
  • 15.
    16 © HortonworksInc. 2011–2018. All rights reserved Background: Recipes • Cloudbreak lets you provision cluster using Ambari Blueprint however not all use-cases can be addressed. • Install additional software. • System config changes. • A recipe is a script that runs on all nodes of a selected node group at a specific time. • Support for bash and python scripts. • Available hooks • Pre-ambari-start • Post-ambari-start • Post-cluster-install • Pre-termination
  • 16.
    17 © HortonworksInc. 2011–2018. All rights reserved Cloudbreak: Add Recipes • Cluster Extensions > Recipes > Create • Add recipe as File, Url or Text
  • 17.
    18 © HortonworksInc. 2011–2018. All rights reserved Cloudbreak: Add Recipes • Clusters > Create Cluster > Cluster Extensions
  • 18.
    19 © HortonworksInc. 2011–2018. All rights reserved Streaming in the Cloud
  • 19.
    20 © HortonworksInc. 2011–2018. All rights reserved
  • 20.
    21 © HortonworksInc. 2011–2018. All rights reserved HDF in CloudBreak 2.7 Apache NiFi and Apache NiFi Registry
  • 21.
    22 © HortonworksInc. 2011–2018. All rights reserved HDF in CloudBreak 2.7 Apache Kafka
  • 22.
    23 © HortonworksInc. 2011–2018. All rights reserved Auto Scaling
  • 23.
    24 © HortonworksInc. 2011–2018. All rights reserved Auto-Scaling • Alerts: Create metric or time-based alerts for cluster scaling • Policies: Scaling policies adjust cluster size based on activity and workload alerts • General Configurations: Boundaries and cooldown period
  • 24.
    25 © HortonworksInc. 2011–2018. All rights reserved Auto-Scaling Time-Based Alert Fire at 10:15 am everyday
  • 25.
    26 © HortonworksInc. 2011–2018. All rights reserved Auto-Scaling Metric-Based Alert Fire after NodeManagers are CRITICAL for 10 minutes
  • 26.
    27 © HortonworksInc. 2011–2018. All rights reserved Auto-Scaling Policies • Define the Scale Adjustment (Node Count, Percentage, Exact) • Select the Host Group (to Scale) • Select Alert (which when fired, executes the Policy)
  • 27.
    28 © HortonworksInc. 2011–2018. All rights reserved Data Lake Shared Services
  • 28.
    29 © HortonworksInc. 2011–2018. All rights reserved Why Data Lake Shared Services • Customers have a need to secure ephemeral workload clusters • Customers need a single metadata repository for Hive schema • Customers want a single pane of glass to define users, groups and authorization policies
  • 29.
    30 © HortonworksInc. 2011–2018. All rights reserved Introducing “Data Lake” in Cloudbreak CLOUD STORAGE WORKLOAD S Durable Ephemeral When data resides in cloud object stores (e.g. S3, ADLS, WASB, GCS), Hadoop optimizes reads/writes and acts as an intermediate cache to increase performance and decrease latency. Metastore SCHEM A Long Running Security access to workload clusters via a Protected Gateway enabled for AuthN and HTTPS. Define your data schema, security policies, and metadata catalog once for your ephemeral and always-on workloads. Atlas CATALOG Ranger POLIC Y SHARED DATA LAKE SERVICES
  • 30.
    31 © HortonworksInc. 2011–2018. All rights reserved Data Lake: Flyover LDAP/AD Hive Database Ranger Database Cloud Storage Data Lake Workload Cluster(s) Ranger Hive Metastore Hive, Spark, Zeppelin Attach
  • 31.
    32 © HortonworksInc. 2011–2018. All rights reserved Learn More • Try Cloudbreak 2.8 (TP) • http://docs.hortonworks.com
  • 32.
    33 © HortonworksInc. 2011–2018. All rights reserved Thank You
  • 33.
    35 © HortonworksInc. 2011–2018. All rights reserved Launch HDP from Cloudbreak UI (Demo#1)
  • 34.
    36 © HortonworksInc. 2011–2018. All rights reserved Launch HDF from Cloudbreak UI (Demo#2)
  • 35.
    37 © HortonworksInc. 2011–2018. All rights reserved Launch HDP from CLI (Demo#3)