Data in the Cloud Crash Course

1 © Hortonworks Inc. 2011–2018. All rights reserved
Data in the Cloud
Santhosh B Gowda
Engineer, Cloudera
DataWorks Summit – Melbourne Feb 2019

Who Am I ?
Santhosh B Gowda
Engineer @ Cloudera

Agenda
• Introduction
• Primary Use Cases
• Cloudbreak Architecture
• Cloudbreak Core Concepts
• Credentials
• Blueprints
• Recipes
• Images
• CLI
• Streaming in Cloudbreak
• Shared Services Data Lake
• Hands-On Lab

No Upfront
HW Costs
$0
Unlimited
Elastic Scale
Ephemeral &
Long-Running
IT &
Business Agility
Why Data on Cloud ?

What Is Cloudbreak ?
Cloudbreak is a tool for provisioning Hadoop
clusters on any cloud infrastructure
Simplified Cluster Provisioning - prescriptive
setup, simple automation

Cloudbreak: Harness the Agility of Cloud with Ease
Cloudbreak
• Declarative workload
provisioning across
multiple cloud providers
• Flexible topologies and
security configuration
options
• DevOps friendly, easy setup
and simple to automate
• Built-in elasticity and auto-
scaling
• Prescriptive integration
with cloud services

Dev / Test
(all HDP services)
BI / Analytics
(Hive)
Data Science
(Spark)
IoT Apps
(Storm, HBase, Hive)
Cloudbreak
Cloudbreak
1. Pick a Blueprint
2. Choose a Cloud
3. Launch HDP/HDF!
Example Ambari Blueprints:
IoT Apps, BI / Analytics, Data Science,
Dev / Test

Create Cluster
Manage
Credentials
Manage
Templates
Manage
Network
Manage
Security
Groups
Manage
Blueprints
Advanced
Configurati
on
Configure Credentials for
Cloudbreak for connecting
to a Cloud Provider
Configure size of VM
images
Manage who can access
HDP clusters
Define network in which
HDP cluster will run Define HDP Services
Configure HDP Repository
Define Failure Actions
Cloud Provider Specific Configurations
HDP Specific Configurations

Cloudbreak UI

Manage Cluster

Cloudbreak CLI
DevOps

Cloudbreak CLI

“SHOW CLI COMMAND”

Recipes
DevOps

Background: Recipes
• Cloudbreak lets you provision cluster using Ambari Blueprint however not all use-cases
can be addressed.
• Install additional software.
• System config changes.
• A recipe is a script that runs on all nodes of a selected node group at a specific time.
• Support for bash and python scripts.
• Available hooks
• Pre-ambari-start
• Post-ambari-start
• Post-cluster-install
• Pre-termination

Cloudbreak: Add Recipes
• Cluster Extensions > Recipes > Create
• Add recipe as File, Url or Text

Cloudbreak: Add Recipes
• Clusters > Create Cluster >
Cluster Extensions

Streaming in the Cloud

HDF in CloudBreak 2.7
Apache NiFi and Apache NiFi Registry

HDF in CloudBreak 2.7
Apache Kafka

Auto Scaling

Auto-Scaling
• Alerts: Create metric or time-based alerts for cluster scaling
• Policies: Scaling policies adjust cluster size based on activity and workload alerts
• General Configurations: Boundaries and cooldown period

Auto-Scaling Time-Based Alert
Fire at 10:15 am everyday

Auto-Scaling Metric-Based Alert
Fire after NodeManagers are
CRITICAL for 10 minutes

Auto-Scaling Policies
• Define the Scale Adjustment (Node Count, Percentage, Exact)
• Select the Host Group (to Scale)
• Select Alert (which when fired, executes the Policy)

Data Lake
Shared Services

Why Data Lake Shared Services
• Customers have a need to secure ephemeral workload clusters
• Customers need a single metadata repository for Hive schema
• Customers want a single pane of glass to define users, groups and authorization policies

Introducing “Data Lake” in Cloudbreak
CLOUD STORAGE WORKLOAD
S
Durable Ephemeral
When data resides in cloud object
stores (e.g. S3, ADLS, WASB, GCS),
Hadoop optimizes reads/writes
and acts as an intermediate cache
to increase performance and
decrease latency.
Metastore
SCHEM
A
Long Running
Security access to workload
clusters via a Protected Gateway
enabled for AuthN and HTTPS.
Define your data schema,
security policies, and metadata
catalog once for your ephemeral
and always-on workloads.
Atlas
CATALOG
Ranger
POLIC
Y
SHARED DATA LAKE SERVICES

Data Lake: Flyover
LDAP/AD
Hive
Database
Ranger
Database
Cloud
Storage
Data Lake Workload
Cluster(s)
Ranger
Hive
Metastore
Hive, Spark,
Zeppelin
Attach

Learn More
• Try Cloudbreak 2.8 (TP)
• http://docs.hortonworks.com

Thank You

Launch HDP from
Cloudbreak UI
(Demo#1)

Launch HDF from
Cloudbreak UI
(Demo#2)

Launch HDP from CLI
(Demo#3)

Data in the Cloud Crash Course

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data in the Cloud Crash Course

Similar to Data in the Cloud Crash Course (20)

More from DataWorks Summit

More from DataWorks Summit (20)

Recently uploaded

Recently uploaded (20)

Data in the Cloud Crash Course