SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Running Enterprise
Workloads in the Cloud
DataWorks Summit - San Jose
June 2018
2 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Presenters
Jeff Sposetti
Product Manager @ Hortonworks
Attila Kanto
Principal Engineer @ Hortonworks
3 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Agenda
 Introduction
 Cloudbreak
 Demo #1: Flyover
 Advanced Topics
 Demo #2: Deeper Dive
 Lessons Learned in the Cloud
 Wrap Up
4 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
No Upfront
HW Costs
Unlimited
Elastic Scale
Ephemeral &
Long-Running
IT &
Business Agility
Why Big Data Workloads in the Cloud?
5 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Cloudbreak: Harness the agility of cloud with ease
Cloudbreak
• Declarative workload provisioning
across cloud providers
• Flexible topologies and security
configuration options
• DevOps friendly, easy setup and
simple to automate
• Built-in elasticity and auto-scaling
• Prescriptive integration with cloud
services
AWS
Ambari HDP + HDF
Azure
Ambari HDP + HDF
6 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Deploy on Public or
Private Clouds
Dynamically configure and
manage clusters on public or
private clouds (Amazon Web
Services, Microsoft Azure,
Google Cloud Platform and
OpenStack)
Automated Scaling
Seamlessly manage elasticity
requirements as cluster
workloads change
Secured Cluster Access
Supports configuration
defining network boundaries,
configuring security groups,
gateway perimeter security
and enabling Kerberos
7 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Cloudbreak Building Blocks
• Cloud Credentials
• Ambari Blueprints
• Auto Scaling
• Custom Recipes
• Custom Images
• Network
• Gateway
• Kerberos Security
• Dynamic Blueprints
• Cloud Storage
Simple and Flexible Prescriptive Secure
8 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Demo #1
Flyover
9 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Advanced Topics
10 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Cloudbreak Building Blocks: Advanced Topics
• Cloud Credentials
• Ambari Blueprints
• Auto Scaling
• Custom Recipes
• Custom Images
• Network
• Gateway
• Kerberos Security
• Dynamic Blueprints
• Cloud Storage
Simple and Flexible Prescriptive Secure
Bringing it all together: Data Lake Shared Services
11 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Custom Images
12 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Background: Cloudbreak
1. Cloudbreak creates VM instances using a default base image.
2. Cloudbreak installs Ambari on a VM instance.
3. Cloudbreak instructs Ambari to install a cluster on the remaining VM instances.
Cloudbreak
Node
VM
Node
VM
Node
VM
Node
VM
Node
VM
Node
VM
Cluster
13 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Custom Images Overview
Create the
Custom Image
Register the
Custom Image
Use the Custom
Image when
Creating a
Cluster
1 2 3
14 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Kerberos Security
15 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Background: Kerberos
 Strongly authenticating and establishing a user’s identity is the basis for secure access in
Hadoop. Users need to be able to reliably “identify” themselves and then have that
identity propagated throughout the Hadoop cluster.
 Once this is done, those users can access resources (such as files or directories) or
interact with the cluster (like running MapReduce jobs).
 Besides users, Hadoop cluster resources themselves (such as Hosts and Services) need
to authenticate with each other to avoid potential malicious systems or daemon’s
“posing as” trusted components of the cluster to gain access to data.
16 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Background: Hadoop + Kerberos
Service
Component
A
Service
Component
B
Hadoop Cluster
KDC
keytabkeytab
Service
Component
C
keytab
Service
Component
D
keytab
Service
Component
X
Service
Component
X
keytabkeytab
Service
Component
X
keytab
Service
Component
X
keytab
Kerberos is used to
secure the
Components in the
cluster. Kerberos
identities are
managed via
“keytabs” on the
Component hosts.
Principals
for the
cluster are
managed in
the KDC.
17 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Background: Ambari Kerberos Support
 Ambari provides automated options for working w/ existing MIT KDC or Active Directory
 Can be highly customized to fit many enterprise requirements
– Templating for customizable principals
– Control of Kerberos Client install and krb5.conf configuration
– Highly-configurable service principal identity naming
 These options are available via Ambari UI as well as via Ambari Blueprints
– Blueprints can include “Kerberos Descriptor” for kerberos-env and krb5-conf
https://cwiki.apache.org/confluence/display/AMBARI/Automated+Kerberizaton
18 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Cloudbreak: Support for Enabling Kerberos
Goal
Provide a way for Cloudbreak users to create clusters that
are Kerberos enabled
Approach
Ambari exposes a-lot-of Kerberos options
Leverage Ambari Kerberos options and avoid re-creating
Ambari Kerberos experience
Be pragmatic about prescriptive options on-top
19 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Cloudbreak: Enable Kerberos Security
 Create Cluster > Security > Advanced
 [ ] Enable Kerberos Security
20 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Options: Use Existing KDC or Use Test KDC
Use Existing
KDC
Use Test KDC
Advanced
Basic
- Not for production use. For testing and
evaluation purposes only.
- Installs and configures an MIT KDC on the
master node.
- Configures the cluster to leverage that KDC.
- Provide basic information
about your existing KDC.
- Ambari Kerberos descriptors
are generated automatically.
- Provide basic information
about your existing KDC.
- Provide your own Ambari
Kerberos descriptors.
21 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Dynamic Blueprints
22 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Dynamic Blueprints: RDBMS and LDAP/AD
 Background:
– Cluster configuration often includes external database (for Hive, Ranger, etc) and LDAP/AD configs
– Users often have to create 1+ versions of the same Blueprint to handle different component
configurations for these external systems
– It’s a challenge to know the different Blueprint configuration choices per service across the stack
 Dynamic Blueprints:
– Ability to manage External Sources (e.g. RDBMS and LDAP/AD) outside of your Blueprint
– Cloudbreak will inject the configurations into your Blueprint
– Simplifies reuse of cluster configurations -> for external sources (RDBMS and LDAP/AD)
– Simplifies your Blueprints -> don’t have to know all the configurations for each component
23 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Dynamic Blueprints: RDBMS
Create an External Source Select during Create Cluster
24 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Dynamic Blueprints: RDBMS
 External Sources > Database Configurations
 Built-In Types: Ambari, Druid, Hive, Oozie, Ranger, Superset
 Ability to set “other” type (for variable replacement)
JDBC properties
in Blueprint for
the Component?
Yes
Use Blueprint as-
is, no Component
configuration
property injection
No Inject
Component
configuration
properties
PROPERTY VARIABLES
rds.[type].connectionString
rds.[type].connectionDriver
rds.[type].connectionUserName
rds.[type].connectionPassword
rds.[type].databaseName
rds.[type].host
rds.[type].hostWithPort
rds.[type].databaseType
where
type=[ambari,druid,hive,oozie,ranger,superset]**
** the “other” type=[other-name]
Perform property
variable
replacement
S
E
25 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Dynamic Blueprints: RDBMS
 PostgreSQL, MySQL or Oracle
26 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Example #1: Injecting type=“hive” configuration properties
Property Variable Example Value
rds.hive.connectionString jdbc:postgresql://hive.test.eu-west-
1:5432/hive
rds.hive.connectionDriver org.postgresql.Driver
rds.hive.connectionUserName mydatabaseuser
rds.hive.connectionPassword Hadoop123!
rds.hive.fancyName PostgreSQL, MySQL / MariaDB, Oracle
rds.hive.databaseType postgres, mysql, oracle
"hive-site": {
"properties": {
"javax.jdo.option.ConnectionURL": "{{{ rds.hive.connectionString }}}",
"javax.jdo.option.ConnectionDriverName": "{{{ rds.hive.connectionDriver }}}",
"javax.jdo.option.ConnectionUserName": "{{{ rds.hive.connectionUserName }}}",
"javax.jdo.option.ConnectionPassword": "{{{ rds.hive.connectionPassword }}}"
}
},
"hive-env" : {
"properties" : {
"hive_database" : "Existing {{{ rds.hive.fancyName}}} Database",
"hive_database_type" : "{{{ rds.hive.databaseType }}}"
}
}
rds.hive.connectionString
rds.hive.connectionUserName
rds.hive.connectionPassword
[type] = hive
In this scenario, PROPERTIES WILL BE INJECTED INTO THE BLUEPRINT
27 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Example #2: Setting type=“other” property variables
Property Variable Example Value
rds.test.connectionString db.test.eu-west-1:5432/sometest
rds.test.connectionDriver org.postgresql.Driver
rds.test.connectionUserName mydatabaseuser
rds.test.connectionPassword Hadoop123!
rds.test.subprotocol postgres
rds.test.databaseEngine POSTGRES
rds.test.connectionString
rds.test.connectionUserName
rds.test.connectionPassword
[type] = test
In this scenario, PROPERTY VARIABLES WILL BE REPLACED IN THE
BLUEPRINT (not injected)
• You must include the property variables in your Blueprint
for replacement. Use Mustache template syntax. For
example:
"test-site": {
"properties": {
"javax.jdo.option.ConnectionURL":"{{rds.test.connectionString}}"
}
• Cloudbreak will perform property variable replacement in
your Blueprint.
28 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Dynamic Blueprints: LDAP/AD
 External Sources > Authentication Configurations
 Built-In Components:
– Atlas, Hadoop, Hive LLAP, Ranger Admin, Ranger UserSync
LDAP properties
in Blueprint for
the Component?
Yes
Use Blueprint as-
is, no Component
configuration
property injection
No Inject
Component
configuration
properties
PROPERTY VARIABLES
ldap.connectionURL
ldap.domain
ldap.bindDn
ldap.bindPassword
ldap.userSearchBase
ldap.userObjectClass
ldap.userNameAttribute
ldap.groupSearchBase
ldap.groupObjectClass
ldap.groupNameAttribute
ldap.groupMemberAttribute
ldap.directoryType
ldap.directoryTypeShort
Perform property
variable
replacement
S
E
29 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
LDAP/AD Property Variable -> Mapping
ldap.connectionURL
ldap.directoryType
ldap.directoryTypeShort
ldap.bindDn
ldap.bindPassword
ldap.userSearchBase
ldap.userNameAttribute
ldap.userObjectClass
ldap.groupSearchBase
ldap.groupNameAttribute
ldap.groupObjectClass
ldap.groupMemberAttribute
30 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Demo #2
Deeper Dive
31 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Data Lake Shared
Services
Bringing It All Together
32 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Why Data Lake Shared Services
 Customers have a need to secure ephemeral workload clusters
 Customers need a single metadata repository for Hive schema
 Customers want a single pane of glass to define users, groups and authorization policies
TECHNICAL PREVIEW
33 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Ephemeral Workloads: Basic -> Advanced -> Enterprise
Basic Ephemeral Advanced Ephemeral Enterprise Spark & Hive
Tuned and Optimized
Infrastructure
Simplified, Automated
Operations
Cloud Storage Integration
Protected Gateway
Schema - Shared (Hive Metastore) Shared (Hive Metastore)
Authentication Single-user Single-user Single or Multi-User (LDAP/AD)
Authorization - - Security Policies (Ranger)
Cloud Storage Audit - - Audit (Ranger)
TECHNICAL PREVIEW
34 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
SCHEMA POLICY AUDIT DIRECTORY
WHAT
Provides Hive schema (tables,
views, etc).
WHY
If you have 2+ workloads
accessing the same schema,
need to share this across
workloads.
HOW
Externalize Hive Metastore
into for schema definition.
WHAT
Defines security policies
around Hive schema.
WHY
If you have 2+ users accessing
the same data, need policies
to be consistently available
and applied.
HOW
Externalize and share Ranger
across workloads and store
policies external.
WHAT
Audit user access.
WHY
Capture data access activity.
HOW
Externalize and share Ranger
across workloads, leverage
cloud storage for audit data.
GATEWAY
WHAT
Provide single endpoint that
can be protected with SSL and
enabled for authentication to
access to cluster resources.
WHY
Avoid opening many ports,
some potentially w/o
authentication or SSL
protection.
HOW
Deploy a centralized protected
gateway automatically.
WHAT
Users and groups.
WHY
Provide multi-user
authentication source for
users and definition of groups.
HOW
Leverage external LDAP/AD.
Data Lake: The Technical Ingredients
TECHNICAL PREVIEW
35 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Data Lake: Flyover
LDAP/AD
Hive
Database
Ranger
Database
Cloud
Storage
Data Lake Workload
Cluster(s)
Ranger
Hive Metastore
Hive, Spark, Zeppelin
Attach
TECHNICAL PREVIEW
36 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Demo #2 (again)
Deeper Dive++
37 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Lessons Learned in the Cloud
38 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Lessons Learned Topics
 Performance
 Costs
 Reliability
 Security
39 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Lesson 1: Performance / Cost
 Know your cloud provider
– Cloudbreak offers an uniform API
– Similarities in basic concepts: compute, network, storage volumes, etc.
– Differences: performance, cloud connector, functionality
 Compute
– Instance types for your workload
– Different families: gp, compute, memory, storage optimized, gpu
– Network bandwith
 Storage
– Speed, reliability, cost
– Aggregated: ephemeral (fixed size and number)
– Disaggregated:
• block storage (HDFS)
• cloud object stores (connector architecture)
40 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Lesson 2: Performance / Costs
 Capacity planning
– Workload type (batch / interactive)
– Allocate / release resources on demand
– Experimenting is cheap
 Flexible cluster shapes and sizes
– No one size fits all: security, HA, cluster topologies
– Cluster size is a variable not constant
– Spot/Preemtible VMs
 Automation
– DevOps mentality
– No manual configuration, finetuning
41 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Lesson 3: Reliability / Fault tolerance
 Network
– Fault domain / Availability Zones
– Rack awareness (think where your instances are running)
– Topologies for HA scenario
 Externalize states:
– All your files, notebooks, schema, policies
– Ambari, Ranger, Hive Metastore etc. databases
42 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Lesson 4: Security
Design your deployment to be secure from the beginning
Data protection
Authorization
Authentication
Perimeter
Level Security
43 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Lesson 4: Security
 Perimeter level Security
– Private VPC/VNet deployments
– Inbound connectivity: security groups, ports
– Outbound: proxy / no internet
– Protected Gateway topology (Knox)
 Authentication:
– LDAP / AD
– Kerberos
 Authorization:
– Consistent authorization control across all HDP component (Ranger)
– Cloud provider specific (IAM roles)
 Data protection:
– At rest, in motion (e.g Ranger KMS, cloud provider specific disk encryptions)
44 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Key takeaways
 Know your cloud provider
 Secure your cluster
45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Wrap Up
46 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Learn More
 Try Cloudbreak 2.7
– http://docs.hortonworks.com
 Join Birds of a Feather
– Wednesday, June 20 @ 5:40p, Cloud and Operations
– Wednesday, June 20 @ 5:40p, Security and Governance
 Visit Breakout Sessions
– Thursday, June 21 @ 10:20a, Performance Analysis of AWS EC2 Instance Types, Michael Young
47 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Thank You

More Related Content

What's hot

Enabling ABAC with Accumulo and Ranger integration
Enabling ABAC with Accumulo and Ranger integrationEnabling ABAC with Accumulo and Ranger integration
Enabling ABAC with Accumulo and Ranger integration
DataWorks Summit
 
The Future of Apache Ambari
The Future of Apache AmbariThe Future of Apache Ambari
The Future of Apache Ambari
DataWorks Summit
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
DataWorks Summit
 
Migrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeMigrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie Mae
DataWorks Summit
 
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...
DataWorks Summit
 
Light-weighted HDFS disaster recovery
Light-weighted HDFS disaster recoveryLight-weighted HDFS disaster recovery
Light-weighted HDFS disaster recovery
DataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
 

What's hot (20)

What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
 
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
 
Enabling ABAC with Accumulo and Ranger integration
Enabling ABAC with Accumulo and Ranger integrationEnabling ABAC with Accumulo and Ranger integration
Enabling ABAC with Accumulo and Ranger integration
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
 
The Future of Apache Ambari
The Future of Apache AmbariThe Future of Apache Ambari
The Future of Apache Ambari
 
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow ManagerBreathing New Life into Apache Oozie with Apache Ambari Workflow Manager
Breathing New Life into Apache Oozie with Apache Ambari Workflow Manager
 
Running a container cloud on YARN
Running a container cloud on YARNRunning a container cloud on YARN
Running a container cloud on YARN
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Migrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie MaeMigrating Analytics to the Cloud at Fannie Mae
Migrating Analytics to the Cloud at Fannie Mae
 
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...
 
Light-weighted HDFS disaster recovery
Light-weighted HDFS disaster recoveryLight-weighted HDFS disaster recovery
Light-weighted HDFS disaster recovery
 
Druid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best PracticesDruid and Hive Together : Use Cases and Best Practices
Druid and Hive Together : Use Cases and Best Practices
 
Schema Registry - Set Your Data Free
Schema Registry - Set Your Data FreeSchema Registry - Set Your Data Free
Schema Registry - Set Your Data Free
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...Enabling a hardware accelerated deep learning data science experience for Apa...
Enabling a hardware accelerated deep learning data science experience for Apa...
 

Similar to Running Enterprise Workloads in the Cloud

Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
DataWorks Summit
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
DataWorks Summit
 

Similar to Running Enterprise Workloads in the Cloud (20)

Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
The Kubernetes WebLogic revival (part 2)
The Kubernetes WebLogic revival (part 2)The Kubernetes WebLogic revival (part 2)
The Kubernetes WebLogic revival (part 2)
 
20191201 kubernetes managed weblogic revival - part 2
20191201 kubernetes managed weblogic revival - part 220191201 kubernetes managed weblogic revival - part 2
20191201 kubernetes managed weblogic revival - part 2
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Developing and Deploying Microservices to IBM Cloud Private
Developing and Deploying Microservices to IBM Cloud PrivateDeveloping and Deploying Microservices to IBM Cloud Private
Developing and Deploying Microservices to IBM Cloud Private
 
A Deep Dive into the Liberty Buildpack on IBM BlueMix
A Deep Dive into the Liberty Buildpack on IBM BlueMix A Deep Dive into the Liberty Buildpack on IBM BlueMix
A Deep Dive into the Liberty Buildpack on IBM BlueMix
 
Geek Sync | Linux, Containers, and SQL Server—Get Ready for Big Data Clusters...
Geek Sync | Linux, Containers, and SQL Server—Get Ready for Big Data Clusters...Geek Sync | Linux, Containers, and SQL Server—Get Ready for Big Data Clusters...
Geek Sync | Linux, Containers, and SQL Server—Get Ready for Big Data Clusters...
 
Continuous Delivery to Kubernetes with Jenkins and Helm
Continuous Delivery to Kubernetes with Jenkins and HelmContinuous Delivery to Kubernetes with Jenkins and Helm
Continuous Delivery to Kubernetes with Jenkins and Helm
 
Creating Microservices Application with IBM Cloud Private (ICP) - ICP Archite...
Creating Microservices Application with IBM Cloud Private (ICP) - ICP Archite...Creating Microservices Application with IBM Cloud Private (ICP) - ICP Archite...
Creating Microservices Application with IBM Cloud Private (ICP) - ICP Archite...
 
Running Cloudbreak on Kubernetes
Running Cloudbreak on KubernetesRunning Cloudbreak on Kubernetes
Running Cloudbreak on Kubernetes
 
Running Cloudbreak on Kubernetes
Running Cloudbreak on KubernetesRunning Cloudbreak on Kubernetes
Running Cloudbreak on Kubernetes
 
The Kubernetes WebLogic revival (part 1)
The Kubernetes WebLogic revival (part 1)The Kubernetes WebLogic revival (part 1)
The Kubernetes WebLogic revival (part 1)
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Docker OpenStack Cloud Foundry
Docker OpenStack Cloud FoundryDocker OpenStack Cloud Foundry
Docker OpenStack Cloud Foundry
 
Hp
HpHp
Hp
 
HP CloudSystem, Alex Haddock, HP Server Strategy Team
HP CloudSystem, Alex Haddock, HP Server Strategy TeamHP CloudSystem, Alex Haddock, HP Server Strategy Team
HP CloudSystem, Alex Haddock, HP Server Strategy Team
 
Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoop
 
CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014CISCO - Presentation at Hortonworks Booth - Strata 2014
CISCO - Presentation at Hortonworks Booth - Strata 2014
 

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 

Recently uploaded (20)

When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 

Running Enterprise Workloads in the Cloud

  • 1. 1 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Running Enterprise Workloads in the Cloud DataWorks Summit - San Jose June 2018
  • 2. 2 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Presenters Jeff Sposetti Product Manager @ Hortonworks Attila Kanto Principal Engineer @ Hortonworks
  • 3. 3 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Agenda  Introduction  Cloudbreak  Demo #1: Flyover  Advanced Topics  Demo #2: Deeper Dive  Lessons Learned in the Cloud  Wrap Up
  • 4. 4 © Hortonworks Inc. 2011 – 2018. All Rights Reserved No Upfront HW Costs Unlimited Elastic Scale Ephemeral & Long-Running IT & Business Agility Why Big Data Workloads in the Cloud?
  • 5. 5 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Cloudbreak: Harness the agility of cloud with ease Cloudbreak • Declarative workload provisioning across cloud providers • Flexible topologies and security configuration options • DevOps friendly, easy setup and simple to automate • Built-in elasticity and auto-scaling • Prescriptive integration with cloud services AWS Ambari HDP + HDF Azure Ambari HDP + HDF
  • 6. 6 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Deploy on Public or Private Clouds Dynamically configure and manage clusters on public or private clouds (Amazon Web Services, Microsoft Azure, Google Cloud Platform and OpenStack) Automated Scaling Seamlessly manage elasticity requirements as cluster workloads change Secured Cluster Access Supports configuration defining network boundaries, configuring security groups, gateway perimeter security and enabling Kerberos
  • 7. 7 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Cloudbreak Building Blocks • Cloud Credentials • Ambari Blueprints • Auto Scaling • Custom Recipes • Custom Images • Network • Gateway • Kerberos Security • Dynamic Blueprints • Cloud Storage Simple and Flexible Prescriptive Secure
  • 8. 8 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Demo #1 Flyover
  • 9. 9 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Advanced Topics
  • 10. 10 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Cloudbreak Building Blocks: Advanced Topics • Cloud Credentials • Ambari Blueprints • Auto Scaling • Custom Recipes • Custom Images • Network • Gateway • Kerberos Security • Dynamic Blueprints • Cloud Storage Simple and Flexible Prescriptive Secure Bringing it all together: Data Lake Shared Services
  • 11. 11 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Custom Images
  • 12. 12 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Background: Cloudbreak 1. Cloudbreak creates VM instances using a default base image. 2. Cloudbreak installs Ambari on a VM instance. 3. Cloudbreak instructs Ambari to install a cluster on the remaining VM instances. Cloudbreak Node VM Node VM Node VM Node VM Node VM Node VM Cluster
  • 13. 13 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Custom Images Overview Create the Custom Image Register the Custom Image Use the Custom Image when Creating a Cluster 1 2 3
  • 14. 14 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Kerberos Security
  • 15. 15 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Background: Kerberos  Strongly authenticating and establishing a user’s identity is the basis for secure access in Hadoop. Users need to be able to reliably “identify” themselves and then have that identity propagated throughout the Hadoop cluster.  Once this is done, those users can access resources (such as files or directories) or interact with the cluster (like running MapReduce jobs).  Besides users, Hadoop cluster resources themselves (such as Hosts and Services) need to authenticate with each other to avoid potential malicious systems or daemon’s “posing as” trusted components of the cluster to gain access to data.
  • 16. 16 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Background: Hadoop + Kerberos Service Component A Service Component B Hadoop Cluster KDC keytabkeytab Service Component C keytab Service Component D keytab Service Component X Service Component X keytabkeytab Service Component X keytab Service Component X keytab Kerberos is used to secure the Components in the cluster. Kerberos identities are managed via “keytabs” on the Component hosts. Principals for the cluster are managed in the KDC.
  • 17. 17 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Background: Ambari Kerberos Support  Ambari provides automated options for working w/ existing MIT KDC or Active Directory  Can be highly customized to fit many enterprise requirements – Templating for customizable principals – Control of Kerberos Client install and krb5.conf configuration – Highly-configurable service principal identity naming  These options are available via Ambari UI as well as via Ambari Blueprints – Blueprints can include “Kerberos Descriptor” for kerberos-env and krb5-conf https://cwiki.apache.org/confluence/display/AMBARI/Automated+Kerberizaton
  • 18. 18 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Cloudbreak: Support for Enabling Kerberos Goal Provide a way for Cloudbreak users to create clusters that are Kerberos enabled Approach Ambari exposes a-lot-of Kerberos options Leverage Ambari Kerberos options and avoid re-creating Ambari Kerberos experience Be pragmatic about prescriptive options on-top
  • 19. 19 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Cloudbreak: Enable Kerberos Security  Create Cluster > Security > Advanced  [ ] Enable Kerberos Security
  • 20. 20 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Options: Use Existing KDC or Use Test KDC Use Existing KDC Use Test KDC Advanced Basic - Not for production use. For testing and evaluation purposes only. - Installs and configures an MIT KDC on the master node. - Configures the cluster to leverage that KDC. - Provide basic information about your existing KDC. - Ambari Kerberos descriptors are generated automatically. - Provide basic information about your existing KDC. - Provide your own Ambari Kerberos descriptors.
  • 21. 21 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Dynamic Blueprints
  • 22. 22 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Dynamic Blueprints: RDBMS and LDAP/AD  Background: – Cluster configuration often includes external database (for Hive, Ranger, etc) and LDAP/AD configs – Users often have to create 1+ versions of the same Blueprint to handle different component configurations for these external systems – It’s a challenge to know the different Blueprint configuration choices per service across the stack  Dynamic Blueprints: – Ability to manage External Sources (e.g. RDBMS and LDAP/AD) outside of your Blueprint – Cloudbreak will inject the configurations into your Blueprint – Simplifies reuse of cluster configurations -> for external sources (RDBMS and LDAP/AD) – Simplifies your Blueprints -> don’t have to know all the configurations for each component
  • 23. 23 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Dynamic Blueprints: RDBMS Create an External Source Select during Create Cluster
  • 24. 24 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Dynamic Blueprints: RDBMS  External Sources > Database Configurations  Built-In Types: Ambari, Druid, Hive, Oozie, Ranger, Superset  Ability to set “other” type (for variable replacement) JDBC properties in Blueprint for the Component? Yes Use Blueprint as- is, no Component configuration property injection No Inject Component configuration properties PROPERTY VARIABLES rds.[type].connectionString rds.[type].connectionDriver rds.[type].connectionUserName rds.[type].connectionPassword rds.[type].databaseName rds.[type].host rds.[type].hostWithPort rds.[type].databaseType where type=[ambari,druid,hive,oozie,ranger,superset]** ** the “other” type=[other-name] Perform property variable replacement S E
  • 25. 25 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Dynamic Blueprints: RDBMS  PostgreSQL, MySQL or Oracle
  • 26. 26 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Example #1: Injecting type=“hive” configuration properties Property Variable Example Value rds.hive.connectionString jdbc:postgresql://hive.test.eu-west- 1:5432/hive rds.hive.connectionDriver org.postgresql.Driver rds.hive.connectionUserName mydatabaseuser rds.hive.connectionPassword Hadoop123! rds.hive.fancyName PostgreSQL, MySQL / MariaDB, Oracle rds.hive.databaseType postgres, mysql, oracle "hive-site": { "properties": { "javax.jdo.option.ConnectionURL": "{{{ rds.hive.connectionString }}}", "javax.jdo.option.ConnectionDriverName": "{{{ rds.hive.connectionDriver }}}", "javax.jdo.option.ConnectionUserName": "{{{ rds.hive.connectionUserName }}}", "javax.jdo.option.ConnectionPassword": "{{{ rds.hive.connectionPassword }}}" } }, "hive-env" : { "properties" : { "hive_database" : "Existing {{{ rds.hive.fancyName}}} Database", "hive_database_type" : "{{{ rds.hive.databaseType }}}" } } rds.hive.connectionString rds.hive.connectionUserName rds.hive.connectionPassword [type] = hive In this scenario, PROPERTIES WILL BE INJECTED INTO THE BLUEPRINT
  • 27. 27 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Example #2: Setting type=“other” property variables Property Variable Example Value rds.test.connectionString db.test.eu-west-1:5432/sometest rds.test.connectionDriver org.postgresql.Driver rds.test.connectionUserName mydatabaseuser rds.test.connectionPassword Hadoop123! rds.test.subprotocol postgres rds.test.databaseEngine POSTGRES rds.test.connectionString rds.test.connectionUserName rds.test.connectionPassword [type] = test In this scenario, PROPERTY VARIABLES WILL BE REPLACED IN THE BLUEPRINT (not injected) • You must include the property variables in your Blueprint for replacement. Use Mustache template syntax. For example: "test-site": { "properties": { "javax.jdo.option.ConnectionURL":"{{rds.test.connectionString}}" } • Cloudbreak will perform property variable replacement in your Blueprint.
  • 28. 28 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Dynamic Blueprints: LDAP/AD  External Sources > Authentication Configurations  Built-In Components: – Atlas, Hadoop, Hive LLAP, Ranger Admin, Ranger UserSync LDAP properties in Blueprint for the Component? Yes Use Blueprint as- is, no Component configuration property injection No Inject Component configuration properties PROPERTY VARIABLES ldap.connectionURL ldap.domain ldap.bindDn ldap.bindPassword ldap.userSearchBase ldap.userObjectClass ldap.userNameAttribute ldap.groupSearchBase ldap.groupObjectClass ldap.groupNameAttribute ldap.groupMemberAttribute ldap.directoryType ldap.directoryTypeShort Perform property variable replacement S E
  • 29. 29 © Hortonworks Inc. 2011 – 2018. All Rights Reserved LDAP/AD Property Variable -> Mapping ldap.connectionURL ldap.directoryType ldap.directoryTypeShort ldap.bindDn ldap.bindPassword ldap.userSearchBase ldap.userNameAttribute ldap.userObjectClass ldap.groupSearchBase ldap.groupNameAttribute ldap.groupObjectClass ldap.groupMemberAttribute
  • 30. 30 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Demo #2 Deeper Dive
  • 31. 31 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Data Lake Shared Services Bringing It All Together
  • 32. 32 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Why Data Lake Shared Services  Customers have a need to secure ephemeral workload clusters  Customers need a single metadata repository for Hive schema  Customers want a single pane of glass to define users, groups and authorization policies TECHNICAL PREVIEW
  • 33. 33 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Ephemeral Workloads: Basic -> Advanced -> Enterprise Basic Ephemeral Advanced Ephemeral Enterprise Spark & Hive Tuned and Optimized Infrastructure Simplified, Automated Operations Cloud Storage Integration Protected Gateway Schema - Shared (Hive Metastore) Shared (Hive Metastore) Authentication Single-user Single-user Single or Multi-User (LDAP/AD) Authorization - - Security Policies (Ranger) Cloud Storage Audit - - Audit (Ranger) TECHNICAL PREVIEW
  • 34. 34 © Hortonworks Inc. 2011 – 2018. All Rights Reserved SCHEMA POLICY AUDIT DIRECTORY WHAT Provides Hive schema (tables, views, etc). WHY If you have 2+ workloads accessing the same schema, need to share this across workloads. HOW Externalize Hive Metastore into for schema definition. WHAT Defines security policies around Hive schema. WHY If you have 2+ users accessing the same data, need policies to be consistently available and applied. HOW Externalize and share Ranger across workloads and store policies external. WHAT Audit user access. WHY Capture data access activity. HOW Externalize and share Ranger across workloads, leverage cloud storage for audit data. GATEWAY WHAT Provide single endpoint that can be protected with SSL and enabled for authentication to access to cluster resources. WHY Avoid opening many ports, some potentially w/o authentication or SSL protection. HOW Deploy a centralized protected gateway automatically. WHAT Users and groups. WHY Provide multi-user authentication source for users and definition of groups. HOW Leverage external LDAP/AD. Data Lake: The Technical Ingredients TECHNICAL PREVIEW
  • 35. 35 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Data Lake: Flyover LDAP/AD Hive Database Ranger Database Cloud Storage Data Lake Workload Cluster(s) Ranger Hive Metastore Hive, Spark, Zeppelin Attach TECHNICAL PREVIEW
  • 36. 36 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Demo #2 (again) Deeper Dive++
  • 37. 37 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Lessons Learned in the Cloud
  • 38. 38 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Lessons Learned Topics  Performance  Costs  Reliability  Security
  • 39. 39 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Lesson 1: Performance / Cost  Know your cloud provider – Cloudbreak offers an uniform API – Similarities in basic concepts: compute, network, storage volumes, etc. – Differences: performance, cloud connector, functionality  Compute – Instance types for your workload – Different families: gp, compute, memory, storage optimized, gpu – Network bandwith  Storage – Speed, reliability, cost – Aggregated: ephemeral (fixed size and number) – Disaggregated: • block storage (HDFS) • cloud object stores (connector architecture)
  • 40. 40 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Lesson 2: Performance / Costs  Capacity planning – Workload type (batch / interactive) – Allocate / release resources on demand – Experimenting is cheap  Flexible cluster shapes and sizes – No one size fits all: security, HA, cluster topologies – Cluster size is a variable not constant – Spot/Preemtible VMs  Automation – DevOps mentality – No manual configuration, finetuning
  • 41. 41 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Lesson 3: Reliability / Fault tolerance  Network – Fault domain / Availability Zones – Rack awareness (think where your instances are running) – Topologies for HA scenario  Externalize states: – All your files, notebooks, schema, policies – Ambari, Ranger, Hive Metastore etc. databases
  • 42. 42 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Lesson 4: Security Design your deployment to be secure from the beginning Data protection Authorization Authentication Perimeter Level Security
  • 43. 43 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Lesson 4: Security  Perimeter level Security – Private VPC/VNet deployments – Inbound connectivity: security groups, ports – Outbound: proxy / no internet – Protected Gateway topology (Knox)  Authentication: – LDAP / AD – Kerberos  Authorization: – Consistent authorization control across all HDP component (Ranger) – Cloud provider specific (IAM roles)  Data protection: – At rest, in motion (e.g Ranger KMS, cloud provider specific disk encryptions)
  • 44. 44 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Key takeaways  Know your cloud provider  Secure your cluster
  • 45. 45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Wrap Up
  • 46. 46 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Learn More  Try Cloudbreak 2.7 – http://docs.hortonworks.com  Join Birds of a Feather – Wednesday, June 20 @ 5:40p, Cloud and Operations – Wednesday, June 20 @ 5:40p, Security and Governance  Visit Breakout Sessions – Thursday, June 21 @ 10:20a, Performance Analysis of AWS EC2 Instance Types, Michael Young
  • 47. 47 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Thank You