Streamline
Hadoop DevOps
with Apache
Ambari
Jayush Luniya
Hadoop Summit, Tokyo
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Speaker
Jayush Luniya
Staff Software Engineer @ Hortonworks
Apache Ambari PMC
jluniya@apache.org
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Ambari is the open-source platform to
provision, manage and monitor Hadoop clusters
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
4 years old
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Exciting Enterprise Features in Ambari 2.4
 New Services: Log Search, Zeppelin, Hive LLAP
 Role Based Access Control
 Management Packs
 Grafana UI for Ambari Metrics System
 New Views: Zeppelin, Storm
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
More in Ambari 2.4
• Alerts: Customizable props and thresholds
(AMBARI-14898)
• Alerts: Retry tolerance (AMBARI-15686)
• Alerts: New HDFS Alerts (AMBARI-14800)
• New Host Page Filtering (AMBARI-15210)
• Remove Service from UI (AMBARI-14759)
• Support for SLES 12 (AMBARI-16007)
• Stability: Database Consistency Checking
(AMBARI-16258)
• Customizable Ambari Log + PID Dirs (AMBARI-
15300)
• New Version Registration Experience (AMBARI-
15724)
• Log Search Technical Preview (AMBARI-14927)
• Operational Audit Logging (AMBARI-15241)
• Role-Based Access Control (AMBARI-13977)
• Automated Setup of Ambari Kerberos through
Blueprints (AMBARI-15561)
• Automated Setup of Ambari Proxy User
(AMBARI-15561)
• Customizable Host Reg. SSH Port (AMBARI-
13450)
Core Features Security Features
• View URLs for bookmarks (AMBARI-15821),
View Refresh (AMBARI-15682)
• Inherit Cluster Permissions (AMBARI-16177)
• Remote Cluster Registration (AMBARI-16274)
Views Framework Features
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deploy
Secure/L
DAP
Smart
Configs
Upgrade
Monitor
Scale,
Extend,
Analyze
Simply Operations - Lifecycle
Ease-of-Use Deploy
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deploy On Premise
Ambari UI wizard handles all of these
combinations and makes recommendations
based on host specs.
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deploy On The Cloud
Certified environments
Sysprepped VMs
Hundreds of similar clusters
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deploy with Blueprints
 Systematic way of defining a cluster
 Export existing cluster into blueprint
/api/v1/clusters/:clusterName?format=blueprint
Config
s
Topology Hosts Cluster
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Create a cluster with Blueprints
{
"configurations" : [
{
"hdfs-site" : {
"dfs.datanode.data.dir" : "/hadoop/1,
/hadoop/2,/hadoop/3"
}
}
],
"host_groups" : [
{
"name" : "master-host",
"components" : [
{ "name" : "NAMENODE” },
{ "name" : "RESOURCEMANAGER” },
…
],
"cardinality" : "1"
},
{
"name" : "worker-host",
"components" : [
{ "name" : "DATANODE" },
{ "name" : "NODEMANAGER” },
…
],
"cardinality" : "1+"
},
],
"Blueprints" : {
"stack_name" : "HDP",
"stack_version" : "2.5"
}
}
{
"blueprint" : "my-blueprint",
"host_groups" :[
{
"name" : "master-host",
"hosts" : [
{
"fqdn" : "master001.ambari.apache.org"
}
]
},
{
"name" : "worker-host",
"hosts" : [
{
"fqdn" : "worker001.ambari.apache.org"
},
{
"fqdn" : "worker002.ambari.apache.org"
},
…
{
"fqdn" : "worker099.ambari.apache.org"
}
]
}
]
}
1. POST /api/v1/blueprints/my-blueprint 2. POST /api/v1/clusters/my-cluster
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Create a cluster with Blueprints
{
"configurations" : [
{
"hdfs-site" : {
"dfs.datanode.data.dir" : "/hadoop/1,
/hadoop/2,/hadoop/3"
}
}
],
"host_groups" : [
{
"name" : "master-host",
"components" : [
{ "name" : "NAMENODE” },
{ "name" : "RESOURCEMANAGER” },
…
],
"cardinality" : "1"
},
{
"name" : "worker-host",
"components" : [
{ "name" : "DATANODE" },
{ "name" : "NODEMANAGER” },
…
],
"cardinality" : "1+"
},
],
"Blueprints" : {
"stack_name" : "HDP",
"stack_version" : "2.5"
}
}
{
"blueprint" : "my-blueprint",
"host_groups" :[
{
"name" : "master-host",
"hosts" : [
{
"fqdn" : "master001.ambari.apache.org"
}
]
},
{
"name" : "worker-host",
"hosts" : [
{
"fqdn" : "worker001.ambari.apache.org"
},
{
"fqdn" : "worker002.ambari.apache.org"
},
…
{
"fqdn" : "worker099.ambari.apache.org"
}
]
}
]
}
1. POST /api/v1/blueprints/my-blueprint 2. POST /api/v1/clusters/my-cluster
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Create a cluster with Blueprints
{
"configurations" : [
{
"hdfs-site" : {
"dfs.datanode.data.dir" : "/hadoop/1,
/hadoop/2,/hadoop/3"
}
}
],
"host_groups" : [
{
"name" : "master-host",
"components" : [
{ "name" : "NAMENODE” },
{ "name" : "RESOURCEMANAGER” },
…
],
"cardinality" : "1"
},
{
"name" : "worker-host",
"components" : [
{ "name" : "DATANODE" },
{ "name" : "NODEMANAGER” },
…
],
"cardinality" : "1+"
},
],
"Blueprints" : {
"stack_name" : "HDP",
"stack_version" : "2.5"
}
}
{
"blueprint" : "my-blueprint",
"host_groups" :[
{
"name" : "master-host",
"hosts" : [
{
"fqdn" : "master001.ambari.apache.org"
}
]
},
{
"name" : "worker-host",
"hosts" : [
{
"fqdn" : "worker001.ambari.apache.org"
},
{
"fqdn" : "worker002.ambari.apache.org"
},
…
{
"fqdn" : "worker099.ambari.apache.org"
}
]
}
]
}
1. POST /api/v1/blueprints/my-blueprint 2. POST /api/v1/clusters/my-cluster
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Create a cluster with Blueprints
{
"configurations" : [
{
"hdfs-site" : {
"dfs.datanode.data.dir" : "/hadoop/1,
/hadoop/2,/hadoop/3"
}
}
],
"host_groups" : [
{
"name" : "master-host",
"components" : [
{ "name" : "NAMENODE” },
{ "name" : "RESOURCEMANAGER” },
…
],
"cardinality" : "1"
},
{
"name" : "worker-host",
"components" : [
{ "name" : "DATANODE" },
{ "name" : "NODEMANAGER” },
…
],
"cardinality" : "1+"
},
],
"Blueprints" : {
"stack_name" : "HDP",
"stack_version" : "2.5"
}
}
{
"blueprint" : "my-blueprint",
"host_groups" :[
{
"name" : "master-host",
"hosts" : [
{
"fqdn" : "master001.ambari.apache.org"
}
]
},
{
"name" : "worker-host",
"hosts" : [
{
"fqdn" : "worker001.ambari.apache.org"
},
{
"fqdn" : "worker002.ambari.apache.org"
},
…
{
"fqdn" : "worker099.ambari.apache.org"
}
]
}
]
}
1. POST /api/v1/blueprints/my-blueprint 2. POST /api/v1/clusters/my-cluster
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Blueprints for Large Scale
• Kerberos, secure out-of-the-box
• High Availability is setup initially for
NameNode, YARN, Hive, Oozie, etc
• Host Discovery allows Ambari to
automatically install services for a Host
when it comes online
• Stack Advisor recommendations
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
POST /api/v1/clusters/MyCluster/hosts
[
{
"blueprint" : "single-node-hdfs-test2",
"host_groups" :[
{
"host_group" : "slave",
"host_count" : 3,
"host_predicate" : "Hosts/cpu_count>1”
}, {
"host_group" : "super-slave",
"host_count" : 5,
"host_predicate" : "Hosts/cpu_count>2&
Hosts/total_mem>3000000"
}
]
}
]
Blueprint Host Discovery
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Comprehensive Security
LDAP/AD
• User auth
• Sync
Kerberos
• MIT KDC
• Keytab
management
Atlas
• Governance
• Compliance
• Linage & history
• Data
classification
Ranger
• Security policies
• Audit
• Authorization
Knox
• Perimeter
security
• Supports
LDAP/AD
• Sec. for
REST/HTTP
• SSL
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kerberos
Ambari manages Kerberos principals and keytabs
Works with existing MIT KDC or Active Directory
Once Kerberized, handles
1. Adding hosts
2. Adding components
to existing hosts
3. Adding services
4. Moving components
to different hosts
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Management Packs
 Improved Release Management:
Decouple Ambari core from stacks releases
 Support Add-ons:
–Release vehicle for 3rd party services, views
–Self-contained release artifacts
–Stack is an overlay of multiple management
packs
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Overlay of Management Packs
inherits from 2.3
inherits from 2.4
inherits from 2.5
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Management Pack++
Short Term Goals (Ambari 2.4)
 Retrofit in Stack Processing Framework
 Enable 3rd party to ship add-on services
Future Goals
 Management Pack Framework
 Deliver Views
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Role Based Access Control (RBAC)
As Ambari & organizations grow,
so do security needs
Ambari integrates with external
authentication systems & LDAP
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
RBAC Terms
Users belong to groups
A group has a role
Users can also have additional roles
Roles are applied to Resources. E.g.,
Ambari, particular Cluster, particular View
Roles have permissions
e.g., add services to cluster
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
New RBAC Roles
only view
↑, except change configs
↑, except alter cluster topology
or install components
Ambari Admin
Cluster Admin
Cluster Op
Service Admin
Service Op
Read-Only
↑, except add services, Kerberos,
manage alerts & upgrades
↑, except manage permissions
all
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Service Layout
Common Services Stack Override
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stack Advisor
Kerberos
HTTPS
Zookeeper Servers
Memory Settings
…
High Availability
atlas.rest.address =
http(s)://host:port
# Atlas Servers
atlas.enabletTLS = true|false
atlas.server.http.port = 21000
atlas.server.https.port = 21443
Example
Configurations
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Background: Upgrade Terminology
Manual
Upgrade
The user follows instructions to upgrade
the stack
Incurs downtime
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Background: Upgrade Terminology
Manual
Upgrade
The user follows instructions to upgrade
the stack
Incurs downtime
Rolling
Upgrade
Automated
Upgrades one component
per host at a time
Preserves cluster operation
and minimizes service impact
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Background: Upgrade Terminology
Express
Upgrade
Automated
Runs in parallel across hosts
Incurs downtime
Manual
Upgrade
The user follows instructions to upgrade
the stack
Incurs downtime
Rolling
Upgrade
Automated
Upgrades one component
per host at a time
Preserves cluster operation
and minimizes service impact
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Automated Upgrade: Rolling or Express
Check
Prerequisites
Review the
prereqs to
confirm
your cluster
configs are
ready
Prepare
Take
backups of
critical
cluster
metadata
Perform
Upgrade
Perform the
HDP
upgrade.
The steps
depend on
upgrade
method:
Rolling or
Express
Register +
Install
Register the
HDP
repository
and install
the target
HDP version
on the
cluster
Finalize
Finalize the
upgrade,
making the
target
version the
current
version
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Process: Rolling Upgrade
ZooKeeper
Ranger
Hive
Oozie
Falcon
Kafka
Knox
Storm
Slider
Flume
Finalize or
Downgrade
Clients HDFS, YARN, MR, Tez,
HBase, Pig. Hive, etc.
Core
Masters
Core Slaves
HDFS
YARN
HBase
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Alerting Framework
Alert Type Description Thresholds (units)
WEB Connects to a Web URL. Alert status is based on
the HTTP response code
Response Code (n/a)
Connection Timeout (seconds)
PORT Connects to a port. Alert status is based on
response time
Response (seconds)
METRIC Checks the value of a service metric. Units vary,
based on the metric being checked
Metric Value (units vary)
Connection Timeout (seconds)
AGGREGATE Aggregates the status for another alert % Affected (percentage)
SCRIPT Executes a script to handle the alert check Varies
SERVER Executes a server-side runnable class to handle
the alert check
Varies
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Alert Check Counts
• Customize the number of times an alert is
checked before dispatching a notification
• Avoid dispatching an alert notification (email, snmp)
in case of transient issues
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Alerts - Configuring the Check Count
 Set globally for all alerts, or override for a specific alert
Global
Setting Alert
Override
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Storm Monitoring View
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Grafana for Ambari Metrics
 Grafana as a “Native UI” for
Ambari Metrics
 Pre-built Dashboards
Host-level, Service-level
 Supports HTTPS
 System Home, Servers
 HDFS Home, NameNodes,
DataNodes
 YARN Home, Applications,
Job History Server
 HBase Home, Performance
FEATURES DASHBOARDS
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Grafana includes pre-built
dashboards for visualizing the most
important cluster metrics.
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The HDFS NameNode
dashboard highlights
file system activity.
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Log Search
Search and index HDP logs!
Capabilities
• Rapid Search of all HDP component logs
• Search across time ranges, log levels, and for
keywords
Solr
Logsearch
Ambari
41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Log Search
WO R K E R
N O D E
L O G
F E E D E R
Solr
LO G
S EA RC H
U I
Solr
Solr
A M BA R I
Java Process
Multi-output Support
Grok filters
Solr Cloud
Local Disk Storage
42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Future of Ambari
 Cloud features
 Service multi-instance (two ZK quorums)
 Service multi-versions (Spark 1.6 & Spark 2.0)
 YARN assemblies
 Patch Upgrades: upgrade individual components in
the same stack version, e.g., just DN and RM in HDP
2.5.*.* with zero downtime
 Ambari High Availability

Streamline Hadoop DevOps with Apache Ambari

  • 1.
  • 2.
    2 © HortonworksInc. 2011 – 2016. All Rights Reserved Speaker Jayush Luniya Staff Software Engineer @ Hortonworks Apache Ambari PMC jluniya@apache.org
  • 3.
    3 © HortonworksInc. 2011 – 2016. All Rights Reserved Apache Ambari is the open-source platform to provision, manage and monitor Hadoop clusters
  • 4.
    4 © HortonworksInc. 2011 – 2016. All Rights Reserved
  • 5.
    5 © HortonworksInc. 2011 – 2016. All Rights Reserved 4 years old
  • 6.
    6 © HortonworksInc. 2011 – 2016. All Rights Reserved Exciting Enterprise Features in Ambari 2.4  New Services: Log Search, Zeppelin, Hive LLAP  Role Based Access Control  Management Packs  Grafana UI for Ambari Metrics System  New Views: Zeppelin, Storm
  • 7.
    7 © HortonworksInc. 2011 – 2016. All Rights Reserved More in Ambari 2.4 • Alerts: Customizable props and thresholds (AMBARI-14898) • Alerts: Retry tolerance (AMBARI-15686) • Alerts: New HDFS Alerts (AMBARI-14800) • New Host Page Filtering (AMBARI-15210) • Remove Service from UI (AMBARI-14759) • Support for SLES 12 (AMBARI-16007) • Stability: Database Consistency Checking (AMBARI-16258) • Customizable Ambari Log + PID Dirs (AMBARI- 15300) • New Version Registration Experience (AMBARI- 15724) • Log Search Technical Preview (AMBARI-14927) • Operational Audit Logging (AMBARI-15241) • Role-Based Access Control (AMBARI-13977) • Automated Setup of Ambari Kerberos through Blueprints (AMBARI-15561) • Automated Setup of Ambari Proxy User (AMBARI-15561) • Customizable Host Reg. SSH Port (AMBARI- 13450) Core Features Security Features • View URLs for bookmarks (AMBARI-15821), View Refresh (AMBARI-15682) • Inherit Cluster Permissions (AMBARI-16177) • Remote Cluster Registration (AMBARI-16274) Views Framework Features
  • 8.
    8 © HortonworksInc. 2011 – 2016. All Rights Reserved Deploy Secure/L DAP Smart Configs Upgrade Monitor Scale, Extend, Analyze Simply Operations - Lifecycle Ease-of-Use Deploy
  • 9.
    9 © HortonworksInc. 2011 – 2016. All Rights Reserved Deploy On Premise Ambari UI wizard handles all of these combinations and makes recommendations based on host specs.
  • 10.
    10 © HortonworksInc. 2011 – 2016. All Rights Reserved Deploy On The Cloud Certified environments Sysprepped VMs Hundreds of similar clusters
  • 11.
    11 © HortonworksInc. 2011 – 2016. All Rights Reserved Deploy with Blueprints  Systematic way of defining a cluster  Export existing cluster into blueprint /api/v1/clusters/:clusterName?format=blueprint Config s Topology Hosts Cluster
  • 12.
    12 © HortonworksInc. 2011 – 2016. All Rights Reserved Create a cluster with Blueprints { "configurations" : [ { "hdfs-site" : { "dfs.datanode.data.dir" : "/hadoop/1, /hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : "master-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "RESOURCEMANAGER” }, … ], "cardinality" : "1" }, { "name" : "worker-host", "components" : [ { "name" : "DATANODE" }, { "name" : "NODEMANAGER” }, … ], "cardinality" : "1+" }, ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.5" } } { "blueprint" : "my-blueprint", "host_groups" :[ { "name" : "master-host", "hosts" : [ { "fqdn" : "master001.ambari.apache.org" } ] }, { "name" : "worker-host", "hosts" : [ { "fqdn" : "worker001.ambari.apache.org" }, { "fqdn" : "worker002.ambari.apache.org" }, … { "fqdn" : "worker099.ambari.apache.org" } ] } ] } 1. POST /api/v1/blueprints/my-blueprint 2. POST /api/v1/clusters/my-cluster
  • 13.
    13 © HortonworksInc. 2011 – 2016. All Rights Reserved Create a cluster with Blueprints { "configurations" : [ { "hdfs-site" : { "dfs.datanode.data.dir" : "/hadoop/1, /hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : "master-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "RESOURCEMANAGER” }, … ], "cardinality" : "1" }, { "name" : "worker-host", "components" : [ { "name" : "DATANODE" }, { "name" : "NODEMANAGER” }, … ], "cardinality" : "1+" }, ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.5" } } { "blueprint" : "my-blueprint", "host_groups" :[ { "name" : "master-host", "hosts" : [ { "fqdn" : "master001.ambari.apache.org" } ] }, { "name" : "worker-host", "hosts" : [ { "fqdn" : "worker001.ambari.apache.org" }, { "fqdn" : "worker002.ambari.apache.org" }, … { "fqdn" : "worker099.ambari.apache.org" } ] } ] } 1. POST /api/v1/blueprints/my-blueprint 2. POST /api/v1/clusters/my-cluster
  • 14.
    14 © HortonworksInc. 2011 – 2016. All Rights Reserved Create a cluster with Blueprints { "configurations" : [ { "hdfs-site" : { "dfs.datanode.data.dir" : "/hadoop/1, /hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : "master-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "RESOURCEMANAGER” }, … ], "cardinality" : "1" }, { "name" : "worker-host", "components" : [ { "name" : "DATANODE" }, { "name" : "NODEMANAGER” }, … ], "cardinality" : "1+" }, ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.5" } } { "blueprint" : "my-blueprint", "host_groups" :[ { "name" : "master-host", "hosts" : [ { "fqdn" : "master001.ambari.apache.org" } ] }, { "name" : "worker-host", "hosts" : [ { "fqdn" : "worker001.ambari.apache.org" }, { "fqdn" : "worker002.ambari.apache.org" }, … { "fqdn" : "worker099.ambari.apache.org" } ] } ] } 1. POST /api/v1/blueprints/my-blueprint 2. POST /api/v1/clusters/my-cluster
  • 15.
    15 © HortonworksInc. 2011 – 2016. All Rights Reserved Create a cluster with Blueprints { "configurations" : [ { "hdfs-site" : { "dfs.datanode.data.dir" : "/hadoop/1, /hadoop/2,/hadoop/3" } } ], "host_groups" : [ { "name" : "master-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "RESOURCEMANAGER” }, … ], "cardinality" : "1" }, { "name" : "worker-host", "components" : [ { "name" : "DATANODE" }, { "name" : "NODEMANAGER” }, … ], "cardinality" : "1+" }, ], "Blueprints" : { "stack_name" : "HDP", "stack_version" : "2.5" } } { "blueprint" : "my-blueprint", "host_groups" :[ { "name" : "master-host", "hosts" : [ { "fqdn" : "master001.ambari.apache.org" } ] }, { "name" : "worker-host", "hosts" : [ { "fqdn" : "worker001.ambari.apache.org" }, { "fqdn" : "worker002.ambari.apache.org" }, … { "fqdn" : "worker099.ambari.apache.org" } ] } ] } 1. POST /api/v1/blueprints/my-blueprint 2. POST /api/v1/clusters/my-cluster
  • 16.
    16 © HortonworksInc. 2011 – 2016. All Rights Reserved Blueprints for Large Scale • Kerberos, secure out-of-the-box • High Availability is setup initially for NameNode, YARN, Hive, Oozie, etc • Host Discovery allows Ambari to automatically install services for a Host when it comes online • Stack Advisor recommendations
  • 17.
    17 © HortonworksInc. 2011 – 2016. All Rights Reserved POST /api/v1/clusters/MyCluster/hosts [ { "blueprint" : "single-node-hdfs-test2", "host_groups" :[ { "host_group" : "slave", "host_count" : 3, "host_predicate" : "Hosts/cpu_count>1” }, { "host_group" : "super-slave", "host_count" : 5, "host_predicate" : "Hosts/cpu_count>2& Hosts/total_mem>3000000" } ] } ] Blueprint Host Discovery
  • 18.
    18 © HortonworksInc. 2011 – 2016. All Rights Reserved Comprehensive Security LDAP/AD • User auth • Sync Kerberos • MIT KDC • Keytab management Atlas • Governance • Compliance • Linage & history • Data classification Ranger • Security policies • Audit • Authorization Knox • Perimeter security • Supports LDAP/AD • Sec. for REST/HTTP • SSL
  • 19.
    19 © HortonworksInc. 2011 – 2016. All Rights Reserved Kerberos Ambari manages Kerberos principals and keytabs Works with existing MIT KDC or Active Directory Once Kerberized, handles 1. Adding hosts 2. Adding components to existing hosts 3. Adding services 4. Moving components to different hosts
  • 20.
    20 © HortonworksInc. 2011 – 2016. All Rights Reserved Management Packs  Improved Release Management: Decouple Ambari core from stacks releases  Support Add-ons: –Release vehicle for 3rd party services, views –Self-contained release artifacts –Stack is an overlay of multiple management packs
  • 21.
    21 © HortonworksInc. 2011 – 2016. All Rights Reserved Overlay of Management Packs inherits from 2.3 inherits from 2.4 inherits from 2.5
  • 22.
    22 © HortonworksInc. 2011 – 2016. All Rights Reserved Management Pack++ Short Term Goals (Ambari 2.4)  Retrofit in Stack Processing Framework  Enable 3rd party to ship add-on services Future Goals  Management Pack Framework  Deliver Views
  • 23.
    23 © HortonworksInc. 2011 – 2016. All Rights Reserved Role Based Access Control (RBAC) As Ambari & organizations grow, so do security needs Ambari integrates with external authentication systems & LDAP
  • 24.
    24 © HortonworksInc. 2011 – 2016. All Rights Reserved RBAC Terms Users belong to groups A group has a role Users can also have additional roles Roles are applied to Resources. E.g., Ambari, particular Cluster, particular View Roles have permissions e.g., add services to cluster
  • 25.
    25 © HortonworksInc. 2011 – 2016. All Rights Reserved New RBAC Roles only view ↑, except change configs ↑, except alter cluster topology or install components Ambari Admin Cluster Admin Cluster Op Service Admin Service Op Read-Only ↑, except add services, Kerberos, manage alerts & upgrades ↑, except manage permissions all
  • 26.
    26 © HortonworksInc. 2011 – 2016. All Rights Reserved Service Layout Common Services Stack Override
  • 27.
    27 © HortonworksInc. 2011 – 2016. All Rights Reserved Stack Advisor Kerberos HTTPS Zookeeper Servers Memory Settings … High Availability atlas.rest.address = http(s)://host:port # Atlas Servers atlas.enabletTLS = true|false atlas.server.http.port = 21000 atlas.server.https.port = 21443 Example Configurations
  • 28.
    28 © HortonworksInc. 2011 – 2016. All Rights Reserved Background: Upgrade Terminology Manual Upgrade The user follows instructions to upgrade the stack Incurs downtime
  • 29.
    29 © HortonworksInc. 2011 – 2016. All Rights Reserved Background: Upgrade Terminology Manual Upgrade The user follows instructions to upgrade the stack Incurs downtime Rolling Upgrade Automated Upgrades one component per host at a time Preserves cluster operation and minimizes service impact
  • 30.
    30 © HortonworksInc. 2011 – 2016. All Rights Reserved Background: Upgrade Terminology Express Upgrade Automated Runs in parallel across hosts Incurs downtime Manual Upgrade The user follows instructions to upgrade the stack Incurs downtime Rolling Upgrade Automated Upgrades one component per host at a time Preserves cluster operation and minimizes service impact
  • 31.
    31 © HortonworksInc. 2011 – 2016. All Rights Reserved Automated Upgrade: Rolling or Express Check Prerequisites Review the prereqs to confirm your cluster configs are ready Prepare Take backups of critical cluster metadata Perform Upgrade Perform the HDP upgrade. The steps depend on upgrade method: Rolling or Express Register + Install Register the HDP repository and install the target HDP version on the cluster Finalize Finalize the upgrade, making the target version the current version
  • 32.
    32 © HortonworksInc. 2011 – 2016. All Rights Reserved Process: Rolling Upgrade ZooKeeper Ranger Hive Oozie Falcon Kafka Knox Storm Slider Flume Finalize or Downgrade Clients HDFS, YARN, MR, Tez, HBase, Pig. Hive, etc. Core Masters Core Slaves HDFS YARN HBase
  • 33.
    33 © HortonworksInc. 2011 – 2016. All Rights Reserved Alerting Framework Alert Type Description Thresholds (units) WEB Connects to a Web URL. Alert status is based on the HTTP response code Response Code (n/a) Connection Timeout (seconds) PORT Connects to a port. Alert status is based on response time Response (seconds) METRIC Checks the value of a service metric. Units vary, based on the metric being checked Metric Value (units vary) Connection Timeout (seconds) AGGREGATE Aggregates the status for another alert % Affected (percentage) SCRIPT Executes a script to handle the alert check Varies SERVER Executes a server-side runnable class to handle the alert check Varies
  • 34.
    34 © HortonworksInc. 2011 – 2016. All Rights Reserved Alert Check Counts • Customize the number of times an alert is checked before dispatching a notification • Avoid dispatching an alert notification (email, snmp) in case of transient issues
  • 35.
    35 © HortonworksInc. 2011 – 2016. All Rights Reserved Alerts - Configuring the Check Count  Set globally for all alerts, or override for a specific alert Global Setting Alert Override
  • 36.
    36 © HortonworksInc. 2011 – 2016. All Rights Reserved Storm Monitoring View
  • 37.
    37 © HortonworksInc. 2011 – 2016. All Rights Reserved Grafana for Ambari Metrics  Grafana as a “Native UI” for Ambari Metrics  Pre-built Dashboards Host-level, Service-level  Supports HTTPS  System Home, Servers  HDFS Home, NameNodes, DataNodes  YARN Home, Applications, Job History Server  HBase Home, Performance FEATURES DASHBOARDS
  • 38.
    38 © HortonworksInc. 2011 – 2016. All Rights Reserved Grafana includes pre-built dashboards for visualizing the most important cluster metrics.
  • 39.
    39 © HortonworksInc. 2011 – 2016. All Rights Reserved The HDFS NameNode dashboard highlights file system activity.
  • 40.
    40 © HortonworksInc. 2011 – 2016. All Rights Reserved Log Search Search and index HDP logs! Capabilities • Rapid Search of all HDP component logs • Search across time ranges, log levels, and for keywords Solr Logsearch Ambari
  • 41.
    41 © HortonworksInc. 2011 – 2016. All Rights Reserved Log Search WO R K E R N O D E L O G F E E D E R Solr LO G S EA RC H U I Solr Solr A M BA R I Java Process Multi-output Support Grok filters Solr Cloud Local Disk Storage
  • 42.
    42 © HortonworksInc. 2011 – 2016. All Rights Reserved Future of Ambari  Cloud features  Service multi-instance (two ZK quorums)  Service multi-versions (Spark 1.6 & Spark 2.0)  YARN assemblies  Patch Upgrades: upgrade individual components in the same stack version, e.g., just DN and RM in HDP 2.5.*.* with zero downtime  Ambari High Availability

Editor's Notes

  • #5 Single pane of glass. Provision on the cloud Metrics Services Config Security Alerts Host management Views framework
  • #6 0.9 in Sep 2012 1.5 in April 2014 1.6 in July 2014 2.3.0 was not used 2.4.0 is slated with a ton of new features. 2179 Jiras. Cadence is 2-3 major releases per year, with follow up maintenance releases in the months after. http://jsfiddle.net/mp8rqq5x/2/
  • #7 Log Search : Solr, Logfeeder (similar to Logstash), and Grafana UI Zeppelin for data exploration and visualization that can plugin to multiple data backends Role Based Access Control
  • #8 Alerts, Stability EU/RU experience LogSearch Security automation Views Framework ease of use
  • #9 Deploy: Blueprints with Host Discovery Secure: Kerberos, LDAP sync Smart Configs: stack advisor, painful to configure a thousand related knobs. E.g, change zoozkeeper quorum then that has an effect on several services. Log folder, then affects log search. Upgrade: Rolling and Express Upgrade, get patches Monitor: Ambari Alerts, Ambari Metrics Analyze, Scale, Extend: Views, Management Packs
  • #11 Cloudbreak can install on Amazon EC2, MSFT Azure, Cluster install takes 5-10 mins, mostly downloading packages, installing bits, and starting services.
  • #12 Used by HDInsight (Microsoft Azure) and Hortonworks QA Allow cluster creation or scaling to be started via the REST API prior to all/any hosts being available. As hosts register with Ambari server they will be matched to request host groups and provisioned according to the requested topology Allow host predicates to be specified along with host count to provide more flexibility in matching hosts to host groups. This will allow for host flavors where different host groups are matched to different host flavors Break up the current monolithic provisioning request into a request for each host operation. For example, install on host A, start on host A, install on hostB, etc. This will allow hosts to make progress even when another host encounters a failure. Allow a host count to be specified in the cluster creation template instead of host names. This is documented in https://issues.apache.org/jira/browse/AMBARI-6275 Install a cluster with two API calls
  • #13 The blueprint contains the configs, assignment of topology to host group, stack version The creation actually assigns hosts to each host group.
  • #14 The blueprint contains the configs, assignment of topology to host group, stack version The creation actually assigns hosts to each host group.
  • #15 The blueprint contains the configs, assignment of topology to host group, stack version The creation actually assigns hosts to each host group.
  • #16 The blueprint contains the configs, assignment of topology to host group, stack version The creation actually assigns hosts to each host group.
  • #17 Dynamic availability Allow host_count to be specified instead of host_names As hosts register, they will be matched to the request host groups and provisioned according to to the requested topology When specifying a host_count, a predicate can also be specified for finer-grained control
  • #18 Dynamic availability Allow host_count to be specified instead of host_names As hosts register, they will be matched to the request host groups and provisioned according to to the requested topology When specifying a host_count, a predicate can also be specified for finer-grained control 3 Terabytes since units is in MB
  • #19 Kerberos: LDAP/AD Services: Ranger, Atlas, Knox. Ranger: setup security policies on who can access what. Authorization of audit files, plugins for other services like HDFS, Hive, Storm, etc. Atlas: Lineage of data, compliance, especially in health care and financial institutions Knox: perimeter security for HTTP and REST calls in the Hadoop Services. Works with SSL, Kerberos. Kerberos Key Distribution Center so we can define service principals and keytabs.
  • #20  Can use existing KDC (key distribution center) or install one for Hadoop Hadoop uses a rule-based system to create mappings between service principals and their related UNIX username
  • #24 As Ambari grows and organizations grow, so do security needs Users have fine-grained roles over the cluster and individual views. Granular authorization checks to distribute the responsibilities and privileges of authenticated users
  • #27 Configuration files Package with python scripts and templates Alert definitions Kerberos configurations, principals, identities, keytabs Meta data Metric details UI controls and widgets
  • #28 Stack Advisor, can now ship the recommendations for a service with the service itself, instead of a monolithic stack advisor for the entire stack. Makes it easier to integrate customer services
  • #29 Express Upgrade: fasted method to upgrade the stack since upgrades an entire component in batches of 100 hosts at a time Rolling Upgrade, one component at a time per host, which can take up to 1 min. For a 100 node cluster with
  • #30 Express Upgrade: fasted method to upgrade the stack since upgrades an entire component in batches of 100 hosts at a time Rolling Upgrade, one component at a time per host, which can take up to 1 min. For a 100 node cluster with
  • #31 Express Upgrade: fasted method to upgrade the stack since upgrades an entire component in batches of 100 hosts at a time Rolling Upgrade, one component at a time per host, which can take up to 1 min. For a 100 node cluster with
  • #38 This Grafana instance is specifically for AMS, not meant to be general-purpose If customer is already using Grafana, this is not a replacement. Grafana will support read-only access for anonymous users, and HTTPS Aggregates across entire cluster, filter by host, top/bottom x, functions like avg/sum/min/max, filter by date range
  • #41 This is not HDP Search, it is not something that the customer has to separately license, it is an embedded Solr instance
  • #42 Agent/Collection process running on each host Written in Java Tails all service log files Parses logs using Grok/regex. Can merge multiple line logs, e.g. stack trace On restart, can resume from last read line. Uses checkpoint files to maintain state Extendable design to send logs to multiple destination type. Currently can send logs to Solr and Kafka
  • #43 Major themes. Goal is to make it as good as Australian pies