SlideShare a Scribd company logo
1 of 33
Download to read offline
1 © Hortonworks Inc. 2011–2018. All rights reserved
Hadoop Operations—
Past, Present, and Future
Santhosh B Gowda
Oct 2018
2 © Hortonworks Inc. 2011–2018. All rights reserved
Disclaimer
This document may contain product features and technology directions that are under development, may
be under development in the future or may ultimately not be developed.
Project capabilities are based on information that is publicly available within the Apache Software
Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception
to release through Apache, however, technical feasibility, market demand, user feedback and the
overarching Apache Software Foundation community development process can all effect timing and final
delivery.
This document’s description of these features and technology directions does not represent a contractual
commitment, promise or obligation from Hortonworks to deliver these features in any generally available
product.
Product features and technology directions are subject to change, and must not be included in contracts,
purchase orders, or sales agreements of any kind.
Since this document contains an outline of general product development plans, customers should not rely
upon it when making purchasing decisions.
3 © Hortonworks Inc. 2011–2018. All rights reserved
Agenda
• Hadoop Operations: Ambari
• Hadoop Operations: Data Challenge
• Cloud Key Considerations
• Cloudbreak
• What is Cloudbreak ?
• Custom Images
• Kerberos Security
• Recipes
• Auto Scaling
4 © Hortonworks Inc. 2011–2018. All rights reserved
What Is Apache Ambari?
A completely open source
management platform for
provisioning, managing,
monitoring and securing
Apache Hadoop clusters.
Apache Ambari takes the
guesswork out of operating
Hadoop.
5 © Hortonworks Inc. 2011–2018. All rights reserved
Hadoop Operations—Ambari
Simplified Installation,
Configuration and Management
Centralized Security Setup
Full Visibility into Cluster Health
Highly Extensible and
Customizable
• Wizard-driven and automated cluster provisioning
• Smart Configurations and Cluster Recommendations
• Automated Rolling and Express cluster upgrades
• Reduce complexity to administer security across the
platform
• Automate setup Kerberos
• Simplify the configuration of Apache Ranger
• Predefined alerts based on operational best practices
• Advanced metrics visualization with Grafana
• Integrated with SmartSense for proactive issues prevention
• Seamlessly fit into your enterprise environment
• Bring custom Services under management via Ambari
Stacks
• Customize the UI with Ambari Views
6 © Hortonworks Inc. 2011–2018. All rights reserved
Early Adopters
Ambari
HDFS
Atlas, Ranger, Metastore,
Knox
Hive Spark
YARN
10101
10101010101
010101010101010
Public Cloud Storage Public Cloud Compute
Large Shared Workloads, supported by Shared
Services, On-Premise
Ambari
HDFS
Atlas, Ranger,
Metastore, Knox
Hive Spark
YARN
10101
10101010101
01010101010101
010101010101010101
0
Long-Running Cluster on Cloud IaaS
10101
10101010101
01010101010101
0101010101010101010
7 © Hortonworks Inc. 2011–2018. All rights reserved
Hadoop Operations: Data Challenge
• Data is becoming more and more distributed…
• Across data center and cloud environments…
• Accessed using multi- and single-workload clusters…
• But must be discoverable and accessed by all who seek it.
Cluster
Cluster
Cluster
Cluster
ClusterCluster ClusterClusterClusterClusterCluster ClusterClusterCluster
DATA
CENTER CLOUDS
The Virtual Data Lake
Business User
Very difficult to find data
(leading to inefficient use of time)
Platform Operator
Hard to secure and hard to
operate (can be time consuming
and prone to error)
8 © Hortonworks Inc. 2011–2018. All rights reserved
Cloud: Key Considerations
• Cloud is infrastructure… need a Data Strategy
• Hybrid (on-premise & cloud) requirements are real.
• Multi-Cloud (i.e. portability) is a key emerging requirement
• Logistics & Physics
• Regulatory & Compliance
• Economic arbitrage
• Consistent and familiar Security & Governance across on-premise & cloud environments
• Free movement of data, regardless of origin or destination
• Global data catalog, regardless of location
9 © Hortonworks Inc. 2011–2018. All rights reserved
Data Management Across On-Prem & Multi-Cloud
Large Shared Workloads, supported by
Shared Services, On-Premise
Ambari
HDFS
Atlas, Ranger,
Metastore, Knox
Hive Spark
YARN
10101
10101010101
01010101010101
010101010101010
Multiple Ephemeral Workloads,
supported by Shared Services, Multi-
Cloud.
Hortonworks DataPlane Service
Public Cloud A
Storage
Public Cloud A
Compute
Atlas, Ranger, Metastore, Knox
Hive LLAP
Ambari Ambari Ambari
NiFi
Spark
Cloudbreak
YARN YARN
Public Cloud B
Storage
Public Cloud B
Compute
Atlas, Ranger, Metastore, Knox
Hive LLAP
Ambari Ambari Ambari
NiFi
Spark
Cloudbreak
YARN YARN
Multiple Ephemeral Workloads,
supported by Shared Services, Multi-
Cloud.
10101
10101010101
01010101010101
010101010101010
1010
10101
10101010101
01010101010101
010101010101010
1010
10 © Hortonworks Inc. 2011–2018. All rights reserved
Hortonworks: Architecting and Optimizing for the Cloud
CLOUD STORAGE WORKLOADS
Durable Ephemeral
When data resides in cloud object
stores (e.g. Amazon S3), Hadoop
optimizes reads/writes and acts as
an intermediate cache to increase
performance and decrease latency.
Metastore
SCHEMA
Long Running
Security access to workload
clusters via a Protected Gateway
enabled for AuthN and HTTPS.
Define your data schema, security
policies, and metadata catalog
once for your ephemeral and
always-on workloads.
Atlas
CATALOG
Ranger
POLICY
SHARED DATA LAKE SERVICES
11 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak
12 © Hortonworks Inc. 2011–2018. All rights reserved
What Is Cloudbreak ?
Cloudbreak is a tool for provisioning Hadoop
clusters on any cloud infrastructure
Simplified Cluster Provisioning - prescriptive
setup, simple automation
13 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak: Harness the Agility of Cloud with Ease
Cloudbreak
• Declarative workload
provisioning across
multiple cloud providers
• Flexible topologies and
security configuration
options
• DevOps friendly, easy setup
and simple to automate
• Built-in elasticity and auto-
scaling
• Prescriptive integration
with cloud services
14 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak Building Blocks
• Cloud Credentials
• Ambari Blueprints
• Auto Scaling
• Custom Recipes
• Custom Images
• Network
• Gateway
• Kerberos Security
• Dynamic Blueprints
• Cloud Storage
Simple and Flexible Prescriptive Secure
15 © Hortonworks Inc. 2011–2018. All rights reserved
Custom Images
16 © Hortonworks Inc. 2011–2018. All rights reserved
Background: Cloudbreak
1. Cloudbreak creates VM instances using a default base image.
2. Cloudbreak installs Ambari on a VM instance.
3. Cloudbreak instructs Ambari to install a cluster on the remaining VM instances.
Cloudbreak
Node
VM
Node
VM
Node
VM
Node
VM
Node
VM
Node
VM
Cluster
17 © Hortonworks Inc. 2011–2018. All rights reserved
Custom Images Overview
Create the
Custom Image
Register the
Custom Image
Use the
Custom Image
when Creating
a Cluster
1 2 3
18 © Hortonworks Inc. 2011–2018. All rights reserved
Recipes
19 © Hortonworks Inc. 2011–2018. All rights reserved
Background: Recipes
• Cloudbreak lets you provision cluster using Ambari Blueprint however not all use-cases
can be addressed.
• Install additional software.
• System config changes.
• A recipe is a script that runs on all nodes of a selected node group at a specific time.
• Support for bash and python scripts.
• Available hooks
• Pre-ambari-start
• Post-ambari-start
• Post-cluster-install
• Pre-termination
20 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak: Add Recipes
• Cluster Extensions > Recipes >
Create
• Add recipe as File, Url or Text
21 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak: Add Recipes
• Clusters > Create Cluster >
Cluster Extensions
22 © Hortonworks Inc. 2011–2018. All rights reserved
Kerberos Security
23 © Hortonworks Inc. 2011–2018. All rights reserved
Background: Kerberos
• Strongly authenticating and establishing a user’s identity is the basis for secure access in
Hadoop. Users need to be able to reliably “identify” themselves and then have that
identity propagated throughout the Hadoop cluster.
• Once this is done, those users can access resources (such as files or directories) or
interact with the cluster (like running MapReduce jobs).
• Besides users, Hadoop cluster resources themselves (such as Hosts and Services) need
to authenticate with each other to avoid potential malicious systems or daemon’s
“posing as” trusted components of the cluster to gain access to data.
25 © Hortonworks Inc. 2011–2018. All rights reserved
Cloudbreak: Enable Kerberos Security
• Create Cluster > Security > Advanced
• [ ] Enable Kerberos Security
26 © Hortonworks Inc. 2011–2018. All rights reserved
Options: Use Existing KDC or Use Test KDC
Use Existing
KDC
Use Test KDC
Advanced
Basic
- Not for production use. For testing and
evaluation purposes only.
- Installs and configures an MIT KDC on the
master node.
- Configures the cluster to leverage that
KDC.
- Provide basic information
about your existing KDC.
- Ambari Kerberos descriptors
are generated automatically.
- Provide basic information
about your existing KDC.
- Provide your own Ambari
Kerberos descriptors.
27 © Hortonworks Inc. 2011–2018. All rights reserved
Auto Scaling
28 © Hortonworks Inc. 2011–2018. All rights reserved
Auto-Scaling
• Alerts: Create metric or time-based alerts for cluster scaling
• Policies: Scaling policies adjust cluster size based on activity and workload alerts
• General Configurations: Boundaries and cooldown period
29 © Hortonworks Inc. 2011–2018. All rights reserved
Auto-Scaling Time-Based Alert
Fire at 10:15 am everyday
30 © Hortonworks Inc. 2011–2018. All rights reserved
Auto-Scaling Metric-Based Alert
Fire after NodeManagers are
CRITICAL for 10 minutes
31 © Hortonworks Inc. 2011–2018. All rights reserved
Auto-Scaling Policies
• Define the Scale Adjustment (Node Count, Percentage, Exact)
• Select the Host Group (to Scale)
• Select Alert (which when fired, executes the Policy)
37 © Hortonworks Inc. 2011–2018. All rights reserved
Learn More
• Try Ambari
• https://docs.hortonworks.com/HDPDocuments/Ambari/Ambari-2.7.0.0/index.html
• Try Cloudbreak 2.8 (TP)
• https://docs.hortonworks.com/HDPDocuments/Cloudbreak/Cloudbreak-
2.8.0/index.html
38 © Hortonworks Inc. 2011–2018. All rights reserved
Questions?
39 © Hortonworks Inc. 2011–2018. All rights reserved
Thank you!

More Related Content

What's hot

Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0DataWorks Summit
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's EvolutionDataWorks Summit
 
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4DataWorks Summit
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?DataWorks Summit
 
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiThe First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiDataWorks Summit
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseDataWorks Summit
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0DataWorks Summit
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash CourseDataWorks Summit
 
Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at ScaleDataWorks Summit
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?DataWorks Summit
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiDataWorks Summit
 
Navigating Idiosyncrasies of IoT Development
Navigating Idiosyncrasies of IoT DevelopmentNavigating Idiosyncrasies of IoT Development
Navigating Idiosyncrasies of IoT DevelopmentDataWorks Summit
 
Curing the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerCuring the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerDataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultDataWorks Summit
 

What's hot (20)

Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
Ozone and HDFS's Evolution
Ozone and HDFS's EvolutionOzone and HDFS's Evolution
Ozone and HDFS's Evolution
 
Containers and Big Data
Containers and Big Data Containers and Big Data
Containers and Big Data
 
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
 
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiThe First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at Scale
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
 
Navigating Idiosyncrasies of IoT Development
Navigating Idiosyncrasies of IoT DevelopmentNavigating Idiosyncrasies of IoT Development
Navigating Idiosyncrasies of IoT Development
 
Curing the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerCuring the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging Manager
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
Deep learning 101
Deep learning 101Deep learning 101
Deep learning 101
 
Containers and Big Data
Containers and Big DataContainers and Big Data
Containers and Big Data
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
 

Similar to Hadoop Operations - Past, Present, and Future

Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudDataWorks Summit
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxDataWorks Summit
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not laterDataWorks Summit
 
Lessons Learned Running a Container Cloud on Apache Hadoop YARN
Lessons Learned Running a Container Cloud on Apache Hadoop YARNLessons Learned Running a Container Cloud on Apache Hadoop YARN
Lessons Learned Running a Container Cloud on Apache Hadoop YARNBillie Rinaldi
 
Lessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNLessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNDataWorks Summit
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopDataWorks Summit
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash CourseDataWorks Summit
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARNDataWorks Summit
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudDataWorks Summit
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakSean Roberts
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...Hortonworks
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudYARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudDataWorks Summit
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Shravan (Sean) Pabba
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop SecurityDataWorks Summit
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...DataWorks Summit
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)DataWorks Summit
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Hortonworks
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Hortonworks
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 

Similar to Hadoop Operations - Past, Present, and Future (20)

Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 
Lessons Learned Running a Container Cloud on Apache Hadoop YARN
Lessons Learned Running a Container Cloud on Apache Hadoop YARNLessons Learned Running a Container Cloud on Apache Hadoop YARN
Lessons Learned Running a Container Cloud on Apache Hadoop YARN
 
Lessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARNLessons learned running a container cloud on YARN
Lessons learned running a container cloud on YARN
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudYARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
 
Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015Hadoop security @ Philly Hadoop Meetup May 2015
Hadoop security @ Philly Hadoop Meetup May 2015
 
Improvements in Hadoop Security
Improvements in Hadoop SecurityImprovements in Hadoop Security
Improvements in Hadoop Security
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Recently uploaded (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Hadoop Operations - Past, Present, and Future

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved Hadoop Operations— Past, Present, and Future Santhosh B Gowda Oct 2018
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved Disclaimer This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery. This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved Agenda • Hadoop Operations: Ambari • Hadoop Operations: Data Challenge • Cloud Key Considerations • Cloudbreak • What is Cloudbreak ? • Custom Images • Kerberos Security • Recipes • Auto Scaling
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved What Is Apache Ambari? A completely open source management platform for provisioning, managing, monitoring and securing Apache Hadoop clusters. Apache Ambari takes the guesswork out of operating Hadoop.
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved Hadoop Operations—Ambari Simplified Installation, Configuration and Management Centralized Security Setup Full Visibility into Cluster Health Highly Extensible and Customizable • Wizard-driven and automated cluster provisioning • Smart Configurations and Cluster Recommendations • Automated Rolling and Express cluster upgrades • Reduce complexity to administer security across the platform • Automate setup Kerberos • Simplify the configuration of Apache Ranger • Predefined alerts based on operational best practices • Advanced metrics visualization with Grafana • Integrated with SmartSense for proactive issues prevention • Seamlessly fit into your enterprise environment • Bring custom Services under management via Ambari Stacks • Customize the UI with Ambari Views
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved Early Adopters Ambari HDFS Atlas, Ranger, Metastore, Knox Hive Spark YARN 10101 10101010101 010101010101010 Public Cloud Storage Public Cloud Compute Large Shared Workloads, supported by Shared Services, On-Premise Ambari HDFS Atlas, Ranger, Metastore, Knox Hive Spark YARN 10101 10101010101 01010101010101 010101010101010101 0 Long-Running Cluster on Cloud IaaS 10101 10101010101 01010101010101 0101010101010101010
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved Hadoop Operations: Data Challenge • Data is becoming more and more distributed… • Across data center and cloud environments… • Accessed using multi- and single-workload clusters… • But must be discoverable and accessed by all who seek it. Cluster Cluster Cluster Cluster ClusterCluster ClusterClusterClusterClusterCluster ClusterClusterCluster DATA CENTER CLOUDS The Virtual Data Lake Business User Very difficult to find data (leading to inefficient use of time) Platform Operator Hard to secure and hard to operate (can be time consuming and prone to error)
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved Cloud: Key Considerations • Cloud is infrastructure… need a Data Strategy • Hybrid (on-premise & cloud) requirements are real. • Multi-Cloud (i.e. portability) is a key emerging requirement • Logistics & Physics • Regulatory & Compliance • Economic arbitrage • Consistent and familiar Security & Governance across on-premise & cloud environments • Free movement of data, regardless of origin or destination • Global data catalog, regardless of location
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved Data Management Across On-Prem & Multi-Cloud Large Shared Workloads, supported by Shared Services, On-Premise Ambari HDFS Atlas, Ranger, Metastore, Knox Hive Spark YARN 10101 10101010101 01010101010101 010101010101010 Multiple Ephemeral Workloads, supported by Shared Services, Multi- Cloud. Hortonworks DataPlane Service Public Cloud A Storage Public Cloud A Compute Atlas, Ranger, Metastore, Knox Hive LLAP Ambari Ambari Ambari NiFi Spark Cloudbreak YARN YARN Public Cloud B Storage Public Cloud B Compute Atlas, Ranger, Metastore, Knox Hive LLAP Ambari Ambari Ambari NiFi Spark Cloudbreak YARN YARN Multiple Ephemeral Workloads, supported by Shared Services, Multi- Cloud. 10101 10101010101 01010101010101 010101010101010 1010 10101 10101010101 01010101010101 010101010101010 1010
  • 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved Hortonworks: Architecting and Optimizing for the Cloud CLOUD STORAGE WORKLOADS Durable Ephemeral When data resides in cloud object stores (e.g. Amazon S3), Hadoop optimizes reads/writes and acts as an intermediate cache to increase performance and decrease latency. Metastore SCHEMA Long Running Security access to workload clusters via a Protected Gateway enabled for AuthN and HTTPS. Define your data schema, security policies, and metadata catalog once for your ephemeral and always-on workloads. Atlas CATALOG Ranger POLICY SHARED DATA LAKE SERVICES
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved Cloudbreak
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved What Is Cloudbreak ? Cloudbreak is a tool for provisioning Hadoop clusters on any cloud infrastructure Simplified Cluster Provisioning - prescriptive setup, simple automation
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved Cloudbreak: Harness the Agility of Cloud with Ease Cloudbreak • Declarative workload provisioning across multiple cloud providers • Flexible topologies and security configuration options • DevOps friendly, easy setup and simple to automate • Built-in elasticity and auto- scaling • Prescriptive integration with cloud services
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved Cloudbreak Building Blocks • Cloud Credentials • Ambari Blueprints • Auto Scaling • Custom Recipes • Custom Images • Network • Gateway • Kerberos Security • Dynamic Blueprints • Cloud Storage Simple and Flexible Prescriptive Secure
  • 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved Custom Images
  • 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved Background: Cloudbreak 1. Cloudbreak creates VM instances using a default base image. 2. Cloudbreak installs Ambari on a VM instance. 3. Cloudbreak instructs Ambari to install a cluster on the remaining VM instances. Cloudbreak Node VM Node VM Node VM Node VM Node VM Node VM Cluster
  • 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved Custom Images Overview Create the Custom Image Register the Custom Image Use the Custom Image when Creating a Cluster 1 2 3
  • 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved Recipes
  • 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved Background: Recipes • Cloudbreak lets you provision cluster using Ambari Blueprint however not all use-cases can be addressed. • Install additional software. • System config changes. • A recipe is a script that runs on all nodes of a selected node group at a specific time. • Support for bash and python scripts. • Available hooks • Pre-ambari-start • Post-ambari-start • Post-cluster-install • Pre-termination
  • 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved Cloudbreak: Add Recipes • Cluster Extensions > Recipes > Create • Add recipe as File, Url or Text
  • 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved Cloudbreak: Add Recipes • Clusters > Create Cluster > Cluster Extensions
  • 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved Kerberos Security
  • 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved Background: Kerberos • Strongly authenticating and establishing a user’s identity is the basis for secure access in Hadoop. Users need to be able to reliably “identify” themselves and then have that identity propagated throughout the Hadoop cluster. • Once this is done, those users can access resources (such as files or directories) or interact with the cluster (like running MapReduce jobs). • Besides users, Hadoop cluster resources themselves (such as Hosts and Services) need to authenticate with each other to avoid potential malicious systems or daemon’s “posing as” trusted components of the cluster to gain access to data.
  • 24. 25 © Hortonworks Inc. 2011–2018. All rights reserved Cloudbreak: Enable Kerberos Security • Create Cluster > Security > Advanced • [ ] Enable Kerberos Security
  • 25. 26 © Hortonworks Inc. 2011–2018. All rights reserved Options: Use Existing KDC or Use Test KDC Use Existing KDC Use Test KDC Advanced Basic - Not for production use. For testing and evaluation purposes only. - Installs and configures an MIT KDC on the master node. - Configures the cluster to leverage that KDC. - Provide basic information about your existing KDC. - Ambari Kerberos descriptors are generated automatically. - Provide basic information about your existing KDC. - Provide your own Ambari Kerberos descriptors.
  • 26. 27 © Hortonworks Inc. 2011–2018. All rights reserved Auto Scaling
  • 27. 28 © Hortonworks Inc. 2011–2018. All rights reserved Auto-Scaling • Alerts: Create metric or time-based alerts for cluster scaling • Policies: Scaling policies adjust cluster size based on activity and workload alerts • General Configurations: Boundaries and cooldown period
  • 28. 29 © Hortonworks Inc. 2011–2018. All rights reserved Auto-Scaling Time-Based Alert Fire at 10:15 am everyday
  • 29. 30 © Hortonworks Inc. 2011–2018. All rights reserved Auto-Scaling Metric-Based Alert Fire after NodeManagers are CRITICAL for 10 minutes
  • 30. 31 © Hortonworks Inc. 2011–2018. All rights reserved Auto-Scaling Policies • Define the Scale Adjustment (Node Count, Percentage, Exact) • Select the Host Group (to Scale) • Select Alert (which when fired, executes the Policy)
  • 31. 37 © Hortonworks Inc. 2011–2018. All rights reserved Learn More • Try Ambari • https://docs.hortonworks.com/HDPDocuments/Ambari/Ambari-2.7.0.0/index.html • Try Cloudbreak 2.8 (TP) • https://docs.hortonworks.com/HDPDocuments/Cloudbreak/Cloudbreak- 2.8.0/index.html
  • 32. 38 © Hortonworks Inc. 2011–2018. All rights reserved Questions?
  • 33. 39 © Hortonworks Inc. 2011–2018. All rights reserved Thank you!