SlideShare a Scribd company logo
1 of 40
1 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Lessons Learned Running a Container
Cloud on Apache Hadoop YARN
Billie Rinaldi
Software Engineering YARN R&D - Hortonworks
Hadoop, YARN, HDFS, Ambari, Ranger, Atlas, and Apache
are trademarks of the Apache Software Foundation
2 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Introduction
Building Blocks for the Container Cloud
Lessons Learned
Q&A
Overview:
3 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Disclaimer
• Some features discussed today are still experimental and/or under current
development.
• Security is a work in progress and a security assessment should be performed
before implementing these features.
• Many features are released in Apache Hadoop 3.1.0 in April 2018, but some will
not come out until 3.2.0 or 3.1.1.
4 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Introduction
5 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
We Build, Test, and Release Open Source Software
• The rapid pace of open source results in
• dozens of releases a year
• tens of thousands of tests per release.
• Many permutations of tests
• Over a dozen supported Linux operating systems.
• Multiple backend databases.
• Nearly thirty open source products in our stack.
• Multiple supported versions of HDP per release.
6 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Addressing the Challenges
• How is the industry addressing the these challenges? What are customers asking for?
• How can we reduce overhead to achieve greater density and improve hardware
utilization?
• How can we improve the speed at which tests run?
• How can we reuse packaging and automation?
7 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Solution: Container Cloud
• Containers eliminate a bulk of the virtualization overhead, improving density per
node.
• Containers help reduce image variance through composition and simplified
packaging.
• Container startup time is fast, no real boot sequence.
• Containers naturally fit into YARN and its container model.
• Allow us to “use what we ship and ship what we use.”
8 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Container Cloud Architecture
Shared Services
Resource
Management
(YARN)
Management
and
Monitoring
(Ambari)
Jenkins
Worker
(Docker)
Testing HDP and HDF releases in container clusters
HDP
(Docker)
Worker
(Docker)
Storage
(HDFS)
Service
Discovery and
REST API
(YARN Services)
Security and
Governance
(Ranger and
Atlas)
SubmitTest
LaunchTest
Worker
(Docker)
HDP
(Docker)
HDP
(Docker)
HDP
(Docker)
9 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Two years later …
5.8+ million containers and many lessons learned.
10 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Building Blocks for the Container Cloud
11 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Building Blocks for the Container Cloud
• YARN Docker Support – Enables additional container types to make it easier
to onboard new applications and services on YARN.
• YARN Services Framework – Provides AM implementation and NM
improvements that enable long running services on YARN.
• YARN Service Discovery – Allows services running on YARN to discover one
another.
12 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Adding Docker on YARN
• Why Docker?
• Provides a lightweight mechanism for
packaging, distributing, and isolating
processes.
• Currently the most popular containerization
framework.
• Allows YARN developers to focus on
integration instead of container primitives.
• Mostly fits into the YARN Container model.
• Buzzword compliant.
13 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
YARN Containers
• What is a YARN container?
• Process.
• Local Resources (scripts, jars, security tokens).
• Resource Requirements (CPU, Memory, I/O).
• AM requests containers.
• RM allocates containers on NMs.
• NM runs containers.
• Container Executor encapsulates platform-
specific logic needed to start a YARN container.
https://www.pepperfry.com/tupperware-mini-rectangular-white-container-850ml-1109991.html
14 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
New Abstraction: Container Runtimes
• Challenge: Container Executor approach is cluster wide; need more flexibility while still
being able to leverage existing Container Executor features.
• Solution: Runtimes added to LinuxContainerExecutor in YARN-3611 – initially released in
Apache Hadoop 2.8.0, while improvements are ongoing).
DefaultLinuxContainerRuntime DockerLinuxContainerRuntime
Existing Linux process-
based execution.
Using Docker to run and
monitor a container.
15 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Distributed Shell and MR on Docker Examples
Environment variables are currently used to set the Container Runtime options.*
*WARNING: might change.
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTaGXMdZCdR6_RUC235TdafDqURxk-KJIptwALUmg5ZmCb3YBW7
> yarn jar $YARN_EXAMPLES_JAR pi 
-Dmapreduce.map.env="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos:7" 
-Dmapreduce.reduce.env="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos:7" 
1 40000
> yarn jar $DSHELL_JAR 
-shell_env YARN_CONTAINER_RUNTIME_TYPE=docker 
-shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos:7 
-shell_command "sleep 120” 
-jar $DSHELL_JAR 
-num_containers 1
16 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Recent Improvements: Container Lifecycle
• Improvements to stopping and cleaning up of containers (YARN-5366)
• Improvements to handling short lived containers (YARN-5366, YARN-7914)
• Container relaunch improvements to reuse existing container (YARN-7973)
• Data in the container’s root filesystem and workdir can be recovered on
the same node
• Support for sending specific signals to the container’s root process (YARN-
5366)
• Delayed deletion for debugging (YARN-5366)
17 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Recent Improvements: Container Security
• ACLs for privileged containers, with the ability to disable privileged
containers system wide (YARN-6623)
• Sudo / group check for running privileged containers (YARN-7221)
• Default untrusted mode for running unmodified images out of the box
(YARN-7516)
• Username to UID/GID mapping to ensure privacy (YARN-4266)
• User supplied bind mounts validated against an admin supplied whitelist
(YARN-5534)
• More restrictive YARN mounts to limit host exposure (YARN-7815)
18 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Building Blocks for the Container Cloud
• YARN Docker Support – Enables additional container types to make it easier
to onboard new applications and services on YARN.
• YARN Services Framework – Provides AM implementation and NM
improvements that enable long running services on YARN.
• YARN Service Discovery – Allows services running on YARN to discover one
another.
19 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
YARN Services Goals
• Long Running – Simplify the deployment and management of long running
applications on YARN.
• Easily Bring New Applications – Remove tedious process of bringing new
applications to YARN.
• Easy to Manage Applications – REST API and Command Line tools.
• Declarative Configuration – Provide configuration to the applications,
declare resource needs, specify placement policies.
20 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
YARN Services Overview
• Apache Slider – incubating at Apache since 2014, designed to make it easier
to run long-running applications on YARN.
• Simplified and first-class support for services in YARN (YARN-4692) – initiated
in 2016.
• Container orchestrator to provision docker-based or native-process based containers
(YARN-5079), integrates Slider core into YARN.
• REST API for managing services on YARN (YARN-4793).
• Simplified discovery of services via DNS mechanisms (YARN-4757).
• Released in Apache Hadoop 3.1.0!
21 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
YARN Services Docker Httpd Example
{
"name": "simple-httpd-service",
"version": "1.0.0",
"lifetime": "3600",
"components": [
{
"name": "httpd",
"number_of_containers": 2,
"launch_command": "/usr/bin/run-httpd",
"artifact": {
"id": "centos/httpd-24-centos7:latest",
"type": "DOCKER"
},
"resource": {
"cpus": 1,
"memory": "1024"
},
...
> yarn app –launch simple-httpd-service 
simple-httpd-service.json
22 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
YARN Services Docker Httpd Example continued
"readiness_check": {
"type": "HTTP",
"properties": {
"url": "http://${THIS_HOST}:8080"
}
},
"configuration": {
"files": [
{
"type": "TEMPLATE",
"dest_file": "/var/www/html/index.html",
"properties": {
"content": "<html><body>Hello from
${COMPONENT_INSTANCE_NAME}!</body></html>"
}
}
]
}
23 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Service assembly
{
"name": "httpd-proxy-service",
"version": "1.0.0",
"components": [
{
"artifact": {
"id": "simple-httpd-service",
"type": "SERVICE"
}
},
{
"name": "httpd-proxy",
"number_of_containers": 1,
"dependencies": [ "httpd" ],
"artifact": {
"id": "centos/httpd-24-centos7",
"type": "DOCKER"
}, ...
> yarn app –save simple-httpd-service 
simple-httpd-service.json
> yarn app –launch httpd-proxy-service 
httpd-proxy-service.json
24 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Other features in progress
• Container upgrade
– Localize new resource while container is running (YARN-1503) portions in 2.9.0
– Restart container with new resources using same container allocation (YARN-4726)
– Support in service AM and service REST API (YARN-7512) slated for release 3.2.0
• Placement policy support in YARN services (YARN-7142)
• User supplied Docker client configs in YARN services (YARN-7996)
• Entrypoint support (YARN-7654)
• And many more!
25 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Building Blocks for the Container Cloud
• YARN Docker Support – Enables additional container types to make it easier
to onboard new applications and services on YARN.
• YARN Services Framework – Provides AM implementation and NM
improvements that enable long running services on YARN.
• YARN Service Discovery – Allows services running on YARN to discover one
another.
26 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
YARN Service Registry
• The YARN Service Registry allows deployed applications to register
themselves to allow discovery by other applications (YARN-913).
• Entries are stored in Zookeeper as the default k/v store, providing HA and
consistency.
• Native Java clients, REST and CLI interfaces exist for access the YARN Service
Registry.
http://api-university.com/blog/let-developers-try-your-apis-without-registration/
27 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Simplified Discovery via DNS
Challenge - Native Java clients and REST interfaces are not ideal
discovery mechanisms for existing applications.
Solution - Exposing YARN Service Registry entries via a more
generic and widely used discovery mechanism: DNS.
The YARN Registry DNS server (YARN-4757) meets these needs.
• Watches the YARN Service Registry for new application and container
registration/deregistration.
• Creates the appropriate DNS records for the container
componentInstanceName.serviceName.user.domain
ctr-e138-1518143905142-215498-01-000007.domain
• Supports zone transfers or zone forwarding.
http://www.idownloadblog.com/2016/03/05/how-to-use-custom-dns-settings/
28 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Lessons Learned
29 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Success!
5.8+ million containers, 1.1+ million testsHuge uptick in adoption
30 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Successes continued
• First full HDP release tested and certified end to end on the container cloud!
• All supported operating systems (CentOS 6/7, SLES, Ubuntu 14/16, Debian)
running in containers on CentOS 7.3 hosts!
Density per node improved by 2.5x!
14
35Virtual Machines
Containers
31 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
…but not without some pain
“No gains without pains.”
- Benjamin Franklin, Poor Richard's Almanack
32 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
IP Management
Challenge: YARN does not manage IP addresses directly.
What we did:
• Allocate a pool of IP addresses to the cluster on the same VLAN.
• Use Docker’s bridge networking with fixed_cidr option.
• Each node in the cluster is allocated 64 IP addresses from the pool.
• Use docker inspect to get the container IP address and add it to the
YARN Service Registry
• YARN Registry DNS Server registers DNS records for easy lookup.
33 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Docker Storage Driver and Filesystem
Challenge: Many Docker Storage drivers, lots of limitations.
What is the best option for us?
What we did:
• Extensive testing of create, stop, and delete operation timings.
• Eliminate options that require significant modifications to the Linux
OS to ease adoption in enterprises.
• If possible, use a driver deemed production ready by maintainers.
• Ultimately, we landed on DeviceMapper LVM thinpool with ext4
backing filesystem ... or so we thought.
34 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
DeviceMapper kernel oops and performance
Challenge: Heavy writes to the container’s root filesystem
causing kernel panics, uninterruptible processes, and high IO wait.
What we did:
• DeviceMapper is the only viable option due to our workloads, can’t
switch to a different storage driver.
• Install SSDs and configure Docker’s graph storage to use it to
eliminate high IO wait.
• Test various RAID controller firmware, Linux kernel, and backing
filesystems to find a stable combination that doesn’t result in panics
and Docker hangs (most recently testing upgrade to CentOS 7.4 with
4.15 kernel, so we could look at overlay2 now).
35 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
User Namespacing
Challenge: YARN provides features for localized resources and log aggregation.
Running containers as the submitting user presents challenges.
What we did:
• Run all containers as the nobody user so that user and application directories
configured by YARN are available to the container.
• Update images so that nobody UID/GID match in image and host.
• Allow for “vanilla containers” that do not bind mount the YARN directories,
allowing the process in the container to run as any user, but logs can no
longer be aggregated (Docker run override disabled).
• User namespacing in Docker is lacking, as it can only remap a single user. As
this restriction is removed, we expect to migrate to this feature. Recent
improvements allow setting UID:GID pair, but further support is needed.
36 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Image Management
Challenge: The implicit pull of a large image can lead to task timeouts.
SSD space is a premium making image clean up important.
What we did:
• Run an internal private registry to reduce WAN load.
• Jenkins job that builds and distributes the image to all nodes in the
cluster, avoiding the implicit pull from “docker run”
• Jenkins job to clean up images that are no longer needed. Full
thinpool can also cause kernel panics – aggressively age off images.
• Reuse base images where possible to reduce bandwidth.
• Work being discussed to provide first-class YARN support for image
management (YARN-3854).
37 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Summary
38 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Summary
• Massive density improvements.
• Greatly improved ease of use.
• Many real world lessons learned.
• Widespread internal adoption.
• Improved self service capabilities.
• Internal use of long running services a reality!
39 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Get Involved
• Still plenty of work to go!
• Improve docker image management
and user handling
• Networking plugins.
• Security/permissions models.
• Bring Your Own Image challenges.
• Follow Along.
• Try out Apache Hadoop 3.1.0 or checkout trunk
and build it! Ansible/vagrant setups available.
40 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Questions?
billie@hortonworks.com
skumpf@hortonworks.com

More Related Content

What's hot

Accelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache HiveAccelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache Hive
DataWorks Summit
 
Using LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowUsing LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache Arrow
DataWorks Summit
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...
DataWorks Summit
 

What's hot (20)

Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...
Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...
Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...
 
SDLC with Apache NiFi
SDLC with Apache NiFiSDLC with Apache NiFi
SDLC with Apache NiFi
 
SAM—streaming analytics made easy
SAM—streaming analytics made easySAM—streaming analytics made easy
SAM—streaming analytics made easy
 
Accelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache HiveAccelerating query processing with materialized views in Apache Hive
Accelerating query processing with materialized views in Apache Hive
 
Containers and Big Data
Containers and Big DataContainers and Big Data
Containers and Big Data
 
Containers and Big Data
Containers and Big DataContainers and Big Data
Containers and Big Data
 
Curing the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerCuring the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging Manager
 
Deep learning 101
Deep learning 101Deep learning 101
Deep learning 101
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
 
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Using LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache ArrowUsing LLVM to accelerate processing of data in Apache Arrow
Using LLVM to accelerate processing of data in Apache Arrow
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
Keynote
KeynoteKeynote
Keynote
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
 
Navigating Idiosyncrasies of IoT Development
Navigating Idiosyncrasies of IoT DevelopmentNavigating Idiosyncrasies of IoT Development
Navigating Idiosyncrasies of IoT Development
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...Achieving a 360-degree view of manufacturing via open source industrial data ...
Achieving a 360-degree view of manufacturing via open source industrial data ...
 
Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...Present and future of unified, portable and efficient data processing with Ap...
Present and future of unified, portable and efficient data processing with Ap...
 

Similar to Lessons learned running a container cloud on YARN

Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo
DataWorks Summit
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
DataWorks Summit
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
DataWorks Summit
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
DataWorks Summit
 
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
Yahoo Developer Network
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
DataWorks Summit
 

Similar to Lessons learned running a container cloud on YARN (20)

Running a container cloud on YARN
Running a container cloud on YARNRunning a container cloud on YARN
Running a container cloud on YARN
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And CloudYARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
 
컨테이너 기술 소개 - Warden, Garden, Docker
컨테이너 기술 소개 - Warden, Garden, Docker컨테이너 기술 소개 - Warden, Garden, Docker
컨테이너 기술 소개 - Warden, Garden, Docker
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
 
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
 
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache KnoxFortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
The Kubernetes WebLogic revival (part 1)
The Kubernetes WebLogic revival (part 1)The Kubernetes WebLogic revival (part 1)
The Kubernetes WebLogic revival (part 1)
 
20191201 kubernetes managed weblogic revival - part 1
20191201 kubernetes managed weblogic revival - part 120191201 kubernetes managed weblogic revival - part 1
20191201 kubernetes managed weblogic revival - part 1
 
Grow Your Business with Oracle Linux, Virtualization- BL v6.pdf
Grow Your Business with Oracle Linux, Virtualization- BL v6.pdfGrow Your Business with Oracle Linux, Virtualization- BL v6.pdf
Grow Your Business with Oracle Linux, Virtualization- BL v6.pdf
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
 
Saving the elephant—now, not later
Saving the elephant—now, not laterSaving the elephant—now, not later
Saving the elephant—now, not later
 
YARN and the Docker container runtime
YARN and the Docker container runtimeYARN and the Docker container runtime
YARN and the Docker container runtime
 

More from DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Lessons learned running a container cloud on YARN

  • 1. 1 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Lessons Learned Running a Container Cloud on Apache Hadoop YARN Billie Rinaldi Software Engineering YARN R&D - Hortonworks Hadoop, YARN, HDFS, Ambari, Ranger, Atlas, and Apache are trademarks of the Apache Software Foundation
  • 2. 2 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Introduction Building Blocks for the Container Cloud Lessons Learned Q&A Overview:
  • 3. 3 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Disclaimer • Some features discussed today are still experimental and/or under current development. • Security is a work in progress and a security assessment should be performed before implementing these features. • Many features are released in Apache Hadoop 3.1.0 in April 2018, but some will not come out until 3.2.0 or 3.1.1.
  • 4. 4 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Introduction
  • 5. 5 © Hortonworks Inc. 2011 – 2018. All Rights Reserved We Build, Test, and Release Open Source Software • The rapid pace of open source results in • dozens of releases a year • tens of thousands of tests per release. • Many permutations of tests • Over a dozen supported Linux operating systems. • Multiple backend databases. • Nearly thirty open source products in our stack. • Multiple supported versions of HDP per release.
  • 6. 6 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Addressing the Challenges • How is the industry addressing the these challenges? What are customers asking for? • How can we reduce overhead to achieve greater density and improve hardware utilization? • How can we improve the speed at which tests run? • How can we reuse packaging and automation?
  • 7. 7 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Solution: Container Cloud • Containers eliminate a bulk of the virtualization overhead, improving density per node. • Containers help reduce image variance through composition and simplified packaging. • Container startup time is fast, no real boot sequence. • Containers naturally fit into YARN and its container model. • Allow us to “use what we ship and ship what we use.”
  • 8. 8 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Container Cloud Architecture Shared Services Resource Management (YARN) Management and Monitoring (Ambari) Jenkins Worker (Docker) Testing HDP and HDF releases in container clusters HDP (Docker) Worker (Docker) Storage (HDFS) Service Discovery and REST API (YARN Services) Security and Governance (Ranger and Atlas) SubmitTest LaunchTest Worker (Docker) HDP (Docker) HDP (Docker) HDP (Docker)
  • 9. 9 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Two years later … 5.8+ million containers and many lessons learned.
  • 10. 10 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Building Blocks for the Container Cloud
  • 11. 11 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Building Blocks for the Container Cloud • YARN Docker Support – Enables additional container types to make it easier to onboard new applications and services on YARN. • YARN Services Framework – Provides AM implementation and NM improvements that enable long running services on YARN. • YARN Service Discovery – Allows services running on YARN to discover one another.
  • 12. 12 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Adding Docker on YARN • Why Docker? • Provides a lightweight mechanism for packaging, distributing, and isolating processes. • Currently the most popular containerization framework. • Allows YARN developers to focus on integration instead of container primitives. • Mostly fits into the YARN Container model. • Buzzword compliant.
  • 13. 13 © Hortonworks Inc. 2011 – 2018. All Rights Reserved YARN Containers • What is a YARN container? • Process. • Local Resources (scripts, jars, security tokens). • Resource Requirements (CPU, Memory, I/O). • AM requests containers. • RM allocates containers on NMs. • NM runs containers. • Container Executor encapsulates platform- specific logic needed to start a YARN container. https://www.pepperfry.com/tupperware-mini-rectangular-white-container-850ml-1109991.html
  • 14. 14 © Hortonworks Inc. 2011 – 2018. All Rights Reserved New Abstraction: Container Runtimes • Challenge: Container Executor approach is cluster wide; need more flexibility while still being able to leverage existing Container Executor features. • Solution: Runtimes added to LinuxContainerExecutor in YARN-3611 – initially released in Apache Hadoop 2.8.0, while improvements are ongoing). DefaultLinuxContainerRuntime DockerLinuxContainerRuntime Existing Linux process- based execution. Using Docker to run and monitor a container.
  • 15. 15 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Distributed Shell and MR on Docker Examples Environment variables are currently used to set the Container Runtime options.* *WARNING: might change. https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTaGXMdZCdR6_RUC235TdafDqURxk-KJIptwALUmg5ZmCb3YBW7 > yarn jar $YARN_EXAMPLES_JAR pi -Dmapreduce.map.env="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos:7" -Dmapreduce.reduce.env="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos:7" 1 40000 > yarn jar $DSHELL_JAR -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos:7 -shell_command "sleep 120” -jar $DSHELL_JAR -num_containers 1
  • 16. 16 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Recent Improvements: Container Lifecycle • Improvements to stopping and cleaning up of containers (YARN-5366) • Improvements to handling short lived containers (YARN-5366, YARN-7914) • Container relaunch improvements to reuse existing container (YARN-7973) • Data in the container’s root filesystem and workdir can be recovered on the same node • Support for sending specific signals to the container’s root process (YARN- 5366) • Delayed deletion for debugging (YARN-5366)
  • 17. 17 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Recent Improvements: Container Security • ACLs for privileged containers, with the ability to disable privileged containers system wide (YARN-6623) • Sudo / group check for running privileged containers (YARN-7221) • Default untrusted mode for running unmodified images out of the box (YARN-7516) • Username to UID/GID mapping to ensure privacy (YARN-4266) • User supplied bind mounts validated against an admin supplied whitelist (YARN-5534) • More restrictive YARN mounts to limit host exposure (YARN-7815)
  • 18. 18 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Building Blocks for the Container Cloud • YARN Docker Support – Enables additional container types to make it easier to onboard new applications and services on YARN. • YARN Services Framework – Provides AM implementation and NM improvements that enable long running services on YARN. • YARN Service Discovery – Allows services running on YARN to discover one another.
  • 19. 19 © Hortonworks Inc. 2011 – 2018. All Rights Reserved YARN Services Goals • Long Running – Simplify the deployment and management of long running applications on YARN. • Easily Bring New Applications – Remove tedious process of bringing new applications to YARN. • Easy to Manage Applications – REST API and Command Line tools. • Declarative Configuration – Provide configuration to the applications, declare resource needs, specify placement policies.
  • 20. 20 © Hortonworks Inc. 2011 – 2018. All Rights Reserved YARN Services Overview • Apache Slider – incubating at Apache since 2014, designed to make it easier to run long-running applications on YARN. • Simplified and first-class support for services in YARN (YARN-4692) – initiated in 2016. • Container orchestrator to provision docker-based or native-process based containers (YARN-5079), integrates Slider core into YARN. • REST API for managing services on YARN (YARN-4793). • Simplified discovery of services via DNS mechanisms (YARN-4757). • Released in Apache Hadoop 3.1.0!
  • 21. 21 © Hortonworks Inc. 2011 – 2018. All Rights Reserved YARN Services Docker Httpd Example { "name": "simple-httpd-service", "version": "1.0.0", "lifetime": "3600", "components": [ { "name": "httpd", "number_of_containers": 2, "launch_command": "/usr/bin/run-httpd", "artifact": { "id": "centos/httpd-24-centos7:latest", "type": "DOCKER" }, "resource": { "cpus": 1, "memory": "1024" }, ... > yarn app –launch simple-httpd-service simple-httpd-service.json
  • 22. 22 © Hortonworks Inc. 2011 – 2018. All Rights Reserved YARN Services Docker Httpd Example continued "readiness_check": { "type": "HTTP", "properties": { "url": "http://${THIS_HOST}:8080" } }, "configuration": { "files": [ { "type": "TEMPLATE", "dest_file": "/var/www/html/index.html", "properties": { "content": "<html><body>Hello from ${COMPONENT_INSTANCE_NAME}!</body></html>" } } ] }
  • 23. 23 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Service assembly { "name": "httpd-proxy-service", "version": "1.0.0", "components": [ { "artifact": { "id": "simple-httpd-service", "type": "SERVICE" } }, { "name": "httpd-proxy", "number_of_containers": 1, "dependencies": [ "httpd" ], "artifact": { "id": "centos/httpd-24-centos7", "type": "DOCKER" }, ... > yarn app –save simple-httpd-service simple-httpd-service.json > yarn app –launch httpd-proxy-service httpd-proxy-service.json
  • 24. 24 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Other features in progress • Container upgrade – Localize new resource while container is running (YARN-1503) portions in 2.9.0 – Restart container with new resources using same container allocation (YARN-4726) – Support in service AM and service REST API (YARN-7512) slated for release 3.2.0 • Placement policy support in YARN services (YARN-7142) • User supplied Docker client configs in YARN services (YARN-7996) • Entrypoint support (YARN-7654) • And many more!
  • 25. 25 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Building Blocks for the Container Cloud • YARN Docker Support – Enables additional container types to make it easier to onboard new applications and services on YARN. • YARN Services Framework – Provides AM implementation and NM improvements that enable long running services on YARN. • YARN Service Discovery – Allows services running on YARN to discover one another.
  • 26. 26 © Hortonworks Inc. 2011 – 2018. All Rights Reserved YARN Service Registry • The YARN Service Registry allows deployed applications to register themselves to allow discovery by other applications (YARN-913). • Entries are stored in Zookeeper as the default k/v store, providing HA and consistency. • Native Java clients, REST and CLI interfaces exist for access the YARN Service Registry. http://api-university.com/blog/let-developers-try-your-apis-without-registration/
  • 27. 27 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Simplified Discovery via DNS Challenge - Native Java clients and REST interfaces are not ideal discovery mechanisms for existing applications. Solution - Exposing YARN Service Registry entries via a more generic and widely used discovery mechanism: DNS. The YARN Registry DNS server (YARN-4757) meets these needs. • Watches the YARN Service Registry for new application and container registration/deregistration. • Creates the appropriate DNS records for the container componentInstanceName.serviceName.user.domain ctr-e138-1518143905142-215498-01-000007.domain • Supports zone transfers or zone forwarding. http://www.idownloadblog.com/2016/03/05/how-to-use-custom-dns-settings/
  • 28. 28 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Lessons Learned
  • 29. 29 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Success! 5.8+ million containers, 1.1+ million testsHuge uptick in adoption
  • 30. 30 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Successes continued • First full HDP release tested and certified end to end on the container cloud! • All supported operating systems (CentOS 6/7, SLES, Ubuntu 14/16, Debian) running in containers on CentOS 7.3 hosts! Density per node improved by 2.5x! 14 35Virtual Machines Containers
  • 31. 31 © Hortonworks Inc. 2011 – 2018. All Rights Reserved …but not without some pain “No gains without pains.” - Benjamin Franklin, Poor Richard's Almanack
  • 32. 32 © Hortonworks Inc. 2011 – 2018. All Rights Reserved IP Management Challenge: YARN does not manage IP addresses directly. What we did: • Allocate a pool of IP addresses to the cluster on the same VLAN. • Use Docker’s bridge networking with fixed_cidr option. • Each node in the cluster is allocated 64 IP addresses from the pool. • Use docker inspect to get the container IP address and add it to the YARN Service Registry • YARN Registry DNS Server registers DNS records for easy lookup.
  • 33. 33 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Docker Storage Driver and Filesystem Challenge: Many Docker Storage drivers, lots of limitations. What is the best option for us? What we did: • Extensive testing of create, stop, and delete operation timings. • Eliminate options that require significant modifications to the Linux OS to ease adoption in enterprises. • If possible, use a driver deemed production ready by maintainers. • Ultimately, we landed on DeviceMapper LVM thinpool with ext4 backing filesystem ... or so we thought.
  • 34. 34 © Hortonworks Inc. 2011 – 2018. All Rights Reserved DeviceMapper kernel oops and performance Challenge: Heavy writes to the container’s root filesystem causing kernel panics, uninterruptible processes, and high IO wait. What we did: • DeviceMapper is the only viable option due to our workloads, can’t switch to a different storage driver. • Install SSDs and configure Docker’s graph storage to use it to eliminate high IO wait. • Test various RAID controller firmware, Linux kernel, and backing filesystems to find a stable combination that doesn’t result in panics and Docker hangs (most recently testing upgrade to CentOS 7.4 with 4.15 kernel, so we could look at overlay2 now).
  • 35. 35 © Hortonworks Inc. 2011 – 2018. All Rights Reserved User Namespacing Challenge: YARN provides features for localized resources and log aggregation. Running containers as the submitting user presents challenges. What we did: • Run all containers as the nobody user so that user and application directories configured by YARN are available to the container. • Update images so that nobody UID/GID match in image and host. • Allow for “vanilla containers” that do not bind mount the YARN directories, allowing the process in the container to run as any user, but logs can no longer be aggregated (Docker run override disabled). • User namespacing in Docker is lacking, as it can only remap a single user. As this restriction is removed, we expect to migrate to this feature. Recent improvements allow setting UID:GID pair, but further support is needed.
  • 36. 36 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Image Management Challenge: The implicit pull of a large image can lead to task timeouts. SSD space is a premium making image clean up important. What we did: • Run an internal private registry to reduce WAN load. • Jenkins job that builds and distributes the image to all nodes in the cluster, avoiding the implicit pull from “docker run” • Jenkins job to clean up images that are no longer needed. Full thinpool can also cause kernel panics – aggressively age off images. • Reuse base images where possible to reduce bandwidth. • Work being discussed to provide first-class YARN support for image management (YARN-3854).
  • 37. 37 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Summary
  • 38. 38 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Summary • Massive density improvements. • Greatly improved ease of use. • Many real world lessons learned. • Widespread internal adoption. • Improved self service capabilities. • Internal use of long running services a reality!
  • 39. 39 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Get Involved • Still plenty of work to go! • Improve docker image management and user handling • Networking plugins. • Security/permissions models. • Bring Your Own Image challenges. • Follow Along. • Try out Apache Hadoop 3.1.0 or checkout trunk and build it! Ansible/vagrant setups available.
  • 40. 40 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Questions? billie@hortonworks.com skumpf@hortonworks.com