Submit Search
Upload
Lessons Learned Running a Container Cloud on Apache Hadoop YARN
•
Download as PPTX, PDF
•
1 like
•
584 views
B
Billie Rinaldi
Follow
Talk given at DataWorks Summit Berlin 2018 on April 18, 2018.
Read less
Read more
Technology
Report
Share
Report
Share
1 of 40
Download now
Recommended
Running a container cloud on YARN
Running a container cloud on YARN
DataWorks Summit
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Yahoo Developer Network
A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3
DataWorks Summit
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
DataWorks Summit
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4
DataWorks Summit
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
DataWorks Summit
Recommended
Running a container cloud on YARN
Running a container cloud on YARN
DataWorks Summit
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Yahoo Developer Network
A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3
DataWorks Summit
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
DataWorks Summit
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4
DataWorks Summit
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
DataWorks Summit
Webinar - Introduction to Ceph and OpenStack
Webinar - Introduction to Ceph and OpenStack
Ceph Community
Cloud orchestration major tools comparision
Cloud orchestration major tools comparision
Ravi Kiran
Apache Slider
Apache Slider
Shivaji Dutta
Containers and Big Data
Containers and Big Data
DataWorks Summit
Apache Knox - Hadoop Security Swiss Army Knife
Apache Knox - Hadoop Security Swiss Army Knife
DataWorks Summit
Writing app framworks for hadoop on yarn
Writing app framworks for hadoop on yarn
DataWorks Summit
Accelerating query processing
Accelerating query processing
DataWorks Summit
Handling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow Story
DataWorks Summit
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
Heiko Loewe
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
Hortonworks
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop operations with ambari
Hortonworks
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Yahoo Developer Network
Geode on Docker
Geode on Docker
Apache Geode
Eclipse MicroProfile 과 Microservice Java framework – Helidon
Eclipse MicroProfile 과 Microservice Java framework – Helidon
Oracle Korea
OpenStack 101 update
OpenStack 101 update
Kamesh Pemmaraju
times ten in-memory database for extreme performance
times ten in-memory database for extreme performance
Oracle Korea
Oracle ExaLogic Overview
Oracle ExaLogic Overview
Peter Doolan
Exalogic workshop overview__hardwarev4
Exalogic workshop overview__hardwarev4
Fran Navarro
OpenStack - Infrastructure as a service
OpenStack - Infrastructure as a service
Denis Cavalcante
Exalogic Technical Overview
Exalogic Technical Overview
Andrey Akulov
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
DataWorks Summit
Containers and Big Data
Containers and Big Data
DataWorks Summit
More Related Content
What's hot
Webinar - Introduction to Ceph and OpenStack
Webinar - Introduction to Ceph and OpenStack
Ceph Community
Cloud orchestration major tools comparision
Cloud orchestration major tools comparision
Ravi Kiran
Apache Slider
Apache Slider
Shivaji Dutta
Containers and Big Data
Containers and Big Data
DataWorks Summit
Apache Knox - Hadoop Security Swiss Army Knife
Apache Knox - Hadoop Security Swiss Army Knife
DataWorks Summit
Writing app framworks for hadoop on yarn
Writing app framworks for hadoop on yarn
DataWorks Summit
Accelerating query processing
Accelerating query processing
DataWorks Summit
Handling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow Story
DataWorks Summit
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
Heiko Loewe
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
Hortonworks
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop operations with ambari
Hortonworks
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Yahoo Developer Network
Geode on Docker
Geode on Docker
Apache Geode
Eclipse MicroProfile 과 Microservice Java framework – Helidon
Eclipse MicroProfile 과 Microservice Java framework – Helidon
Oracle Korea
OpenStack 101 update
OpenStack 101 update
Kamesh Pemmaraju
times ten in-memory database for extreme performance
times ten in-memory database for extreme performance
Oracle Korea
Oracle ExaLogic Overview
Oracle ExaLogic Overview
Peter Doolan
Exalogic workshop overview__hardwarev4
Exalogic workshop overview__hardwarev4
Fran Navarro
OpenStack - Infrastructure as a service
OpenStack - Infrastructure as a service
Denis Cavalcante
Exalogic Technical Overview
Exalogic Technical Overview
Andrey Akulov
What's hot
(20)
Webinar - Introduction to Ceph and OpenStack
Webinar - Introduction to Ceph and OpenStack
Cloud orchestration major tools comparision
Cloud orchestration major tools comparision
Apache Slider
Apache Slider
Containers and Big Data
Containers and Big Data
Apache Knox - Hadoop Security Swiss Army Knife
Apache Knox - Hadoop Security Swiss Army Knife
Writing app framworks for hadoop on yarn
Writing app framworks for hadoop on yarn
Accelerating query processing
Accelerating query processing
Handling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow Story
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop operations with ambari
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Geode on Docker
Geode on Docker
Eclipse MicroProfile 과 Microservice Java framework – Helidon
Eclipse MicroProfile 과 Microservice Java framework – Helidon
OpenStack 101 update
OpenStack 101 update
times ten in-memory database for extreme performance
times ten in-memory database for extreme performance
Oracle ExaLogic Overview
Oracle ExaLogic Overview
Exalogic workshop overview__hardwarev4
Exalogic workshop overview__hardwarev4
OpenStack - Infrastructure as a service
OpenStack - Infrastructure as a service
Exalogic Technical Overview
Exalogic Technical Overview
Similar to Lessons Learned Running a Container Cloud on Apache Hadoop YARN
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
DataWorks Summit
Containers and Big Data
Containers and Big Data
DataWorks Summit
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo
DataWorks Summit
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
DataWorks Summit
컨테이너 기술 소개 - Warden, Garden, Docker
컨테이너 기술 소개 - Warden, Garden, Docker
seungdon Choi
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
DataWorks Summit
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
Sunil Govindan
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
Yahoo Developer Network
Get most out of Spark on YARN
Get most out of Spark on YARN
DataWorks Summit
The Kubernetes WebLogic revival (part 1)
The Kubernetes WebLogic revival (part 1)
Simon Haslam
20191201 kubernetes managed weblogic revival - part 1
20191201 kubernetes managed weblogic revival - part 1
makker_nl
Grow Your Business with Oracle Linux, Virtualization- BL v6.pdf
Grow Your Business with Oracle Linux, Virtualization- BL v6.pdf
syamsulsakbar
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Wangda Tan
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
Saving the elephant—now, not later
Saving the elephant—now, not later
DataWorks Summit
YARN and the Docker container runtime
YARN and the Docker container runtime
DataWorks Summit/Hadoop Summit
Oracle - Continuous Delivery NYC meetup, June 07, 2018
Oracle - Continuous Delivery NYC meetup, June 07, 2018
Oracle Developers
Docker handons-workshop-for-charity
Docker handons-workshop-for-charity
Yusuf Hadiwinata Sutandar
DevOps Supercharged with Docker on Exadata
DevOps Supercharged with Docker on Exadata
MarketingArrowECS_CZ
Similar to Lessons Learned Running a Container Cloud on Apache Hadoop YARN
(20)
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
Containers and Big Data
Containers and Big Data
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
컨테이너 기술 소개 - Warden, Garden, Docker
컨테이너 기술 소개 - Warden, Garden, Docker
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
Get most out of Spark on YARN
Get most out of Spark on YARN
The Kubernetes WebLogic revival (part 1)
The Kubernetes WebLogic revival (part 1)
20191201 kubernetes managed weblogic revival - part 1
20191201 kubernetes managed weblogic revival - part 1
Grow Your Business with Oracle Linux, Virtualization- BL v6.pdf
Grow Your Business with Oracle Linux, Virtualization- BL v6.pdf
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
Saving the elephant—now, not later
Saving the elephant—now, not later
YARN and the Docker container runtime
YARN and the Docker container runtime
Oracle - Continuous Delivery NYC meetup, June 07, 2018
Oracle - Continuous Delivery NYC meetup, June 07, 2018
Docker handons-workshop-for-charity
Docker handons-workshop-for-charity
DevOps Supercharged with Docker on Exadata
DevOps Supercharged with Docker on Exadata
Recently uploaded
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
RTylerCroy
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Sinan KOZAK
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Pooja Nehwal
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
BookNet Canada
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Puma Security, LLC
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Maria Levchenko
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Igalia
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Malak Abu Hammad
Slack Application Development 101 Slides
Slack Application Development 101 Slides
praypatel2
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
2toLead Limited
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
Paola De la Torre
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Principled Technologies
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
OnBoard
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
Scott Keck-Warren
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Results
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Delhi Call girls
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
Sujit Pal
Recently uploaded
(20)
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Slack Application Development 101 Slides
Slack Application Development 101 Slides
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
Lessons Learned Running a Container Cloud on Apache Hadoop YARN
1.
1 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Lessons Learned Running a Container Cloud on Apache Hadoop YARN Billie Rinaldi Software Engineering YARN R&D - Hortonworks Hadoop, YARN, HDFS, Ambari, Ranger, Atlas, and Apache are trademarks of the Apache Software Foundation
2.
2 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Introduction Building Blocks for the Container Cloud Lessons Learned Q&A Overview:
3.
3 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Disclaimer • Some features discussed today are still experimental and/or under current development. • Security is a work in progress and a security assessment should be performed before implementing these features. • Many features are released in Apache Hadoop 3.1.0 in April 2018, but some will not come out until 3.2.0 or 3.1.1.
4.
4 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Introduction
5.
5 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved We Build, Test, and Release Open Source Software • The rapid pace of open source results in • dozens of releases a year • tens of thousands of tests per release. • Many permutations of tests • Over a dozen supported Linux operating systems. • Multiple backend databases. • Nearly thirty open source products in our stack. • Multiple supported versions of HDP per release.
6.
6 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Addressing the Challenges • How is the industry addressing the these challenges? What are customers asking for? • How can we reduce overhead to achieve greater density and improve hardware utilization? • How can we improve the speed at which tests run? • How can we reuse packaging and automation?
7.
7 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Solution: Container Cloud • Containers eliminate a bulk of the virtualization overhead, improving density per node. • Containers help reduce image variance through composition and simplified packaging. • Container startup time is fast, no real boot sequence. • Containers naturally fit into YARN and its container model. • Allow us to “use what we ship and ship what we use.”
8.
8 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Container Cloud Architecture Shared Services Resource Management (YARN) Management and Monitoring (Ambari) Jenkins Worker (Docker) Testing HDP and HDF releases in container clusters HDP (Docker) Worker (Docker) Storage (HDFS) Service Discovery and REST API (YARN Services) Security and Governance (Ranger and Atlas) SubmitTest LaunchTest Worker (Docker) HDP (Docker) HDP (Docker) HDP (Docker)
9.
9 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Two years later … 5.8+ million containers and many lessons learned.
10.
10 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Building Blocks for the Container Cloud
11.
11 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Building Blocks for the Container Cloud • YARN Docker Support – Enables additional container types to make it easier to onboard new applications and services on YARN. • YARN Services Framework – Provides AM implementation and NM improvements that enable long running services on YARN. • YARN Service Discovery – Allows services running on YARN to discover one another.
12.
12 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Adding Docker on YARN • Why Docker? • Provides a lightweight mechanism for packaging, distributing, and isolating processes. • Currently the most popular containerization framework. • Allows YARN developers to focus on integration instead of container primitives. • Mostly fits into the YARN Container model. • Buzzword compliant.
13.
13 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved YARN Containers • What is a YARN container? • Process. • Local Resources (scripts, jars, security tokens). • Resource Requirements (CPU, Memory, I/O). • AM requests containers. • RM allocates containers on NMs. • NM runs containers. • Container Executor encapsulates platform- specific logic needed to start a YARN container. https://www.pepperfry.com/tupperware-mini-rectangular-white-container-850ml-1109991.html
14.
14 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved New Abstraction: Container Runtimes • Challenge: Container Executor approach is cluster wide; need more flexibility while still being able to leverage existing Container Executor features. • Solution: Runtimes added to LinuxContainerExecutor in YARN-3611 – initially released in Apache Hadoop 2.8.0, while improvements are ongoing). DefaultLinuxContainerRuntime DockerLinuxContainerRuntime Existing Linux process- based execution. Using Docker to run and monitor a container.
15.
15 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Distributed Shell and MR on Docker Examples Environment variables are currently used to set the Container Runtime options.* *WARNING: might change. https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTaGXMdZCdR6_RUC235TdafDqURxk-KJIptwALUmg5ZmCb3YBW7 > yarn jar $YARN_EXAMPLES_JAR pi -Dmapreduce.map.env="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos:7" -Dmapreduce.reduce.env="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos:7" 1 40000 > yarn jar $DSHELL_JAR -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos:7 -shell_command "sleep 120” -jar $DSHELL_JAR -num_containers 1
16.
16 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Recent Improvements: Container Lifecycle • Improvements to stopping and cleaning up of containers (YARN-5366) • Improvements to handling short lived containers (YARN-5366, YARN-7914) • Container relaunch improvements to reuse existing container (YARN-7973) • Data in the container’s root filesystem and workdir can be recovered on the same node • Support for sending specific signals to the container’s root process (YARN- 5366) • Delayed deletion for debugging (YARN-5366)
17.
17 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Recent Improvements: Container Security • ACLs for privileged containers, with the ability to disable privileged containers system wide (YARN-6623) • Sudo / group check for running privileged containers (YARN-7221) • Default untrusted mode for running unmodified images out of the box (YARN-7516) • Username to UID/GID mapping to ensure privacy (YARN-4266) • User supplied bind mounts validated against an admin supplied whitelist (YARN-5534) • More restrictive YARN mounts to limit host exposure (YARN-7815)
18.
18 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Building Blocks for the Container Cloud • YARN Docker Support – Enables additional container types to make it easier to onboard new applications and services on YARN. • YARN Services Framework – Provides AM implementation and NM improvements that enable long running services on YARN. • YARN Service Discovery – Allows services running on YARN to discover one another.
19.
19 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved YARN Services Goals • Long Running – Simplify the deployment and management of long running applications on YARN. • Easily Bring New Applications – Remove tedious process of bringing new applications to YARN. • Easy to Manage Applications – REST API and Command Line tools. • Declarative Configuration – Provide configuration to the applications, declare resource needs, specify placement policies.
20.
20 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved YARN Services Overview • Apache Slider – incubating at Apache since 2014, designed to make it easier to run long-running applications on YARN. • Simplified and first-class support for services in YARN (YARN-4692) – initiated in 2016. • Container orchestrator to provision docker-based or native-process based containers (YARN-5079), integrates Slider core into YARN. • REST API for managing services on YARN (YARN-4793). • Simplified discovery of services via DNS mechanisms (YARN-4757). • Released in Apache Hadoop 3.1.0!
21.
21 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved YARN Services Docker Httpd Example { "name": "simple-httpd-service", "version": "1.0.0", "lifetime": "3600", "components": [ { "name": "httpd", "number_of_containers": 2, "launch_command": "/usr/bin/run-httpd", "artifact": { "id": "centos/httpd-24-centos7:latest", "type": "DOCKER" }, "resource": { "cpus": 1, "memory": "1024" }, ... > yarn app –launch simple-httpd-service simple-httpd-service.json
22.
22 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved YARN Services Docker Httpd Example continued "readiness_check": { "type": "HTTP", "properties": { "url": "http://${THIS_HOST}:8080" } }, "configuration": { "files": [ { "type": "TEMPLATE", "dest_file": "/var/www/html/index.html", "properties": { "content": "<html><body>Hello from ${COMPONENT_INSTANCE_NAME}!</body></html>" } } ] }
23.
23 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Service assembly { "name": "httpd-proxy-service", "version": "1.0.0", "components": [ { "artifact": { "id": "simple-httpd-service", "type": "SERVICE" } }, { "name": "httpd-proxy", "number_of_containers": 1, "dependencies": [ "httpd" ], "artifact": { "id": "centos/httpd-24-centos7", "type": "DOCKER" }, ... > yarn app –save simple-httpd-service simple-httpd-service.json > yarn app –launch httpd-proxy-service httpd-proxy-service.json
24.
24 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Other features in progress • Container upgrade – Localize new resource while container is running (YARN-1503) portions in 2.9.0 – Restart container with new resources using same container allocation (YARN-4726) – Support in service AM and service REST API (YARN-7512) slated for release 3.2.0 • Placement policy support in YARN services (YARN-7142) • User supplied Docker client configs in YARN services (YARN-7996) • Entrypoint support (YARN-7654) • And many more!
25.
25 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Building Blocks for the Container Cloud • YARN Docker Support – Enables additional container types to make it easier to onboard new applications and services on YARN. • YARN Services Framework – Provides AM implementation and NM improvements that enable long running services on YARN. • YARN Service Discovery – Allows services running on YARN to discover one another.
26.
26 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved YARN Service Registry • The YARN Service Registry allows deployed applications to register themselves to allow discovery by other applications (YARN-913). • Entries are stored in Zookeeper as the default k/v store, providing HA and consistency. • Native Java clients, REST and CLI interfaces exist for access the YARN Service Registry. http://api-university.com/blog/let-developers-try-your-apis-without-registration/
27.
27 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Simplified Discovery via DNS Challenge - Native Java clients and REST interfaces are not ideal discovery mechanisms for existing applications. Solution - Exposing YARN Service Registry entries via a more generic and widely used discovery mechanism: DNS. The YARN Registry DNS server (YARN-4757) meets these needs. • Watches the YARN Service Registry for new application and container registration/deregistration. • Creates the appropriate DNS records for the container componentInstanceName.serviceName.user.domain ctr-e138-1518143905142-215498-01-000007.domain • Supports zone transfers or zone forwarding. http://www.idownloadblog.com/2016/03/05/how-to-use-custom-dns-settings/
28.
28 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Lessons Learned
29.
29 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Success! 5.8+ million containers, 1.1+ million testsHuge uptick in adoption
30.
30 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Successes continued • First full HDP release tested and certified end to end on the container cloud! • All supported operating systems (CentOS 6/7, SLES, Ubuntu 14/16, Debian) running in containers on CentOS 7.3 hosts! Density per node improved by 2.5x! 14 35Virtual Machines Containers
31.
31 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved …but not without some pain “No gains without pains.” - Benjamin Franklin, Poor Richard's Almanack
32.
32 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved IP Management Challenge: YARN does not manage IP addresses directly. What we did: • Allocate a pool of IP addresses to the cluster on the same VLAN. • Use Docker’s bridge networking with fixed_cidr option. • Each node in the cluster is allocated 64 IP addresses from the pool. • Use docker inspect to get the container IP address and add it to the YARN Service Registry • YARN Registry DNS Server registers DNS records for easy lookup.
33.
33 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Docker Storage Driver and Filesystem Challenge: Many Docker Storage drivers, lots of limitations. What is the best option for us? What we did: • Extensive testing of create, stop, and delete operation timings. • Eliminate options that require significant modifications to the Linux OS to ease adoption in enterprises. • If possible, use a driver deemed production ready by maintainers. • Ultimately, we landed on DeviceMapper LVM thinpool with ext4 backing filesystem ... or so we thought.
34.
34 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved DeviceMapper kernel oops and performance Challenge: Heavy writes to the container’s root filesystem causing kernel panics, uninterruptible processes, and high IO wait. What we did: • DeviceMapper is the only viable option due to our workloads, can’t switch to a different storage driver. • Install SSDs and configure Docker’s graph storage to use it to eliminate high IO wait. • Test various RAID controller firmware, Linux kernel, and backing filesystems to find a stable combination that doesn’t result in panics and Docker hangs (most recently testing upgrade to CentOS 7.4 with 4.15 kernel, so we could look at overlay2 now).
35.
35 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved User Namespacing Challenge: YARN provides features for localized resources and log aggregation. Running containers as the submitting user presents challenges. What we did: • Run all containers as the nobody user so that user and application directories configured by YARN are available to the container. • Update images so that nobody UID/GID match in image and host. • Allow for “vanilla containers” that do not bind mount the YARN directories, allowing the process in the container to run as any user, but logs can no longer be aggregated (Docker run override disabled). • User namespacing in Docker is lacking, as it can only remap a single user. As this restriction is removed, we expect to migrate to this feature. Recent improvements allow setting UID:GID pair, but further support is needed.
36.
36 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Image Management Challenge: The implicit pull of a large image can lead to task timeouts. SSD space is a premium making image clean up important. What we did: • Run an internal private registry to reduce WAN load. • Jenkins job that builds and distributes the image to all nodes in the cluster, avoiding the implicit pull from “docker run” • Jenkins job to clean up images that are no longer needed. Full thinpool can also cause kernel panics – aggressively age off images. • Reuse base images where possible to reduce bandwidth. • Work being discussed to provide first-class YARN support for image management (YARN-3854).
37.
37 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Summary
38.
38 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Summary • Massive density improvements. • Greatly improved ease of use. • Many real world lessons learned. • Widespread internal adoption. • Improved self service capabilities. • Internal use of long running services a reality!
39.
39 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Get Involved • Still plenty of work to go! • Improve docker image management and user handling • Networking plugins. • Security/permissions models. • Bring Your Own Image challenges. • Follow Along. • Try out Apache Hadoop 3.1.0 or checkout trunk and build it! Ansible/vagrant setups available.
40.
40 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Questions? billie@hortonworks.com skumpf@hortonworks.com
Download now