Submit Search
Upload
Lessons Learned Running a Container Cloud on Apache Hadoop YARN
•
Download as PPTX, PDF
•
1 like
•
584 views
B
Billie Rinaldi
Follow
Talk given at DataWorks Summit Berlin 2018 on April 18, 2018.
Read less
Read more
Technology
Report
Share
Report
Share
1 of 40
Download now
Recommended
Running a container cloud on YARN
Running a container cloud on YARN
DataWorks Summit
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Yahoo Developer Network
A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3
DataWorks Summit
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
DataWorks Summit
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4
DataWorks Summit
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
DataWorks Summit
Recommended
Running a container cloud on YARN
Running a container cloud on YARN
DataWorks Summit
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Yahoo Developer Network
A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3
DataWorks Summit
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
DataWorks Summit
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4
DataWorks Summit
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
DataWorks Summit
Webinar - Introduction to Ceph and OpenStack
Webinar - Introduction to Ceph and OpenStack
Ceph Community
Cloud orchestration major tools comparision
Cloud orchestration major tools comparision
Ravi Kiran
Apache Slider
Apache Slider
Shivaji Dutta
Containers and Big Data
Containers and Big Data
DataWorks Summit
Apache Knox - Hadoop Security Swiss Army Knife
Apache Knox - Hadoop Security Swiss Army Knife
DataWorks Summit
Writing app framworks for hadoop on yarn
Writing app framworks for hadoop on yarn
DataWorks Summit
Accelerating query processing
Accelerating query processing
DataWorks Summit
Handling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow Story
DataWorks Summit
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
Heiko Loewe
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
Hortonworks
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop operations with ambari
Hortonworks
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Yahoo Developer Network
Geode on Docker
Geode on Docker
Apache Geode
Eclipse MicroProfile 과 Microservice Java framework – Helidon
Eclipse MicroProfile 과 Microservice Java framework – Helidon
Oracle Korea
OpenStack 101 update
OpenStack 101 update
Kamesh Pemmaraju
times ten in-memory database for extreme performance
times ten in-memory database for extreme performance
Oracle Korea
Oracle ExaLogic Overview
Oracle ExaLogic Overview
Peter Doolan
Exalogic workshop overview__hardwarev4
Exalogic workshop overview__hardwarev4
Fran Navarro
OpenStack - Infrastructure as a service
OpenStack - Infrastructure as a service
Denis Cavalcante
Exalogic Technical Overview
Exalogic Technical Overview
Andrey Akulov
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
DataWorks Summit
Containers and Big Data
Containers and Big Data
DataWorks Summit
More Related Content
What's hot
Webinar - Introduction to Ceph and OpenStack
Webinar - Introduction to Ceph and OpenStack
Ceph Community
Cloud orchestration major tools comparision
Cloud orchestration major tools comparision
Ravi Kiran
Apache Slider
Apache Slider
Shivaji Dutta
Containers and Big Data
Containers and Big Data
DataWorks Summit
Apache Knox - Hadoop Security Swiss Army Knife
Apache Knox - Hadoop Security Swiss Army Knife
DataWorks Summit
Writing app framworks for hadoop on yarn
Writing app framworks for hadoop on yarn
DataWorks Summit
Accelerating query processing
Accelerating query processing
DataWorks Summit
Handling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow Story
DataWorks Summit
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
Heiko Loewe
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
Hortonworks
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop operations with ambari
Hortonworks
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Yahoo Developer Network
Geode on Docker
Geode on Docker
Apache Geode
Eclipse MicroProfile 과 Microservice Java framework – Helidon
Eclipse MicroProfile 과 Microservice Java framework – Helidon
Oracle Korea
OpenStack 101 update
OpenStack 101 update
Kamesh Pemmaraju
times ten in-memory database for extreme performance
times ten in-memory database for extreme performance
Oracle Korea
Oracle ExaLogic Overview
Oracle ExaLogic Overview
Peter Doolan
Exalogic workshop overview__hardwarev4
Exalogic workshop overview__hardwarev4
Fran Navarro
OpenStack - Infrastructure as a service
OpenStack - Infrastructure as a service
Denis Cavalcante
Exalogic Technical Overview
Exalogic Technical Overview
Andrey Akulov
What's hot
(20)
Webinar - Introduction to Ceph and OpenStack
Webinar - Introduction to Ceph and OpenStack
Cloud orchestration major tools comparision
Cloud orchestration major tools comparision
Apache Slider
Apache Slider
Containers and Big Data
Containers and Big Data
Apache Knox - Hadoop Security Swiss Army Knife
Apache Knox - Hadoop Security Swiss Army Knife
Writing app framworks for hadoop on yarn
Writing app framworks for hadoop on yarn
Accelerating query processing
Accelerating query processing
Handling Kernel Upgrades at Scale - The Dirty Cow Story
Handling Kernel Upgrades at Scale - The Dirty Cow Story
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop operations with ambari
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Geode on Docker
Geode on Docker
Eclipse MicroProfile 과 Microservice Java framework – Helidon
Eclipse MicroProfile 과 Microservice Java framework – Helidon
OpenStack 101 update
OpenStack 101 update
times ten in-memory database for extreme performance
times ten in-memory database for extreme performance
Oracle ExaLogic Overview
Oracle ExaLogic Overview
Exalogic workshop overview__hardwarev4
Exalogic workshop overview__hardwarev4
OpenStack - Infrastructure as a service
OpenStack - Infrastructure as a service
Exalogic Technical Overview
Exalogic Technical Overview
Similar to Lessons Learned Running a Container Cloud on Apache Hadoop YARN
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
DataWorks Summit
Containers and Big Data
Containers and Big Data
DataWorks Summit
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo
DataWorks Summit
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
DataWorks Summit
컨테이너 기술 소개 - Warden, Garden, Docker
컨테이너 기술 소개 - Warden, Garden, Docker
seungdon Choi
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
DataWorks Summit
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
Sunil Govindan
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
Yahoo Developer Network
Get most out of Spark on YARN
Get most out of Spark on YARN
DataWorks Summit
The Kubernetes WebLogic revival (part 1)
The Kubernetes WebLogic revival (part 1)
Simon Haslam
20191201 kubernetes managed weblogic revival - part 1
20191201 kubernetes managed weblogic revival - part 1
makker_nl
Grow Your Business with Oracle Linux, Virtualization- BL v6.pdf
Grow Your Business with Oracle Linux, Virtualization- BL v6.pdf
syamsulsakbar
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Wangda Tan
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
Saving the elephant—now, not later
Saving the elephant—now, not later
DataWorks Summit
YARN and the Docker container runtime
YARN and the Docker container runtime
DataWorks Summit/Hadoop Summit
Oracle - Continuous Delivery NYC meetup, June 07, 2018
Oracle - Continuous Delivery NYC meetup, June 07, 2018
Oracle Developers
Docker handons-workshop-for-charity
Docker handons-workshop-for-charity
Yusuf Hadiwinata Sutandar
DevOps Supercharged with Docker on Exadata
DevOps Supercharged with Docker on Exadata
MarketingArrowECS_CZ
Similar to Lessons Learned Running a Container Cloud on Apache Hadoop YARN
(20)
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
Containers and Big Data
Containers and Big Data
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
컨테이너 기술 소개 - Warden, Garden, Docker
컨테이너 기술 소개 - Warden, Garden, Docker
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
Get most out of Spark on YARN
Get most out of Spark on YARN
The Kubernetes WebLogic revival (part 1)
The Kubernetes WebLogic revival (part 1)
20191201 kubernetes managed weblogic revival - part 1
20191201 kubernetes managed weblogic revival - part 1
Grow Your Business with Oracle Linux, Virtualization- BL v6.pdf
Grow Your Business with Oracle Linux, Virtualization- BL v6.pdf
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
Saving the elephant—now, not later
Saving the elephant—now, not later
YARN and the Docker container runtime
YARN and the Docker container runtime
Oracle - Continuous Delivery NYC meetup, June 07, 2018
Oracle - Continuous Delivery NYC meetup, June 07, 2018
Docker handons-workshop-for-charity
Docker handons-workshop-for-charity
DevOps Supercharged with Docker on Exadata
DevOps Supercharged with Docker on Exadata
Recently uploaded
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Scott Keck-Warren
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
Fwdays
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Dubai Multi Commodity Centre
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
2toLead Limited
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Ridwan Fadjar
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
Alfredo García Lavilla
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Pixlogix Infotech
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Fwdays
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
gvaughan
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Alex Barbosa Coqueiro
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
costume and set research powerpoint presentation
costume and set research powerpoint presentation
phoebematthew05
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
Lorenzo Miniero
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
carlostorres15106
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
hariprasad279825
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
shyamraj55
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Patryk Bandurski
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Fwdays
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
ScyllaDB
Recently uploaded
(20)
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
costume and set research powerpoint presentation
costume and set research powerpoint presentation
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
Lessons Learned Running a Container Cloud on Apache Hadoop YARN
1.
1 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Lessons Learned Running a Container Cloud on Apache Hadoop YARN Billie Rinaldi Software Engineering YARN R&D - Hortonworks Hadoop, YARN, HDFS, Ambari, Ranger, Atlas, and Apache are trademarks of the Apache Software Foundation
2.
2 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Introduction Building Blocks for the Container Cloud Lessons Learned Q&A Overview:
3.
3 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Disclaimer • Some features discussed today are still experimental and/or under current development. • Security is a work in progress and a security assessment should be performed before implementing these features. • Many features are released in Apache Hadoop 3.1.0 in April 2018, but some will not come out until 3.2.0 or 3.1.1.
4.
4 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Introduction
5.
5 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved We Build, Test, and Release Open Source Software • The rapid pace of open source results in • dozens of releases a year • tens of thousands of tests per release. • Many permutations of tests • Over a dozen supported Linux operating systems. • Multiple backend databases. • Nearly thirty open source products in our stack. • Multiple supported versions of HDP per release.
6.
6 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Addressing the Challenges • How is the industry addressing the these challenges? What are customers asking for? • How can we reduce overhead to achieve greater density and improve hardware utilization? • How can we improve the speed at which tests run? • How can we reuse packaging and automation?
7.
7 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Solution: Container Cloud • Containers eliminate a bulk of the virtualization overhead, improving density per node. • Containers help reduce image variance through composition and simplified packaging. • Container startup time is fast, no real boot sequence. • Containers naturally fit into YARN and its container model. • Allow us to “use what we ship and ship what we use.”
8.
8 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Container Cloud Architecture Shared Services Resource Management (YARN) Management and Monitoring (Ambari) Jenkins Worker (Docker) Testing HDP and HDF releases in container clusters HDP (Docker) Worker (Docker) Storage (HDFS) Service Discovery and REST API (YARN Services) Security and Governance (Ranger and Atlas) SubmitTest LaunchTest Worker (Docker) HDP (Docker) HDP (Docker) HDP (Docker)
9.
9 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Two years later … 5.8+ million containers and many lessons learned.
10.
10 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Building Blocks for the Container Cloud
11.
11 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Building Blocks for the Container Cloud • YARN Docker Support – Enables additional container types to make it easier to onboard new applications and services on YARN. • YARN Services Framework – Provides AM implementation and NM improvements that enable long running services on YARN. • YARN Service Discovery – Allows services running on YARN to discover one another.
12.
12 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Adding Docker on YARN • Why Docker? • Provides a lightweight mechanism for packaging, distributing, and isolating processes. • Currently the most popular containerization framework. • Allows YARN developers to focus on integration instead of container primitives. • Mostly fits into the YARN Container model. • Buzzword compliant.
13.
13 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved YARN Containers • What is a YARN container? • Process. • Local Resources (scripts, jars, security tokens). • Resource Requirements (CPU, Memory, I/O). • AM requests containers. • RM allocates containers on NMs. • NM runs containers. • Container Executor encapsulates platform- specific logic needed to start a YARN container. https://www.pepperfry.com/tupperware-mini-rectangular-white-container-850ml-1109991.html
14.
14 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved New Abstraction: Container Runtimes • Challenge: Container Executor approach is cluster wide; need more flexibility while still being able to leverage existing Container Executor features. • Solution: Runtimes added to LinuxContainerExecutor in YARN-3611 – initially released in Apache Hadoop 2.8.0, while improvements are ongoing). DefaultLinuxContainerRuntime DockerLinuxContainerRuntime Existing Linux process- based execution. Using Docker to run and monitor a container.
15.
15 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Distributed Shell and MR on Docker Examples Environment variables are currently used to set the Container Runtime options.* *WARNING: might change. https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTaGXMdZCdR6_RUC235TdafDqURxk-KJIptwALUmg5ZmCb3YBW7 > yarn jar $YARN_EXAMPLES_JAR pi -Dmapreduce.map.env="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos:7" -Dmapreduce.reduce.env="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos:7" 1 40000 > yarn jar $DSHELL_JAR -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos:7 -shell_command "sleep 120” -jar $DSHELL_JAR -num_containers 1
16.
16 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Recent Improvements: Container Lifecycle • Improvements to stopping and cleaning up of containers (YARN-5366) • Improvements to handling short lived containers (YARN-5366, YARN-7914) • Container relaunch improvements to reuse existing container (YARN-7973) • Data in the container’s root filesystem and workdir can be recovered on the same node • Support for sending specific signals to the container’s root process (YARN- 5366) • Delayed deletion for debugging (YARN-5366)
17.
17 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Recent Improvements: Container Security • ACLs for privileged containers, with the ability to disable privileged containers system wide (YARN-6623) • Sudo / group check for running privileged containers (YARN-7221) • Default untrusted mode for running unmodified images out of the box (YARN-7516) • Username to UID/GID mapping to ensure privacy (YARN-4266) • User supplied bind mounts validated against an admin supplied whitelist (YARN-5534) • More restrictive YARN mounts to limit host exposure (YARN-7815)
18.
18 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Building Blocks for the Container Cloud • YARN Docker Support – Enables additional container types to make it easier to onboard new applications and services on YARN. • YARN Services Framework – Provides AM implementation and NM improvements that enable long running services on YARN. • YARN Service Discovery – Allows services running on YARN to discover one another.
19.
19 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved YARN Services Goals • Long Running – Simplify the deployment and management of long running applications on YARN. • Easily Bring New Applications – Remove tedious process of bringing new applications to YARN. • Easy to Manage Applications – REST API and Command Line tools. • Declarative Configuration – Provide configuration to the applications, declare resource needs, specify placement policies.
20.
20 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved YARN Services Overview • Apache Slider – incubating at Apache since 2014, designed to make it easier to run long-running applications on YARN. • Simplified and first-class support for services in YARN (YARN-4692) – initiated in 2016. • Container orchestrator to provision docker-based or native-process based containers (YARN-5079), integrates Slider core into YARN. • REST API for managing services on YARN (YARN-4793). • Simplified discovery of services via DNS mechanisms (YARN-4757). • Released in Apache Hadoop 3.1.0!
21.
21 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved YARN Services Docker Httpd Example { "name": "simple-httpd-service", "version": "1.0.0", "lifetime": "3600", "components": [ { "name": "httpd", "number_of_containers": 2, "launch_command": "/usr/bin/run-httpd", "artifact": { "id": "centos/httpd-24-centos7:latest", "type": "DOCKER" }, "resource": { "cpus": 1, "memory": "1024" }, ... > yarn app –launch simple-httpd-service simple-httpd-service.json
22.
22 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved YARN Services Docker Httpd Example continued "readiness_check": { "type": "HTTP", "properties": { "url": "http://${THIS_HOST}:8080" } }, "configuration": { "files": [ { "type": "TEMPLATE", "dest_file": "/var/www/html/index.html", "properties": { "content": "<html><body>Hello from ${COMPONENT_INSTANCE_NAME}!</body></html>" } } ] }
23.
23 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Service assembly { "name": "httpd-proxy-service", "version": "1.0.0", "components": [ { "artifact": { "id": "simple-httpd-service", "type": "SERVICE" } }, { "name": "httpd-proxy", "number_of_containers": 1, "dependencies": [ "httpd" ], "artifact": { "id": "centos/httpd-24-centos7", "type": "DOCKER" }, ... > yarn app –save simple-httpd-service simple-httpd-service.json > yarn app –launch httpd-proxy-service httpd-proxy-service.json
24.
24 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Other features in progress • Container upgrade – Localize new resource while container is running (YARN-1503) portions in 2.9.0 – Restart container with new resources using same container allocation (YARN-4726) – Support in service AM and service REST API (YARN-7512) slated for release 3.2.0 • Placement policy support in YARN services (YARN-7142) • User supplied Docker client configs in YARN services (YARN-7996) • Entrypoint support (YARN-7654) • And many more!
25.
25 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Building Blocks for the Container Cloud • YARN Docker Support – Enables additional container types to make it easier to onboard new applications and services on YARN. • YARN Services Framework – Provides AM implementation and NM improvements that enable long running services on YARN. • YARN Service Discovery – Allows services running on YARN to discover one another.
26.
26 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved YARN Service Registry • The YARN Service Registry allows deployed applications to register themselves to allow discovery by other applications (YARN-913). • Entries are stored in Zookeeper as the default k/v store, providing HA and consistency. • Native Java clients, REST and CLI interfaces exist for access the YARN Service Registry. http://api-university.com/blog/let-developers-try-your-apis-without-registration/
27.
27 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Simplified Discovery via DNS Challenge - Native Java clients and REST interfaces are not ideal discovery mechanisms for existing applications. Solution - Exposing YARN Service Registry entries via a more generic and widely used discovery mechanism: DNS. The YARN Registry DNS server (YARN-4757) meets these needs. • Watches the YARN Service Registry for new application and container registration/deregistration. • Creates the appropriate DNS records for the container componentInstanceName.serviceName.user.domain ctr-e138-1518143905142-215498-01-000007.domain • Supports zone transfers or zone forwarding. http://www.idownloadblog.com/2016/03/05/how-to-use-custom-dns-settings/
28.
28 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Lessons Learned
29.
29 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Success! 5.8+ million containers, 1.1+ million testsHuge uptick in adoption
30.
30 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Successes continued • First full HDP release tested and certified end to end on the container cloud! • All supported operating systems (CentOS 6/7, SLES, Ubuntu 14/16, Debian) running in containers on CentOS 7.3 hosts! Density per node improved by 2.5x! 14 35Virtual Machines Containers
31.
31 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved …but not without some pain “No gains without pains.” - Benjamin Franklin, Poor Richard's Almanack
32.
32 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved IP Management Challenge: YARN does not manage IP addresses directly. What we did: • Allocate a pool of IP addresses to the cluster on the same VLAN. • Use Docker’s bridge networking with fixed_cidr option. • Each node in the cluster is allocated 64 IP addresses from the pool. • Use docker inspect to get the container IP address and add it to the YARN Service Registry • YARN Registry DNS Server registers DNS records for easy lookup.
33.
33 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Docker Storage Driver and Filesystem Challenge: Many Docker Storage drivers, lots of limitations. What is the best option for us? What we did: • Extensive testing of create, stop, and delete operation timings. • Eliminate options that require significant modifications to the Linux OS to ease adoption in enterprises. • If possible, use a driver deemed production ready by maintainers. • Ultimately, we landed on DeviceMapper LVM thinpool with ext4 backing filesystem ... or so we thought.
34.
34 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved DeviceMapper kernel oops and performance Challenge: Heavy writes to the container’s root filesystem causing kernel panics, uninterruptible processes, and high IO wait. What we did: • DeviceMapper is the only viable option due to our workloads, can’t switch to a different storage driver. • Install SSDs and configure Docker’s graph storage to use it to eliminate high IO wait. • Test various RAID controller firmware, Linux kernel, and backing filesystems to find a stable combination that doesn’t result in panics and Docker hangs (most recently testing upgrade to CentOS 7.4 with 4.15 kernel, so we could look at overlay2 now).
35.
35 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved User Namespacing Challenge: YARN provides features for localized resources and log aggregation. Running containers as the submitting user presents challenges. What we did: • Run all containers as the nobody user so that user and application directories configured by YARN are available to the container. • Update images so that nobody UID/GID match in image and host. • Allow for “vanilla containers” that do not bind mount the YARN directories, allowing the process in the container to run as any user, but logs can no longer be aggregated (Docker run override disabled). • User namespacing in Docker is lacking, as it can only remap a single user. As this restriction is removed, we expect to migrate to this feature. Recent improvements allow setting UID:GID pair, but further support is needed.
36.
36 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Image Management Challenge: The implicit pull of a large image can lead to task timeouts. SSD space is a premium making image clean up important. What we did: • Run an internal private registry to reduce WAN load. • Jenkins job that builds and distributes the image to all nodes in the cluster, avoiding the implicit pull from “docker run” • Jenkins job to clean up images that are no longer needed. Full thinpool can also cause kernel panics – aggressively age off images. • Reuse base images where possible to reduce bandwidth. • Work being discussed to provide first-class YARN support for image management (YARN-3854).
37.
37 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Summary
38.
38 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Summary • Massive density improvements. • Greatly improved ease of use. • Many real world lessons learned. • Widespread internal adoption. • Improved self service capabilities. • Internal use of long running services a reality!
39.
39 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Get Involved • Still plenty of work to go! • Improve docker image management and user handling • Networking plugins. • Security/permissions models. • Bring Your Own Image challenges. • Follow Along. • Try out Apache Hadoop 3.1.0 or checkout trunk and build it! Ansible/vagrant setups available.
40.
40 © Hortonworks
Inc. 2011 – 2018. All Rights Reserved Questions? billie@hortonworks.com skumpf@hortonworks.com
Download now