Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps on YARN

1,105 views

Published on

Apache Hadoop YARN is a modern resource-management platform that handles resource scheduling, isolation and multi-tenancy for a variety of data processing engines that can co-exist and share a single data-center in a cost-effective manner.

In the first half of the talk, we are going to give a brief look into some of the big efforts cooking in the Apache Hadoop YARN community.

We will then dig deeper into one of the efforts - supporting Docker runtime in YARN. Docker is an application container engine that enables developers and sysadmins to build, deploy and run containerized applications. In this half, we'll discuss container runtimes in YARN, with a focus on using the DockerContainerRuntime to run various docker applications under YARN. Support for container runtimes (including the docker container runtime) was recently added to the Linux Container Executor (YARN-3611 and its sub-tasks). We’ll walk through various aspects of running docker containers under YARN - resource isolation, some security aspects (for example container capabilities, privileged containers, user namespaces) and other work in progress features like image localization and support for different networking modes.

Speakers:

Vinod Kumar Vavilapalli is the Hadoop YARN and MapReduce guy at Hortonworks. He is a long term Hadoop contributor at Apache, Hadoop committer and a member of the Apache Hadoop PMC. He has a Bachelors degree from Indian Institute of Technology Roorkee in Computer Science and Engineering. He has been working on Hadoop for nearly 9 years and he still has fun doing it. Straight out of college, he joined the Hadoop team at Yahoo! Bangalore, before Hortonworks happened. He is passionate about using computers to change the world for better, bit by bit.

Sidharta Seethana is a software engineer at Hortonworks. He works on the YARN team, focussing on bringing new kinds of workloads to YARN. Prior to joining Hortonworks, Sidharta spent 10 years at Yahoo! Inc., working on a variety of large scale distributed systems for core platforms/web services, search and marketplace properties, developer network and personalization.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps on YARN

  1. 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The latest of Apache Hadoop YARN & running your docker apps on YARN 52nd Bay Area Hadoop User Group (HUG) Meetup Sidharta Seethana Vinod Kumar Vavilapalli
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The latest of Apache Hadoop YARN • 2.7.3 / 2.6.4 / 2.8.0 .. • Timeline Service V2: What is happening in the cluster? • Federation: Scale to 100K nodes! • Over subscription: Get more of your hardware • Better SLAs, preemption enhancements • New web UI • First class services • Generalized scheduling strategies • Docker runtime!
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved YARN and the Docker Container Runtime
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Context LinuxContainerExecutor Why docker with YARN?
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Container Runtimes in LinuxContainerExecutor ⬢ Support for Container Runtimes in YARN was added as part of – YARN-3611 (umbrella). Multiple container types are supported in the same executor. ⬢ The current mechanism of handling container lifecycle is moved into its own runtime ⬢ A new docker container runtime is introduced that manages docker containers ⬢ LinuxContainerExecutor can delegate to either runtime on a per application basis ⬢ Clients specify which container type they want to use – currently via environment variables but eventually through well-defined client APIs. ⬢ We could support more container types in the future.
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DockerContainerRuntime in LinuxContainerExecutor ⬢ Exposes a subset of Docker container lifecycle functionality ⬢ Docker v1.10.x required for some of the work being planned (e.g. user namespaces) ⬢ A recent linux kernel is required (3.10+) for basic functionality. Some features (e.g. overlay fs) require an even more recent kernel.
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DockerContainerRuntime : Resource Isolation ⬢ Support added in YARN-4553 ⬢ LinuxContainerExecutor still manages resource isolation and enforcement . ⬢ Docker uses the cgroup specified by LCE ( --cgroup-parent introduced in docker v 1.6 , net_cls support added to libcontainer recently – support added in v1.9)
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DockerContainerRuntime : Linux Capabilities ⬢ Support added in YARN-4258 ⬢ Based on linux capabilities ⬢ Admin controlled – cluster administrator can control which capabilities docker containers have on the cluster (yarn.nodemanager.runtime.linux.docker.capabilities) ⬢ Default set is based on what docker uses by default.
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DockerContainerRuntime : Privileged Containers ⬢ Support added in YARN-4262 ⬢ Allows certain applications to run in docker containers – e.g. oracle ⬢ This could be a security hazard so access to this needs to be controlled : –Disabled by default (yarn.nodemanager.runtime.linux.docker.privileged-containers.allowed) –Admin controlled whitelist (yarn.nodemanager.runtime.linux.docker.privileged- containers.acl) –This whitelisted set of users allowed to launch privileged containers – but they must explicitly request for it (YARN_CONTAINER_RUNTIME_DOCKER_RUN_PRIVILEGED_CONTAINER)
  10. 10. 1 0 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DockerContainerRuntime : Users ⬢ Docker runs the container as the specified user (-u) ⬢ This user needs to be available in the image being used. ⬢ Depending on capabilities (CAP_SETUID), privileged escalation could occur. ⬢ Docker v1.10 added support for user namespaces – requires daemon re- configuration. Support for this in DockerContainerRuntime needs to be planned.
  11. 11. 1 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DockerContainerRuntime : Networking ⬢ Defaults to --net=host in the docker container runtime. –This is not secure – but this is the only way some applications can run. –We need to switch the default to bridged mode or an admin specified network plugin ⬢ Network plug-in support was added in v1.9 . –This should allow for more sophisticated networking scenarios –YARN doesn’t have to do anything except delegate to the specified network plugin when launching the container. –Support for this is a work in progress (YARN-4007)
  12. 12. 1 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DockerContainerRuntime : Images ⬢ Localization via HDFS? : We could localize images from HDFS and load them using ‘docker load’. ⬢ This approach has the advantage of using an existing HDFS instance for storage/distribution at scale. ⬢ However : –we lose some of the optimizations/functionality that using a full-fledged docker registry might provide. –We have to figure out security implications. What if users clobber each others images when ‘loading’ ?
  13. 13. 1 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demos Spark on Docker YARN on YARN
  14. 14. 1 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Q&A

×