Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Running Services on YARN

1,260 views

Published on

Apache Hadoop YARN is a modern resource-management platform that can host multiple data processing engines for various workloads like batch processing (MapReduce), interactive SQL (Hive, Tez), real-time processing (Storm), existing services and a wide variety of custom applications. These applications can all co-exist on YARN and share a single data center in a cost-effective manner with the platform worrying about resource management, isolation and multi-tenancy.

YARN is now adding support for services in a first class manner. This talk will first cover the challenges of running services on YARN, and then move on to the changes that were made to the ResourceManager to support scheduling services on YARN(such as affinity and anti-affinity). The talk will then move on to cover the changes made in the NodeManager and features such as container restart and container upgrades. The talk will also cover new additions to YARN like the new application manager (that will allow users to bring services workloads onto YARN by providing features such as container orchestration and management) and the DNS server that uses the YARN registry to enable service discovery.

Published in: Technology
  • Be the first to comment

Running Services on YARN

  1. 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Running Services on YARN Munich, April 2017 Varun Vasudev
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved About myself ⬢ Apache Hadoop contributor since 2014 ⬢ Apache Hadoop committer and PMC member ⬢ Currently working for Hortonworks ⬢ vvasudev@apache.org
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Introduction to Apache Hadoop YARN ⬢ Architectural center of big data workloads ⬢ Enterprise adoption –Secure mode is popular –Multi-tenant support ⬢ SLAs –Tolerance for slow running jobs decreasing –Consistent performance desired ⬢ Diverse workloads increasing –LLAP on Slider
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Introduction to Apache Hadoop YARN YARN: Data Operating System (Cluster Resource Management) 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Script Pig SQL Hive TezTez Java Scala Cascading Tez ° ° ° ° ° ° ° ° ° ° ° ° ° ° Others ISV Engines HDFS (Hadoop Distributed File System) Stream Storm Search Solr NoSQL HBase Accumulo Slider Slider BATCH, INTERACTIVE & REAL-TIME DATA ACCESS In-Memory Spark YARN The Architectural Center of Hadoop • Common data platform, many applications • Support multi-tenant access & processing • Batch, interactive & real-time use cases
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Several important trends in age of Hadoop 3.0 + YARN and Other Platform Services Storage Resource Management Security Service Discovery Management Monitoring Alerts IOT Assembly Kafka Storm HBase Solr Governance MR Tez Spark … Innovating frameworks: Flink, DL(TensorFlow), etc. Various Environments On Premise Private Cloud Public Cloud
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Services workloads becoming more popular ⬢ Users using more and more long running services like LLAP, HiveServer, HBase, etc ⬢ Service workloads are gaining more importance –Need a webserver to serve results from a MR job –New YARN UI can be run in its own container –ATSv2 would like to launch ATS reader containers as well –Applications want to run their own shuffle service
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Node 1 NodeManager128G, 16 vcores Launch Applicaton 1 AMAM process Launch AM process via ContainerExecutor – DCE, LCE, WSCE. Monitor/isolate memory and cpu Application Lifecycle ResourceManager (active) Request containers Allocate containers Container 1 process Container 2 process Launch containers on node using DCE, LCE, WSCE. Monitor/isolate memory and cpu History Server(ATS – leveldb, JHS - HDFS) HDFS Log aggregation
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Application Lifecycle ⬢ Designed for batch jobs –Jobs run for hours, days –Jobs are using frameworks(like MR, Tez, Spark) which are aware of YARN –Container failure is bad but frameworks have logic to handle it •Local container state loss is handled –Jobs are chained/pipelined using application ids –Debugging is an offline event ⬢ Doesn’t carry over cleanly for services –Services run for longer periods of time –Services may or may not be aware of YARN –Container loss is a bigger problem, can have really bad consequences –Services would like to discover other services –Debugging is an online event
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Enabling Services on YARN
  10. 10. 1 0 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Enabling Services on YARN ⬢ AM to manage services ⬢ Service discovery ⬢ Container lifecycle ⬢ Scheduler changes ⬢ YARN UI ⬢ Application upgrades ⬢ Other issues –Log collection –Support for monitoring
  11. 11. 1 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved AM to manage services ⬢ Any service/job on YARN requires an AM –AM’s are hard to write –Different services will re-implement the same functionalities –AM has to keep up with changes in Apache Hadoop ⬢ Native YARN framework layer for services(YARN-5079) –Provides an AM that ships as part of Apache Hadoop that can be used to manage services –AM is from the Apache Slider project –AM provides REST APIs to manage applications –Has support for functionality such as port scheduling, flexing the number of containers –Maintained by the Apache Hadoop developers so it’s kept up to date with the rest of YARN –New YARN REST APIs to launch services
  12. 12. 1 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved YARN REST API to launch services { "name": "vvasudev-druid-2017-03-16", "resource": { "cpus": 16, "memory": "51200" }, "components" : [ { "name": "vvasudev-druid", "dependencies": [ ], "artifact": { "id": ”druid-image:0.1.0.0-25", "type": "DOCKER" }, "configuration": { "properties": { "env.CUSTOM_SERVICE_PROPERTIES": "true", "env.ZK_HOSTS": ”zkhost1:2181,zkhost2:2181,zkhost3:2181" } } } ], "number_of_containers": 5, "launch_command": "/opt/druid/start-druid.sh", "queue" : ”yarn-services” }
  13. 13. 1 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Service discovery ⬢ Long running services require a way to discover them –Application ids are constant for the lifetime of the application –Container ids are constant for the lifetime of the container but containers will come up and go down ⬢ Add support for discovery of long running services using DNS and the Registry Service(YARN-4757) –DNS is well understood –Registry service will have a record of the application to DNS name –YARN has a DNS server but currently this is for testing and experimentation only –YARN will need to add support for DNS updates to fit into existing DNS solutions
  14. 14. 1 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Service Discovery NodeManager NodeManager NodeManager ResourceManager DNS Server Registry Service ApplicationManager Zookeeper Zookeeper Zookeeper User
  15. 15. 1 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Container lifecycle ⬢ When the container exits, the NodeManager(NM) reclaims all the resources immediately –NM also cleans up any local state that the container maintained ⬢ AM may or may not be able to get a container back on the same node –NM has to download any private resources again for the container leading to delays in restarts ⬢ Added support for first class container re-tries(YARN-4725) –AM can specify retry policy when starting the container –On process exit, the NM will not clean up any state or resources –Instead it will attempt to retry the container –AM can specify limits on the number of retries as well as the delay between retries
  16. 16. 1 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Container Lifecycle NodeManager Container process Disk 1 Disk 2 Disk 3 HDFS Application Container Data
  17. 17. 1 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scheduler improvements ⬢ In case of services, affinity and and anti-affinity become important –Affinity and anti-affinity apply at a container and an application level – e.g. don’t schedule two HBase region servers on the same node but schedule the Spark containers on the same nodes as the region server ⬢ Support is being added for affinity and anti-affinity in the RM(YARN-5907) –Slider AM already has some basic support for container affinity and anti-affinity via re-tries –RM can do a better job of container placement if it has first class support –AMs can specify affinity and anti-affinity policies to get the right placement they need
  18. 18. 1 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scheduler improvements - Affinity and Anti-affinity ⬢ Anti-Affinity –Some services don’t want their daemons run on the same host/rack for better fault recovering or performance. –For example, don’t run >1 HBase region server on the same fault zone.
  19. 19. 1 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scheduler Improvements - Affinity and Anti-affinity ⬢ Affinity –Some services want to run their daemons on the same host/rack, etc. for performance. –For example, run Storm workers as close as possible for better data exchanging performance. (SW = Storm Worker)
  20. 20. 2 0 © Hortonworks Inc. 2011 – 2016. All Rights Reserved YARN UI(YARN-3368)
  21. 21. 2 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved YARN UI - Services
  22. 22. 2 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Application upgrades ⬢ YARN has no support for container or application upgrades –Container upgrade support support needs to be added in NM –Application upgrade support has to be added in the RM ⬢ Support added for container upgrade and rollback(YARN-4726) –Application upgrade support still to be carried out
  23. 23. 2 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Other issues ⬢ Log rotation –Log rotation used to run on application completion –Support has been added to fetch the logs for running containers ⬢ Support for container monitoring/health checks
  24. 24. 2 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved In Conclusion ⬢ Services workloads becoming more and more popular on YARN ⬢ Fundamental pieces to add support for services are in place but few additional pieces remain
  25. 25. 2 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved2 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank you!

×