Accumulo Summit 2014: Accumulo on YARN


Published on

Speaker: Billie Rinaldi

In their OSDI 2006 paper, Google describes that "Bigtable depends on a cluster management system for scheduling jobs, managing resources on shared machines, dealing with machine failures, and monitoring machine status." Until recently, no such system existed for Apache Accumulo to rely upon. Apache Hadoop 2 introduced the Yarn resource management system to the Hadoop ecosystem. This talk will describe the benefits Yarn can provide for Accumulo installations and how the Slider project (proposed for the Apache Incubator) makes it easier to deploy long-running applications on Yarn. It will describe the details of the Accumulo App Package for Slider and how to use Slider to deploy an Accumulo instance, as well as how instances can be actively managed by other applications such as Apache Ambari.

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Client talks to the Yarn ResourceManager, which gets a container from a NodeManager and starts the App Master (AM)
  • AM requests containers for the other specified roles of the cluster (e.g. tserver, monitor) and launches those processes
  • Accumulo Summit 2014: Accumulo on YARN

    1. 1. © Hortonworks Inc. 2014 Apache Accumulo on YARN with Apache Slider Billie Rinaldi Sr. Member of Technical Staff Hortonworks, Inc. June 12, 2014 Page 1 Apache, Accumulo, Slider, Ambari, Hadoop, Yarn, Apache Accumulo, Apache Slider, Apache Ambari, and the Accumulo logo are trademarks of the Apache Software Foundation.
    2. 2. © Hortonworks Inc. 2014 Topics •What is YARN? •Why would you want to run Accumulo on YARN? •What is Slider and why is it needed? •How is Accumulo deployed & managed with Slider? Page 2
    3. 3. © Hortonworks Inc. 2014 Getting more from Hadoop HADOOP 1.0 HDFS (redundant, reliable storage) MapReduce (cluster resource management & data processing) HDFS2 (redundant, reliable storage) YARN (cluster resource management) MapReduce (data processing) App X (data processing) HADOOP 2.0 Failure handling and resource management are no longer just for MapReduce … and this separation enables much more flexibility Page 3 App Y (data processing) Primarily Batch Batch, Interactive, Online, Streaming, …
    4. 4. © Hortonworks Inc. 2014 App on YARN Use Cases •Small app clusters in a large YARN cluster •Dynamic clusters •Self-healing clusters •Elastic clusters •Transient clusters for workflows •Custom versions & configurations •More efficient utilization/sharing Page 4
    5. 5. © Hortonworks Inc. 2014 YARN Structure Page 5 HDFS YARN Node Manager HDFS YARN Node Manager HDFS YARN Resource Manager “The RM” HDFS YARN Node Manager • Servers run YARN Node Managers • NM's heartbeat to Resource Manager • RM schedules work over cluster • RM allocates containers to apps • NMs start containers • NMs report container health
    6. 6. © Hortonworks Inc. 2014 Client Creates App Master Page 6 HDFS YARN Node Manager HDFS YARN Node Manager Client HDFS YARN Resource Manager “The RM” HDFS YARN Node Manager Application Master
    7. 7. © Hortonworks Inc. 2014 AM Asks for Containers Page 7 HDFS YARN Node Manager HDFS YARN Node Manager Container Container HDFS YARN Resource Manager “The RM” HDFS YARN Node Manager Container Application Master
    8. 8. © Hortonworks Inc. 2014 YARN Notifies AM of Failures Page 8 HDFS YARN Node Manager HDFS YARN Node Manager ContainerContainer Container HDFS YARN Resource Manager “The RM” HDFS YARN Node Manager Application Master
    9. 9. © Hortonworks Inc. 2014 Issues to Consider •Do I need to re-write parts of my application? •How do I package my application for YARN? •How do I configure my application? •How do I debug my application? •Can I still manage my application? •Can I monitor my application? •Can I manage inter-/intra-application dependencies? •How will the external clients communicate? •What does it take to secure the application? Page 9
    10. 10. © Hortonworks Inc. 2014 Apache Slider Apache Slider is a project in incubation at the Apache Software Foundation with the goal of making it possible and easy to deploy existing applications onto a YARN cluster • History – HBase on YARN (HOYA) – AccumuloProvider/HBaseProvider on YARN – Agent Provider + App Packages for Accumulo/HBase/Storm/… • Goals for long-lived applications – Execute management operations (Start/Stop, Reconfigure, Scale up/down, Rolling-restart, Decommission/Recommission, Upgrade) – Detect and remedy failures – Manage logs – Monitor (Ganglia, JMX) Page 10
    11. 11. © Hortonworks Inc. 2014 Components of Slider Page 11 Slider App Package Slider CLI HDFS YARN Resource Manager “The RM” HDFS YARN Node Manager Agent Comp. Inst. HDFS YARN Node Manager Agent Comp. Inst. App Master / Agent Provider Registry • AppMaster • AgentProvider • Agent • Component Instance • AppPackage • CLI • Registry
    12. 12. © Hortonworks Inc. 2014 Application by Slider Page 12 Similar to any YARN application 1. CLI starts an instance of the AM 2. AM requests containers 3. Containers activate with an Agent 4. Agent gets application definition 5. Agent registers with AM 6. AM issues commands 7. Agent reports back status, configuration, etc. 8. AM publishes endpoints, configurations Slider App Package Slider CLI HDFS YARN Resource Manager “The RM” HDFS YARN Node Manager Agent Comp. Inst. HDFS YARN Node Manager Agent Comp. Inst. Application Registry App Master/Agent Provider AM commands install, start, stop, status, … CLI commands create, freeze, thaw, flex, destroy
    13. 13. © Hortonworks Inc. 2014 Accumulo Slider App Package Page 13
    14. 14. © Hortonworks Inc. 2014 Slider Metainfo Page 14 <metainfo><services><service> <name>ACCUMULO</name> <version>1.5.1</version> <exportGroups><exportGroup> <name>QuickLinks</name> <exports><export> <name>org.apache.slider.monitor</name> <value>http://${ACCUMULO_MONITOR_HOST}:${site.accumulo-site.monitor.port.client} </export></exports></exportGroup></exportGroups> <commandOrders><commandOrder> <command>ACCUMULO_TSERVER-START</command> <requires>ACCUMULO_MASTER-STARTED</requires> </commandOrder></commandOrders> <components><component> <name>ACCUMULO_MASTER</name> <category>MASTER</category> <minInstanceCount>1</minInstanceCount> <commandScript> <script>scripts/</script> </commandScript></component></components> </service></services></metainfo> Application Info Commands have dependencies URIs can be published Component information Commands are implemented as scripts
    15. 15. © Hortonworks Inc. 2014 Slider App Resource Spec Page 15 { "schema": "", "metadata": { }, "global": { }, "components": { "ACCUMULO_MASTER": { "yarn.role.priority": "1", "yarn.component.instances": "1" }, "slider-appmaster": { }, "ACCUMULO_TSERVER": { "yarn.role.priority": "2", "yarn.component.instances": "1" }, "ACCUMULO_MONITOR": { "yarn.role.priority": "3", "yarn.component.instances": "1" }, YARN resource requirements Unique priorities "ACCUMULO_GC": { "yarn.role.priority": "4", "yarn.component.instances": "1" }, "ACCUMULO_TRACER": { "yarn.role.priority": "5", "yarn.component.instances": "1" } } }
    16. 16. © Hortonworks Inc. 2014 Slider AppConfig Spec Page 16 { "application.def": "/slider/", "java_home": "/usr/jdk64/jdk1.7.0_45", "": "${AGENT_LOG_ROOT}/app/log", "": "${AGENT_WORK_ROOT}/app/run", "": "128m", "": "/usr/lib/hadoop", "": "/usr/lib/zookeeper", "": "instancename", "": "secret", "site.accumulo-site.instance.dfs.dir": "/apps/accumulo/data", "site.accumulo-site.master.port.client": "0", "site.accumulo-site.trace.port.client": "0", "site.accumulo-site.tserver.port.client": "0", "site.accumulo-site.gc.port.client": "0", "site.accumulo-site.monitor.port.log4j": "0", "site.accumulo-site.monitor.port.client": "${ACCUMULO_MONITOR.ALLOCATED_PORT}", "": "${ZK_HOST}", } Configurations needed by Slider Named variables Site variables for application Named variables for cluster details Allocate and advertise Variables for the application scripts (a representative sampling of various types of configuration parameters)
    17. 17. © Hortonworks Inc. 2014 Slider Install • Set up Local Install • Set up HDFS Page 17 /slider/ /slider/agent /slider/agent/conf /slider/agent/conf/agent.ini /slider/agent/slider-agent.tar.gz Plus any additional directories needed by the app mvn clean package –DskipTests (builds tarball) Get slider-0.31.0-incubating-SNAPSHOT-all.tar.gz from slider-assembly/target/ Untar tarball in desired directory Edit conf/slider-client.xml: yarn.application.classpath slider.zookeeper.quorum yarn.resourcemanager.address yarn.resourcemanager.scheduler.address fs.defaultFS
    18. 18. © Hortonworks Inc. 2014 Slider Execution • Create an Accumulo instance • Modify an existing instance Page 18 bin/slider create name --image hdfs:// slider/agent/slider-agent.tar.gz --template appConfig.json --resources resources.json bin/slider freeze name bin/slider thaw name bin/slider destroy name bin/slider flex name --component ACCUMULO_TSERVER 2
    19. 19. © Hortonworks Inc. 2014 Managing a YARN Application Goal is to have Slider integrate with any application management framework, e.g. Ambari Apache Ambari is an open source framework for provisioning, managing and monitoring Apache Hadoop clusters • Ambari Views allows development of custom user interfaces • Slider App View will deploy, monitor, manage YARN apps using Slider, embedded in Ambari (currently, Tech Preview) Page 19 Ambari Server Ambari Web FE View UI View BE Slider CLI HDFS YARN Node Manager HDFS YARN Node Manager HDFS YARN Node Manager
    20. 20. © Hortonworks Inc. 2014 Page 20
    21. 21. © Hortonworks Inc. 2014 Page 21
    22. 22. © Hortonworks Inc. 2014 What’s Next in Slider Page 22 •Lock-in Application Specification •Integration with the YARN Registry •Inter/Intra-Application Dependencies •Robust failure handling •Improved debugging •Security •More applications!
    23. 23. © Hortonworks Inc. 2014 YARN-896: Long-Lived Apps •Container reconnect on AM restart – mostly complete •Token renewal on long-lived apps – patch available •Containers: signaling, >1 process sequence •AM/RM managed gang scheduling •Anti-affinity hint in container requests •ZK Service Registry •Logging Page 23
    24. 24. © Hortonworks Inc. 2014 Slider is Seeking Contributors • Bring Your Favorite Applications to YARN –Create packages, give feedback, create patches, … • Useful Links –Source: –Website: –Mailing List: –JIRA: • Current and Upcoming Releases –Slider 0.30-incubating (May) –Slider 0.40-incubating (planned) Page 24
    25. 25. © Hortonworks Inc. 2014 Questions? IRC #accumulo Page 25
    26. 26. © Hortonworks Inc. 2014 AM Restart – leading edge Page 26 NodeMap model of YARN cluster ComponentHistory persistent history of component placements Specification resources.json &c Container Queues requested, starting, releasing Component Map container ID -> component instance Event History application history Persisted in HDFS Rebuilt Transient ctx.setKeepContainersAcrossApplicationAttempts(true)
    27. 27. © Hortonworks Inc. 2014 Application Registry Page 27 • A common problem (not specific to Slider) s:// • Current – Apache Curator based – Register URLs pointing to actual data – AM doubles up as a webserver for published data • Future – Registry should be stand-alone – Slider is a consumer as well as publisher – Slider focuses on declarative solution for Applications to publish data – Allows integration of Applications independent of how they are hosted