Accumulo Summit 2014: Accumulo on YARN
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Accumulo Summit 2014: Accumulo on YARN

  • 1,182 views
Uploaded on

Speaker: Billie Rinaldi ...

Speaker: Billie Rinaldi

In their OSDI 2006 paper, Google describes that "Bigtable depends on a cluster management system for scheduling jobs, managing resources on shared machines, dealing with machine failures, and monitoring machine status." Until recently, no such system existed for Apache Accumulo to rely upon. Apache Hadoop 2 introduced the Yarn resource management system to the Hadoop ecosystem. This talk will describe the benefits Yarn can provide for Accumulo installations and how the Slider project (proposed for the Apache Incubator) makes it easier to deploy long-running applications on Yarn. It will describe the details of the Accumulo App Package for Slider and how to use Slider to deploy an Accumulo instance, as well as how instances can be actively managed by other applications such as Apache Ambari.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,182
On Slideshare
983
From Embeds
199
Number of Embeds
6

Actions

Shares
Downloads
22
Comments
0
Likes
2

Embeds 199

http://localhost 59
http://www.scoop.it 57
http://accumulosummit.com 44
https://twitter.com 34
http://www.slideee.com 4
http://www.accumulosummit.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Client talks to the Yarn ResourceManager, which gets a container from a NodeManager and starts the App Master (AM)
  • AM requests containers for the other specified roles of the cluster (e.g. tserver, monitor) and launches those processes

Transcript

  • 1. © Hortonworks Inc. 2014 Apache Accumulo on YARN with Apache Slider Billie Rinaldi Sr. Member of Technical Staff Hortonworks, Inc. June 12, 2014 Page 1 Apache, Accumulo, Slider, Ambari, Hadoop, Yarn, Apache Accumulo, Apache Slider, Apache Ambari, and the Accumulo logo are trademarks of the Apache Software Foundation.
  • 2. © Hortonworks Inc. 2014 Topics •What is YARN? •Why would you want to run Accumulo on YARN? •What is Slider and why is it needed? •How is Accumulo deployed & managed with Slider? Page 2
  • 3. © Hortonworks Inc. 2014 Getting more from Hadoop HADOOP 1.0 HDFS (redundant, reliable storage) MapReduce (cluster resource management & data processing) HDFS2 (redundant, reliable storage) YARN (cluster resource management) MapReduce (data processing) App X (data processing) HADOOP 2.0 Failure handling and resource management are no longer just for MapReduce … and this separation enables much more flexibility Page 3 App Y (data processing) Primarily Batch Batch, Interactive, Online, Streaming, …
  • 4. © Hortonworks Inc. 2014 App on YARN Use Cases •Small app clusters in a large YARN cluster •Dynamic clusters •Self-healing clusters •Elastic clusters •Transient clusters for workflows •Custom versions & configurations •More efficient utilization/sharing Page 4
  • 5. © Hortonworks Inc. 2014 YARN Structure Page 5 HDFS YARN Node Manager HDFS YARN Node Manager HDFS YARN Resource Manager “The RM” HDFS YARN Node Manager • Servers run YARN Node Managers • NM's heartbeat to Resource Manager • RM schedules work over cluster • RM allocates containers to apps • NMs start containers • NMs report container health
  • 6. © Hortonworks Inc. 2014 Client Creates App Master Page 6 HDFS YARN Node Manager HDFS YARN Node Manager Client HDFS YARN Resource Manager “The RM” HDFS YARN Node Manager Application Master
  • 7. © Hortonworks Inc. 2014 AM Asks for Containers Page 7 HDFS YARN Node Manager HDFS YARN Node Manager Container Container HDFS YARN Resource Manager “The RM” HDFS YARN Node Manager Container Application Master
  • 8. © Hortonworks Inc. 2014 YARN Notifies AM of Failures Page 8 HDFS YARN Node Manager HDFS YARN Node Manager ContainerContainer Container HDFS YARN Resource Manager “The RM” HDFS YARN Node Manager Application Master
  • 9. © Hortonworks Inc. 2014 Issues to Consider •Do I need to re-write parts of my application? •How do I package my application for YARN? •How do I configure my application? •How do I debug my application? •Can I still manage my application? •Can I monitor my application? •Can I manage inter-/intra-application dependencies? •How will the external clients communicate? •What does it take to secure the application? Page 9
  • 10. © Hortonworks Inc. 2014 Apache Slider Apache Slider is a project in incubation at the Apache Software Foundation with the goal of making it possible and easy to deploy existing applications onto a YARN cluster • History – HBase on YARN (HOYA) – AccumuloProvider/HBaseProvider on YARN – Agent Provider + App Packages for Accumulo/HBase/Storm/… • Goals for long-lived applications – Execute management operations (Start/Stop, Reconfigure, Scale up/down, Rolling-restart, Decommission/Recommission, Upgrade) – Detect and remedy failures – Manage logs – Monitor (Ganglia, JMX) Page 10
  • 11. © Hortonworks Inc. 2014 Components of Slider Page 11 Slider App Package Slider CLI HDFS YARN Resource Manager “The RM” HDFS YARN Node Manager Agent Comp. Inst. HDFS YARN Node Manager Agent Comp. Inst. App Master / Agent Provider Registry • AppMaster • AgentProvider • Agent • Component Instance • AppPackage • CLI • Registry
  • 12. © Hortonworks Inc. 2014 Application by Slider Page 12 Similar to any YARN application 1. CLI starts an instance of the AM 2. AM requests containers 3. Containers activate with an Agent 4. Agent gets application definition 5. Agent registers with AM 6. AM issues commands 7. Agent reports back status, configuration, etc. 8. AM publishes endpoints, configurations Slider App Package Slider CLI HDFS YARN Resource Manager “The RM” HDFS YARN Node Manager Agent Comp. Inst. HDFS YARN Node Manager Agent Comp. Inst. Application Registry App Master/Agent Provider AM commands install, start, stop, status, … CLI commands create, freeze, thaw, flex, destroy
  • 13. © Hortonworks Inc. 2014 Accumulo Slider App Package Page 13
  • 14. © Hortonworks Inc. 2014 Slider Metainfo Page 14 <metainfo><services><service> <name>ACCUMULO</name> <version>1.5.1</version> <exportGroups><exportGroup> <name>QuickLinks</name> <exports><export> <name>org.apache.slider.monitor</name> <value>http://${ACCUMULO_MONITOR_HOST}:${site.accumulo-site.monitor.port.client} </export></exports></exportGroup></exportGroups> <commandOrders><commandOrder> <command>ACCUMULO_TSERVER-START</command> <requires>ACCUMULO_MASTER-STARTED</requires> </commandOrder></commandOrders> <components><component> <name>ACCUMULO_MASTER</name> <category>MASTER</category> <minInstanceCount>1</minInstanceCount> <commandScript> <script>scripts/accumulo_master.py</script> </commandScript></component></components> </service></services></metainfo> Application Info Commands have dependencies URIs can be published Component information Commands are implemented as scripts
  • 15. © Hortonworks Inc. 2014 Slider App Resource Spec Page 15 { "schema": "http://example.org/specification/v2.0.0", "metadata": { }, "global": { }, "components": { "ACCUMULO_MASTER": { "yarn.role.priority": "1", "yarn.component.instances": "1" }, "slider-appmaster": { }, "ACCUMULO_TSERVER": { "yarn.role.priority": "2", "yarn.component.instances": "1" }, "ACCUMULO_MONITOR": { "yarn.role.priority": "3", "yarn.component.instances": "1" }, YARN resource requirements Unique priorities "ACCUMULO_GC": { "yarn.role.priority": "4", "yarn.component.instances": "1" }, "ACCUMULO_TRACER": { "yarn.role.priority": "5", "yarn.component.instances": "1" } } }
  • 16. © Hortonworks Inc. 2014 Slider AppConfig Spec Page 16 { "application.def": "/slider/accumulo_v151.zip", "java_home": "/usr/jdk64/jdk1.7.0_45", "site.global.app_log_dir": "${AGENT_LOG_ROOT}/app/log", "site.global.app_pid_dir": "${AGENT_WORK_ROOT}/app/run", "site.global.tserver_heapsize": "128m", "site.global.hadoop_prefix": "/usr/lib/hadoop", "site.global.zookeeper_home": "/usr/lib/zookeeper", "site.global.accumulo_instance_name": "instancename", "site.global.accumulo_root_password": "secret", "site.accumulo-site.instance.dfs.dir": "/apps/accumulo/data", "site.accumulo-site.master.port.client": "0", "site.accumulo-site.trace.port.client": "0", "site.accumulo-site.tserver.port.client": "0", "site.accumulo-site.gc.port.client": "0", "site.accumulo-site.monitor.port.log4j": "0", "site.accumulo-site.monitor.port.client": "${ACCUMULO_MONITOR.ALLOCATED_PORT}", "site.accumulo-site.instance.zookeeper.host": "${ZK_HOST}", } Configurations needed by Slider Named variables Site variables for application Named variables for cluster details Allocate and advertise Variables for the application scripts (a representative sampling of various types of configuration parameters)
  • 17. © Hortonworks Inc. 2014 Slider Install • Set up Local Install • Set up HDFS Page 17 /slider/accumulo_v151.zip /slider/agent /slider/agent/conf /slider/agent/conf/agent.ini /slider/agent/slider-agent.tar.gz Plus any additional directories needed by the app mvn clean package –DskipTests (builds tarball) Get slider-0.31.0-incubating-SNAPSHOT-all.tar.gz from slider-assembly/target/ Untar tarball in desired directory Edit conf/slider-client.xml: yarn.application.classpath slider.zookeeper.quorum yarn.resourcemanager.address yarn.resourcemanager.scheduler.address fs.defaultFS
  • 18. © Hortonworks Inc. 2014 Slider Execution • Create an Accumulo instance • Modify an existing instance Page 18 bin/slider create name --image hdfs://c6401.ambari.apache.org:8020/ slider/agent/slider-agent.tar.gz --template appConfig.json --resources resources.json bin/slider freeze name bin/slider thaw name bin/slider destroy name bin/slider flex name --component ACCUMULO_TSERVER 2
  • 19. © Hortonworks Inc. 2014 Managing a YARN Application Goal is to have Slider integrate with any application management framework, e.g. Ambari Apache Ambari is an open source framework for provisioning, managing and monitoring Apache Hadoop clusters • Ambari Views allows development of custom user interfaces • Slider App View will deploy, monitor, manage YARN apps using Slider, embedded in Ambari (currently, Tech Preview) Page 19 Ambari Server Ambari Web FE View UI View BE Slider CLI HDFS YARN Node Manager HDFS YARN Node Manager HDFS YARN Node Manager
  • 20. © Hortonworks Inc. 2014 Page 20
  • 21. © Hortonworks Inc. 2014 Page 21
  • 22. © Hortonworks Inc. 2014 What’s Next in Slider Page 22 •Lock-in Application Specification •Integration with the YARN Registry •Inter/Intra-Application Dependencies •Robust failure handling •Improved debugging •Security •More applications!
  • 23. © Hortonworks Inc. 2014 YARN-896: Long-Lived Apps •Container reconnect on AM restart – mostly complete •Token renewal on long-lived apps – patch available •Containers: signaling, >1 process sequence •AM/RM managed gang scheduling •Anti-affinity hint in container requests •ZK Service Registry •Logging Page 23
  • 24. © Hortonworks Inc. 2014 Slider is Seeking Contributors • Bring Your Favorite Applications to YARN –Create packages, give feedback, create patches, … • Useful Links –Source: https://git-wip-us.apache.org/repos/asf/incubator-slider.git –Website: http://slider.incubator.apache.org –Mailing List: dev@slider.incubator.apache.org –JIRA: https://issues.apache.org/jira/browse/SLIDER • Current and Upcoming Releases –Slider 0.30-incubating (May) –Slider 0.40-incubating (planned) Page 24
  • 25. © Hortonworks Inc. 2014 Questions? billie@hortonworks.com dev@accumulo.apache.org user@accumulo.apache.org IRC #accumulo Page 25
  • 26. © Hortonworks Inc. 2014 AM Restart – leading edge Page 26 NodeMap model of YARN cluster ComponentHistory persistent history of component placements Specification resources.json &c Container Queues requested, starting, releasing Component Map container ID -> component instance Event History application history Persisted in HDFS Rebuilt Transient ctx.setKeepContainersAcrossApplicationAttempts(true)
  • 27. © Hortonworks Inc. 2014 Application Registry Page 27 • A common problem (not specific to Slider) s://issues.apache.org/jira/browse/YARN-913 • Current – Apache Curator based – Register URLs pointing to actual data – AM doubles up as a webserver for published data • Future – Registry should be stand-alone – Slider is a consumer as well as publisher – Slider focuses on declarative solution for Applications to publish data – Allows integration of Applications independent of how they are hosted