© Hortonworks Inc. 2013
Hoya: HBase on YARN
Steve Loughran & Devaraj Das
{stevel, ddas} at hortonworks.com
@steveloughran,...
© Hortonworks Inc. 2012
Hadoop as Next-Gen Platform
HADOOP 1.0
HDFS
(redundant, reliable storage)
MapReduce
(cluster resou...
© Hortonworks Inc.
YARN: Taking Hadoop Beyond Batch
Page 3
Applications Run Natively IN Hadoop
HDFS2 (Redundant, Reliable ...
HDFS2 (Redundant, Reliable Storage)
YARN (Cluster Resource Management)
BATCH
(MapReduce)
INTERACTIVE
(Tez)
STREAMING
(Stor...
© Hortonworks Inc.
Page 5
© Hortonworks Inc.
Hoya: On-demand HBase clusters
1. Small HBase cluster in large YARN cluster
2. Dynamic HBase clusters
3...
© Hortonworks Inc.
Goal: No code changes in HBase
• Today : none
HBase 0.95.2$ mvn install -Dhadoop.version=2.0
But we'd l...
© Hortonworks Inc.
Hoya – the tool
• Hoya (Hbase On YArn)
–Java tool
–Completely CLI driven
• Input: cluster description a...
© Hortonworks Inc. 2012
YARN manages the cluster
Page 9
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HDFS
YARN Resource M...
© Hortonworks Inc. 2012
Hoya Client creates App Master
Page 10
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HDFS
YARN Res...
© Hortonworks Inc. 2012
AM deploys HBase with YARN
Page 11
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HDFS
YARN Resourc...
© Hortonworks Inc. 2012
HBase & clients bind via Zookeeper
Page 12
HDFS
YARN Node Manager
HBase Region Server
HDFS
YARN No...
© Hortonworks Inc. 2012
YARN notifies AM of failures
Page 13
HDFS
YARN Node Manager
HDFS
YARN Node Manager
HBase Region Se...
© Hortonworks Inc.
HOYA - cool bits
• Cluster specification stored as JSON in HDFS
• Conf dir cached, dynamically patched ...
© Hortonworks Inc.
HOYA - AM RPC API
//shut down
public void stopCluster();
//change #of worker nodes in cluster
public bo...
© Hortonworks Inc.
Flexing/failure handling is same code
public boolean flexNodes(int workers) throws IOException {
log.in...
© Hortonworks Inc. 2012
{
"version" : "1.0",
"name" : "cl1",
"type" : "hbase",
"state" : 1,
"createTime" : 1377276630308,
...
© Hortonworks Inc. 2012
"roles" : {
"worker" : {
"yarn.memory" : "256",
"role.instances" : "5",
"role.name" : "worker",
"j...
© Hortonworks Inc.
Current status
• Able to create & stop on-demand HBase clusters
–RegionServer failures handled
• Able t...
© Hortonworks Inc.
Ongoing
• Multiple roles: worker, master, monitor
--role worker --roleopts worker yarn.vcores 2
• Multi...
© Hortonworks Inc.
Requirements of an App: MUST
• Install from tarball; run as normal user
• Deploy/start without human in...
© Hortonworks Inc.
Requirements of an App: SHOULD
• Be configurable by Hadoop XML files
• Publish dynamically assigned web...
© Hortonworks Inc.
YARN-896: long-lived services
1. Container reconnect on AM restart
2. Token renewal on long-lived apps
...
© Hortonworks Inc.
SLAs & co-existence with MapReduce
1. Make IO bandwidth/IOPs a resource used in
scheduling & limits
2. ...
© Hortonworks Inc.
Hoya needs a home!
Page 25
https://github.com/hortonworks/hoya
© Hortonworks Inc
Questions?
hortonworks.com
Page 26
Upcoming SlideShare
Loading in...5
×

Hoya for Code Review

865

Published on

An iteration of the Hoya slides put up as part of a review of the code with others writing YARN services; looks at what Hoya offers -what we need from Apps to be able to deploy them this way, and what we need from YARN

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
865
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
17
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • JMX port binding & publishing of portweb port rolling restart of NM/RMAM retry logicTesting: chaos monkey for YARNLogging: running Samza without HDFS -costs of ops & latency. Are only running Samza clusters w/ YARN. YARN puts logs into HDFS by default, so without HDFS you are stuffed.-rollover of stdout & stderr -have the NM implement the rolling. -Samza could log from log4j to append to Kafka; need a way to pull out and view. Adds 15 min YARN can use http: URLs to pull in local resourceSamza can handle outages of a few minutes, but for other services rolling restart/upgadeno classic scheduling; assume full code can run or buy more hardware-RM to tell AM when request cant be satisfied
  • co-existenc
  • Hoya for Code Review

    1. 1. © Hortonworks Inc. 2013 Hoya: HBase on YARN Steve Loughran & Devaraj Das {stevel, ddas} at hortonworks.com @steveloughran, @ddraj August 2013
    2. 2. © Hortonworks Inc. 2012 Hadoop as Next-Gen Platform HADOOP 1.0 HDFS (redundant, reliable storage) MapReduce (cluster resource management & data processing) HDFS2 (redundant, reliable storage) YARN (cluster resource management) MapReduce (data processing) Others (data processing) HADOOP 2.0 Single Use System Batch Apps Multi Purpose Platform Batch, Interactive, Online, Streaming, … Page 2
    3. 3. © Hortonworks Inc. YARN: Taking Hadoop Beyond Batch Page 3 Applications Run Natively IN Hadoop HDFS2 (Redundant, Reliable Storage) YARN (Cluster Resource Management) BATCH (MapReduce) INTERACTIVE (Tez) STREAMING (Storm, S4,…) GRAPH (Giraph) HPC MPI (OpenMPI) OTHER (Search) (Weave…) Samza Store ALL DATA in one place… Interact with that data in MULTIPLE WAYS with Predictable Performance and Quality of Service IN-MEMORY (Spark)
    4. 4. HDFS2 (Redundant, Reliable Storage) YARN (Cluster Resource Management) BATCH (MapReduce) INTERACTIVE (Tez) STREAMING (Storm, S4,…) GRAPH (Giraph) HPC MPI (OpenMPI) OTHER (Search) (Weave…) HBase IN-MEMORY (Spark) HDFS2 (Redundant, Reliable Storage) YARN (Cluster Resource Management) BATCH (MapReduce) INTERACTIVE (Tez) STREAMING (Storm, S4,…) GRAPH (Giraph) HPC MPI (OpenMPI) OTHER (Search) (Weave…) HBase IN-MEMORY (Spark) And HBase?
    5. 5. © Hortonworks Inc. Page 5
    6. 6. © Hortonworks Inc. Hoya: On-demand HBase clusters 1. Small HBase cluster in large YARN cluster 2. Dynamic HBase clusters 3. Self-healing HBase Cluster 4. Elastic HBase clusters 5. Transient/intermittent clusters for workflows 6. Custom versions & configurations 7. More efficient utilization/sharing of cluster Page 6
    7. 7. © Hortonworks Inc. Goal: No code changes in HBase • Today : none HBase 0.95.2$ mvn install -Dhadoop.version=2.0 But we'd like • ZK reporting of web UI ports • Allocation of tables in RS to be block location aware • A way to get from failed RS to YARN container (configurable ID is enough) Page 7
    8. 8. © Hortonworks Inc. Hoya – the tool • Hoya (Hbase On YArn) –Java tool –Completely CLI driven • Input: cluster description as JSON –Specification of cluster: node options, ZK params –Configuration generated –Entire state persisted • Actions: create, freeze/thaw, flex, exists <cluster> • Can change cluster state later –Add/remove nodes, started / stopped states
    9. 9. © Hortonworks Inc. 2012 YARN manages the cluster Page 9 HDFS YARN Node Manager HDFS YARN Node Manager HDFS YARN Resource Manager HDFS YARN Node Manager • Servers run YARN Node Managers • NM's heartbeat to Resource Manager • RM schedules work over cluster • RM allocates containers to apps • NMs start containers • NMs report container health
    10. 10. © Hortonworks Inc. 2012 Hoya Client creates App Master Page 10 HDFS YARN Node Manager HDFS YARN Node Manager HDFS YARN Resource Manager HDFS YARN Node Manager Hoya Client Hoya AM
    11. 11. © Hortonworks Inc. 2012 AM deploys HBase with YARN Page 11 HDFS YARN Node Manager HDFS YARN Node Manager HDFS YARN Resource Manager Hoya Client HDFS YARN Node Manager Hoya AM [HBase Master] HBase Region Server HBase Region Server
    12. 12. © Hortonworks Inc. 2012 HBase & clients bind via Zookeeper Page 12 HDFS YARN Node Manager HBase Region Server HDFS YARN Node Manager HBase Region Server HDFS YARN Resource Manager HBase Client HDFS YARN Node Manager Hoya AM [HBase Master] Hoya Client
    13. 13. © Hortonworks Inc. 2012 YARN notifies AM of failures Page 13 HDFS YARN Node Manager HDFS YARN Node Manager HBase Region Server HDFS YARN Resource Manager Hoya Client HDFS YARN Node Manager Hoya AM [HBase Master] HBase Region Server HBase Region Server
    14. 14. © Hortonworks Inc. HOYA - cool bits • Cluster specification stored as JSON in HDFS • Conf dir cached, dynamically patched before pushing up as local resources for master & region servers • HBase .tar file stored in HDFS -clusters can use the same/different HBase versions • Handling of cluster flexing is the same code as unplanned container loss. • No Hoya code on region servers Page 14
    15. 15. © Hortonworks Inc. HOYA - AM RPC API //shut down public void stopCluster(); //change #of worker nodes in cluster public boolean flexNodes(int workers); //get JSON description of live cluster public String getClusterStatus(); Page 15
    16. 16. © Hortonworks Inc. Flexing/failure handling is same code public boolean flexNodes(int workers) throws IOException { log.info("Flexing cluster count from {} to {}", numTotalContainers, workers); if (numTotalContainers == workers) { //no-op log.info("Flex is a no-op"); return false; } //update the #of workers numTotalContainers = workers; // ask for more containers if needed reviewRequestAndReleaseNodes(); return true; } Page 16
    17. 17. © Hortonworks Inc. 2012 { "version" : "1.0", "name" : "cl1", "type" : "hbase", "state" : 1, "createTime" : 1377276630308, "originConfigurationPath" : "hdfs://ubuntu:9000/user/stevel/.hoya/cluster/cl1/original", "generatedConfigurationPath" : "hdfs://ubuntu:9000/user/stevel/.hoya/cluster/cl1/generated", "zkHosts" : "localhost", "zkPort" : 2181, "zkPath" : "/yarnapps_hoya_stevel_cl1", "hbaseDataPath" : "hdfs://ubuntu:9000/user/stevel/.hoya/cluster/cl1/hbase", "imagePath" : "hdfs://ubuntu:9000/hbase.tar", "options" : { "hoya.test" : "true" }, ... } Cluster Specification: persistent & wire
    18. 18. © Hortonworks Inc. 2012 "roles" : { "worker" : { "yarn.memory" : "256", "role.instances" : "5", "role.name" : "worker", "jvm.heapsize" : "256", "yarn.vcores" : "1", "app.infoport" : "0" "env.MALLOC_ARENA_MAX": "4" }, "master" : { "yarn.memory" : "128", "role.instances" : "1", "role.name" : "master", "jvm.heapsize" : "128", "yarn.vcores" : "1", "app.infoport" : "8585" } } Role Specifications
    19. 19. © Hortonworks Inc. Current status • Able to create & stop on-demand HBase clusters –RegionServer failures handled • Able to specify specific HBase configuration: hbase-home or .tar.gz • Cluster stop, restart, flex • get (dynamic) conf as XML, properties
    20. 20. © Hortonworks Inc. Ongoing • Multiple roles: worker, master, monitor --role worker --roleopts worker yarn.vcores 2 • Multiple Providers: HBase + others –client side: preflight, configuration patching –server side: starting roles, liveness • Liveness probes: HTTP GET, RPC port, RPC op? • What do we need in YARN for production? Page 20
    21. 21. © Hortonworks Inc. Requirements of an App: MUST • Install from tarball; run as normal user • Deploy/start without human intervention • Pre-configurable, static instance config data • Support dynamic discovery/binding of peers • Co-existence with other app instance in cluster/nodes • Handle co-located role instances • Persist data to HDFS • Support 'kill' as a shutdown option • Handle failed role instances • Support role instances moving after failure Page 21
    22. 22. © Hortonworks Inc. Requirements of an App: SHOULD • Be configurable by Hadoop XML files • Publish dynamically assigned web UI & RPC ports • Support cluster flexing up/down • Support API to determine role instance status • Make it possible to determine role instance ID from app • Support simple remote liveness probes Page 22
    23. 23. © Hortonworks Inc. YARN-896: long-lived services 1. Container reconnect on AM restart 2. Token renewal on long-lived apps 3. Containers: signalling, >1 process sequence 4. AM/RM managed gang scheduling 5. Anti-affinity hint in container requests 6. Service Registry - ZK? 7. Logging All post Hadoop-2.1 Page 23
    24. 24. © Hortonworks Inc. SLAs & co-existence with MapReduce 1. Make IO bandwidth/IOPs a resource used in scheduling & limits 2. Need to monitor what's going on w.r.t IO & net load from containers  apps  queues 3. Dynamic adaptation of cgroup HDD, Net, RAM limits Page 24
    25. 25. © Hortonworks Inc. Hoya needs a home! Page 25 https://github.com/hortonworks/hoya
    26. 26. © Hortonworks Inc Questions? hortonworks.com Page 26
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×