Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Toward Better Multi-Tenancy Support from HDFS

1,419 views

Published on

Toward Better Multi-Tenancy Support from HDFS

Published in: Technology
  • Be the first to comment

Toward Better Multi-Tenancy Support from HDFS

  1. 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Toward Better Multi- Tenancy Support from HDFS Xiaoyu Yao Email: xyao@hortonworks.com
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved About myself ⬢ Member of Technical Staff at Hortonworks since 2014 ⬢ Apache Hadoop Committer and PMC member. ⬢ Currently working on HDFS. ⬢ This talk is to help better understanding of HDFS multi-tenancy support and ongoing work for better resource management.
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda ⬢ Overview ⬢ Hadoop multi-tenancy features ⬢ HDFS resources and multi-tenancy offerings ⬢ HDFS resource management via resource coupon ⬢ Q&A
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Overview ⬢ Centrally managed infrastructure –Consolidate to simplify management and lower TCO –Better utilization and efficiency ⬢ Requirement –Resource Sharing –Resource Isolation –Resource Control
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi-Tenancy Support from Hadoop Resource Sharing Resource Isolation Resource Management HBASE Y Namespace, Region Server Group Quota YARN Y Queue, Node Label ... Capacity Scheduler, ... HDFS Y Federation Quota, FairCallQueue, Backoff
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Resources ⬢ Capacity –Namespace –Storage Space –Storage Type ⬢ Operational Resources –Namenode •RPC –Datanode •Disk & Network
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Resource Sharing/Isolation – Federation
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Capacity Management – Quota ⬢ Quota –Namespace –StorageSpace –HDFS-7584 Quota by Storage Types ⬢ Limitations –Static –Per directory –No per user/job control
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Operational Resource Management – Namenode RPC Isolation (1) ⬢Internal RPC –DN->NN block report, heartbeat, etc. –ZKFC->NN liveness check ⬢External RPC –Client RPCs from HDFSClients such as MR jobs/Hive queries/HBase Client Listener Reader Reader Call Queue Handler Handler Handler FSN
  10. 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Operational Resource Management – Namenode RPC Isolation (2) ⬢Use case: –HFDS access from normal jobs impacted by offending jobs –Internal RPCs impacted by External RPCs –One blocked RPC method could affect others ⬢Protect HDFS internal RPCs: –Dedicated service RPC server/port •Isolate DN->NN block report, heartbeat, etc. –Dedicated lifeline RPC server/port •Protect ZKFC->NN liveness check ⬢All external RPCs go to the default port (e.g., 8020)
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Resource Management – Name Node RPC Call Queue ⬢ In multi-tenancy scenario, call queue should play an important role like a shock absorber to accommodate different workload, converting busty arrivals into smooth, steady departures. ⬢ Good call queue –queue without call bloat –catches and handles bursts with no more than a temporary increase of queue delay –maximum server utilization ⬢ Bad call queue –queue that exhibits call bloat –queue filled up and stay filled upon bursts –low utilization and high queue latency
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Resource Management - Fair Call Queue ⬢ Before HADOOP-9640 LinkedBlockingQueue –Single queue –Client blocked and timeout/fail when queue is full ⬢ HADOOP-9640 - Fair Call Queue –Multiple priority levels and call queues with different processing priority –Each RPC is assigned a priority by scheduler –High priority RPC calls are put into call queue with higher probability of being executed. Scheduler Queue 0 Queue ... Queue 2 Multiplexer (WRR)
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Resource Management – Namenode RPC Throttling <1> ⬢ HADOOP-10597 Backoff when the call queue is full –Send back a Retriable exception –Let the client do exponential wait and retry instead of blocking/timeout/failed the call.
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Resource Management – Namenode RPC Throttling <2> ⬢ HADOOP-12916 Backoff based on response time –The basic idea: Backoff earlier to avoid call queue overload so that namenode can recover quickly. –Low priority calls get backed off if response time of high priority call is over predefined threshold. –More per user/queue metrics added for trouble shooting.
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Resource Management – Namenode RPC Throttling <3> ⬢ Abstract scheduler interface from call queue for pluggable RPC priority assignment –DefaultRpcScheduler: all RPC calls with same priority –DecayRpcScheduler: from original FairCallQueue priority assigned based on previous call volumes of users. –Other experimental schedulers: configurable list of high priority user/group for low latency jobs, medium priority user/group for normal jobs and low priority user/group for batch jobs.
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS resource management - QoS ⬢ Use case: –Allow high performance QoS mechanism with minimum decoding effort on server side ⬢ HADOOP-9194 QoS support for Hadoop RPC –One bytes in RPC header to facilitate QoS mechanism –E.g., differentiate OLTP/OLAP, batch/streaming against the same HDFS ⬢ Limitation –No mechanism level implementation yet
  17. 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS resource management with YARN ⬢ Use Case –Priority inversion without centralized resource management (e.g., RPC calls from high priority YARN jobs may be put into low priority HDFS namenode call queue) –Identify and manage ”bad” caller effectively ⬢ Namenode – RPC handler –FairCallQueue offers the fairness use of namenode RPC handlers –No guarantee of differentiation ⬢ Datanode – I/O bandwidth –No differentiation of writer/reader and bandwidth usage. –Datanode allows static throttling balancer I/O.
  18. 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Namenode Resource Reservation ⬢ HADOOP-13128 propose HDFS namenode resource reservation via resource coupon –From throttling to manage –Similar to delegation token in many aspects –Works for both Kerberos and non-Kerberos cluster –Allows only privileged service user to request resource coupons from namenode. –Coupon can be serialized/de-serialized for use within container. –Coupon can be renewed for long running jobs or canceled after the intended job is finished.
  19. 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Namenode Resource Coupon ⬢ Coupon Identifier –Finer grain owner (MR job ID, Hive Query ID) to help identify and manage “good” and “bad” callers –Resource type (Namenode RPC or Datanode I/O bandwidth) –Flexible management unit for different resources. •Min/Max percentage (e.g. Namenode RPC) •Absolute value (Datanode I/O bandwidth)
  20. 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Namenode Resource Coupon Manager (RCM) ⬢ Grant/Renew/Cancel resource coupon ⬢ Monitor and report resource usage ⬢ Check and validate resource use requests
  21. 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Namenode Resource Pool HDFS Namenode Resource Pool Fairness Pool Managed Pool Applications supporting Resource Coupon (YARN/HBASE) Legacy Applications without Resource Coupon
  22. 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Namenode Resource Coupon Manager (RCM) NEW Client YARN Resource Manager HDFS Namenode RCM HDFS Datanode YARN Node Manager YARN Container
  23. 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Resource Management – Datanode ⬢ Use case: –When a client writes to HDFS faster than the disk bandwidth of the DNs, it saturates the disk bandwidth and put the DNs into an unresponsive state. –The client only backs off by aborting / recovering the pipeline, which causes failed writes and unnecessary pipeline recovery. ⬢ Static I/O Throttling –HDFS-7265 Support HDFS IO throttling –HDFS-9796 Use a throttler for replica write in datanode –HDFS-4412 Add throttler for datanode bandwidth –HADOOP-10410 datanode Qos via ioprio_set on DataXceiver thread ⬢ Dynamic I/O Throttling –HDFS-7270 Add congestion signaling capability to DataNode write pipline(ECN) ⬢ Future work: I/O bandwidth reservation with resource coupon
  24. 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank you! Q&A

×