Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Improving HDFS Availability with IPC Quality of Service

2,176 views

Published on

Published in: Technology
  • Be the first to comment

Improving HDFS Availability with IPC Quality of Service

  1. 1. Improving HDFS Availability with Hadoop RPC Quality of Service Hadoop Summit 2015
  2. 2. • Hadoop performance at scale Ming Ma • Hadoop reliability and scalability Twitter Hadoop Team Chris Li Data Platform Who We Are
  3. 3. @twitterhadoop Agenda ‣Diagnosis of Namenode Congestion • How does QoS help? • How to use QoS in your clusters
  4. 4. @twitterhadoop Hadoop Workloads @ Twitter, ebay • Large scale • Thousands of machines • Tens of thousands of jobs / day • Diverse • Production vs ad-hoc • Batch vs interactive vs iterative • Require performance isolation
  5. 5. @twitterhadoop Solutions for Performance Isolation • YARN: flexible cluster resource management • Cross Data Center Traffic QoS • Set QoS policy via DSCP bits in IP header • HDFS Federation • Cluster Separation: run high SLA jobs in another cluster
  6. 6. @twitterhadoop Unsolved Extreme Cluster Slowdown
  7. 7. @twitterhadoop Unsolved Extreme Cluster Slowdown • hadoop fs -ls takes 5+ seconds
  8. 8. @twitterhadoop Unsolved Extreme Cluster Slowdown • hadoop fs -ls takes 5+ seconds • Worst case: cluster outage • Namenode lost some datanode heartbeats → replication storm
  9. 9. @twitterhadoop Audit Logs to the Rescue
  10. 10. @twitterhadoop Audit Logs to the Rescue • Username, operation type, date record logged for each operation
  11. 11. @twitterhadoop Audit Logs to the Rescue • Username, operation type, date record logged for each operation • We automatically backup into HDFS
  12. 12. @twitterhadoop (Hadoop Learning about Itself)
  13. 13. @twitterhadoop Cause: Resource Monopolization Each color is a different user Area is number of calls
  14. 14. @twitterhadoop What’s wrong with this code? while (true) { fileSystem.exists("/foo"); }
  15. 15. @twitterhadoop What’s wrong with this code? while (true) { fileSystem.exists("/foo"); } Don’t do this at home
  16. 16. @twitterhadoop What’s wrong with this code? while (true) { fileSystem.exists("/foo"); } Don’t do this at home Unless QoS is on ;)
  17. 17. @twitterhadoop Bad Code + MapReduce = DDoS on Namenode! Namenode Bad User Good Users Other Users
  18. 18. @twitterhadoop Bad Code + MapReduce = DDoS on Namenode! Namenode Bad User Good Users Other Users
  19. 19. @twitterhadoop Bad Code + MapReduce = DDoS on Namenode! Namenode Bad User Good Users Other Users
  20. 20. @twitterhadoop Bad Code + MapReduce = DDoS on Namenode! Namenode Bad User Good Users Other Users
  21. 21. @twitterhadoop Bad Code + MapReduce = DDoS on Namenode! Namenode Bad User Good Users Other Users
  22. 22. @twitterhadoop Bad Code + MapReduce = DDoS on Namenode! Namenode Bad User Good Users Other Users
  23. 23. @twitterhadoop Client Process Namenode Process RPC Server RPC Client DFS Client Namenode Service Responders NN Lock Hadoop RPC Overview FIFO Call Queue HandlersReaders
  24. 24. @twitterhadoop Hadoop RPC Overview FIFO Call Queue HandlersReaders
  25. 25. @twitterhadoop Hadoop RPC Overview FIFO Call Queue HandlersReaders
  26. 26. @twitterhadoop Diagnosing Congestion FIFO Call Queue HandlersReaders
  27. 27. @twitterhadoop Diagnosing Congestion Good User Bad User FIFO Call Queue HandlersReaders
  28. 28. @twitterhadoop Diagnosing Congestion HandlersReaders Good User Bad User
  29. 29. @twitterhadoop Diagnosing Congestion HandlersReaders Good User Bad User
  30. 30. @twitterhadoop Diagnosing Congestion HandlersReaders Good User Bad User
  31. 31. @twitterhadoop Diagnosing Congestion HandlersReaders Good User Bad User
  32. 32. @twitterhadoop Diagnosing Congestion HandlersReaders Good User Bad User
  33. 33. @twitterhadoop Diagnosing Congestion HandlersReaders Good User Bad User
  34. 34. @twitterhadoop Diagnosing Congestion HandlersReaders Good User Bad User ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
  35. 35. @twitterhadoop Solutions we’ve considered
  36. 36. @twitterhadoop Solutions we’ve considered • HDFS Federation
  37. 37. @twitterhadoop Solutions we’ve considered • HDFS Federation • Use separate RPC server for datanode requests (service RPC)
  38. 38. @twitterhadoop Solutions we’ve considered • HDFS Federation • Use separate RPC server for datanode requests (service RPC) • Namenode global lock
  39. 39. @twitterhadoop Agenda ✓ Diagnosis of Namenode Congestion ‣How does QoS help? • How to use QoS in your clusters
  40. 40. @twitterhadoop Goals • Achieve Fairness and QoS • No performance degradation • High throughput • Low overhead
  41. 41. @twitterhadoop Model it as a scheduling problem • Available resource is the RPC handler thread • Users should be given a fair share of resources
  42. 42. @twitterhadoop Design Considerations • Pluggable, configurable • Simplifying assumptions: • All users are equal • All RPC calls have the same cost • Leverage existing scheduling algorithms
  43. 43. @twitterhadoop Solving Congestion with FairCallQueue Call Queue HandlersReaders Good User Bad User Queue 0 Queue 1 Queue 2 Queue 3Scheduler Multiplexer
  44. 44. @twitterhadoop Fair Scheduling Call Queue HandlersReaders Good User Bad User
  45. 45. @twitterhadoop Fair Scheduling: Good User Call Queue HandlersReaders Good User Bad User
  46. 46. @twitterhadoop Fair Scheduling: Good User Call Queue HandlersReaders Good User Bad User 11%
  47. 47. @twitterhadoop Fair Scheduling: Good User Call Queue HandlersReaders Good User Bad User
  48. 48. @twitterhadoop Fair Scheduling: Good User Call Queue HandlersReaders Good User Bad User
  49. 49. @twitterhadoop Fair Scheduling: Good User Call Queue HandlersReaders Good User Bad User Queue 0: < 12%
  50. 50. @twitterhadoop Fair Scheduling: Good User Call Queue HandlersReaders Good User Bad User
  51. 51. @twitterhadoop Fair Scheduling: Good User Call Queue HandlersReaders Good User Bad User
  52. 52. @twitterhadoop Fair Scheduling: Bad User Call Queue HandlersReaders Good User Bad User
  53. 53. @twitterhadoop Fair Scheduling: Bad User Call Queue HandlersReaders Good User Bad User
  54. 54. @twitterhadoop Fair Scheduling: Bad User Call Queue HandlersReaders Good User Bad User
  55. 55. @twitterhadoop Fair Scheduling: Bad User Call Queue HandlersReaders Good User Bad User 80%
  56. 56. @twitterhadoop Fair Scheduling: Bad User Call Queue HandlersReaders Good User Bad User
  57. 57. @twitterhadoop Fair Scheduling: Bad User Call Queue HandlersReaders Good User Bad User
  58. 58. @twitterhadoop Fair Scheduling: Bad User Call Queue HandlersReaders Good User Bad User Queue 3: > 50%
  59. 59. @twitterhadoop Fair Scheduling: Bad User Call Queue HandlersReaders Good User Bad User
  60. 60. @twitterhadoop Fair Scheduling: Bad User Call Queue HandlersReaders Good User Bad User
  61. 61. @twitterhadoop Fair Scheduling Result Call Queue HandlersReaders Good User Bad User
  62. 62. @twitterhadoop Weighted Round-Robin Multiplexing Call Queue HandlersReaders Good User Bad User
  63. 63. @twitterhadoop Weighted Round-Robin Multiplexing Call Queue HandlersReaders Good User Bad User
  64. 64. @twitterhadoop Weighted Round-Robin Multiplexing Call Queue HandlersReaders Good User Bad User Take 3
  65. 65. @twitterhadoop Weighted Round-Robin Multiplexing Call Queue HandlersReaders Good User Bad User
  66. 66. @twitterhadoop Weighted Round-Robin Multiplexing Call Queue HandlersReaders Good User Bad User Take 2
  67. 67. @twitterhadoop Weighted Round-Robin Multiplexing Call Queue HandlersReaders Good User Bad User
  68. 68. @twitterhadoop Weighted Round-Robin Multiplexing Call Queue HandlersReaders Good User Bad User
  69. 69. @twitterhadoop Weighted Round-Robin Multiplexing Call Queue HandlersReaders Good User Bad User
  70. 70. @twitterhadoop Weighted Round-Robin Multiplexing Call Queue HandlersReaders Good User Bad User
  71. 71. @twitterhadoop Weighted Round-Robin Multiplexing Call Queue HandlersReaders Good User Bad User
  72. 72. @twitterhadoop Weighted Round-Robin Multiplexing Call Queue HandlersReaders Good User Bad User Repeat
  73. 73. @twitterhadoop FairCallQueue preventing high latency FIFO CallQueue FairCallQueue
  74. 74. @twitterhadoop RPC Backoff • Prevents RPC queue from completely filling up • Clients are told to wait and retry with exponential backoff
  75. 75. RPC Backoff Good User Bad User Call Queue HandlersReaders Good User
  76. 76. RPC Backoff Good User Bad User Call Queue HandlersReaders Good User RetriableException
  77. 77. RPC Backoff Good User Bad User Call Queue HandlersReaders Good User
  78. 78. @twitterhadoop RPC Backoff Effects ConnectTimeoutException ConnectTimeoutException GoodAppLatency(ms) 0 2250 4500 6750 9000 Abusive App - number of clients - number of connections 100 x 100 1k x 1k 10k x 100 10k x 500 10k x 10k 50k x 50k Normal FairCallQueue FairCallQueue + RPC Backoff
  79. 79. @twitterhadoop Current Status • Enabled on all Twitter and ebay production clusters for 6+ months • Open source availability: HADOOP-9640 • Swappable call queue in 2.4 • FairCallQueue in 2.6 • RPC Backoff in 2.8
  80. 80. @twitterhadoop Agenda ✓ Diagnosis of Namenode Congestion ✓ How does QoS help? ‣How to use QoS in your clusters
  81. 81. @twitterhadoop QoS is Easy to Enable hdfs-site.xml: <property> <name>ipc.8020.callqueue.impl</name> <value>org.apache.hadoop.ipc.FairCallQueue</value> </property> <property> <name>ipc.8020.backoff.enable</name> <value>true</value> </property> Port you want QoS on
  82. 82. @twitterhadoop Future Possibilities • RPC scheduling improvements • Weighted share per user • Prioritize datanode RPCs over client RPC • Overall HDFS QoS • Namenode fine-grained locking • Fairness for data transfers • HTTP based payloads such as webHDFS
  83. 83. @twitterhadoop Conclusion • Try it out! • No more namenode congestion since it’s been enabled at both Twitter and ebay • Providing QoS at the RPC level is an important step towards HDFS fine-grained QoS
  84. 84. @twitterhadoop Special thanks to our reviewers: • Arpit Agarwal (Hortonworks) • Daryn Sharp (Yahoo) • Andrew Wang (Cloudera) • Benoy Antony (ebay) • Jing Zhao (Hortonworks) • Hiroshi Ideka (vic.co.jp) • Eddy Xu (Cloudera) • Steve Loughran (Hortonworks) • Suresh Srinivas (Hortonworks) • Kihwal Lee (Yahoo) • Joep Rottinghuis (Twitter) • Lohit VijayaRenu (Twitter)
  85. 85. @twitterhadoop Questions and Answers • For help setting up QoS, feature ideas, questions: Ming Ma Chris Li @twitterhadoop @mingmasplace chrili_sf@ebaysf.com
  86. 86. @twitterhadoop Appendix
  87. 87. @twitterhadoop FairCallQueue Data • 37 node cluster • 10 users runs a job which has: • 20 Mappers, each mapper: • Runs 100 threads. Each thread: • Continuously calls hdfs.exists() in a tight loop • Spikes are caused by garbage collection, a separate issue
  88. 88. @twitterhadoop Client Backoff Data • See https://issues.apache.org/jira/secure/ attachment/12670619/ MoreRPCClientBackoffEvaluation.pdf
  89. 89. @twitterhadoop Related JIRAs • FairCallQueue + Backoff: HADOOP-9640 • Cross Data Center Traffic QoS: HDFS-5175 • nntop: HDFS-6982 • Datanode Congestion Control: HDFS-7270 • Namenode fine-grained locking: HDFS-5453
  90. 90. @twitterhadoop Thoughts on Tuning • Worth considering if you run a larger cluster or have many users • Make your life easier while tuning by refreshing the queue with hadoop dfsadmin -refreshCallQueue
  91. 91. @twitterhadoop Anatomy of a QoS conf key • core-site.xml • ipc.8020.faircallqueue.priority-levels RPC server’s port, customize if using non-default port / service rpc port
  92. 92. key: default: @twitterhadoop Number of Sub-queues • More subqueues = more unique classes of service • Recommend 10 for larger clusters ipc.8020.faircallqueue.priority-levels 4
  93. 93. key: default: @twitterhadoop Scheduler: Decay Factor • Controls by how much accumulated counts are decayed by on each sweep. Larger values decay slower. • Ex: 1024 calls with decay factor of 0.5 will take 10 sweeps to decay assuming the user makes no additional calls. ipc.8020.faircallqueue.decay-scheduler.decay-factor 0.5
  94. 94. key: default: @twitterhadoop Scheduler: Sweep Period • How many ms between each decay sweep. Smaller is more responsive, but sweeps have overhead. • Ex: if it takes 10 sweeps to decay and we sweep every 5 seconds, a user’s activity will remain for 50s. ipc.8020.faircallqueue.decay-scheduler.period-ms 5000
  95. 95. key: default: @twitterhadoop Scheduler: Thresholds • List of floats, determines boundaries between each service class. If you have 4 queues, you’ll have 3 bounds. • Each number represents a percentage of total calls. • First number is threshold for going into queue 0 (highest priority). Second number decides queue 1 vs rest. etc. • Recommend trying even splits (10, 20, 30, … 90) or exponential (default) ipc.8020.faircallqueue.decay-scheduler.thresholds 12%, 25%, 50%
  96. 96. key: default: @twitterhadoop Multiplexer: Weights • Weights are how many times the mux will try to read from a sub-queue it represents before moving on to the next sub-queue. • Ex: 4,3,1 is used for 3 queues, meaning: Read up to 4 times from queue 0, Read up to 3 times from queue 1, Read once from queue 2, Repeat • The mux controls the penalty of being in a low-priority queue. Recommend not setting anything to 0, as starvation is possible in that case. ipc.8020.faircallqueue.multiplexer.weights 8,4,2,1
  97. 97. key: default: @twitterhadoop Backoff Max Attempts • The default is equivalent to 90 seconds of retrying • To achieve equivalent of 10 minutes of retrying, set it to 44. dfs.client.retry.max.attempts 10

×