Collection of Small Tips on Further Stabilizing your Hadoop Cluster

1,671 views

Published on

Published in: Technology, Spiritual
1 Comment
2 Likes
Statistics
Notes
  • 5 STAR SLOTS GREAT JACKPOTS AND BONUSES COULDNT ASK FOR A BETTER SIT https://t.co/M4NkSqOQBL
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
1,671
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
0
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Collection of Small Tips on Further Stabilizing your Hadoop Cluster

  1. 1. Collection of small tips on further stabilizing your hadoop cluster P R E S E N T E D B Y K o j i N o g u c h i ⎪ J u n e 3 , 2 0 1 4 2 0 1 4 H a d o o p S u m m i t , S a n J o s e , C a l i f o r n i a
  2. 2. Agenda 2 Yahoo Confidential & Proprietary Who I am What’s NOT covered List of tips that I found them useful Q&A
  3. 3. Who I am 3 Yahoo Confidential & Proprietary Grid Support/Solutions at Yahoo. › Helping users on the internal hadoop clusters USER OPS Dev
  4. 4. USER OPS Dev Who I am 4 Yahoo Confidential & Proprietary Grid Support/Solutions at Yahoo. › Helping users on the internal hadoop clusters • Covering everything !?
  5. 5. USER OPS Dev Who I am 5 Yahoo Confidential & Proprietary Grid Support/Solutions at Yahoo. › Helping users on the internal hadoop clusters • Covering everything !?  Covering any tiny pieces not picked up by others …
  6. 6. What’s NOT covered in this talk 6 Yahoo Confidential & Proprietary How to maintain the clusters (ops) › automation on breakfixing, upgrading, monitoring How to configure hadoop clusters (dev) › Healthcheck script. Reserving disk space. Number of slots per node, etc How to tune your hadoop jobs (user) › Less spilling and identifying bottlenecks
  7. 7. Some items that turned out to be useful 7 Yahoo Confidential & Proprietary Identifying… Slow nodes Misconfigured nodes CPU wasting jobs HDFS wasting users Queue congestions
  8. 8. Some items that turned out to be useful 8 Yahoo Confidential & Proprietary Identifying… Slow nodes Misconfigured nodes CPU wasting jobs HDFS wasting users Queue congestions
  9. 9. Slow nodes hurting Hadoop cluster ? 9 Yahoo Confidential & Proprietary  One of Hadoop’s strengths FAULT TOLERANCE
  10. 10. Slow nodes hurting Hadoop cluster ? 10 Yahoo Confidential & Proprietary  One of Hadoop’s strengths FAULT TOLERANCE  Complete failure  GREAT! Fast Recovery  Partial failure  NOT so great … Tasks scheduled on slow nodes can take forever…
  11. 11. Speculative Execution helps? 11 Yahoo Confidential & Proprietary  Redundant copy of the slow task and take output from whichever task finished faster HDFS (DATA) task1 attempt0 NodeA Output
  12. 12. HDFS (DATA) task1 attempt0 NodeA Output task1 attempt1 NodeB Take the output from faster attempt Speculative Execution helps? 12 Yahoo Confidential & Proprietary  Redundant copy of the slow task and take output from whichever task finished faster
  13. 13. task1 attempt0 NodeA Output task1 attempt1 NodeB NodeX NodeY NodeZ Speculative Execution helps? 13 Yahoo Confidential & Proprietary  Redundant copy of the slow task and take output from whichever task finished faster
  14. 14. Speculative Execution helps? 14 Yahoo Confidential & Proprietary Speculative Execution helps a lot BUT … 1. Both nodes could be slow 2. Two attempts can still hit a slow datanode 3. Not all jobs can use speculative execution
  15. 15. How it used to work 15 Yahoo Confidential & Proprietary With 100s of users  At least 1 user who’s good at policing
  16. 16. AUTOMATION 16 Yahoo Confidential & Proprietary https://www.flickr.com/photos/antiuniverse/410462775
  17. 17. How it used to work 17 Yahoo Confidential & Proprietary
  18. 18. How it used to work 18 Yahoo Confidential & Proprietary
  19. 19. Identifying slow nodes 19 Yahoo Confidential & Proprietary Comparing performance
  20. 20. Identifying slow nodes 20 Yahoo Confidential & Proprietary Comparing performance
  21. 21. Identifying slow nodes 21 Yahoo Confidential & Proprietary Comparing performance
  22. 22. Identifying slow nodes 22 Yahoo Confidential & Proprietary Speculative Execution BEFORE: At runtime, users using it to workaround hitting the slow nodes HERE: A day(hours) later, using the logs to identify the slow nodes.
  23. 23. task1 attempt0 NodeA Output NodeB NodeX NodeY TIME task1 attempt1 0:00 10:00 20:00 Speculative Execution Again 23 Yahoo Confidential & Proprietary
  24. 24. task1 attempt0 NodeA Output NodeB NodeX NodeY TIME task1 attempt1 0:00 10:00 20:00 KILLEDLOSE['NodeA'] += 1 Speculative Execution Again 24 Yahoo Confidential & Proprietary
  25. 25. task1 attempt0 NodeA Output NodeB NodeX NodeY TIME task1 attempt1 0:00 10:00 20:00 KILLEDLOSE['NodeA'] += 1 WIN['NodeB'] += 1 Speculative Execution Again 25 Yahoo Confidential & Proprietary
  26. 26. JobHistory Log 26 Yahoo Confidential & Proprietary {"type":"MAP_ATTEMPT_STARTED","event”:... "attemptId":"attempt_1399615563645_308371_m_000000 _0","startTime":1400522576570,"trackerName":"gsbl317 47.blue.ygrid.yahoo.com",… {"type":"MAP_ATTEMPT_FINISHED","event,"taskType":"MA P","taskStatus":"SUCCEEDED","mapFinishTime":140052 2582385
  27. 27. Pig Job analyzing the job history 27 Yahoo Confidential & Proprietary A = LOAD 'starling.starling_task_attempts' USING org.apache.hcatalog.pig.HCatLoader(); B = FILTER A by dt >= '$STARTDATE'; describe B; C = FOREACH B generate grid,dt,task_id,task_attempt_id,type,host_name,status,start_ts,shuffle_time,sort_time,finish_time; D = FILTER C by type == 'MAP' or type == 'REDUCE'; ATTEMPT0 = FILTER D by LAST_INDEX_OF(task_attempt_id, '_0') == (SIZE(task_attempt_id) - 2); ATTEMPT1 = FILTER D by LAST_INDEX_OF(task_attempt_id, '_1') == (SIZE(task_attempt_id) - 2); -- This would filter out any task that had only 1 attempt TaskWithAtLeastTwoAttempts = join ATTEMPT0 by task_id, ATTEMPT1 by task_id; -- For simplicity, I am only looking at task that had second task attempt successful TaskWith2ndAttemptSuccess = filter TaskWithAtLeastTwoAttempts by (ATTEMPT1::status == 'SUCCESS' or ATTEMPT1::status == 'SUCCEEDED') and ATTEMPT0::status == 'KILLED' and ATTEMPT0::start_ts + ATTEMPT0::shuffle_time + ATTEMPT0::sort_time + ATTEMPT0::finish_time > ATTEMPT1::start_ts; -- Counting number of first attempt fail and kill event for each node FirstFailedKilledAttempt = FOREACH TaskWith2ndAttemptSuccess generate ATTEMPT0::grid, ATTEMPT0::host_name, ATTEMPT0::dt; ; FirstFailedKilledAttempt2 = GROUP FirstFailedKilledAttempt by (grid,host_name, dt); FirstFailedKilledAttempt3 = FOREACH FirstFailedKilledAttempt2 { generate group.grid, group.host_name, group.dt, COUNT(FirstFailedKilledAttempt) as firstFailedCounts; } -- Only counting number of failure gave too much false positive. Counting how many times each node won. SecondSuccessAttempt = FOREACH TaskWith2ndAttemptSuccess generate ATTEMPT1::grid, ATTEMPT1::host_name, ATTEMPT1::dt; SecondSuccessAttempt2 = GROUP SecondSuccessAttempt by (grid,host_name, dt); SecondSuccessAttempt3 = FOREACH SecondSuccessAttempt2 generate group.grid, group.host_name, group.dt, COUNT(SecondSuccessAttempt) as secondSuccessfulCounts; GridNodeSuccessFailedCounts = join FirstFailedKilledAttempt3 by (grid,host_name,dt) left outer, SecondSuccessAttempt3 by (grid,host_name,dt); GridNodeSuccessFailedCounts2 = FILTER GridNodeSuccessFailedCounts by firstFailedCounts > 50 and firstFailedCounts > (secondSuccessfulCounts is null ? 0 : secondSuccessfulCounts ) * 4;
  28. 28. Pig Job analyzing the job history 28 Yahoo Confidential & Proprietary For Any tasks with attempt0 “KILLED” and attempt1“SUCCESS” && attempt0’s finishtime > attempt1’s starttime WIN[attempt1’s node] += 1 LOSE [attempt0’s node] +=1 Aggregate and print out any nodes with LOSE[‘node’] > WIN[‘node’] * 4 && Lose[‘node’] > 50
  29. 29. Result 29 Yahoo Confidential & Proprietary Extremely slow nodes came up with › Losing over 50 times and winning 0 or 1 time. › Report to ops if this happens 2 days in a row With mixed config&hardware cluster › Showed trend in one type of nodes winning over others
  30. 30. 30 Yahoo Confidential & Proprietary
  31. 31. Some items that turned out to be useful 31 Yahoo Confidential & Proprietary Identifying… Slow nodes Misconfigured nodes CPU wasting jobs HDFS wasting users Queue congestions
  32. 32. Misconfigured Nodes 32 Yahoo Confidential & Proprietary Misconfigured Nodes  Tasks repeatedly fail for some users&jobs Like Termites When users notice, it’s too late
  33. 33. Finding misconfigured nodes 33 Yahoo Confidential & Proprietary Modify previous slow node detection script For Any tasks with attempt0 “KILLED” and attempt1“SUCCESS” && attempt0’s finishtime > attempt1’s starttime
  34. 34. Finding misconfigured nodes 34 Yahoo Confidential & Proprietary Modify previous slow node detection script For Any tasks with attempt0 “KILLED” and attempt1“SUCCESS” && attempt0’s finishtime > attempt1’s starttime Aggregate per node fail count > 30 per day. Add first 4 attempts ID and error messages. FAILED
  35. 35. Results  Actively finding issues(but still manual) 35 Yahoo Confidential & Proprietary 1. Misconfigured nodes with job/error references 2. Detect regression in OS rolling upgrades › Users code failing › Some OS specific errors (disk/user-lookup/etc) 3. Detect partial network failures/slowness › Pair of nodes failing with Map fetch failures › Nodes failing on localizations 4. Detect Hadoop bug › Like Disk-Fail-In-Place bug with dist cache
  36. 36. Some items that turned out to be useful 36 Yahoo Confidential & Proprietary Identifying… Slow nodes Misconfigured nodes CPU wasting jobs HDFS wasting users Queue congestions
  37. 37. When cluster is bottlenecked on CPU… 37 Yahoo Confidential & Proprietary In 0.23 + CapacitySchduler Scheduling is based on memory Memory Limit enforced by NodeManager but not CPU Less important after Hadoop 2.X + CPU based/aware scheduling
  38. 38. Job & Task Counters 38 Yahoo Confidential & Proprietary
  39. 39. JobHistory Log 39 Yahoo Confidential & Proprietary For each task attempt {"name":"CPU_MILLISECONDS","displayName":"CPU time spent (ms)","value":826430} {"name":"GC_TIME_MILLIS","displayName":"GC time elapsed (ms)","value":38863},
  40. 40. Find possible jobs wasting CPU 40 Yahoo Confidential & Proprietary For each task attempt › CPU_TIME / attempt_time (0 ~ 20) [CPU_RATIO] › GC_TIME / attempt_time (0 ~ 1.0) [GC_RATIO] Aggregate per job and show › MAX_CPU_RATIO, MAX_GC_RATIO, AVG_CPU_RATIO, AVG_GC_RATIO Also, collecting percentage per day per job › resources(Mbytes)% › CPU_TIME%
  41. 41. Results 41 Yahoo Confidential & Proprietary Able to reach out to users wasting CPU › Job having a task taking 10-20 times of cpu time › Job using __% of resources but __% of cpu time For one job with 85% gc time, 3 times speedup with ParallelGC  UseSerialGC (50% gc time) Another +25% with G1GC but with more CPU time.
  42. 42. Some items that turned out to be useful 42 Yahoo Confidential & Proprietary Identifying… Slow nodes Misconfigured nodes CPU wasting jobs HDFS wasting users Queue congestions
  43. 43. Limited HDFS Space 43 Yahoo Confidential & Proprietary  HDFS Quota has significantly reduced the amount of abuse cases  But still seeing “HDFS almost full! Please delete” broadcast email once in a while.
  44. 44. Space to look for 44 Yahoo Confidential & Proprietary 1. Large directory that hasn’t changed 2. Large directory that suddenly increased 3. Large directory that hasn’t been accessed 4. Large directory not compressed
  45. 45. Space to look for 45 Yahoo Confidential & Proprietary 1. Large directory that hasn’t changed 2. Large directory that suddenly increased 3. Large directory that hasn’t been accessed 4. Large directory not compressed
  46. 46. Space to look for 46 Yahoo Confidential & Proprietary 1. Large directory that hasn’t changed 2. Large directory that suddenly increased Save the following result daily hdfs dfs –count /user/* /projects/*/*  Take a diff from __ days back.
  47. 47. 4. Large directory not compressed 47 Yahoo Confidential & Proprietary Too big to search and read the entire hdfs. Need to cut down on search space Interested in data created daily/hourly/etc
  48. 48. 4. Large directory not compressed 48 Yahoo Confidential & Proprietary listdir=(/) while(listdir not empty) dir = listdir.pop if (dir.size() < 5TBytes) {skip/continue} if( dir has #subdirs with timestamp > 7) { pick one large file from recent timestamp subdir hcat $file | head –bytes 10MB | gzip –c | wc --bytes } else { push all subdirs to listdir }
  49. 49. 4. Large directory not compressed 49 Yahoo Confidential & Proprietary DIRNAME: /projects/DDD/d1/d2/d3/d4 DIRSIZE: 77,912,005,675,237 (~70TB) CLUSTER: mycluster-tan Username: ddd_aa Compression Ratio: 12.6718 Sample File: /projects/DDD/d1/d2/d3/d4 /2014051405/part-m-00000 Sample Filesize: 134,217,852 Couple of hours in sequential script per cluster
  50. 50. Results 50 Yahoo Confidential & Proprietary By periodically collecting the hdfs usage and compression state Identify stale dirs Identify suddenly increasing dirs Identify not compressed dirs
  51. 51. Some items that turned out to be useful 51 Yahoo Confidential & Proprietary Identifying… Slow nodes Misconfigured nodes CPU wasting jobs HDFS wasting users Queue congestions
  52. 52. Why did my tiny job take hours yesterday? 52 Yahoo Confidential & Proprietary Bug in users’ code Queue full ? Cluster full ? If queue/cluster resource issue, what changed recently?
  53. 53. Needed a way to look back 53 Yahoo Confidential & Proprietary  Periodically save the output of % mapred job –list … JobId State StartTime UserName Queue Priority UsedContainers RsvdContainers UsedMem RsvdMem NeededMem AM info job_1400781790269_206630 RUNNING 1400867814129 user1 queue1 NORMAL 2 0 3072M 0M 3072M mycluster.___.com:8088/proxy/application_1400781790269_2066 30/ …
  54. 54. Needed a way to look back (2) 54 Yahoo Confidential & Proprietary
  55. 55. Results 55 Yahoo Confidential & Proprietary Users can look back and see if the jobs hang due to queue/cluster contention. Saving ‘mapred job –list’ outputs let me go back and check the individual jobs
  56. 56. What’s covered 56 Yahoo Confidential & Proprietary Identifying… Slow nodes Misconfigured nodes CPU wasting jobs HDFS wasting users Queue congestions
  57. 57. Thank You @kojinoguchi We are hiring! Stop by Kiosk P9 or reach out to us at bigdata@yahoo-inc.com.

×