APOLLO GROUPHadoop Operations: Starting Out SmallSo Your Cluster Isnt Yahoo-sized (yet)Michael ArnoldPrincipal Systems Eng...
Agenda  Who  What (Definitions)  Decisions for Now  Decisions for Later  Lessons LearnedAPOLLO GROUP             © 20...
APOLLO GROUP  WhoAPOLLO GROUP Apollo Group          © 2012            3
Who is Apollo?        Apollo Group is a leading provider of higher          education programs for working adults.APOLLO G...
Who is Michael Arnold?  Systems Administrator  Automation geek  13 years in IT  I deal with:      –Server hardware spe...
APOLLO GROUP  What  DefinitionsAPOLLO GROUP Apollo Group          © 2012            6
Definitions  Q: What is a tiny/small/medium/large cluster?  A:      –Tiny:          1-9      –Small:         10-99      –M...
Definitions  Q: What is a “headnode”?  A: A server that runs one or more of the following   Hadoop processes:      –NameNo...
APOLLO GROUP  What decisions should you  make now and which can  you postpone for later?  Decisions for NowAPOLLO GROUP Ap...
Which Hadoop distribution?  Amazon  Apache  Cloudera  Greenplum  Hortonworks  IBM  MapR  Platform ComputingAPOLLO ...
Should you virtualize?  Can be OK for small clusters BUT      –virtualization adds overhead      –can cause performance d...
What sort of hardware should you be                                      considering?  Inexpensive  Not “enterprisey” ha...
Plan for capacity expansion  Start at the bottom and   work your way up  Leave room in your   cabinets for more   machin...
Plan for capacity expansion (cont.)  Deploy your initial   cluster in two cabinets     –One headnode, one      switch, an...
Plan for capacity expansion (cont.)  Install a second cluster   in the empty space in   the upper half of the   cabinetAP...
APOLLO GROUP  What decisions should you  make now and which can  you postpone for later?  Decisions for LaterAPOLLO GROUP ...
What size cluster?  Depends upon your:  Budget  Data size  Workload characteristics  SLAAPOLLO GROUP           © 2012 ...
What size cluster? (cont.)  Are your MapReduce jobs:  compute-intensive?  reading lots of data?  http://www.cloudera.com...
Should you implement rack awareness?        If more than one switch in the cluster:                           YESAPOLLO GR...
Should you use automation?       If not in the beginning, then as soon as                        possible.  Boot disks wi...
APOLLO GROUP  Lessons LearnedAPOLLO GROUP Apollo Group          © 2012            21
Keep It Simple            Dont add redundancy and features         (server/network) that will make things more            ...
Automate the Hardware  Twelve hours of manual work in the datacenter is   not fun.  Make sure all server firmware is con...
Rolling upgrades are possible               (Just not of the Hadoop software.)   Datanodes can be decommissioned, patched,...
The smallest thing can have a big impact on the                                             cluster  Bad NIC/switchport c...
HDFS blocks are weird  On ext3/ext4:      –Small blocks are not padded to the HDFS block-       size, but rather the actu...
Do not prematurely optimize  Be careful tuning your datanode filesystems.      • mkfs -t ext4 -T largefile4 ... (probably...
Use DNS-friendly names for services       hdfs://hdfs.delta.hadoop.apollogrp.edu:8020/         mapred.delta.hadoop.apollog...
Use a parallel, remote execution tool  pdsh/Cluster SSH/mussh/etc                 SSH in a for loop is so 2010  FUNC/MCo...
Make your log directories as large as you can.  20-100GB /var/log      –Implement log purging cronjobs or your log       ...
Insist on IPMI 2.0 for out of band management of                                     server hardware.  Serial Over LAN is...
Spanning-tree is the devil  Enable portfast on your server switch ports or the   BMCs may never get a DHCP lease.APOLLO GR...
Apollo has re-built its cluster four times.               You may end up doing so as well.APOLLO GROUP               © 201...
Apollo Timeline  First build  Cloudera Professional Services helped install CDH  Four nodes  Manually build OS via USB ...
Apollo Timeline  Second build  Cobbler  All software deployment is via kickstart. Very little   is in puppet. Config fil...
Apollo Timeline  Third build  OS filesystem partitioning needed to change.  Most software deployment still via kickstart...
Apollo Timeline  Fourth build  HDFS filesystem inodes needed to be increased.  Full puppet automation.  Added redundant...
Cluster failures at Apollo  Hardware      –disk failures (40+)      –disk cabling (6)      –RAM (2)      –switch port (1)...
Know your workload  You can spend all the time in the world trying to get   the best CPU/RAM/HDD/switch/cabinet   configur...
APOLLO GROUP  Questions?APOLLO GROUP Apollo Group          © 2012            42
Upcoming SlideShare
Loading in …5
×

Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)

2,811 views
2,681 views

Published on

Hadoop Summit 2012 - Deployment and Operations track
Everyone hears about large clusters with thousands of machines and petabytes of storage yet not everyone starts their first Hadoop deployment with dozens of cabinets of equipment. What do you do when you don`t have quite as large of a deployment? What decisions should you make now and which should you postpone for later? This session is for SysAdmins that have not yet or just recently jumped into the Hadoop fray. You will be presented with the knowledge gained from two years of operational experience at a (currently) small Hadoop site. We will discuss things that are initially important for a small (10-100 node) cluster and what happens when you outgrow your first deployment.

Published in: Technology
1 Comment
3 Likes
Statistics
Notes
  • you can spend all the time in the world trying to get the best * ,but you are running on pure luck until you understand your cluster's workload
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
2,811
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
49
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

Hadoop Operations: Starting Out Small / So Your Cluster Isn't Yahoo-sized (yet)

  1. 1. APOLLO GROUPHadoop Operations: Starting Out SmallSo Your Cluster Isnt Yahoo-sized (yet)Michael ArnoldPrincipal Systems Engineer14 June 2012
  2. 2. Agenda Who What (Definitions) Decisions for Now Decisions for Later Lessons LearnedAPOLLO GROUP © 2012 Apollo Group 2
  3. 3. APOLLO GROUP WhoAPOLLO GROUP Apollo Group © 2012 3
  4. 4. Who is Apollo? Apollo Group is a leading provider of higher education programs for working adults.APOLLO GROUP © 2012 Apollo Group 4
  5. 5. Who is Michael Arnold? Systems Administrator Automation geek 13 years in IT I deal with: –Server hardware specification/configuration –Server firmware –Server operating system –Hadoop application health –Monitoring all the aboveAPOLLO GROUP © 2012 Apollo Group 5
  6. 6. APOLLO GROUP What DefinitionsAPOLLO GROUP Apollo Group © 2012 6
  7. 7. Definitions Q: What is a tiny/small/medium/large cluster? A: –Tiny: 1-9 –Small: 10-99 –Medium: 100-999 –Large: 1000+ –Yahoo-sized: 4000APOLLO GROUP © 2012 Apollo Group 7
  8. 8. Definitions Q: What is a “headnode”? A: A server that runs one or more of the following Hadoop processes: –NameNode –JobTracker –Secondary NameNode –ZooKeeper –HBase MasterAPOLLO GROUP © 2012 Apollo Group 8
  9. 9. APOLLO GROUP What decisions should you make now and which can you postpone for later? Decisions for NowAPOLLO GROUP Apollo Group © 2012 9
  10. 10. Which Hadoop distribution? Amazon Apache Cloudera Greenplum Hortonworks IBM MapR Platform ComputingAPOLLO GROUP © 2012 Apollo Group 10
  11. 11. Should you virtualize? Can be OK for small clusters BUT –virtualization adds overhead –can cause performance degradation –cannot take advantage of Hadoop rack locality Virtualization can be good for: –functional testing of M/R job or workflow changes –evaluation of Hadoop upgradesAPOLLO GROUP © 2012 Apollo Group 11
  12. 12. What sort of hardware should you be considering? Inexpensive Not “enterprisey” hardware –No RAID* –No Redundant power* Low power consumption No optical drives –get systems that can boot off the network * except in headnodesAPOLLO GROUP © 2012 Apollo Group 12
  13. 13. Plan for capacity expansion Start at the bottom and work your way up Leave room in your cabinets for more machinesAPOLLO GROUP © 2012 Apollo Group 13
  14. 14. Plan for capacity expansion (cont.) Deploy your initial cluster in two cabinets –One headnode, one switch, and several (five) datanodes per cabinetAPOLLO GROUP © 2012 Apollo Group 14
  15. 15. Plan for capacity expansion (cont.) Install a second cluster in the empty space in the upper half of the cabinetAPOLLO GROUP © 2012 Apollo Group 15
  16. 16. APOLLO GROUP What decisions should you make now and which can you postpone for later? Decisions for LaterAPOLLO GROUP Apollo Group © 2012 16
  17. 17. What size cluster? Depends upon your: Budget Data size Workload characteristics SLAAPOLLO GROUP © 2012 Apollo Group 17
  18. 18. What size cluster? (cont.) Are your MapReduce jobs: compute-intensive? reading lots of data? http://www.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/APOLLO GROUP © 2012 Apollo Group 18
  19. 19. Should you implement rack awareness? If more than one switch in the cluster: YESAPOLLO GROUP © 2012 Apollo Group 19
  20. 20. Should you use automation? If not in the beginning, then as soon as possible. Boot disks will fail. Automated OS and application installs: –Save time –Reduce errors •Cobbler/Spacewalk/Foreman/xCat/etc •Puppet/Chef/Cfengine/shell scripts/etcAPOLLO GROUP © 2012 Apollo Group 20
  21. 21. APOLLO GROUP Lessons LearnedAPOLLO GROUP Apollo Group © 2012 21
  22. 22. Keep It Simple Dont add redundancy and features (server/network) that will make things more complicated and expensive. Hadoop has built-in redundancies. Dont overlook them.APOLLO GROUP © 2012 Apollo Group 22
  23. 23. Automate the Hardware Twelve hours of manual work in the datacenter is not fun. Make sure all server firmware is configured identically. –HP SmartStart Scripting Toolkit –Dell OpenManage Deployment Toolkit –IBM ServerGuide Scripting ToolkitAPOLLO GROUP © 2012 Apollo Group 23
  24. 24. Rolling upgrades are possible (Just not of the Hadoop software.) Datanodes can be decommissioned, patched, and added back into the cluster without service downtime.APOLLO GROUP © 2012 Apollo Group 24
  25. 25. The smallest thing can have a big impact on the cluster Bad NIC/switchport can cause cluster slowness. Slow disks can cause intermittent job slowdowns.APOLLO GROUP © 2012 Apollo Group 25
  26. 26. HDFS blocks are weird On ext3/ext4: –Small blocks are not padded to the HDFS block- size, but rather the actual size of the data. –Each HDFS block is actually two files on the datanodes filesystem: •The actual data and •A metadata/checksum file # ls -l blk_1058778885645824207* -rw-r--r-- 1 hdfs hdfs 35094 May 14 01:26 blk_1058778885645824207 -rw-r--r-- 1 hdfs hdfs 283 May 14 01:26 blk_1058778885645824207_19155994.metaAPOLLO GROUP © 2012 Apollo Group 26
  27. 27. Do not prematurely optimize Be careful tuning your datanode filesystems. • mkfs -t ext4 -T largefile4 ... (probably bad) • mkfs -t ext4 -i 131072 -m 0 ... (better) /etc/mke2fs.conf [fs_types] hadoop = { features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink, extra_isize inode_ratio = 131072 blocksize = -1 reserved_ratio = 0 default_mntopts = acl,user_xattr }APOLLO GROUP © 2012 Apollo Group 27
  28. 28. Use DNS-friendly names for services hdfs://hdfs.delta.hadoop.apollogrp.edu:8020/ mapred.delta.hadoop.apollogrp.edu:8021 http://oozie.delta.hadoop.apollogrp.edu:11000/ hiveserver.delta.hadoop.apollogrp.edu:10000 Yes, the names are long, but I bet you can figure out how to connect to Bravo Cluster.APOLLO GROUP © 2012 Apollo Group 29
  29. 29. Use a parallel, remote execution tool pdsh/Cluster SSH/mussh/etc SSH in a for loop is so 2010 FUNC/MCollectiveAPOLLO GROUP © 2012 Apollo Group 30
  30. 30. Make your log directories as large as you can. 20-100GB /var/log –Implement log purging cronjobs or your log directories will fill up. Beware: M/R jobs can fill up /tmp as well.APOLLO GROUP © 2012 Apollo Group 31
  31. 31. Insist on IPMI 2.0 for out of band management of server hardware. Serial Over LAN is awesome when booting a system. Standardized hardware/temperature monitoring. Simple remote power control.APOLLO GROUP © 2012 Apollo Group 33
  32. 32. Spanning-tree is the devil Enable portfast on your server switch ports or the BMCs may never get a DHCP lease.APOLLO GROUP © 2012 Apollo Group 34
  33. 33. Apollo has re-built its cluster four times. You may end up doing so as well.APOLLO GROUP © 2012 Apollo Group 35
  34. 34. Apollo Timeline First build Cloudera Professional Services helped install CDH Four nodes Manually build OS via USB CDROM. CDH2APOLLO GROUP © 2012 Apollo Group 36
  35. 35. Apollo Timeline Second build Cobbler All software deployment is via kickstart. Very little is in puppet. Config files are deployed via wget. CDH2APOLLO GROUP © 2012 Apollo Group 37
  36. 36. Apollo Timeline Third build OS filesystem partitioning needed to change. Most software deployment still via kickstart. CDH3b2APOLLO GROUP © 2012 Apollo Group 38
  37. 37. Apollo Timeline Fourth build HDFS filesystem inodes needed to be increased. Full puppet automation. Added redundant/hotswap enterprise hardware for headnodes. CDH3u1APOLLO GROUP © 2012 Apollo Group 39
  38. 38. Cluster failures at Apollo Hardware –disk failures (40+) –disk cabling (6) –RAM (2) –switch port (1) Software –Cluster •NFS (NN -> 2NN metadata) –Job •TT java heap •Running out of /tmp or /var/log/hadoop •Running out of HDFS spaceAPOLLO GROUP © 2012 Apollo Group 40
  39. 39. Know your workload You can spend all the time in the world trying to get the best CPU/RAM/HDD/switch/cabinet configuration, but you are running on pure luck until you understand your clusters workload.APOLLO GROUP © 2012 Apollo Group 41
  40. 40. APOLLO GROUP Questions?APOLLO GROUP Apollo Group © 2012 42

×