Hadoop operations


Published on

Published in: Technology

Hadoop operations

  1. 1. APOLLO GROUPHadoop Operations: Starting Out SmallSo Your Cluster Isnt Yahoo-sized (yet)Michael ArnoldPrincipal Systems Engineer14 June 2012
  2. 2. Agenda Who What (Definitions) Decisions for Now Decisions for Later Lessons LearnedAPOLLO GROUP © 2012 Apollo Group 2
  3. 3. APOLLO GROUP WhoAPOLLO GROUP Apollo Group © 2012 3
  4. 4. Who is Apollo? Apollo Group is a leading provider of higher education programs for working adults.APOLLO GROUP © 2012 Apollo Group 4
  5. 5. Who is Michael Arnold? Systems Administrator Automation geek 13 years in IT I deal with: –Server hardware specification/configuration –Server firmware –Server operating system –Hadoop application health –Monitoring all the aboveAPOLLO GROUP © 2012 Apollo Group 5
  6. 6. APOLLO GROUP What DefinitionsAPOLLO GROUP Apollo Group © 2012 6
  7. 7. Definitions Q: What is a tiny/small/medium/large cluster? A: –Tiny: 1-9 –Small: 10-99 –Medium: 100-999 –Large: 1000+ –Yahoo-sized: 4000APOLLO GROUP © 2012 Apollo Group 7
  8. 8. Definitions Q: What is a “headnode”? A: A server that runs one or more of the following Hadoop processes: –NameNode –JobTracker –Secondary NameNode –ZooKeeper –HBase MasterAPOLLO GROUP © 2012 Apollo Group 8
  9. 9. APOLLO GROUP What decisions should you make now and which can you postpone for later? Decisions for NowAPOLLO GROUP Apollo Group © 2012 9
  10. 10. Which Hadoop distribution? Amazon Apache Cloudera Greenplum Hortonworks IBM MapR Platform ComputingAPOLLO GROUP © 2012 Apollo Group 10
  11. 11. Should you virtualize? Can be OK for small clusters BUT –virtualization adds overhead –can cause performance degradation –cannot take advantage of Hadoop rack locality Virtualization can be good for: –functional testing of M/R job or workflow changes –evaluation of Hadoop upgradesAPOLLO GROUP © 2012 Apollo Group 11
  12. 12. What sort of hardware should you be considering? Inexpensive Not “enterprisey” hardware –No RAID* –No Redundant power* Low power consumption No optical drives –get systems that can boot off the network * except in headnodesAPOLLO GROUP © 2012 Apollo Group 12
  13. 13. Plan for capacity expansion Start at the bottom and work your way up Leave room in your cabinets for more machinesAPOLLO GROUP © 2012 Apollo Group 13
  14. 14. Plan for capacity expansion (cont.) Deploy your initial cluster in two cabinets –One headnode, one switch, and several (five) datanodes per cabinetAPOLLO GROUP © 2012 Apollo Group 14
  15. 15. Plan for capacity expansion (cont.) Install a second cluster in the empty space in the upper half of the cabinetAPOLLO GROUP © 2012 Apollo Group 15
  16. 16. APOLLO GROUP What decisions should you make now and which can you postpone for later? Decisions for LaterAPOLLO GROUP Apollo Group © 2012 16
  17. 17. What size cluster? Depends upon your: Budget Data size Workload characteristics SLAAPOLLO GROUP © 2012 Apollo Group 17
  18. 18. What size cluster? (cont.) Are your MapReduce jobs: compute-intensive? reading lots of data? http://www.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/APOLLO GROUP © 2012 Apollo Group 18
  19. 19. Should you implement rack awareness? If more than one switch in the cluster: YESAPOLLO GROUP © 2012 Apollo Group 19
  20. 20. Should you use automation? If not in the beginning, then as soon as possible. Boot disks will fail. Automated OS and application installs: –Save time –Reduce errors •Cobbler/Spacewalk/Foreman/xCat/etc •Puppet/Chef/Cfengine/shell scripts/etcAPOLLO GROUP © 2012 Apollo Group 20
  21. 21. APOLLO GROUP Lessons LearnedAPOLLO GROUP Apollo Group © 2012 21
  22. 22. Keep It Simple Dont add redundancy and features (server/network) that will make things more complicated and expensive. Hadoop has built-in redundancies. Dont overlook them.APOLLO GROUP © 2012 Apollo Group 22
  23. 23. Automate the Hardware Twelve hours of manual work in the datacenter is not fun. Make sure all server firmware is configured identically. –HP SmartStart Scripting Toolkit –Dell OpenManage Deployment Toolkit –IBM ServerGuide Scripting ToolkitAPOLLO GROUP © 2012 Apollo Group 23
  24. 24. Rolling upgrades are possible (Just not of the Hadoop software.) Datanodes can be decommissioned, patched, and added back into the cluster without service downtime.APOLLO GROUP © 2012 Apollo Group 24
  25. 25. The smallest thing can have a big impact on the cluster Bad NIC/switchport can cause cluster slowness. Slow disks can cause intermittent job slowdowns.APOLLO GROUP © 2012 Apollo Group 25
  26. 26. HDFS blocks are weird On ext3/ext4: –Small blocks are not padded to the HDFS block- size, but rather the actual size of the data. –Each HDFS block is actually two files on the datanodes filesystem: •The actual data and •A metadata/checksum file # ls -l blk_1058778885645824207* -rw-r--r-- 1 hdfs hdfs 35094 May 14 01:26 blk_1058778885645824207 -rw-r--r-- 1 hdfs hdfs 283 May 14 01:26 blk_1058778885645824207_19155994.metaAPOLLO GROUP © 2012 Apollo Group 26
  27. 27. Do not prematurely optimize Be careful tuning your datanode filesystems. • mkfs -t ext4 -T largefile4 ... (probably bad) • mkfs -t ext4 -i 131072 -m 0 ... (better) /etc/mke2fs.conf [fs_types] hadoop = { features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink, extra_isize inode_ratio = 131072 blocksize = -1 reserved_ratio = 0 default_mntopts = acl,user_xattr }APOLLO GROUP © 2012 Apollo Group 27
  28. 28. Use DNS-friendly names for services hdfs://hdfs.delta.hadoop.apollogrp.edu:8020/ mapred.delta.hadoop.apollogrp.edu:8021 http://oozie.delta.hadoop.apollogrp.edu:11000/ hiveserver.delta.hadoop.apollogrp.edu:10000 Yes, the names are long, but I bet you can figure out how to connect to Bravo Cluster.APOLLO GROUP © 2012 Apollo Group 29
  29. 29. Use a parallel, remote execution tool pdsh/Cluster SSH/mussh/etc SSH in a for loop is so 2010 FUNC/MCollectiveAPOLLO GROUP © 2012 Apollo Group 30
  30. 30. Make your log directories as large as you can. 20-100GB /var/log –Implement log purging cronjobs or your log directories will fill up. Beware: M/R jobs can fill up /tmp as well.APOLLO GROUP © 2012 Apollo Group 31
  31. 31. Insist on IPMI 2.0 for out of band management of server hardware. Serial Over LAN is awesome when booting a system. Standardized hardware/temperature monitoring. Simple remote power control.APOLLO GROUP © 2012 Apollo Group 33
  32. 32. Spanning-tree is the devil Enable portfast on your server switch ports or the BMCs may never get a DHCP lease.APOLLO GROUP © 2012 Apollo Group 34
  33. 33. Apollo has re-built its cluster four times. You may end up doing so as well.APOLLO GROUP © 2012 Apollo Group 35
  34. 34. Apollo Timeline First build Cloudera Professional Services helped install CDH Four nodes Manually build OS via USB CDROM. CDH2APOLLO GROUP © 2012 Apollo Group 36
  35. 35. Apollo Timeline Second build Cobbler All software deployment is via kickstart. Very little is in puppet. Config files are deployed via wget. CDH2APOLLO GROUP © 2012 Apollo Group 37
  36. 36. Apollo Timeline Third build OS filesystem partitioning needed to change. Most software deployment still via kickstart. CDH3b2APOLLO GROUP © 2012 Apollo Group 38
  37. 37. Apollo Timeline Fourth build HDFS filesystem inodes needed to be increased. Full puppet automation. Added redundant/hotswap enterprise hardware for headnodes. CDH3u1APOLLO GROUP © 2012 Apollo Group 39
  38. 38. Cluster failures at Apollo Hardware –disk failures (40+) –disk cabling (6) –RAM (2) –switch port (1) Software –Cluster •NFS (NN -> 2NN metadata) –Job •TT java heap •Running out of /tmp or /var/log/hadoop •Running out of HDFS spaceAPOLLO GROUP © 2012 Apollo Group 40
  39. 39. Know your workload You can spend all the time in the world trying to get the best CPU/RAM/HDD/switch/cabinet configuration, but you are running on pure luck until you understand your clusters workload.APOLLO GROUP © 2012 Apollo Group 41
  40. 40. APOLLO GROUP Questions?APOLLO GROUP Apollo Group © 2012 42
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.