Hadoop operations


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Hadoop operations

  1. APOLLO GROUPHadoop Operations: Starting Out SmallSo Your Cluster Isnt Yahoo-sized (yet)Michael ArnoldPrincipal Systems Engineer14 June 2012
  2. Agenda Who What (Definitions) Decisions for Now Decisions for Later Lessons LearnedAPOLLO GROUP © 2012 Apollo Group 2
  3. APOLLO GROUP WhoAPOLLO GROUP Apollo Group © 2012 3
  4. Who is Apollo? Apollo Group is a leading provider of higher education programs for working adults.APOLLO GROUP © 2012 Apollo Group 4
  5. Who is Michael Arnold? Systems Administrator Automation geek 13 years in IT I deal with: –Server hardware specification/configuration –Server firmware –Server operating system –Hadoop application health –Monitoring all the aboveAPOLLO GROUP © 2012 Apollo Group 5
  6. APOLLO GROUP What DefinitionsAPOLLO GROUP Apollo Group © 2012 6
  7. Definitions Q: What is a tiny/small/medium/large cluster? A: –Tiny: 1-9 –Small: 10-99 –Medium: 100-999 –Large: 1000+ –Yahoo-sized: 4000APOLLO GROUP © 2012 Apollo Group 7
  8. Definitions Q: What is a “headnode”? A: A server that runs one or more of the following Hadoop processes: –NameNode –JobTracker –Secondary NameNode –ZooKeeper –HBase MasterAPOLLO GROUP © 2012 Apollo Group 8
  9. APOLLO GROUP What decisions should you make now and which can you postpone for later? Decisions for NowAPOLLO GROUP Apollo Group © 2012 9
  10. Which Hadoop distribution? Amazon Apache Cloudera Greenplum Hortonworks IBM MapR Platform ComputingAPOLLO GROUP © 2012 Apollo Group 10
  11. Should you virtualize? Can be OK for small clusters BUT –virtualization adds overhead –can cause performance degradation –cannot take advantage of Hadoop rack locality Virtualization can be good for: –functional testing of M/R job or workflow changes –evaluation of Hadoop upgradesAPOLLO GROUP © 2012 Apollo Group 11
  12. What sort of hardware should you be considering? Inexpensive Not “enterprisey” hardware –No RAID* –No Redundant power* Low power consumption No optical drives –get systems that can boot off the network * except in headnodesAPOLLO GROUP © 2012 Apollo Group 12
  13. Plan for capacity expansion Start at the bottom and work your way up Leave room in your cabinets for more machinesAPOLLO GROUP © 2012 Apollo Group 13
  14. Plan for capacity expansion (cont.) Deploy your initial cluster in two cabinets –One headnode, one switch, and several (five) datanodes per cabinetAPOLLO GROUP © 2012 Apollo Group 14
  15. Plan for capacity expansion (cont.) Install a second cluster in the empty space in the upper half of the cabinetAPOLLO GROUP © 2012 Apollo Group 15
  16. APOLLO GROUP What decisions should you make now and which can you postpone for later? Decisions for LaterAPOLLO GROUP Apollo Group © 2012 16
  17. What size cluster? Depends upon your: Budget Data size Workload characteristics SLAAPOLLO GROUP © 2012 Apollo Group 17
  18. What size cluster? (cont.) Are your MapReduce jobs: compute-intensive? reading lots of data? http://www.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/APOLLO GROUP © 2012 Apollo Group 18
  19. Should you implement rack awareness? If more than one switch in the cluster: YESAPOLLO GROUP © 2012 Apollo Group 19
  20. Should you use automation? If not in the beginning, then as soon as possible. Boot disks will fail. Automated OS and application installs: –Save time –Reduce errors •Cobbler/Spacewalk/Foreman/xCat/etc •Puppet/Chef/Cfengine/shell scripts/etcAPOLLO GROUP © 2012 Apollo Group 20
  21. APOLLO GROUP Lessons LearnedAPOLLO GROUP Apollo Group © 2012 21
  22. Keep It Simple Dont add redundancy and features (server/network) that will make things more complicated and expensive. Hadoop has built-in redundancies. Dont overlook them.APOLLO GROUP © 2012 Apollo Group 22
  23. Automate the Hardware Twelve hours of manual work in the datacenter is not fun. Make sure all server firmware is configured identically. –HP SmartStart Scripting Toolkit –Dell OpenManage Deployment Toolkit –IBM ServerGuide Scripting ToolkitAPOLLO GROUP © 2012 Apollo Group 23
  24. Rolling upgrades are possible (Just not of the Hadoop software.) Datanodes can be decommissioned, patched, and added back into the cluster without service downtime.APOLLO GROUP © 2012 Apollo Group 24
  25. The smallest thing can have a big impact on the cluster Bad NIC/switchport can cause cluster slowness. Slow disks can cause intermittent job slowdowns.APOLLO GROUP © 2012 Apollo Group 25
  26. HDFS blocks are weird On ext3/ext4: –Small blocks are not padded to the HDFS block- size, but rather the actual size of the data. –Each HDFS block is actually two files on the datanodes filesystem: •The actual data and •A metadata/checksum file # ls -l blk_1058778885645824207* -rw-r--r-- 1 hdfs hdfs 35094 May 14 01:26 blk_1058778885645824207 -rw-r--r-- 1 hdfs hdfs 283 May 14 01:26 blk_1058778885645824207_19155994.metaAPOLLO GROUP © 2012 Apollo Group 26
  27. Do not prematurely optimize Be careful tuning your datanode filesystems. • mkfs -t ext4 -T largefile4 ... (probably bad) • mkfs -t ext4 -i 131072 -m 0 ... (better) /etc/mke2fs.conf [fs_types] hadoop = { features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink, extra_isize inode_ratio = 131072 blocksize = -1 reserved_ratio = 0 default_mntopts = acl,user_xattr }APOLLO GROUP © 2012 Apollo Group 27
  28. Use DNS-friendly names for services hdfs://hdfs.delta.hadoop.apollogrp.edu:8020/ mapred.delta.hadoop.apollogrp.edu:8021 http://oozie.delta.hadoop.apollogrp.edu:11000/ hiveserver.delta.hadoop.apollogrp.edu:10000 Yes, the names are long, but I bet you can figure out how to connect to Bravo Cluster.APOLLO GROUP © 2012 Apollo Group 29
  29. Use a parallel, remote execution tool pdsh/Cluster SSH/mussh/etc SSH in a for loop is so 2010 FUNC/MCollectiveAPOLLO GROUP © 2012 Apollo Group 30
  30. Make your log directories as large as you can. 20-100GB /var/log –Implement log purging cronjobs or your log directories will fill up. Beware: M/R jobs can fill up /tmp as well.APOLLO GROUP © 2012 Apollo Group 31
  31. Insist on IPMI 2.0 for out of band management of server hardware. Serial Over LAN is awesome when booting a system. Standardized hardware/temperature monitoring. Simple remote power control.APOLLO GROUP © 2012 Apollo Group 33
  32. Spanning-tree is the devil Enable portfast on your server switch ports or the BMCs may never get a DHCP lease.APOLLO GROUP © 2012 Apollo Group 34
  33. Apollo has re-built its cluster four times. You may end up doing so as well.APOLLO GROUP © 2012 Apollo Group 35
  34. Apollo Timeline First build Cloudera Professional Services helped install CDH Four nodes Manually build OS via USB CDROM. CDH2APOLLO GROUP © 2012 Apollo Group 36
  35. Apollo Timeline Second build Cobbler All software deployment is via kickstart. Very little is in puppet. Config files are deployed via wget. CDH2APOLLO GROUP © 2012 Apollo Group 37
  36. Apollo Timeline Third build OS filesystem partitioning needed to change. Most software deployment still via kickstart. CDH3b2APOLLO GROUP © 2012 Apollo Group 38
  37. Apollo Timeline Fourth build HDFS filesystem inodes needed to be increased. Full puppet automation. Added redundant/hotswap enterprise hardware for headnodes. CDH3u1APOLLO GROUP © 2012 Apollo Group 39
  38. Cluster failures at Apollo Hardware –disk failures (40+) –disk cabling (6) –RAM (2) –switch port (1) Software –Cluster •NFS (NN -> 2NN metadata) –Job •TT java heap •Running out of /tmp or /var/log/hadoop •Running out of HDFS spaceAPOLLO GROUP © 2012 Apollo Group 40
  39. Know your workload You can spend all the time in the world trying to get the best CPU/RAM/HDD/switch/cabinet configuration, but you are running on pure luck until you understand your clusters workload.APOLLO GROUP © 2012 Apollo Group 41
  40. APOLLO GROUP Questions?APOLLO GROUP Apollo Group © 2012 42