More Related Content
Similar to Hadoop operations (20)
More from DataWorks Summit (20)
Hadoop operations
- 2. Agenda
Who
What (Definitions)
Decisions for Now
Decisions for Later
Lessons Learned
APOLLO GROUP © 2012 Apollo Group 2
- 4. Who is Apollo?
Apollo Group is a leading provider of higher
education programs for working adults.
APOLLO GROUP © 2012 Apollo Group 4
- 5. Who is Michael Arnold?
Systems Administrator
Automation geek
13 years in IT
I deal with:
–Server hardware specification/configuration
–Server firmware
–Server operating system
–Hadoop application health
–Monitoring all the above
APOLLO GROUP © 2012 Apollo Group 5
- 6. APOLLO GROUP
What
Definitions
APOLLO GROUP Apollo Group
© 2012 6
- 7. Definitions
Q: What is a tiny/small/medium/large cluster?
A:
–Tiny: 1-9
–Small: 10-99
–Medium: 100-999
–Large: 1000+
–Yahoo-sized: 4000
APOLLO GROUP © 2012 Apollo Group 7
- 8. Definitions
Q: What is a “headnode”?
A: A server that runs one or more of the following
Hadoop processes:
–NameNode
–JobTracker
–Secondary NameNode
–ZooKeeper
–HBase Master
APOLLO GROUP © 2012 Apollo Group 8
- 9. APOLLO GROUP
What decisions should you
make now and which can
you postpone for later?
Decisions for Now
APOLLO GROUP Apollo Group
© 2012 9
- 10. Which Hadoop distribution?
Amazon
Apache
Cloudera
Greenplum
Hortonworks
IBM
MapR
Platform Computing
APOLLO GROUP © 2012 Apollo Group 10
- 11. Should you virtualize?
Can be OK for small clusters BUT
–virtualization adds overhead
–can cause performance degradation
–cannot take advantage of Hadoop rack locality
Virtualization can be good for:
–functional testing of M/R job or workflow changes
–evaluation of Hadoop upgrades
APOLLO GROUP © 2012 Apollo Group 11
- 12. What sort of hardware should you be
considering?
Inexpensive
Not “enterprisey” hardware
–No RAID*
–No Redundant power*
Low power consumption
No optical drives
–get systems that can boot off the network
* except in headnodes
APOLLO GROUP © 2012 Apollo Group 12
- 13. Plan for capacity expansion
Start at the bottom and
work your way up
Leave room in your
cabinets for more
machines
APOLLO GROUP © 2012 Apollo Group 13
- 14. Plan for capacity expansion (cont.)
Deploy your initial
cluster in two cabinets
–One headnode, one
switch, and several
(five) datanodes per
cabinet
APOLLO GROUP © 2012 Apollo Group 14
- 15. Plan for capacity expansion (cont.)
Install a second cluster
in the empty space in
the upper half of the
cabinet
APOLLO GROUP © 2012 Apollo Group 15
- 16. APOLLO GROUP
What decisions should you
make now and which can
you postpone for later?
Decisions for Later
APOLLO GROUP Apollo Group
© 2012 16
- 17. What size cluster?
Depends upon your:
Budget
Data size
Workload characteristics
SLA
APOLLO GROUP © 2012 Apollo Group 17
- 18. What size cluster? (cont.)
Are your MapReduce jobs:
compute-intensive?
reading lots of data?
http://www.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/
APOLLO GROUP © 2012 Apollo Group 18
- 19. Should you implement rack awareness?
If more than one switch in the cluster:
YES
APOLLO GROUP © 2012 Apollo Group 19
- 20. Should you use automation?
If not in the beginning, then as soon as
possible.
Boot disks will fail.
Automated OS and application installs:
–Save time
–Reduce errors
•Cobbler/Spacewalk/Foreman/xCat/etc
•Puppet/Chef/Cfengine/shell scripts/etc
APOLLO GROUP © 2012 Apollo Group 20
- 21. APOLLO GROUP
Lessons Learned
APOLLO GROUP Apollo Group
© 2012 21
- 22. Keep It Simple
Don't add redundancy and features
(server/network) that will make things more
complicated and expensive.
Hadoop has built-in redundancies.
Don't overlook them.
APOLLO GROUP © 2012 Apollo Group 22
- 23. Automate the Hardware
Twelve hours of manual work in the datacenter is
not fun.
Make sure all server firmware is configured
identically.
–HP SmartStart Scripting Toolkit
–Dell OpenManage Deployment Toolkit
–IBM ServerGuide Scripting Toolkit
APOLLO GROUP © 2012 Apollo Group 23
- 24. Rolling upgrades are possible
(Just not of the Hadoop software.)
Datanodes can be decommissioned, patched, and
added back into the cluster without service
downtime.
APOLLO GROUP © 2012 Apollo Group 24
- 25. The smallest thing can have a big impact on the
cluster
Bad NIC/switchport can cause cluster slowness.
Slow disks can cause intermittent job slowdowns.
APOLLO GROUP © 2012 Apollo Group 25
- 26. HDFS blocks are weird
On ext3/ext4:
–Small blocks are not padded to the HDFS block-
size, but rather the actual size of the data.
–Each HDFS block is actually two files on the
datanode's filesystem:
•The actual data and
•A metadata/checksum file
# ls -l blk_1058778885645824207*
-rw-r--r-- 1 hdfs hdfs 35094 May 14 01:26 blk_1058778885645824207
-rw-r--r-- 1 hdfs hdfs 283 May 14 01:26 blk_1058778885645824207_19155994.meta
APOLLO GROUP © 2012 Apollo Group 26
- 27. Do not prematurely optimize
Be careful tuning your datanode filesystems.
• mkfs -t ext4 -T largefile4 ... (probably bad)
• mkfs -t ext4 -i 131072 -m 0 ... (better)
/etc/mke2fs.conf
[fs_types]
hadoop = {
features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,
extra_isize
inode_ratio = 131072
blocksize = -1
reserved_ratio = 0
default_mntopts = acl,user_xattr
}
APOLLO GROUP © 2012 Apollo Group 27
- 28. Use DNS-friendly names for services
hdfs://hdfs.delta.hadoop.apollogrp.edu:8020/
mapred.delta.hadoop.apollogrp.edu:8021
http://oozie.delta.hadoop.apollogrp.edu:11000/
hiveserver.delta.hadoop.apollogrp.edu:10000
Yes, the names are long, but I bet you can figure out how to
connect to Bravo Cluster.
APOLLO GROUP © 2012 Apollo Group 29
- 29. Use a parallel, remote execution tool
pdsh/Cluster SSH/mussh/etc
SSH in a for loop is so 2010
FUNC/MCollective
APOLLO GROUP © 2012 Apollo Group 30
- 30. Make your log directories as large as you can.
20-100GB /var/log
–Implement log purging cronjobs or your log
directories will fill up.
Beware: M/R jobs can fill up /tmp as well.
APOLLO GROUP © 2012 Apollo Group 31
- 31. Insist on IPMI 2.0 for out of band management of
server hardware.
Serial Over LAN is awesome when booting a
system.
Standardized hardware/temperature monitoring.
Simple remote power control.
APOLLO GROUP © 2012 Apollo Group 33
- 32. Spanning-tree is the devil
Enable portfast on your server switch ports or the
BMCs may never get a DHCP lease.
APOLLO GROUP © 2012 Apollo Group 34
- 33. Apollo has re-built it's cluster four times.
You may end up doing so as well.
APOLLO GROUP © 2012 Apollo Group 35
- 34. Apollo Timeline
First build
Cloudera Professional Services helped install CDH
Four nodes
Manually build OS via USB CDROM.
CDH2
APOLLO GROUP © 2012 Apollo Group 36
- 35. Apollo Timeline
Second build
Cobbler
All software deployment is via kickstart. Very little
is in puppet. Config files are deployed via wget.
CDH2
APOLLO GROUP © 2012 Apollo Group 37
- 36. Apollo Timeline
Third build
OS filesystem partitioning needed to change.
Most software deployment still via kickstart.
CDH3b2
APOLLO GROUP © 2012 Apollo Group 38
- 37. Apollo Timeline
Fourth build
HDFS filesystem inodes needed to be increased.
Full puppet automation.
Added redundant/hotswap enterprise hardware for
headnodes.
CDH3u1
APOLLO GROUP © 2012 Apollo Group 39
- 38. Cluster failures at Apollo
Hardware
–disk failures (40+)
–disk cabling (6)
–RAM (2)
–switch port (1)
Software
–Cluster
•NFS (NN -> 2NN metadata)
–Job
•TT java heap
•Running out of /tmp or /var/log/hadoop
•Running out of HDFS space
APOLLO GROUP © 2012 Apollo Group 40
- 39. Know your workload
You can spend all the time in the world trying to get
the best CPU/RAM/HDD/switch/cabinet
configuration, but you are running on pure luck
until you understand your cluster's workload.
APOLLO GROUP © 2012 Apollo Group 41
- 40. APOLLO GROUP
Questions?
APOLLO GROUP Apollo Group
© 2012 42