Managing a Large Hadoop Cluster


Jeff Hammerbacher
Manager, Data
May 28 - 29, 2008
Anatomy of the Facebook Cluster
Hardware
▪   Individual nodes
    ▪   CPU: Intel Xeon dual socket quad cores (8 cores per ...
Anatomy of the Facebook Cluster
Functional Separation
▪   Need to have test, staging, and production clusters
▪   Break no...
Anatomy of the Facebook Cluster
Software for Administration
▪   Most utilities are included in hadoop/bin
    ▪   Format D...
Anatomy of the Facebook Cluster
 Excerpts from Facebook’s conf/hadoop-site.xml
dfs.block.size                      134,217...
Anatomy of the Facebook Cluster
HDFS Tips from Dhruba Borthakur
▪   Be careful when using profilers to examine NN state
▪  ...
Anatomy of the Facebook Cluster
Common Issues
▪   Client libraries out of sync
▪   Non-uniform availability of software or...
Anatomy of the Facebook Cluster
More About Monitoring
▪   Hadoop has an abstract interface for metrics reporting
    ▪   o...
Anatomy of the Facebook Cluster
More About Performance
▪   In addition to the metrics package, logs are rich source of inf...
Anatomy of the Facebook Cluster
Recent DFS Performance Numbers
▪   All DNs are on same rack to isolate switch performance ...
X-Trace + Hadoop
HDFS Performance analysis
Anatomy of the Facebook Cluster
Resource Management and Job Scheduling
▪   By far the most intensive cluster management re...
Manual Job Scheduling
Job Priorities and “Kill this Job” from JT Web Interface
Anatomy of the Facebook Cluster
Recent Cluster Statistics
▪   From May 2nd to May 21st:
    ▪   Total jobs: 8,794
    ▪   ...
(c) 2008 Facebook, Inc. or its licensors.  quot;Facebookquot; is a registered trademark of Facebook, Inc.. All rights rese...
20080528dublinpt3
Upcoming SlideShare
Loading in...5
×

20080528dublinpt3

1,929

Published on

One in a series of presentations given at the IBM Cloud Computing Center in Dublin.

Published in: Business, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,929
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
69
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

20080528dublinpt3

  1. 1. Managing a Large Hadoop Cluster Jeff Hammerbacher Manager, Data May 28 - 29, 2008
  2. 2. Anatomy of the Facebook Cluster Hardware ▪ Individual nodes ▪ CPU: Intel Xeon dual socket quad cores (8 cores per box) ▪ Memory: 16 GB ECC DRAM ▪ Disk: 4 x 1 TB 7200 RPM SATA ▪ Network: 1 gE ▪ Topology ▪ 320 nodes arranged into 8 racks of 40 nodes each ▪ 8 x 1 Gbps links out to the core switch
  3. 3. Anatomy of the Facebook Cluster Functional Separation ▪ Need to have test, staging, and production clusters ▪ Break nodes into groups of 10 ▪ First 30 machines on each rack run DFS ▪ Last 10 machines used for DFS and upgrade testing or left idle ▪ Run main MapReduce cluster on 20 machines in each rack ▪ Run test MapReduce cluster on 10 machines in four racks ▪ Do MapReduce testing on 10 machines in four racks ▪ A few other MapReduce clusters for isolated applications
  4. 4. Anatomy of the Facebook Cluster Software for Administration ▪ Most utilities are included in hadoop/bin ▪ Format DFS, start/stop daemons, fsck, rebalance blocks, etc. ▪ Hypershell (internal): provides distributed shell functionality ▪ See also: dsh, GXP, Capistrano, ClusterIt ▪ Cfengine: ensure uniform system images, configuration, and libraries ▪ ODS (internal): monitoring and alerting ▪ See also: Ganglia for monitoring, Nagios for alerting ▪ Cacti: network monitoring
  5. 5. Anatomy of the Facebook Cluster Excerpts from Facebook’s conf/hadoop-site.xml dfs.block.size 134,217,728 Larger block size for less NN metadata dfs.datanode.du.reserved 1,024,000,000 Don’t fill up the local disk dfs.namenode.handler.count 40 More NN server threads for DN RPCs dfs.network.script /mnt/vol/hive/stable/bin/rackid.pl Print machine network name fs.trash.interval 1,440 fs.trash.root /Trash io.file.buffer.size 32,768 Size of r/w buffer used by SequenceFile io.sort.factor 100 More streams merged while sorting io.sort.mb 200 Higher memory limit while sorting data mapred.child.java.opts -Xmx1024m -Djava.net.preferIPv4Stack=true Large heap size; avoid RPC timeout mapred.linerecordreader.maxlength 1,000,000 Skip malformed lines mapred.min.split.size 65,536 mapred.reduce.copy.backoff 5 mapred.reduce.parallel.copies 20 More threads to fetch map output data mapred.tasktracker.tasks.maximum 5 mapred.speculative.map.enabled TRUE mapred.speculative.reduce.enabled FALSE mapred.speculative.map.gap 1 webinterface.private.actions TRUE
  6. 6. Anatomy of the Facebook Cluster HDFS Tips from Dhruba Borthakur ▪ Be careful when using profilers to examine NN state ▪ Never load many small files ▪ Always use java 1.6, otherwise NN will consume about 50% more CPU ▪ When decommissioning DNs, do a max of 10 machines or so at a time, otherwise the NN gets overloaded ▪ Run fsck every night and monitor the number of missing/under- replicated blocks ▪ If a block stays unreplicated, force its replication factor up, then down ▪ When adding new DNs to the cluster, run the rebalancing script
  7. 7. Anatomy of the Facebook Cluster Common Issues ▪ Client libraries out of sync ▪ Non-uniform availability of software or libraries on TT nodes ▪ Bad disk: manifested as ROFS ▪ NIC decides to go into 100 Mbps Ethernet mode ▪ DN reserved amount not honored resulting in disk filled to capacity ▪ Resource contention
  8. 8. Anatomy of the Facebook Cluster More About Monitoring ▪ Hadoop has an abstract interface for metrics reporting ▪ org.apache.hadoop.metrics.spi ▪ Currently has “file” and “ganglia” implementations ▪ Every Metric belongs to a Context and a Record ▪ Metrics can also have Tags for disambiguation ▪ See conf/hadoop-metrics.properties for configuration ▪ Web interfaces to NN and JT also have detailed information ▪ A variety of cron’d scripts also take care of system-level monitoring
  9. 9. Anatomy of the Facebook Cluster More About Performance ▪ In addition to the metrics package, logs are rich source of information ▪ Starting to regularly parse logs and store information into MySQL db ▪ Multiple research labs working on this area ▪ Berkeley RAD Lab ▪ Carnegie Mellon PDL ▪ Watch OSDI this year for papers
  10. 10. Anatomy of the Facebook Cluster Recent DFS Performance Numbers ▪ All DNs are on same rack to isolate switch performance from test ▪ 8 DNs, each with 2 map slots: hence performance levels off at 16 files ▪ Each mapper writes 1 GB/file. Block size is 128MB. Replication factor is 3. ▪ Uses Java 1.6 Number of Files 0.15.4 (MB/s) 0.17.0 (MB/s) 1 30 60 2 25 53 3 20 43 5 18 33 8 9 27 13 8 18 20 9 17 24 8 18 28 8 16
  11. 11. X-Trace + Hadoop HDFS Performance analysis
  12. 12. Anatomy of the Facebook Cluster Resource Management and Job Scheduling ▪ By far the most intensive cluster management responsibility ▪ At Facebook: manually set job priorities and kill jobs ▪ HOD ▪ Integrates with Torque resource manager ▪ Torque frequently paired with Maui cluster scheduler ▪ Other options ▪ Sun Grid Engine ▪ Condor ▪ Platform LSF (commercial)
  13. 13. Manual Job Scheduling Job Priorities and “Kill this Job” from JT Web Interface
  14. 14. Anatomy of the Facebook Cluster Recent Cluster Statistics ▪ From May 2nd to May 21st: ▪ Total jobs: 8,794 ▪ Total map tasks: 1,362,429 ▪ Total reduce tasks: 86,806 ▪ Average duration of a successful job: 296 s ▪ Average duration of a successful map: 81 s ▪ Average duration of a successful reduce: 678 s
  15. 15. (c) 2008 Facebook, Inc. or its licensors.  quot;Facebookquot; is a registered trademark of Facebook, Inc.. All rights reserved. 1.0
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×