Your SlideShare is downloading. ×
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply



Published on

One in a series of presentations given at the IBM Cloud Computing Center in Dublin.

One in a series of presentations given at the IBM Cloud Computing Center in Dublin.

Published in: Business, Technology

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Managing a Large Hadoop Cluster Jeff Hammerbacher Manager, Data May 28 - 29, 2008
  • 2. Anatomy of the Facebook Cluster Hardware ▪ Individual nodes ▪ CPU: Intel Xeon dual socket quad cores (8 cores per box) ▪ Memory: 16 GB ECC DRAM ▪ Disk: 4 x 1 TB 7200 RPM SATA ▪ Network: 1 gE ▪ Topology ▪ 320 nodes arranged into 8 racks of 40 nodes each ▪ 8 x 1 Gbps links out to the core switch
  • 3. Anatomy of the Facebook Cluster Functional Separation ▪ Need to have test, staging, and production clusters ▪ Break nodes into groups of 10 ▪ First 30 machines on each rack run DFS ▪ Last 10 machines used for DFS and upgrade testing or left idle ▪ Run main MapReduce cluster on 20 machines in each rack ▪ Run test MapReduce cluster on 10 machines in four racks ▪ Do MapReduce testing on 10 machines in four racks ▪ A few other MapReduce clusters for isolated applications
  • 4. Anatomy of the Facebook Cluster Software for Administration ▪ Most utilities are included in hadoop/bin ▪ Format DFS, start/stop daemons, fsck, rebalance blocks, etc. ▪ Hypershell (internal): provides distributed shell functionality ▪ See also: dsh, GXP, Capistrano, ClusterIt ▪ Cfengine: ensure uniform system images, configuration, and libraries ▪ ODS (internal): monitoring and alerting ▪ See also: Ganglia for monitoring, Nagios for alerting ▪ Cacti: network monitoring
  • 5. Anatomy of the Facebook Cluster Excerpts from Facebook’s conf/hadoop-site.xml dfs.block.size 134,217,728 Larger block size for less NN metadata dfs.datanode.du.reserved 1,024,000,000 Don’t fill up the local disk dfs.namenode.handler.count 40 More NN server threads for DN RPCs /mnt/vol/hive/stable/bin/ Print machine network name fs.trash.interval 1,440 fs.trash.root /Trash io.file.buffer.size 32,768 Size of r/w buffer used by SequenceFile io.sort.factor 100 More streams merged while sorting io.sort.mb 200 Higher memory limit while sorting data -Xmx1024m Large heap size; avoid RPC timeout mapred.linerecordreader.maxlength 1,000,000 Skip malformed lines mapred.min.split.size 65,536 mapred.reduce.copy.backoff 5 mapred.reduce.parallel.copies 20 More threads to fetch map output data mapred.tasktracker.tasks.maximum 5 TRUE mapred.speculative.reduce.enabled FALSE 1 webinterface.private.actions TRUE
  • 6. Anatomy of the Facebook Cluster HDFS Tips from Dhruba Borthakur ▪ Be careful when using profilers to examine NN state ▪ Never load many small files ▪ Always use java 1.6, otherwise NN will consume about 50% more CPU ▪ When decommissioning DNs, do a max of 10 machines or so at a time, otherwise the NN gets overloaded ▪ Run fsck every night and monitor the number of missing/under- replicated blocks ▪ If a block stays unreplicated, force its replication factor up, then down ▪ When adding new DNs to the cluster, run the rebalancing script
  • 7. Anatomy of the Facebook Cluster Common Issues ▪ Client libraries out of sync ▪ Non-uniform availability of software or libraries on TT nodes ▪ Bad disk: manifested as ROFS ▪ NIC decides to go into 100 Mbps Ethernet mode ▪ DN reserved amount not honored resulting in disk filled to capacity ▪ Resource contention
  • 8. Anatomy of the Facebook Cluster More About Monitoring ▪ Hadoop has an abstract interface for metrics reporting ▪ org.apache.hadoop.metrics.spi ▪ Currently has “file” and “ganglia” implementations ▪ Every Metric belongs to a Context and a Record ▪ Metrics can also have Tags for disambiguation ▪ See conf/ for configuration ▪ Web interfaces to NN and JT also have detailed information ▪ A variety of cron’d scripts also take care of system-level monitoring
  • 9. Anatomy of the Facebook Cluster More About Performance ▪ In addition to the metrics package, logs are rich source of information ▪ Starting to regularly parse logs and store information into MySQL db ▪ Multiple research labs working on this area ▪ Berkeley RAD Lab ▪ Carnegie Mellon PDL ▪ Watch OSDI this year for papers
  • 10. Anatomy of the Facebook Cluster Recent DFS Performance Numbers ▪ All DNs are on same rack to isolate switch performance from test ▪ 8 DNs, each with 2 map slots: hence performance levels off at 16 files ▪ Each mapper writes 1 GB/file. Block size is 128MB. Replication factor is 3. ▪ Uses Java 1.6 Number of Files 0.15.4 (MB/s) 0.17.0 (MB/s) 1 30 60 2 25 53 3 20 43 5 18 33 8 9 27 13 8 18 20 9 17 24 8 18 28 8 16
  • 11. X-Trace + Hadoop HDFS Performance analysis
  • 12. Anatomy of the Facebook Cluster Resource Management and Job Scheduling ▪ By far the most intensive cluster management responsibility ▪ At Facebook: manually set job priorities and kill jobs ▪ HOD ▪ Integrates with Torque resource manager ▪ Torque frequently paired with Maui cluster scheduler ▪ Other options ▪ Sun Grid Engine ▪ Condor ▪ Platform LSF (commercial)
  • 13. Manual Job Scheduling Job Priorities and “Kill this Job” from JT Web Interface
  • 14. Anatomy of the Facebook Cluster Recent Cluster Statistics ▪ From May 2nd to May 21st: ▪ Total jobs: 8,794 ▪ Total map tasks: 1,362,429 ▪ Total reduce tasks: 86,806 ▪ Average duration of a successful job: 296 s ▪ Average duration of a successful map: 81 s ▪ Average duration of a successful reduce: 678 s
  • 15. (c) 2008 Facebook, Inc. or its licensors.  quot;Facebookquot; is a registered trademark of Facebook, Inc.. All rights reserved. 1.0