• Save
HDFS Issues
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

HDFS Issues

on

  • 3,791 views

Look at things you should be aware of before you go live with an HDFS cluster

Look at things you should be aware of before you go live with an HDFS cluster

Statistics

Views

Total Views
3,791
Views on SlideShare
3,782
Embed Views
9

Actions

Likes
8
Downloads
0
Comments
0

3 Embeds 9

http://www.slideshare.net 5
http://www.e-presentations.us 2
http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

HDFS Issues Presentation Transcript

  • 1. HDFS Issues Steve Loughran Julio Guijarro Paolo Castagna
  • 2. HDFS: Hadoop Distributed Filesystem
    • Filesystem to scale to tens of Petabytes
    • High IO bandwidth to code near the data
    • Replication across machines
    • Commodity HW: SATA, CPU blades + gigabit LAN
    • Not-Posix: no R/W files, locks
  • 3. Locality is key for Hadoop performance
    • Where does data live?
    • Each site provides a shell script for this
    • Future improvement: inline JavaScript/regexp pattern?
    • Have a simple IP -> Rack/switch mapping
  • 4. Append
    • "critical" for HBase performance
    • Not stable, reliable yet (HADOOP-5332)
    • Disabled in Hadoop 0.20; can be turned on with dfs.support.append=true
    • Stable in 0.21? Let Y! find out first.
  • 5. Data Loss
    • HADOOP-4810 Data lost at cluster startup time
    • HADOOP-4702 Failure to clean up failed copies -added invalid tmp blocks
    • HADOOP-4663 bad import of data in tmp files -added invalid tmp blocks to the fs
    • =>need a good backup strategy
  • 6. Handling of full disks
    • Joost: Namenode HDD overflow -corrupted edit log leading to crash on every cluster restart
    • HADOOP-3574 Better Datanode DiskOutOfSpaceException handling.
    • Ongoing problem in mailing lists
    • No good tests yet
    • ==>don't let the namenode disks fill up
  • 7. Underreplication/ bad handling of corrupt data
    • HADOOP-4543 HADOOP-3314 -inadequate detection of truncated/incorrectly sized blocks (could be picked up on startup, otherwise only the (slower) checksum scanner will find it eventually
    • HADOOP-5133 - when the block lengths are all inconsistent, which to choose?
    • ==> min replication of 3 +stay close to the Yahoo! Configuration
    • -better handling of missing blocks?
  • 8. Limits to scale of Namenode
    • Everything is in memory
    • Y! run 32+GB machines and a big blocksize
    • Run a secondary for faster restart times
    • Secondary namenode memory should be the same as that of the primary.
  • 9. Failover handling
    • Secondary namenode is not a failover server, it is a log server
    • Need to restart the primary namenode and replay actions
    • Dynamic hosts? all JVMs (currently) cache the DNS entries of namenode.
    • Worker nodes don't reload their configurations when waiting for masters to come back up
  • 10. Rate of change of filesystem/APIs
    • One-way upgrades 3x a year?
    • Y! run the 0.x.1 releases live, though they skipped 0.19 entirely
    • Most people are on 18.3. 0.20.1 looks good (with append disabled)‏
    • Rollback via distCP to another cluster
  • 11. Security: none
    • User identification added after last.fm deleted a fileystem by accident.
    • Caller provides name -taken on trust
    • Working towards running MR jobs with restricted user rights
    • -there is no security, just defence against accidents
  • 12. Is HDFS ready for production? Maybe, but with care
  • 13. 7 May 2009