• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
AHUG Presentation: Fun with Hadoop File Systems
 

AHUG Presentation: Fun with Hadoop File Systems

on

  • 786 views

The Presentation given by Brad Childs on April 25, 2013 at the Austin Hadoop Users Group (AHUG): Fun with Hadoop File Systems

The Presentation given by Brad Childs on April 25, 2013 at the Austin Hadoop Users Group (AHUG): Fun with Hadoop File Systems

Statistics

Views

Total Views
786
Views on SlideShare
786
Embed Views
0

Actions

Likes
0
Downloads
20
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    AHUG Presentation: Fun with Hadoop File Systems AHUG Presentation: Fun with Hadoop File Systems Presentation Transcript

    • FUN WITHHADOOPFILE SYSTEMS© Bradley Childs / bdc@redhat.com
    • HISTORY•  Distributed file systems have been around for a long time•  DFS battle optimizing the CAP theorem•  Hadoops DFS implementation is called HDFS•  Wide adoption of hadoop, users forced to use HDFS as theonly alternative•  HDFS has technical trade offs and limitations
    • HDFS ARCHITECTUREclientNameNodeclientDataNodeclientDataNodeclientDataNodeStore & Compute
    • HDFS ISSUESHandy•  Locking around metadata operations permitted by single namenode•  File locking permitted by single name nodeFrustrating•  Difficult to get data in and out (ingest)•  Name Node is single point of failure•  Name Node is system bottleneck
    • GLUSTER FILESYSTEMGluster is an open source multi purpose DFSFeatures:•  Data Striping•  Global elastic hashing for file placement•  Basic and GEO Replication•  Full POSIX Compliant Interface•  Flexible architecture•  Supports Storage Resident Apps – Compute and Data onsame machineMore Info: www.gluster.org
    • GLUSTERARCHITECTUREclientTrusted PeersclientDataBrickclientDataBrickclientDataBrickVolumeVolumeStore & Compute
    • HCFSHCFS: Hadoop Compatible File System•  Implementing the o.a.h.fs.FileSystem interface not enough forexisting hadoop jobs to run on a different file system•  HDFS architecture created semantics and assumptions•  HCFS defines these semantics so any file system can replaceHDFS without fear of compatibility•  Open ongoing effort to define file system semantics decoupledfrom architectureJIRA:issues.apache.org/jira/browse/HADOOP-9371
    • COMMON FILESYSTEMATTRIBUTES•  Hierarchical structure of directories containing directories andfiles•  File contain between 0 and MAX_SIZE data•  Directories contain 0 or more files or directories•  Directories have no data, only child elements
    • NETWORKASSUMPTIONS•  The final state of a file system after a network failure isundefined•  The immediate consistency state of a file system after anetwork failure is undefined•  If a network failure can be reported to the client, the failureMUST be an instance of IOException
    • NETWORK FAILURE•  Any operation with a file system MAY signal an error bythrowing an instance of IOException•  File system operations MUST NOT throw RuntimeExceptionexceptions on the failure of a remote operations, authenticationor other operational problems•  Stream read operations MAY fail if the read channel has beenidle for a file system specific period of time•  Stream write operations MAY fail if the write channel has beenidle for a file system specific period of time•  Network failures MAY be raised in the Stream close() operation
    • ATOMICITY•  Rename of a file MUST be atomic•  Rename of a directory SHOULD be atomic•  Delete of a file MUST be atomic•  Delete of an empty directory MUST be atomic•  Recursive directory deletion MAY be atomic. Although HDFSoffers atomic recursive directory deletion, none of the other filesystems that Hadoop supports offers such a guarantee -including the local file systems•  mkdir() SHOULD be atomic•  mkdirs() MAY be atomic. [It is currently atomic on HDFS, butthis is not the case for most other filesystems -and cannot beguaranteed for future versions of HDFS]
    • CONCURRENCY•  The data added to a file during a write or append MAY be visiblewhile the write operation is in progress•  If a client opens a file for a read() operation while another read()operation is in progress, the second operation MUST succeed.Both clients MUST have a consistent view of the same data•  If a file is deleted while a read() operation is in progress, the read()operation MAY complete successfully. Implementations MAYcause read() operations to fail with an IOException instead•  Multiple writers MAY open a file for writing. If this occurs, theoutcome is undefined•  Undefined: action of delete() while a write or append operation isin progress
    • CONSISTENCYThe consistency model of a Hadoop file system is one-copy-update-semantics; partiallygenerally that of a traditional Posix file system.•  Create: once the close() operation on an output stream writing a newly created file hascompleted, in-cluster operations querying the file metadata and contents MUSTimmediately see the file and its data•  Update: Once the close() operation on an output stream writing a newly created file hascompleted, in-cluster operations querying the file metadata and contents MUSTimmediately see the new data•  Delete: once a delete() operation is on a file has completed, listStatus() , open() ,rename() and append() operations MUST fail•  When file is deleted then overwritten, listStatus() , open() , rename() and append()operations MUST succeed: the file is visible•  Rename: after a rename has completed, operations against the new path MUST succeed;operations against the old path MUST fail•  The consistency semantics out of cluster client MUST be the same as in-cluster clients: Allclients calling read() on a closed file MUST see the same metadata and data until it ischanged from a create() , append() , rename() and append() operation
    • REFERENCESApache HCFS Wiki:wiki.apache.org/hadoop/HCFSApache file Systems semantics JIRA:issues.apache.org/jira/browse/HADOOP-9371Some of this text is taken from the working draft linked in above Jira, credit Steve Loughran et al.The opinions expressed do not necessarily represent those of RedHat Inc. or any of its affiliates.© Bradley Childs / bdc@redhat.com