• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
HDFS - What's New and Future

HDFS - What's New and Future



Hadoop 2.0 offers significant HDFS improvements: new append-pipeline, federation, wire compatibility, NameNode HA, performance improvements, ...

Hadoop 2.0 offers significant HDFS improvements: new append-pipeline, federation, wire compatibility, NameNode HA, performance improvements,
etc. We describe these features and their benefits. We also discuss development that is underway for the next HDFS release. This includes much needed data management features such as Snapshots and Disaster Recovery. We add support for different classes of storage devices such as SSDs and open interfaces such as NFS; together these extend HDFS as a more general storage system. As with every release we will continue improvements to performance, diagnosability and manageability of HDFS.



Total Views
Views on SlideShare
Embed Views



2 Embeds 17

http://eventifier.co 16
http://eventifier.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    HDFS - What's New and Future HDFS - What's New and Future Presentation Transcript

    • HDFS What’s New and FutureSuresh Srinivassuresh@hortonworks.com@suresh_m_s© Hortonworks Inc. 2013 Page 1
    • About Me• Architect & Founder at Hortonworks• Apache Hadoop committer and PMC member• > 4.5 years working on HDFS Architecting the Future of Big Data Page 2 © Hortonworks Inc. 2013
    • Agenda• HDFS – What’s new – Federation – HA – Snapshots – Other features• Future – Major Architectural Directions – Short term and long term features Architecting the Future of Big Data Page 3 © Hortonworks Inc. 2013
    • We have been hard at work…• Progress is being made in many areas – Scalability – Performance – Enterprise features – Ongoing operability improvements – Enhancements for other projects in the ecosystem – Expand Hadoop ecosystem to more platforms and use cases• 2192 commits in Hadoop in the last year – Almost a million lines of changes – ~150 contributors – Lot of new contributors - ~80 with < 3 patches• 350K lines of changes in HDFS and common Architecting the Future of Big Data Page 4 © Hortonworks Inc. 2013
    • Building on Rock-solid Foundation• Original design choices - simple and robust – Storage: Rely in OS’s file system rather than use raw disk – Storage Fault Tolerance: multiple replicas, active monitoring – Single Namenode Master• Reliability – Over 7 9’s of data reliability – Less than 0.38 failures across 25 clusters• Operability – Small teams can manage large clusters • An operator per 3K node cluster – Fast Time to repair on node or disk failure • Minutes to an hour Vs. RAID array repairs taking many long hours• Scalable - proven by large scale deployments not bits – > 100 PB storage, > 400 million files, > 4500 nodes in a single cluster – > 70 K nodes of HDFS in deployment and use Architecting the Future of Big Data Page 5 © Hortonworks Inc. 2013
    • Federation NN-1 NN-k NN-n Namespace Foreign NS1 NS k NS n .. .. . . Pool 1 Pool k Pool n Block Storage Block Pools DN 1 DN 2 DN m .. .. .. Common Storage• Block Storage as generic storage service – DNs store blocks in Block Pools for all the Namespace Volumes• Multiple independent Namenodes and Namespace Volumes in a cluster – Scalability by adding more namenodes/namespaces – Isolation – separating applications to their own namespaces – Client side mount tables/ViewFS for integrated views Architecting the Future of Big Data Page 6 © Hortonworks Inc. 2013
    • High Availability• Support standby namenode and failover – Planned downtime – Unplanned downtime• Release 1.1 – Cold standby – Uses NFS as shared storage – Standard HA frameworks as failover controller • Linux HA and VMWare VSphere – Suitable for small clusters up to 500 nodes Architecting the Future of Big Data Page 7 © Hortonworks Inc. 2013
    • Hadoop Full Stack HA Slave Nodes of Hadoop Cluster jo jo jo jo jo b b b b b AppsRunningOutside Failover JT into Safemode NN JT NN Server Server Server HA Cluster for Master Daemons Architecting the Future of Big Data Page 8 © Hortonworks Inc. 2013
    • High Availability – Release 2.0• Supports manual and automatic failover• Automatic failover with Failover Controller – Active NN election and failure detection using ZooKeeper – Periodic NN health check – Failover on NN failure• Removed shared storage dependency – Quorum Journal Manager • 3 to 5 Journal Nodes for storing editlog • Edit must be written to quorum number of Journal Nodes Available in Release 2.0.3-alpha Architecting the Future of Big Data Page 9 © Hortonworks Inc. 2013
    • ZK ZK ZK Heartbeat Heartbeat FailoverController FailoverController Active Standby Cmds JN JN JN Shared NN state NN NNMonitor Health through Quorumof NN. OS, HW Active of JournalNodes Standby Monitor Health of NN. OS, HW Block Reports to Active & Standby DN fencing: only obey commands from active DN DN DN DN Namenode HA has no external dependency Architecting the Future of Big Data Page 10 © Hortonworks Inc. 2013
    • Snapshots (HDFS-2802)• Support for read-only COW snapshots – Design allows read-write snapshots• Namenode only operation – no data copy made – Metadata in namenode - no complicated distributed mechanism – Datanodes have no knowledge• Snapshot entire namespace or sub directories – Nested snapshots allowed – Managed by Admin • Users can take snapshots of directories they own• Efficient – Instantaneous creation – Memory used is highly optimized – Does not affect regular HDFS operations Architecting the Future of Big Data Page 11 © Hortonworks Inc. 2013
    • Snapshot Design ∆n ∆n-1 ∆0 Current Sn Sn-1 S0• Based on Persistent Data Structures – Maintains changes in the diff list at the Inodes • Tracks creation, deletion, and modification – Snapshot state Sn = current - ∆n• A large number of snapshots supported – State proportional to the changes between the snapshots – Supports millions of snapshots Architecting the Future of Big Data Page 12 © Hortonworks Inc. 2013
    • Snapshot – APIs and CLIs• All regular commands & APIs can be used with snapshot path – /<path>/.snapshot/<snapshot_name>/file.txt• CLIs – Allow snapshots • dfsadmin –allowSnapshots <dir> • dfsadmin –disAllowSnapshots <dir> – Create/delete/rename snapshots • fs –createSnapshot<dir> [snapshot_name] • fs –deleteSnapshot<dir> <snapshot_name> • fs –renameSnapshot<dir> <old_name> <new_name> – Tool to print diff between snapshots – Admin tool to print all snapshottable directories and snapshots• Status – Work almost complete – ready to be integrated to trunk – Additional work needed in integration to Ambari Architecting the Future of Big Data Page 13 © Hortonworks Inc. 2013
    • Performance Improvements• Many Improvements – SSE4.2 CRC32C – ~3x less CPU on read path – Read path improvements for fewer memory copies – Short-circuit read for 2-3x faster random reads (HBase workloads) – Unix domain socket based local reads (almost done) • Simpler to configure and generic for many applications – I/O improvements using posix_fadvise() – libhdfs improvements for zero copy reads• Significant improvements - IO 2.5x to 5x faster – Lot of improvements back ported to release 1.x • Available in Apache release 1.1 and HDP 1.1 Architecting the Future of Big Data Page 14 © Hortonworks Inc. 2013
    • Other Features• New append pipeline• Protobuf, wire compatibility – Post 2.0 GA stronger wire compatibility in Apache Hadoop and HDP Releases• Rolling upgrades – With relaxed version checks• Improvements for other projects – Stale node to improve HBase MTTR• Block placement enhancements – Better support for other topologies such as VMs and Cloud• On the wire encryption – Both data and RPC• Support for NFS gateway – Work in progress – available soon• Expanding ecosystem, platforms and applicability – Native support for Windows Architecting the Future of Big Data Page 15 © Hortonworks Inc. 2013
    • Enterprise Readiness• Storage fault-tolerance – built into HDFS  – Over 7’9s of data reliability• High Availability • Standard Interfaces  – WebHdfs(REST) & HTTPFS, Fuse, NFS, libwebhdfs and libhdfs• Wire protocol compatibility • Rolling upgrades • Snapshots • Disaster Recovery  – Distcp for parallel and incremental copies across cluster – Apache Ambari and HDP for automated management Architecting the Future of Big Data Page 16 © Hortonworks Inc. 2013
    • HDFS FuturesArchitecting the Future of Big Data Page 17© Hortonworks Inc. 2011
    • Storage Abstraction• Fundamental storage abstraction improvements• Short Term – Heterogeneous storage • Support SSDs and disks for different storage categories • Match storage to different access patterns • Disk/storage addressing/locality and status collection – Block level APIs for apps that don’t need file system interface – Granular block placement policies• Long Term – Explore support for objects/Key value store and APIs – Serving from Datanodes optimized based on file structure Architecting the Future of Big Data Page 18 © Hortonworks Inc. 2013
    • Higher Scalability• Even higher scalability of namespace – Only working set in Namenode memory – Namenode as container of namespaces • Support large number of namespaces – Explore new types of namespaces• Further scale the block storage – Block management to Datanodes – Block collection/Mega block group abstraction Architecting the Future of Big Data Page 19 © Hortonworks Inc. 2013
    • High Availability• Further enhancements to HA – Expand Full stack HA to include other dependent services – Support multiple standby nodes – Use standby for reads – Simplify management – eliminate special daemons for journals • Move Namenode metadata to HDFS Architecting the Future of Big Data Page 20 © Hortonworks Inc. 2013
    • Q&A• Myths and misinformation – Not reliable (was never true) – Namenode dies all state is lost (was never true) – Hard to operate – Slow and not performant – Namenode is a single point of failure – Needs shared NFS storage – Does not have point in time recovery – Does not support disaster recovery Thank You! Architecting the Future of Big Data Page 21 © Hortonworks Inc. 2013