HDFS Federation++

8,557 views

Published on

Hortonworks HDFS Federation and New Features presentation from Hadoop Summit 2011.

Published in: Technology, Business
0 Comments
16 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
8,557
On SlideShare
0
From Embeds
0
Number of Embeds
1,766
Actions
Shares
0
Downloads
256
Comments
0
Likes
16
Embeds 0
No embeds

No notes for slide
  • Improve random or short read performanceAllow reuse of connection (HDFS-941)Speedup DFS read pathPure Java CRC implementation (HADOOP-6148)28% improvement for 32 bit JVM and 58% improvement for 64 bit JVMCRC and other improvements (HDFS-2080)Significant speedup and CPU savings – work on going
  • HDFS Federation++

    1. 1. HDFS Federation andOther New Features<br />Suresh Srinivas<br />Founder & Architect – Hortonworks<br />Yahoo Hadoop Team (3 years)<br />@suresh_m_s<br />Sanjay Radia<br />Founder & Architect – Hortonworks<br />Yahoo Hadoop Team (4 years)<br />@srr<br />© Hortonworks Inc. 2011<br />
    2. 2. Agenda<br />HDFS Architecture Overview<br />Current Limitations<br />HDFS Federation<br />Other HDFS Features in Release 0.23<br />Futures<br />© Hortonworks Inc. 2011<br />2<br />
    3. 3. Two main layers<br />Namespace<br />Consists of dirs, files and blocks<br />Supports create, delete, modify and list files or dirs operations<br />Block Storage<br />Block Management<br />Datanode cluster membership<br />Supports create/delete/modify/get block location operations<br />Manages replication and replica placement<br />Storage - provides read and write access to blocks<br />Implemented as<br /><ul><li>Single Namespace Volume</li></ul>Namespace Volume = Namespace + Blocks<br /><ul><li>Single namenode with a namespace</li></ul>Entire namespace is in memory<br />Provides Block Management<br />Datanodes store block replicas<br />Block files stored on local file system on the nodes<br />HDFS Architecture<br />© Hortonworks Inc. 2011<br />3<br />NS<br />Namenode<br />Namespace<br />Block Management<br />Storage<br />Block Storage<br />Datanode<br />Datanode<br />…<br />
    4. 4. Limitation - Single Namenode<br />© Hortonworks Inc. 2011<br />4<br />Scalability<br /><ul><li>Storage scales horizontally - namespace doesn’t
    5. 5. Limited number of files, dirs and blocks
    6. 6. 250 million files and blocks at 64GB Namenode heap size</li></ul>Performance<br /><ul><li>File system operations limited by a single node</li></ul>Poor Isolation<br /><ul><li>All the tenants share a single namespace
    7. 7. Separate volume for tenants is not possible
    8. 8. Lacks separate namespace for different categories of applications
    9. 9. Experimental apps can affect production apps
    10. 10. Example - HBase could use its own namespace</li></li></ul><li>Namespace and Block Management are distinct services<br />Tightly coupled due to co-location<br />Scaling block management independent of namespace is simpler<br />Simplifies Namespace and scaling it<br />Block Storage could be a generic service<br />Namespace is one of the applications to use the service<br />Other services can be built directly on Block Storage<br />HBase<br />Foreign namespaces<br />Limitation – Tight coupling<br />© Hortonworks Inc. 2011<br />5<br />
    11. 11. © Hortonworks Inc. 2011<br />6<br />Isolation is a problem for even small clusters<br />
    12. 12. HDFS Federation<br />© Hortonworks Inc. 2011<br />7<br />...<br />...<br />NN-n<br />NN-k<br />NN-1<br />Foreign NS n<br />Namespace<br /> NS1<br /> NS k<br />...<br />...<br />Pool k<br />Pool n<br />Pool 1<br /> Block Pools<br />Balancer<br />Block Storage<br />Datanode 2<br />Datanode m<br />Datanode 1<br />...<br />Common Storage<br />Multiple independent Namenodes and Namespace Volumes in a cluster<br />Namespace Volume = Namespace + Block Pool<br />Block Storage as generic storage service<br />Set of blocks for a Namespace Volume is called a Block Pool<br />DNs store blocks for all the Namespace Volumes – no partitioning<br />
    13. 13. Key Ideas & Benefits<br />Distributed Namespace: Partitioned across namenodes<br />Simple and Robust due to independent masters<br />Each master serves a namespace volume<br />Preserves namenode stability - very little code change in the namenodes<br />Scalability – 6K nodes, 100K tasks, 120PB and 1 billion files<br />Block Pools enable generic storage service<br />Enables Namespace Volumes to be independent of each other<br />Fuels innovation and Rapid development<br />New implementations of file systems and Applications on top of block storage possible<br />New block pool categories – tmp storage, distributed cache, small object storage<br />In future, move Block Management out of namenode to separate set of nodes<br />Simplifies namespace/application implementation<br />Distributed namenode becomes significantly simpler<br />© Hortonworks Inc. 2011<br />8<br />Alternate NN Implementation<br />HBase<br />HDFS<br />Namespace<br />MR tmp<br />Storage Service<br />
    14. 14. HDFS Federation – First Phase Details<br />Simple design<br />Little change in the Namenode, most change in Datanode, Config and Tools<br />Core development in 4 months<br />Namespace and Block Management remain in Namenode<br />Block Management could be moved out of namenode in the future<br />Little impact on existing deployments<br />Single namenode configuration runs as is<br />Cluster level Web UI for better manageability<br />Provides cluster summary, namenode list and decommissioning status<br />Tools<br />Decommissioning, Balancer and other tools work with multiple namespaces<br />© Hortonworks Inc. 2011<br />9<br />
    15. 15. Managing Namespaces<br />Federation has multiple namespaces – Don’t you need a single global namespace?<br />Key is to share the data and the names used to access the data<br />A global namespace is one way to do that <br />Client-side mount table is another way to share.<br />Shared mount-table => “global” shared view<br />Personalized mount-table => per-application view<br />Share the data that matter by mounting it<br />Client-side implementation of mount tables<br />No single point of failure<br />No hotspot for root and top level directories<br />© Hortonworks Inc. 2011<br />10<br />Client-side mount-table<br />/<br />project<br />home<br />tmp<br />data<br />NS4<br />NS3<br />NS2<br />NS1<br />
    16. 16. HDFS I/O Improvements<br />Ongoing improvements for HBase (HDFS-1599)<br />Append, Flush and Sync improvements<br />Allow reuse of connections to datanodes (HDFS-941)<br />35-65% improvement for random reads<br />~40% improvement in latency as seen by HBase<br />Pure Java CRC implementation (HADOOP-6148)<br />~2x improvement<br />CRC and other improvements in progress (HDFS-2080)<br />2.5x performance improvement for CPU bound sequential reads<br />Improved locking for multithreaded readers (HDFS-1148)<br />15% improvement for HBase<br />Improvements in threading model of readers(HDFS-918/HDFS-1323)<br />10-15% improvement in reads<br />© Hortonworks Inc. 2011<br />11<br />
    17. 17. Other HDFS Features In Release 0.23<br />HDFS Raid (HDFS-503)<br />Lower replication for better storage efficiency using Erasure Codes<br />Edits cleanup (HDFS-1073)<br />Simpler model for fsimage and editlog<br />Write pipeline recovery (HDFS-1606)<br />Handles loss of datanode in pipeline for better data reliability<br />Improved decommissioning<br />Datanode Protocol Cleanup (HDFS-1937)<br />Towards cleaner separation of Block Storage<br />Metrics (HADOOP-6728)<br />Better metrics management<br />HTTP support<br />Read Write lock (0.22)<br />Better namenode operational throughput<br />© Hortonworks Inc. 2011<br />12<br />Other Features<br /><ul><li>Startup Time (HDFS-1443)
    18. 18. Startup time from 1 hour to 20 mins
    19. 19. Upgrade from 3 hours to 25 mins
    20. 20. Disk Fail In Place (HADOOP-7123)
    21. 21. Currently disk failure brings down a datanode
    22. 22. Loss of significant storage with high density datanode
    23. 23. Changes and best practices to make datanode resilient to disk failures</li></ul>Operational<br />
    24. 24. Futures<br />HA for Namenode (HDFS-1623)<br />Work in progress<br />Wire compatibility (HADOOP-7347)<br />Data transfer protocol cleanup and use Protocol Buffers (done)<br />Protocol Buffers for RPC<br />Continue I/Operformance improvements<br />Lot of work already done in 0.23<br />Partial namespace in memory<br />Further scale the namespace size supported by the namenode<br />More management and operability improvements<br />© Hortonworks Inc. 2011<br />13<br />
    25. 25. © Hortonworks Inc. 2011<br />Thank you.<br />14<br />

    ×