Your SlideShare is downloading. ×
0
HDFS Federation++
HDFS Federation++
HDFS Federation++
HDFS Federation++
HDFS Federation++
HDFS Federation++
HDFS Federation++
HDFS Federation++
HDFS Federation++
HDFS Federation++
HDFS Federation++
HDFS Federation++
HDFS Federation++
HDFS Federation++
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

HDFS Federation++

7,387

Published on

Hortonworks HDFS Federation and New Features presentation from Hadoop Summit 2011.

Hortonworks HDFS Federation and New Features presentation from Hadoop Summit 2011.

Published in: Technology, Business
0 Comments
13 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,387
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
235
Comments
0
Likes
13
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Improve random or short read performanceAllow reuse of connection (HDFS-941)Speedup DFS read pathPure Java CRC implementation (HADOOP-6148)28% improvement for 32 bit JVM and 58% improvement for 64 bit JVMCRC and other improvements (HDFS-2080)Significant speedup and CPU savings – work on going
  • Transcript

    • 1. HDFS Federation andOther New Features<br />Suresh Srinivas<br />Founder & Architect – Hortonworks<br />Yahoo Hadoop Team (3 years)<br />@suresh_m_s<br />Sanjay Radia<br />Founder & Architect – Hortonworks<br />Yahoo Hadoop Team (4 years)<br />@srr<br />© Hortonworks Inc. 2011<br />
    • 2. Agenda<br />HDFS Architecture Overview<br />Current Limitations<br />HDFS Federation<br />Other HDFS Features in Release 0.23<br />Futures<br />© Hortonworks Inc. 2011<br />2<br />
    • 3. Two main layers<br />Namespace<br />Consists of dirs, files and blocks<br />Supports create, delete, modify and list files or dirs operations<br />Block Storage<br />Block Management<br />Datanode cluster membership<br />Supports create/delete/modify/get block location operations<br />Manages replication and replica placement<br />Storage - provides read and write access to blocks<br />Implemented as<br /><ul><li>Single Namespace Volume</li></ul>Namespace Volume = Namespace + Blocks<br /><ul><li>Single namenode with a namespace</li></ul>Entire namespace is in memory<br />Provides Block Management<br />Datanodes store block replicas<br />Block files stored on local file system on the nodes<br />HDFS Architecture<br />© Hortonworks Inc. 2011<br />3<br />NS<br />Namenode<br />Namespace<br />Block Management<br />Storage<br />Block Storage<br />Datanode<br />Datanode<br />…<br />
    • 4. Limitation - Single Namenode<br />© Hortonworks Inc. 2011<br />4<br />Scalability<br /><ul><li>Storage scales horizontally - namespace doesn’t
    • 5. Limited number of files, dirs and blocks
    • 6. 250 million files and blocks at 64GB Namenode heap size</li></ul>Performance<br /><ul><li>File system operations limited by a single node</li></ul>Poor Isolation<br /><ul><li>All the tenants share a single namespace
    • 7. Separate volume for tenants is not possible
    • 8. Lacks separate namespace for different categories of applications
    • 9. Experimental apps can affect production apps
    • 10. Example - HBase could use its own namespace</li></li></ul><li>Namespace and Block Management are distinct services<br />Tightly coupled due to co-location<br />Scaling block management independent of namespace is simpler<br />Simplifies Namespace and scaling it<br />Block Storage could be a generic service<br />Namespace is one of the applications to use the service<br />Other services can be built directly on Block Storage<br />HBase<br />Foreign namespaces<br />Limitation – Tight coupling<br />© Hortonworks Inc. 2011<br />5<br />
    • 11. © Hortonworks Inc. 2011<br />6<br />Isolation is a problem for even small clusters<br />
    • 12. HDFS Federation<br />© Hortonworks Inc. 2011<br />7<br />...<br />...<br />NN-n<br />NN-k<br />NN-1<br />Foreign NS n<br />Namespace<br /> NS1<br /> NS k<br />...<br />...<br />Pool k<br />Pool n<br />Pool 1<br /> Block Pools<br />Balancer<br />Block Storage<br />Datanode 2<br />Datanode m<br />Datanode 1<br />...<br />Common Storage<br />Multiple independent Namenodes and Namespace Volumes in a cluster<br />Namespace Volume = Namespace + Block Pool<br />Block Storage as generic storage service<br />Set of blocks for a Namespace Volume is called a Block Pool<br />DNs store blocks for all the Namespace Volumes – no partitioning<br />
    • 13. Key Ideas & Benefits<br />Distributed Namespace: Partitioned across namenodes<br />Simple and Robust due to independent masters<br />Each master serves a namespace volume<br />Preserves namenode stability - very little code change in the namenodes<br />Scalability – 6K nodes, 100K tasks, 120PB and 1 billion files<br />Block Pools enable generic storage service<br />Enables Namespace Volumes to be independent of each other<br />Fuels innovation and Rapid development<br />New implementations of file systems and Applications on top of block storage possible<br />New block pool categories – tmp storage, distributed cache, small object storage<br />In future, move Block Management out of namenode to separate set of nodes<br />Simplifies namespace/application implementation<br />Distributed namenode becomes significantly simpler<br />© Hortonworks Inc. 2011<br />8<br />Alternate NN Implementation<br />HBase<br />HDFS<br />Namespace<br />MR tmp<br />Storage Service<br />
    • 14. HDFS Federation – First Phase Details<br />Simple design<br />Little change in the Namenode, most change in Datanode, Config and Tools<br />Core development in 4 months<br />Namespace and Block Management remain in Namenode<br />Block Management could be moved out of namenode in the future<br />Little impact on existing deployments<br />Single namenode configuration runs as is<br />Cluster level Web UI for better manageability<br />Provides cluster summary, namenode list and decommissioning status<br />Tools<br />Decommissioning, Balancer and other tools work with multiple namespaces<br />© Hortonworks Inc. 2011<br />9<br />
    • 15. Managing Namespaces<br />Federation has multiple namespaces – Don’t you need a single global namespace?<br />Key is to share the data and the names used to access the data<br />A global namespace is one way to do that <br />Client-side mount table is another way to share.<br />Shared mount-table => “global” shared view<br />Personalized mount-table => per-application view<br />Share the data that matter by mounting it<br />Client-side implementation of mount tables<br />No single point of failure<br />No hotspot for root and top level directories<br />© Hortonworks Inc. 2011<br />10<br />Client-side mount-table<br />/<br />project<br />home<br />tmp<br />data<br />NS4<br />NS3<br />NS2<br />NS1<br />
    • 16. HDFS I/O Improvements<br />Ongoing improvements for HBase (HDFS-1599)<br />Append, Flush and Sync improvements<br />Allow reuse of connections to datanodes (HDFS-941)<br />35-65% improvement for random reads<br />~40% improvement in latency as seen by HBase<br />Pure Java CRC implementation (HADOOP-6148)<br />~2x improvement<br />CRC and other improvements in progress (HDFS-2080)<br />2.5x performance improvement for CPU bound sequential reads<br />Improved locking for multithreaded readers (HDFS-1148)<br />15% improvement for HBase<br />Improvements in threading model of readers(HDFS-918/HDFS-1323)<br />10-15% improvement in reads<br />© Hortonworks Inc. 2011<br />11<br />
    • 17. Other HDFS Features In Release 0.23<br />HDFS Raid (HDFS-503)<br />Lower replication for better storage efficiency using Erasure Codes<br />Edits cleanup (HDFS-1073)<br />Simpler model for fsimage and editlog<br />Write pipeline recovery (HDFS-1606)<br />Handles loss of datanode in pipeline for better data reliability<br />Improved decommissioning<br />Datanode Protocol Cleanup (HDFS-1937)<br />Towards cleaner separation of Block Storage<br />Metrics (HADOOP-6728)<br />Better metrics management<br />HTTP support<br />Read Write lock (0.22)<br />Better namenode operational throughput<br />© Hortonworks Inc. 2011<br />12<br />Other Features<br /><ul><li>Startup Time (HDFS-1443)
    • 18. Startup time from 1 hour to 20 mins
    • 19. Upgrade from 3 hours to 25 mins
    • 20. Disk Fail In Place (HADOOP-7123)
    • 21. Currently disk failure brings down a datanode
    • 22. Loss of significant storage with high density datanode
    • 23. Changes and best practices to make datanode resilient to disk failures</li></ul>Operational<br />
    • 24. Futures<br />HA for Namenode (HDFS-1623)<br />Work in progress<br />Wire compatibility (HADOOP-7347)<br />Data transfer protocol cleanup and use Protocol Buffers (done)<br />Protocol Buffers for RPC<br />Continue I/Operformance improvements<br />Lot of work already done in 0.23<br />Partial namespace in memory<br />Further scale the namespace size supported by the namenode<br />More management and operability improvements<br />© Hortonworks Inc. 2011<br />13<br />
    • 25. © Hortonworks Inc. 2011<br />Thank you.<br />14<br />

    ×