Your SlideShare is downloading. ×
HDFS Federation++
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

HDFS Federation++

7,282

Published on

Hortonworks HDFS Federation and New Features presentation from Hadoop Summit 2011.

Hortonworks HDFS Federation and New Features presentation from Hadoop Summit 2011.

Published in: Technology, Business
0 Comments
13 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,282
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
233
Comments
0
Likes
13
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Improve random or short read performanceAllow reuse of connection (HDFS-941)Speedup DFS read pathPure Java CRC implementation (HADOOP-6148)28% improvement for 32 bit JVM and 58% improvement for 64 bit JVMCRC and other improvements (HDFS-2080)Significant speedup and CPU savings – work on going
  • Transcript

    • 1. HDFS Federation andOther New Features
      Suresh Srinivas
      Founder & Architect – Hortonworks
      Yahoo Hadoop Team (3 years)
      @suresh_m_s
      Sanjay Radia
      Founder & Architect – Hortonworks
      Yahoo Hadoop Team (4 years)
      @srr
      © Hortonworks Inc. 2011
    • 2. Agenda
      HDFS Architecture Overview
      Current Limitations
      HDFS Federation
      Other HDFS Features in Release 0.23
      Futures
      © Hortonworks Inc. 2011
      2
    • 3. Two main layers
      Namespace
      Consists of dirs, files and blocks
      Supports create, delete, modify and list files or dirs operations
      Block Storage
      Block Management
      Datanode cluster membership
      Supports create/delete/modify/get block location operations
      Manages replication and replica placement
      Storage - provides read and write access to blocks
      Implemented as
      • Single Namespace Volume
      Namespace Volume = Namespace + Blocks
      • Single namenode with a namespace
      Entire namespace is in memory
      Provides Block Management
      Datanodes store block replicas
      Block files stored on local file system on the nodes
      HDFS Architecture
      © Hortonworks Inc. 2011
      3
      NS
      Namenode
      Namespace
      Block Management
      Storage
      Block Storage
      Datanode
      Datanode

    • 4. Limitation - Single Namenode
      © Hortonworks Inc. 2011
      4
      Scalability
      • Storage scales horizontally - namespace doesn’t
      • 5. Limited number of files, dirs and blocks
      • 6. 250 million files and blocks at 64GB Namenode heap size
      Performance
      • File system operations limited by a single node
      Poor Isolation
      • All the tenants share a single namespace
      • 7. Separate volume for tenants is not possible
      • 8. Lacks separate namespace for different categories of applications
      • 9. Experimental apps can affect production apps
      • 10. Example - HBase could use its own namespace
    • Namespace and Block Management are distinct services
      Tightly coupled due to co-location
      Scaling block management independent of namespace is simpler
      Simplifies Namespace and scaling it
      Block Storage could be a generic service
      Namespace is one of the applications to use the service
      Other services can be built directly on Block Storage
      HBase
      Foreign namespaces
      Limitation – Tight coupling
      © Hortonworks Inc. 2011
      5
    • 11. © Hortonworks Inc. 2011
      6
      Isolation is a problem for even small clusters
    • 12. HDFS Federation
      © Hortonworks Inc. 2011
      7
      ...
      ...
      NN-n
      NN-k
      NN-1
      Foreign NS n
      Namespace
      NS1
      NS k
      ...
      ...
      Pool k
      Pool n
      Pool 1
      Block Pools
      Balancer
      Block Storage
      Datanode 2
      Datanode m
      Datanode 1
      ...
      Common Storage
      Multiple independent Namenodes and Namespace Volumes in a cluster
      Namespace Volume = Namespace + Block Pool
      Block Storage as generic storage service
      Set of blocks for a Namespace Volume is called a Block Pool
      DNs store blocks for all the Namespace Volumes – no partitioning
    • 13. Key Ideas & Benefits
      Distributed Namespace: Partitioned across namenodes
      Simple and Robust due to independent masters
      Each master serves a namespace volume
      Preserves namenode stability - very little code change in the namenodes
      Scalability – 6K nodes, 100K tasks, 120PB and 1 billion files
      Block Pools enable generic storage service
      Enables Namespace Volumes to be independent of each other
      Fuels innovation and Rapid development
      New implementations of file systems and Applications on top of block storage possible
      New block pool categories – tmp storage, distributed cache, small object storage
      In future, move Block Management out of namenode to separate set of nodes
      Simplifies namespace/application implementation
      Distributed namenode becomes significantly simpler
      © Hortonworks Inc. 2011
      8
      Alternate NN Implementation
      HBase
      HDFS
      Namespace
      MR tmp
      Storage Service
    • 14. HDFS Federation – First Phase Details
      Simple design
      Little change in the Namenode, most change in Datanode, Config and Tools
      Core development in 4 months
      Namespace and Block Management remain in Namenode
      Block Management could be moved out of namenode in the future
      Little impact on existing deployments
      Single namenode configuration runs as is
      Cluster level Web UI for better manageability
      Provides cluster summary, namenode list and decommissioning status
      Tools
      Decommissioning, Balancer and other tools work with multiple namespaces
      © Hortonworks Inc. 2011
      9
    • 15. Managing Namespaces
      Federation has multiple namespaces – Don’t you need a single global namespace?
      Key is to share the data and the names used to access the data
      A global namespace is one way to do that
      Client-side mount table is another way to share.
      Shared mount-table => “global” shared view
      Personalized mount-table => per-application view
      Share the data that matter by mounting it
      Client-side implementation of mount tables
      No single point of failure
      No hotspot for root and top level directories
      © Hortonworks Inc. 2011
      10
      Client-side mount-table
      /
      project
      home
      tmp
      data
      NS4
      NS3
      NS2
      NS1
    • 16. HDFS I/O Improvements
      Ongoing improvements for HBase (HDFS-1599)
      Append, Flush and Sync improvements
      Allow reuse of connections to datanodes (HDFS-941)
      35-65% improvement for random reads
      ~40% improvement in latency as seen by HBase
      Pure Java CRC implementation (HADOOP-6148)
      ~2x improvement
      CRC and other improvements in progress (HDFS-2080)
      2.5x performance improvement for CPU bound sequential reads
      Improved locking for multithreaded readers (HDFS-1148)
      15% improvement for HBase
      Improvements in threading model of readers(HDFS-918/HDFS-1323)
      10-15% improvement in reads
      © Hortonworks Inc. 2011
      11
    • 17. Other HDFS Features In Release 0.23
      HDFS Raid (HDFS-503)
      Lower replication for better storage efficiency using Erasure Codes
      Edits cleanup (HDFS-1073)
      Simpler model for fsimage and editlog
      Write pipeline recovery (HDFS-1606)
      Handles loss of datanode in pipeline for better data reliability
      Improved decommissioning
      Datanode Protocol Cleanup (HDFS-1937)
      Towards cleaner separation of Block Storage
      Metrics (HADOOP-6728)
      Better metrics management
      HTTP support
      Read Write lock (0.22)
      Better namenode operational throughput
      © Hortonworks Inc. 2011
      12
      Other Features
      • Startup Time (HDFS-1443)
      • 18. Startup time from 1 hour to 20 mins
      • 19. Upgrade from 3 hours to 25 mins
      • 20. Disk Fail In Place (HADOOP-7123)
      • 21. Currently disk failure brings down a datanode
      • 22. Loss of significant storage with high density datanode
      • 23. Changes and best practices to make datanode resilient to disk failures
      Operational
    • 24. Futures
      HA for Namenode (HDFS-1623)
      Work in progress
      Wire compatibility (HADOOP-7347)
      Data transfer protocol cleanup and use Protocol Buffers (done)
      Protocol Buffers for RPC
      Continue I/Operformance improvements
      Lot of work already done in 0.23
      Partial namespace in memory
      Further scale the namespace size supported by the namenode
      More management and operability improvements
      © Hortonworks Inc. 2011
      13
    • 25. © Hortonworks Inc. 2011
      Thank you.
      14

    ×