HDFS:Now and FutureSanjay Radia (sanjay@hortonworks.com)@Hortonworks.comSuresh Srinivas (suresh@hortonworks.com)© Hortonwo...
Outline•  Hadoop 1 and Hadoop 2 Releases•  Generalized storage service  –  Leverage it for further innovation•  Enterprise...
Hadoop 1 and Hadoop 2Hadoop 1 (GA)                     Hadoop 2 (alpha)• Security                        • New Append• App...
Testing & Quality – Used for each stable releaseNightly Testing      –  1200 automated tests on 30 nodes      –  Live data...
Hadoop 1 and Hadoop 2 Timelines               0.20.1       DEV         QA     betaHADOOP 1.0                              ...
Outline•  Hadoop 1 and Hadoop 2 Releases•  Generalized storage service  –  Leverage it for further innovation•  Enterprise...
Federation: Generalized Block Storage         Namespace       NN-1                                                        ...
HDFS’ Generic Storage Service     Opportunities for Innovation•  Federation - Distributed (Partitioned) Namespace   –  Sim...
Shadow File System for Another                                                                            S3              ...
Managing Namespaces•    Federation has multiple namespaces                                                   Client-side  ...
Next Steps… first class support for volumes                                     •  NameServer - Container for namespaces  ...
Outline•  Hadoop 1 and Hadoop 2 Releases•  Generalized storage service  –  Leverage it for further innovation•  Enterprise...
Enterprise Use Cases•  High Availability þ•  Standard Interfaces þ  –  WebHdfs(REST) þ, Fuse þ and NFS access•  Snapsh...
Outline•  Hadoop 1 and Hadoop 2 Releases•  Generalized storage service  –  Leverage it for further innovation•  Enterprise...
Infrastructure Improvements•  Netty  –  Better connection and thread management•  Image/Edits management  –  HDFS image/ed...
Outline•  Hadoop 1 and Hadoop 2 Releases•  Generalized storage service  –  Leverage it for further innovation•  Enterprise...
HA in 1.0            Using Full Stack HA Architecture                                               17	  © Hortonworks Inc...
Hadoop Full Stack HA Architecture                                          Slave Nodes of Hadoop Cluster                  ...
HA in Hadoop 1 with HDP1•  Full Stack HA Architecture  –  NameNode      –  Clients pause automatically      –  JobTracker ...
Hadoop NN/JT HA with vSphere                               Page 20   © Hortonworks Inc. 2011
NN HA with Linux-HA         Linux	  HA	                                            Linux	  HA	                            ...
Failover Times•  NameNode Failover times with vSphere and LinuxHA  –  Failure detection and Failover – 0.5 to 2 minutes  –...
Summary•  Hadoop 1 – The most stable release  –  Now with Full-Stack HA using industry proven tools•  Hadoop 2 – in Alpha ...
Thanks                            Page 24  © Hortonworks Inc. 2011
Upcoming SlideShare
Loading in …5
×

Hdfs high availability

5,052 views

Published on

The HDFS NameNode is a robust and reliable service as seen in practice in production at Yahoo and other customers. However, the NameNode does not have automatic failover support. A hot failover solution called HA NameNode is currently under active development (HDFS-1623). This talk will cover the architecture, design and setup. We will also discuss the future direction for HA NameNode.

Published in: Technology, Business
1 Comment
9 Likes
Statistics
Notes
No Downloads
Views
Total views
5,052
On SlideShare
0
From Embeds
0
Number of Embeds
708
Actions
Shares
0
Downloads
0
Comments
1
Likes
9
Embeds 0
No embeds

No notes for slide

Hdfs high availability

  1. 1. HDFS:Now and FutureSanjay Radia (sanjay@hortonworks.com)@Hortonworks.comSuresh Srinivas (suresh@hortonworks.com)© Hortonworks Inc. 2011 Page 1
  2. 2. Outline•  Hadoop 1 and Hadoop 2 Releases•  Generalized storage service –  Leverage it for further innovation•  Enterprise Use Cases•  HDFS Infrastructure Improvements•  HA in Hadoop 1! 2
  3. 3. Hadoop 1 and Hadoop 2Hadoop 1 (GA) Hadoop 2 (alpha)• Security • New Append• Append/Fsync (Hbase) • Federation• WebHdfs + Spnego • Wire compatibility• Write pipeline improvements • Edit logs rewrite• Local write optimization • Faster startup• Performance improvements • HA NameNode• Disk-fail-in-place Page 3 © Hortonworks Inc. 2011
  4. 4. Testing & Quality – Used for each stable releaseNightly Testing –  1200 automated tests on 30 nodes –  Live data and applicationsQE Certification for Release –  Large variety and scale tests on 500 nodes –  Performance benchmarking –  QE HIT integration testing of whole stackRelease Testing – alpha and beta•  Sandbox cluster – 3 clusters each with 400 - 1K nodes –  Major releases: 2 months testing on actual data - all production projects must sign off•  Research clusters – 6 Clusters (non-revenue production jobs) (4K Nodes) –  Major releases – minimum 2 months before moving to production –  .25Million to .5Million jobs per week if it clears research then mostly fine in productionRelease•  Production clusters - 11 clusters (4.5K nodes) –  Revenue generating, stricter SLAs 4
  5. 5. Hadoop 1 and Hadoop 2 Timelines 0.20.1 DEV QA betaHADOOP 1.0 DEV QA beta 0.20.2 Security DEV QA beta 0.20.1xx Operability, Multi Tenancy DEV QA beta Hadoop 1.0 0.20.2xx GA Old Append 1.0 DEV QA beta New AppendHADOOP 2.0 DEV 0.21 SecurityPort + DEV QA 0.22 Federation, YARN Hadoop 2.0 0.23 DEV QA alpha alpha HA, Wire Compatibility DEV QA beta 2.0 2008 2009 2010 2011 2012 Page 5 © Hortonworks Inc. 2011
  6. 6. Outline•  Hadoop 1 and Hadoop 2 Releases•  Generalized storage service –  Leverage it for further innovation•  Enterprise Use Cases•  HDFS Infrastructure Improvements•  HA in Hadoop 1! 6
  7. 7. Federation: Generalized Block Storage Namespace NN-1 NN-k NN-n Foreign                      NS1                    NS  k NS  n ... ... Pool    1 Pool    k Pool    n Block Storage                        Block    Pools Datanode  1   Datanode  2 Datanode  m ... ... ... Common  Storage•  Block Storage as generic storage service –  Set of blocks for a Namespace Volume is called a Block Pool –  DNs store blocks for all the Namespace Volumes – no partitioning•  Multiple independent Namenodes and Namespace Volumes in a cluster –  Namespace Volume = Namespace + Block Pool
  8. 8. HDFS’ Generic Storage Service Opportunities for Innovation•  Federation - Distributed (Partitioned) Namespace –  Simple and Robust due to independent masters Alternate NN Implementation HBase –  Scalability, Isolation, Availability HDFS Namespace MR tmp•  New Services – Independent Block Pools –  New FS - Partial namespace in memory –  MR Tmp storage, HBase directly on block storage Storage Service –  Shadow file system – caches HDFS, NFS, S3•  Future: move Block Management in DataNodes –  Simplifies namespace/application implementation –  Distributed namenode becomes significantly simple
  9. 9. Shadow File System for Another S3 Custom Shadow NameSpaces HDFS DataNode NFS•  Custom Namespace to shadow namespace of another system –  Uses a private block pool•  Different policies on the data –  E.g. Single replica, fetch missing ones from source –  Hadoop can serve as a processing engine for source data without putting a lot of load on source –  E.g. Reduce replication factor for data duplicated in another cluster Page 9 © Hortonworks Inc. 2011
  10. 10. Managing Namespaces•  Federation has multiple namespaces Client-side / mount-table•  Don’t you need a single global namespace? –  Some tenants want private namespace •  Hadoop as service – each tenant its own namespace –  Global? Key is to share the data and the names used to data project home tmp access the data•  A single global namespace is one way share•  Client-side mount table is another way to share. –  Shared mount-table => “global” shared view NS4 –  Personalized mount-table => per-application view •  Share the data that matter by mounting it•  Client-side implementation of mount tables NS1 NS2 NS3 –  No single point of failure –  No hotspot for root and top level directories
  11. 11. Next Steps… first class support for volumes •  NameServer - Container for namespaces ›  Lots of small namespace volumes –  Chosen per user/tenant/data feed –  Management policies (quota, …) … –  Mount tables for unified namespace NameServers as •  Can be managed by a central volume serverContainers of Namespaces ›  Move namespace for balancing •  WorkingSet of namespace in memory …   Datanode Datanode ›  Many more namespaces in a server Storage Layer •  Number of NameServers = ›  Sum of (Namespace working set) ›  Sum of (Namespace throughput) 11 © Hortonworks Inc. 2011
  12. 12. Outline•  Hadoop 1 and Hadoop 2 Releases•  Generalized storage service –  Leverage it for further innovation•  Enterprise Use Cases•  HDFS Infrastructure Improvements•  HA in Hadoop 1! 12
  13. 13. Enterprise Use Cases•  High Availability þ•  Standard Interfaces þ –  WebHdfs(REST) þ, Fuse þ and NFS access•  Snapshots - Under progress•  Disaster Recovery –  Distcp does parallel and incremental copies þ –  Enhance using journal interface & Snapshots•  Data Efficiency/RAID –  Productize the tools and experience at Facebook Page 13 © Hortonworks Inc. 2011
  14. 14. Outline•  Hadoop 1 and Hadoop 2 Releases•  Generalized storage service –  Leverage it for further innovation•  Enterprise Use Cases•  HDFS Infrastructure Improvements•  HA in Hadoop 1! 14
  15. 15. Infrastructure Improvements•  Netty –  Better connection and thread management•  Image/Edits management –  HDFS image/edits stored with in HDFS•  Parallel writes –  Lower latency•  Grouping blocks –  Scaling number of blocks and block reports•  Support for Heterogeneous Storage –  SSD, archival storage•  Rolling upgrades improvements –  Wire compatibility done Page 15 © Hortonworks Inc. 2011
  16. 16. Outline•  Hadoop 1 and Hadoop 2 Releases•  Generalized storage service –  Leverage it for further innovation•  Enterprise use cases•  HDFS Infrastructure Improvements•  HA in Hadoop 1! 16
  17. 17. HA in 1.0 Using Full Stack HA Architecture 17  © Hortonworks Inc. 2011
  18. 18. Hadoop Full Stack HA Architecture Slave Nodes of Hadoop Cluster job job job job job AppsRunningOutside Failover JT into Safemode NN JT NN N+K Server Server Server failover HA Cluster for Master Daemons 18 © Hortonworks Inc. 2011
  19. 19. HA in Hadoop 1 with HDP1•  Full Stack HA Architecture –  NameNode –  Clients pause automatically –  JobTracker pauses automatically –  HA for other Hadoop master daemons coming•  Use industry standard HA frameworks –  VMWare vSphere-HA, and others soon –  Industry Proven –  Failover, fencing, … –  Deals with tricky corner cases and prevents corruption –  Addition benefits –  N-N & N+K failover –  Migration for maintenance 19 © Hortonworks Inc. 2011
  20. 20. Hadoop NN/JT HA with vSphere Page 20 © Hortonworks Inc. 2011
  21. 21. NN HA with Linux-HA Linux  HA   Linux  HA   Heartbeat Resource  Mgr   Resource  Mgr   (Watchdog)   (Watchdog)   CmdsMonitor Health Monitor Healthof NN. OS, HW of NN. OS, HW NN Shared NN NN Active state Cold DN DN DN © Hortonworks Inc. 2011
  22. 22. Failover Times•  NameNode Failover times with vSphere and LinuxHA –  Failure detection and Failover – 0.5 to 2 minutes –  OS bootup needed for vSphere – 1 minute –  Namenode Startup (exit safemode) –  Small/Medium clusters – 1 to 2 minutes –  Large cluster – 5 to 15 minutes•  NameNode startup time measurements –  60 Nodes, 60K files, 6 million blocks, 300 TB raw storage – 40 sec –  180 Nodes, 200K files, 18 million blocks, 900TB raw storage – 120 sec Cold Failover is good enough for small/medium clusters Failure Detection and Automatic Failover Dominates 22 © Hortonworks Inc. 2011
  23. 23. Summary•  Hadoop 1 – The most stable release –  Now with Full-Stack HA using industry proven tools•  Hadoop 2 – in Alpha testing –  3 years of development –  significant new in alpha/beta testing –  Generalized storage layer – opportunities for innovation –  Partial namespace in memory, shadow/caching file system, MR tmp, etc. –  Hadoop 2 HA –  main difference – warm/hot failover•  Snapshot and DR improvements are coming Page 23 © Hortonworks Inc. 2011
  24. 24. Thanks Page 24 © Hortonworks Inc. 2011

×