SlideShare a Scribd company logo
Data Availability and Integrity
in Apache Hadoop
Steve Loughran
@steveloughran
stevel@apache.org




© Hortonworks Inc. 2012
Questions Hadoop Ops teams ask

• Can Hadoop keep my data safe?

• Can Hadoop keep my data available?

• What happens when things go wrong?

• Can you improve this?




                                       Page 2
     © Hortonworks Inc. 2012
Can Hadoop Keep My Data Safe?
                                             Switch



                 ToR Switch                           ToR Switch   ToR Switch
                                    file

                                    block1
                      Name          block2             DataNode     DataNode
                      Node          block3
                                    …




                                                       DataNode     DataNode




                       2ary
                      Name                             DataNode     DataNode
                      Node




                      (Job
                    Tracker)                           DataNode     DataNode




                                                                                Page 3
          © Hortonworks Inc. 2012
Replication handles data integrity
• CRC32 checksum per 512 bytes
• Verified across datanodes on write
• Verified on all reads
• Background verification of all blocks (~weekly)
• Corrupt blocks re-replicated
• All replicas corrupt  operations team
  intervention

2009: Yahoo! lost 19 out of 329M blocks on 20K
servers –bugs now fixed
                                                    Page 4
     © Hortonworks Inc. 2012
Harder: Switch failure
                                            Switch



                ToR Switch                           ToR Switch   ToR Switch
                                   file

                                   block1
                     Name          block2             DataNode     DataNode
                     Node          block3
                                   …




                                                      DataNode     DataNode




                      2ary
                     Name                             DataNode     DataNode
                     Node




                     (Job
                   Tracker)                           DataNode     DataNode




                                                                               Page 5
         © Hortonworks Inc. 2012
Bonded 1 GbE >1 switch
Avoids hardware problems, not software




                                         Page 6
© Hortonworks Inc. 2012
NameNode failure rare but costs
                                          ToR Switch



  1. Try to reboot/restart
                                  NN IP




  2. Bring up new                                             Shared storage for
  NameNode server                           Name              filesystem image and
                                  NN IP     Node
  -with same IP                                               journal ("edit log")
  -or restart DataNodes

                                             2ary
                                            Name       (Secondary NN receives
                                            Node
                                                       streamed journal and checkpoints
                                                       filesystem image)



Yahoo!: 22 NameNode failures on 25 clusters in 18 months = .99999 availability

                                                                                     Page 7
        © Hortonworks Inc. 2012
What to improve

• Address costs of NameNode failure in Hadoop 1

• Add live NN failover (HDFS 2.0)

• Eliminate shared storage (HDFS 2.x)

• Add resilience to the entire stack




                                              Page 8
     © Hortonworks Inc. 2012
Full Stack HA
add resilience to planned/unplanned outages of
layers underneath




                                                 9
© Hortonworks Inc. 2012
HA in Hadoop 1 (HDP1)
Use existing HA clustering technologies to add
cold failover of key manager services:
   VMWare vSphere HA
   RedHat HA Linux




                                                 10
© Hortonworks Inc. 2012
RedHat HA Linux
                  ToR Switches




         NN IP            Name
                                      DataNode     DataNode
                          Node
          IP1




          NN IP           Name
                                      DataNode     DataNode
                          Node
          IP2




        2NN IP             2ary
                          Name        DataNode     DataNode
          IP3             Node




          JT IP
                       (Job
                     Tracker)         DataNode     DataNode
         IP4




    HA Linux: heartbeats & failover



                                                              Page 11
© Hortonworks Inc. 2012
Linux HA Implementation

• Replace init.d script with “Resource Agent” script

• Probe deep state of HDFS, Job Tracker

• Detection & handling of hung process hard

• Test in virtual + physical environments

• Testing with physical clusters




                                                  Page 12
     © Hortonworks Inc. 2012
Yes, but does it work?

public void testKillHungNN() {
  assertRestartsHDFS {
    nnServer.kill(19,
      "/var/run/hadoop/hadoop-hadoop-namenode.pid")
  }
}


 Groovy JUnit tests
 “Tools of Chaos” to break remote hosts and
 infrastructures



                                                      Page 13
      © Hortonworks Inc. 2012
And how long does it take?

Small cluster: 1-3 minutes

Medium Cluster: 2-4 Minutes

Where Medium == A Petabyte or less



Cold Failover is good enough for small/medium clusters
                                                     14
     © Hortonworks Inc. 2012
“Full Stack”: IPC client
Configurable retry & time to block
  ipc.client.connect.max.retries
  dfs.client.retry.policy.enabled


1. Blocking works for most clients (HBase, Pig…)

2. Failure-aware applications can tune/disable

3. Job tracker added “Safe Mode” for outages


                                                 Page 15
     © Hortonworks Inc. 2012
Putting it all together: Demo




                                Page 16
    © Hortonworks Inc. 2012
HA in Hadoop HDFS 2




                          Page 17
© Hortonworks Inc. 2012
Hadoop 2.0 HA


 Zoo-
Keeper                        Standby
                              Active                 IP1
                                           Active
                              Failure-                            DataNode
                             Controller     NN




 Zoo-
Keeper



                              Active
                              Standby     Standby
                                           Active
                              Failure-                            DataNode
                             Controller     NN      IP2
 Zoo-
Keeper




                                                                             Page 18
   © Hortonworks Inc. 2012
When will HDFS 2 be ready?
Moving from alpha to beta ... production in 2013

Download and play with early releases!




                                                   Page 19
© Hortonworks Inc. 2012
Moving forward
• Retry policies for all remote client
  protocols/libraries in the stack.

• Dynamic (zookeeper?) service lookup

• YARN needs HA of Resource Manager, individual
  MR clusters

• “No more Managers”


                                             Page 20
     © Hortonworks Inc. 2012
Summary
• HDFS handles corruption and partial loss of data
  today

• Hadoop 1 now has cold failover for small/medium
  clusters

• Hadoop 2 adding hot failover

• Full Stack HA for resilience to outages


                                                Page 21
     © Hortonworks Inc. 2012
Single Points of Failure
There's always a SPOF

Q. How do you find it?

A. It finds you


                              Page 22
    © Hortonworks Inc. 2012
Page 23
© Hortonworks Inc. 2012

More Related Content

What's hot

Dash7 Technology 2009 2011 Seoul Briefing
Dash7 Technology 2009 2011 Seoul BriefingDash7 Technology 2009 2011 Seoul Briefing
Dash7 Technology 2009 2011 Seoul Briefing
Haystack Technologies
 
Running with the Devil: Mechanical Sympathetic Networking
Running with the Devil: Mechanical Sympathetic NetworkingRunning with the Devil: Mechanical Sympathetic Networking
Running with the Devil: Mechanical Sympathetic Networking
Todd Montgomery
 
Couchbase Server 2.0 - XDCR - Deep dive
Couchbase Server 2.0 - XDCR - Deep diveCouchbase Server 2.0 - XDCR - Deep dive
Couchbase Server 2.0 - XDCR - Deep dive
Dipti Borkar
 
Integration of Cloud and Grid Middleware at DGRZR
Integration of Cloud and Grid Middleware at DGRZRIntegration of Cloud and Grid Middleware at DGRZR
Integration of Cloud and Grid Middleware at DGRZR
Stefan Freitag
 
Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues
Lucidworks (Archived)
 
Faster and Smaller qcow2 Files with Subcluster-based Allocation
Faster and Smaller qcow2 Files with Subcluster-based AllocationFaster and Smaller qcow2 Files with Subcluster-based Allocation
Faster and Smaller qcow2 Files with Subcluster-based Allocation
Igalia
 
Kernel Recipes 2019 - XDP closer integration with network stack
Kernel Recipes 2019 -  XDP closer integration with network stackKernel Recipes 2019 -  XDP closer integration with network stack
Kernel Recipes 2019 - XDP closer integration with network stack
Anne Nicolas
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
Vipin Varghese
 
How swift is your Swift - SD.pptx
How swift is your Swift - SD.pptxHow swift is your Swift - SD.pptx
How swift is your Swift - SD.pptx
OpenStack Foundation
 
Userspace networking
Userspace networkingUserspace networking
Userspace networking
Stephen Hemminger
 
Ncar globally accessible user environment
Ncar globally accessible user environmentNcar globally accessible user environment
Ncar globally accessible user environment
inside-BigData.com
 
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
Jim St. Leger
 
Oow2007 performance
Oow2007 performanceOow2007 performance
Oow2007 performance
Ricky Zhu
 
Collaborate vdb performance
Collaborate vdb performanceCollaborate vdb performance
Collaborate vdb performance
Kyle Hailey
 
NFS and Oracle
NFS and OracleNFS and Oracle
NFS and Oracle
Kyle Hailey
 
Collaborate nfs kyle_final
Collaborate nfs kyle_finalCollaborate nfs kyle_final
Collaborate nfs kyle_final
Kyle Hailey
 
Transition to ipv6 cgv6-edited
Transition to ipv6  cgv6-editedTransition to ipv6  cgv6-edited
Transition to ipv6 cgv6-edited
Fred Bovy
 
20121205 open stack_accelerating_science_v3
20121205 open stack_accelerating_science_v320121205 open stack_accelerating_science_v3
20121205 open stack_accelerating_science_v3
Tim Bell
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Jeffrey Breen
 
Mongo db roma replication and sharding
Mongo db roma replication and shardingMongo db roma replication and sharding
Mongo db roma replication and sharding
Guglielmo Incisa Di Camerana
 

What's hot (20)

Dash7 Technology 2009 2011 Seoul Briefing
Dash7 Technology 2009 2011 Seoul BriefingDash7 Technology 2009 2011 Seoul Briefing
Dash7 Technology 2009 2011 Seoul Briefing
 
Running with the Devil: Mechanical Sympathetic Networking
Running with the Devil: Mechanical Sympathetic NetworkingRunning with the Devil: Mechanical Sympathetic Networking
Running with the Devil: Mechanical Sympathetic Networking
 
Couchbase Server 2.0 - XDCR - Deep dive
Couchbase Server 2.0 - XDCR - Deep diveCouchbase Server 2.0 - XDCR - Deep dive
Couchbase Server 2.0 - XDCR - Deep dive
 
Integration of Cloud and Grid Middleware at DGRZR
Integration of Cloud and Grid Middleware at DGRZRIntegration of Cloud and Grid Middleware at DGRZR
Integration of Cloud and Grid Middleware at DGRZR
 
Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues
 
Faster and Smaller qcow2 Files with Subcluster-based Allocation
Faster and Smaller qcow2 Files with Subcluster-based AllocationFaster and Smaller qcow2 Files with Subcluster-based Allocation
Faster and Smaller qcow2 Files with Subcluster-based Allocation
 
Kernel Recipes 2019 - XDP closer integration with network stack
Kernel Recipes 2019 -  XDP closer integration with network stackKernel Recipes 2019 -  XDP closer integration with network stack
Kernel Recipes 2019 - XDP closer integration with network stack
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
How swift is your Swift - SD.pptx
How swift is your Swift - SD.pptxHow swift is your Swift - SD.pptx
How swift is your Swift - SD.pptx
 
Userspace networking
Userspace networkingUserspace networking
Userspace networking
 
Ncar globally accessible user environment
Ncar globally accessible user environmentNcar globally accessible user environment
Ncar globally accessible user environment
 
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
 
Oow2007 performance
Oow2007 performanceOow2007 performance
Oow2007 performance
 
Collaborate vdb performance
Collaborate vdb performanceCollaborate vdb performance
Collaborate vdb performance
 
NFS and Oracle
NFS and OracleNFS and Oracle
NFS and Oracle
 
Collaborate nfs kyle_final
Collaborate nfs kyle_finalCollaborate nfs kyle_final
Collaborate nfs kyle_final
 
Transition to ipv6 cgv6-edited
Transition to ipv6  cgv6-editedTransition to ipv6  cgv6-edited
Transition to ipv6 cgv6-edited
 
20121205 open stack_accelerating_science_v3
20121205 open stack_accelerating_science_v320121205 open stack_accelerating_science_v3
20121205 open stack_accelerating_science_v3
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
 
Mongo db roma replication and sharding
Mongo db roma replication and shardingMongo db roma replication and sharding
Mongo db roma replication and sharding
 

Viewers also liked

Data protection for hadoop environments
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environments
DataWorks Summit
 
Inside hadoop-dev
Inside hadoop-devInside hadoop-dev
Inside hadoop-dev
Steve Loughran
 
Importance of Big data for your Business
Importance of Big data for your BusinessImportance of Big data for your Business
Importance of Big data for your Business
azuyo.com
 
Hadoop Application Architectures tutorial - Strata London
Hadoop Application Architectures tutorial - Strata LondonHadoop Application Architectures tutorial - Strata London
Hadoop Application Architectures tutorial - Strata London
hadooparchbook
 
Architectural considerations for Hadoop Applications
Architectural considerations for Hadoop ApplicationsArchitectural considerations for Hadoop Applications
Architectural considerations for Hadoop Applications
hadooparchbook
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
Christopher Pezza
 
Hadoop I/O Analysis
Hadoop I/O AnalysisHadoop I/O Analysis
Hadoop I/O Analysis
Richard McDougall
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Uwe Printz
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
Arvind Kumar
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
DataWorks Summit
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
Avkash Chauhan
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
Hanborq Inc.
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
 
Hadoop
HadoopHadoop
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
Ricardo Varela
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
Hortonworks
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
Nasrin Hussain
 

Viewers also liked (20)

Data protection for hadoop environments
Data protection for hadoop environmentsData protection for hadoop environments
Data protection for hadoop environments
 
Inside hadoop-dev
Inside hadoop-devInside hadoop-dev
Inside hadoop-dev
 
Importance of Big data for your Business
Importance of Big data for your BusinessImportance of Big data for your Business
Importance of Big data for your Business
 
Hadoop Application Architectures tutorial - Strata London
Hadoop Application Architectures tutorial - Strata LondonHadoop Application Architectures tutorial - Strata London
Hadoop Application Architectures tutorial - Strata London
 
Architectural considerations for Hadoop Applications
Architectural considerations for Hadoop ApplicationsArchitectural considerations for Hadoop Applications
Architectural considerations for Hadoop Applications
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Hadoop I/O Analysis
Hadoop I/O AnalysisHadoop I/O Analysis
Hadoop I/O Analysis
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Hadoop
HadoopHadoop
Hadoop
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Similar to Availability and Integrity in hadoop (Strata EU Edition)

HA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkHA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talk
Steve Loughran
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
Jared Winick
 
Understanding hdfs
Understanding hdfsUnderstanding hdfs
Understanding hdfs
Thirunavukkarasu Ps
 
Operate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineOperate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmine
DataWorks Summit
 
HDFS - What's New and Future
HDFS - What's New and FutureHDFS - What's New and Future
HDFS - What's New and Future
DataWorks Summit
 
Hadoop, Taming Elephants
Hadoop, Taming ElephantsHadoop, Taming Elephants
Hadoop, Taming Elephants
Ovidiu Dimulescu
 
Hadoop Inside
Hadoop InsideHadoop Inside
Hadoop Inside
Eun-Jo Lee
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Ovidiu Dimulescu
 
Conference slides: MySQL Cluster Performance Tuning
Conference slides: MySQL Cluster Performance TuningConference slides: MySQL Cluster Performance Tuning
Conference slides: MySQL Cluster Performance Tuning
Severalnines
 
Strata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureStrata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and Future
Cloudera, Inc.
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
Richard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
outstanding59
 
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?  Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
EMC
 
High Availability != High-cost
High Availability != High-costHigh Availability != High-cost
High Availability != High-cost
normanmaurer
 
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HA
Hanborq Inc.
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrow
Steve Loughran
 
An Hour of DB2 Tips
An Hour of DB2 TipsAn Hour of DB2 Tips
An Hour of DB2 Tips
Craig Mullins
 
Hadoop at Rakuten, 2011/07/06
Hadoop at Rakuten, 2011/07/06Hadoop at Rakuten, 2011/07/06
Hadoop at Rakuten, 2011/07/06
Rakuten Group, Inc.
 
[OSDC 2013] Hadoop Cluster HA 的經驗分享
[OSDC 2013] Hadoop Cluster HA 的經驗分享[OSDC 2013] Hadoop Cluster HA 的經驗分享
[OSDC 2013] Hadoop Cluster HA 的經驗分享
Tsu-Fen Han
 

Similar to Availability and Integrity in hadoop (Strata EU Edition) (20)

HA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talkHA Hadoop -ApacheCon talk
HA Hadoop -ApacheCon talk
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
 
Understanding hdfs
Understanding hdfsUnderstanding hdfs
Understanding hdfs
 
Operate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineOperate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmine
 
HDFS - What's New and Future
HDFS - What's New and FutureHDFS - What's New and Future
HDFS - What's New and Future
 
Hadoop, Taming Elephants
Hadoop, Taming ElephantsHadoop, Taming Elephants
Hadoop, Taming Elephants
 
Hadoop Inside
Hadoop InsideHadoop Inside
Hadoop Inside
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Conference slides: MySQL Cluster Performance Tuning
Conference slides: MySQL Cluster Performance TuningConference slides: MySQL Cluster Performance Tuning
Conference slides: MySQL Cluster Performance Tuning
 
Strata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureStrata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and Future
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?  Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
 
High Availability != High-cost
High Availability != High-costHigh Availability != High-cost
High Availability != High-cost
 
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HA
 
Hadoop: today and tomorrow
Hadoop: today and tomorrowHadoop: today and tomorrow
Hadoop: today and tomorrow
 
An Hour of DB2 Tips
An Hour of DB2 TipsAn Hour of DB2 Tips
An Hour of DB2 Tips
 
Hadoop at Rakuten, 2011/07/06
Hadoop at Rakuten, 2011/07/06Hadoop at Rakuten, 2011/07/06
Hadoop at Rakuten, 2011/07/06
 
[OSDC 2013] Hadoop Cluster HA 的經驗分享
[OSDC 2013] Hadoop Cluster HA 的經驗分享[OSDC 2013] Hadoop Cluster HA 的經驗分享
[OSDC 2013] Hadoop Cluster HA 的經驗分享
 

More from Steve Loughran

Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
Steve Loughran
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is over
Steve Loughran
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)
Steve Loughran
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit Edition
Steve Loughran
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!
Steve Loughran
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()
Steve Loughran
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming Deployed
Steve Loughran
 
Testing
TestingTesting
I hate mocking
I hate mockingI hate mocking
I hate mocking
Steve Loughran
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?
Steve Loughran
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Steve Loughran
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User Group
Steve Loughran
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object stores
Steve Loughran
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
Steve Loughran
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object Stores
Steve Loughran
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony Era
Steve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Steve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
Steve Loughran
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARN
Steve Loughran
 
YARN Services
YARN ServicesYARN Services
YARN Services
Steve Loughran
 

More from Steve Loughran (20)

Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is over
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit Edition
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming Deployed
 
Testing
TestingTesting
Testing
 
I hate mocking
I hate mockingI hate mocking
I hate mocking
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User Group
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object stores
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object Stores
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony Era
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARN
 
YARN Services
YARN ServicesYARN Services
YARN Services
 

Recently uploaded

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 

Recently uploaded (20)

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 

Availability and Integrity in hadoop (Strata EU Edition)

  • 1. Data Availability and Integrity in Apache Hadoop Steve Loughran @steveloughran stevel@apache.org © Hortonworks Inc. 2012
  • 2. Questions Hadoop Ops teams ask • Can Hadoop keep my data safe? • Can Hadoop keep my data available? • What happens when things go wrong? • Can you improve this? Page 2 © Hortonworks Inc. 2012
  • 3. Can Hadoop Keep My Data Safe? Switch ToR Switch ToR Switch ToR Switch file block1 Name block2 DataNode DataNode Node block3 … DataNode DataNode 2ary Name DataNode DataNode Node (Job Tracker) DataNode DataNode Page 3 © Hortonworks Inc. 2012
  • 4. Replication handles data integrity • CRC32 checksum per 512 bytes • Verified across datanodes on write • Verified on all reads • Background verification of all blocks (~weekly) • Corrupt blocks re-replicated • All replicas corrupt  operations team intervention 2009: Yahoo! lost 19 out of 329M blocks on 20K servers –bugs now fixed Page 4 © Hortonworks Inc. 2012
  • 5. Harder: Switch failure Switch ToR Switch ToR Switch ToR Switch file block1 Name block2 DataNode DataNode Node block3 … DataNode DataNode 2ary Name DataNode DataNode Node (Job Tracker) DataNode DataNode Page 5 © Hortonworks Inc. 2012
  • 6. Bonded 1 GbE >1 switch Avoids hardware problems, not software Page 6 © Hortonworks Inc. 2012
  • 7. NameNode failure rare but costs ToR Switch 1. Try to reboot/restart NN IP 2. Bring up new Shared storage for NameNode server Name filesystem image and NN IP Node -with same IP journal ("edit log") -or restart DataNodes 2ary Name (Secondary NN receives Node streamed journal and checkpoints filesystem image) Yahoo!: 22 NameNode failures on 25 clusters in 18 months = .99999 availability Page 7 © Hortonworks Inc. 2012
  • 8. What to improve • Address costs of NameNode failure in Hadoop 1 • Add live NN failover (HDFS 2.0) • Eliminate shared storage (HDFS 2.x) • Add resilience to the entire stack Page 8 © Hortonworks Inc. 2012
  • 9. Full Stack HA add resilience to planned/unplanned outages of layers underneath 9 © Hortonworks Inc. 2012
  • 10. HA in Hadoop 1 (HDP1) Use existing HA clustering technologies to add cold failover of key manager services: VMWare vSphere HA RedHat HA Linux 10 © Hortonworks Inc. 2012
  • 11. RedHat HA Linux ToR Switches NN IP Name DataNode DataNode Node IP1 NN IP Name DataNode DataNode Node IP2 2NN IP 2ary Name DataNode DataNode IP3 Node JT IP (Job Tracker) DataNode DataNode IP4 HA Linux: heartbeats & failover Page 11 © Hortonworks Inc. 2012
  • 12. Linux HA Implementation • Replace init.d script with “Resource Agent” script • Probe deep state of HDFS, Job Tracker • Detection & handling of hung process hard • Test in virtual + physical environments • Testing with physical clusters Page 12 © Hortonworks Inc. 2012
  • 13. Yes, but does it work? public void testKillHungNN() { assertRestartsHDFS { nnServer.kill(19, "/var/run/hadoop/hadoop-hadoop-namenode.pid") } } Groovy JUnit tests “Tools of Chaos” to break remote hosts and infrastructures Page 13 © Hortonworks Inc. 2012
  • 14. And how long does it take? Small cluster: 1-3 minutes Medium Cluster: 2-4 Minutes Where Medium == A Petabyte or less Cold Failover is good enough for small/medium clusters 14 © Hortonworks Inc. 2012
  • 15. “Full Stack”: IPC client Configurable retry & time to block ipc.client.connect.max.retries dfs.client.retry.policy.enabled 1. Blocking works for most clients (HBase, Pig…) 2. Failure-aware applications can tune/disable 3. Job tracker added “Safe Mode” for outages Page 15 © Hortonworks Inc. 2012
  • 16. Putting it all together: Demo Page 16 © Hortonworks Inc. 2012
  • 17. HA in Hadoop HDFS 2 Page 17 © Hortonworks Inc. 2012
  • 18. Hadoop 2.0 HA Zoo- Keeper Standby Active IP1 Active Failure- DataNode Controller NN Zoo- Keeper Active Standby Standby Active Failure- DataNode Controller NN IP2 Zoo- Keeper Page 18 © Hortonworks Inc. 2012
  • 19. When will HDFS 2 be ready? Moving from alpha to beta ... production in 2013 Download and play with early releases! Page 19 © Hortonworks Inc. 2012
  • 20. Moving forward • Retry policies for all remote client protocols/libraries in the stack. • Dynamic (zookeeper?) service lookup • YARN needs HA of Resource Manager, individual MR clusters • “No more Managers” Page 20 © Hortonworks Inc. 2012
  • 21. Summary • HDFS handles corruption and partial loss of data today • Hadoop 1 now has cold failover for small/medium clusters • Hadoop 2 adding hot failover • Full Stack HA for resilience to outages Page 21 © Hortonworks Inc. 2012
  • 22. Single Points of Failure There's always a SPOF Q. How do you find it? A. It finds you Page 22 © Hortonworks Inc. 2012

Editor's Notes

  1. Once you adopt Hadoop, it can rapidly become the biggest central storage point of data in an organisation. At which point you start caring about how well it looks after your data. this talk aims to answer these questions
  2. HDFS is built on the concept that in a large cluster, disk failure is inevitable. The system is designed to change the impact of this from the beeping of pagers to a background hum.Akey part of the HDFS design: copying the blocks across machines means that the loss of a disk, server or even entire rack keeps the data available.
  3. There's lots of checksumming going on of the data to pick up corruption -CRCs created at write time (and even verified end-to-end in a cross-machine write), scanned on read time.
  4. Rack failures can generate a lot of replication traffic, as every block that was stored in the rack needs to be replicated at least once. The replication still has to follow the constraints of no more than one block copy per server. Much of this traffic is intra-rack, but every block which already has 2x replicas on a single rack will be replicated to another rack if possible.This is what scares ops team. Important: there is no specific notion of "mass failure" or "network partition". Here HDFS only sees that four machines have gone down.
  5. When the NameNode fails, the cluster is offline.client applications -including the MapReduce layer and HBase see this, and fail.
  6. One question is "how long does this take"For clusters of under a few hundred machines, with not that much of an edit log to replay, failure detection dominates the time. vSphere is the slowest there as it has to notice that the monitor has stopped sending heartbeats (~60s), then boot the OS (~60s) before bringing up the NN. This is why it's better for smaller clusters.LinuxHA fails over faster; if set to poll every 10-15 s then failover begins within 20s of the outage.
  7. The risk here isn't just the failover, it's all the other changes that went into HDFS 2; the bug reporting numbers for HDFS 2 are way down on the previous releases, which means that it was either much better, or hasn't been tested enough.the big issue here is that the data in your HDFS cluster may be one of the most valuable assets of an organization, you can't afford to lose it. It's why it's good to be cautious with any filesystem, be it linux or a layer above.
  8. When the NN fails, it's obvious to remote applications, which fail, and to the ops team, which have to fix it. It also adds cost to the system. Does it happen often? No, almost never, but having the ops team and hardware ready for it is the cost
  9. One question is "how long does this take"For clusters of under a few hundred machines, with not that much of an edit log to replay, failure detection dominates the time. vSphere is the slowest there as it has to notice that the monitor has stopped sending heartbeats (~60s), then boot the OS (~60s) before bringing up the NN. This is why it's better for smaller clusters.LinuxHA fails over faster; if set to poll every 10-15 s then failover begins within 20s of the outage.