SlideShare a Scribd company logo
Availability
Reliability in distributed system 
•To be truly reliable, a distributed system must have the following characteristics: 
–Fault-Tolerant: It can recover from component failures without performing incorrect actions. 
–Highly Available: It can restore operations, permitting it to resume providing services even when some components have failed. 
–Recoverable: Failed components can restart themselves and rejoin the system, after the cause of failure has been repaired. 
–Consistent: The system can coordinate actions by multiple components often in the presence of concurrency and failure. This underlies the ability of a distributed system to act like a non-distributed system. 
–Scalable: It can operate correctly even as some aspect of the system is scaled to a larger size. For example, we might increase the size of the network on which the system is running. This increases the frequency of network outages and could degrade a "non-scalable" system. Similarly, we might increase the number of users or servers, or overall load on the system. In a scalable system, this should not have a significant effect. 
–Predictable Performance: The ability to provide desired responsiveness in a timely manner. 
–Secure: The system authenticates access to data and services
SPOF 
•The combination of 
–replicating namenode metadata on multiple file-systems, and 
–using the secondary namenode to create checkpoints 
•Protects against data loss 
•But does not provide high-availability of the file- system.
SPOF 
•The namenode is still 
•a single point of failure (SPOF), 
•since if it did fail 
–all client 
–including MapReduce jobs 
•would be unable to read, write, or list files 
•Because the namenode is the sole repository of 
–the metadata and the 
–file-to-block mapping
SPOF 
•In such an event 
•The whole Hadoop system would effectively be 
•“Out of service” 
•Until a new namenode could be brought online.
Reasons for downtime 
•An important part of 
•improving availability and 
•articulating requirements is 
•understanding the causes of downtime 
•There are many types of failures in distributed systems, ways to classify them, and analyses of how failures result in downtime.
Maintenance 
•Maintenance to a master host normally requires a restart of the entire system.
Hardware failures 
•Hosts and their connections may fail 
•Hardware failures on the master host 
•or a failure in the connection between the master and the majority of the slaves 
•can cause system downtime
Software failures 
•Software bugs may cause a component in the system to stop functioning or require a restart. 
•For example, a bug in upgrade code could result in downtime due to data corruption. 
•A dependent software component may become unavailable (e.g. the Java garbage collector enters a stop-the-world phase). 
•A software bug in a master service will likely cause downtime.
Software failures 
•Software failures are a significant issue in distributed systems. 
•Even with rigorous testing, software bugs account for a substantial fraction of unplanned downtime (estimated at 25-35%). 
•Residual bugs in mature systems can be classified into two main categories .
Heisenbug 
•A bug that seems to disappear or alter its characteristics when it is observed or researched. 
•A common example is a bug that occurs in a release-mode compile of a program, but not when researched under debug- mode. 
•The name "heisenbug" is a pun on the "Heisenberg uncertainty principle," a quantum physics term which is commonly (yet inaccurately) used to refer to the way in which observers affect the measurements of the things that they are observing, by the act of observing alone (this is actually the observer effect, and is commonly confused with the Heisenberg uncertainty principle).
Bohrbug 
•A bug (named after the Bohr atom model) that, in contrast to a heisenbug, does not disappear or alter its characteristics when it is researched. 
•A Bohrbugtypically manifests itself reliably under a well- defined set of conditions.
Software failures 
•Heisenbugstend to be more prevalent in distributed systems than in local systems. 
•One reason for this is the difficulty programmers have in obtaining a coherent and comprehensive view of the interactions of concurrent processes.
Operator errors 
•People make mistakes. 
•Hadoop attempts to limit operator error by simplifying administration, validating its configuration, and providing useful messages in logs and UI components; 
•however operator mistakes may still cause downtime.
Strategy 
Severity of Database Downtime 
Planned 
Unplanned 
Catastrophic 
Latency of Database Recovery 
No Downtime 
HighAvailability 
ContinuousAvailability 
DisasterRecovery 
OnlineMaintenance 
OfflineMaintenance 
HighAvailabilityClusters 
Switchingand WarmStandbyReplication 
ColdStandby
Recall 
•The NameNodestores modifications to the file system as a log appended to a native file system file,edits. 
•When a NameNodestarts up, it reads HDFS state from an image file,fsimage, and then applies edits from the edits log file. 
•It then writes new HDFS state to thefsimageand starts normal operation with an empty edits file. 
•Since NameNodemergesfsimageandeditsfiles only during start up, the edits log file could get very large over time on a busy cluster. 
•Another side effect of a larger edits file is that next restart of NameNodetakes longer.
Availability –Attempt 1 -Secondary namenode 
•Its main role is to periodically merge the namespace image with the edit log to prevent the edit log from becoming too large. 
•The secondary namenode usually runs on a separate physical machine, since it requires plenty of CPU and as much memory as the namenode to perform the merge. 
•It keeps a copy of the merged namespace image, which can be used in the event of the namenode failing. 
•However, the state of the secondary namenode lags that of the primary, so in the event of total failure of the primary, data loss is almost certain. 
•The usual course of action in this case is to copy the namenode’smetadata files that are on NFS to the secondary and run it as the new primary. 
•The secondary NameNodestores the latest checkpoint in a directory which is structured the same way as the primary NameNode'sdirectory. 
•So that the check pointed image is always ready to be read by the primary NameNodeif necessary.
Long Recovery 
•To recover from a failed namenode, an administrator starts a new primary namenode with one of the file-system metadata replicas, and configures datanodesand clients to use this new namenode. 
•The new namenode is not able to serve requests until it has 
–loaded its namespace image into memory, 
–replayed its editlog, and 
–received enough block reports from the datanodesto leave safe mode. 
•On large clusters with many files and blocks, the time it takes for a namenode to start from cold can be 30 minutes or more. 
•The long recovery time is a problem for routine maintenance too. 
•In fact, since unexpected failure of the namenode is so rare, the case for planned downtime is actually more important in practice.
Other roads to availability 
•NameNodepersists its namespace using two files: 
–fsimage, which is the latest checkpoint of the namespace andedits, 
–a journal (log) of changes to the namespace since the checkpoint. 
•When a NameNodestarts up, it merges thefsimageandeditsjournal to provide an up-to-date view of the file system metadata. 
•The NameNodethen overwritesfsimagewith the new HDFS state and begins a neweditsjournal. 
•Secondary name-node acts as mere checkpointer. 
•Secondary name-node should be transformed into a standby name-node (SNN). 
•Make it a warm standby. 
•Provide real time streaming of edits to SNN so that it contained the up-to-date namespace state.
Availability –Attempt 2 -Backup node Checkpoint node 
•The Checkpoint node periodically creates checkpoints of the namespace. 
•Downloadsfsimageandeditsfrom the active NameNode 
•Merges locally, and uploads the new image back to the active NameNode. 
•The Backup node provides 
–the same checkpointingfunctionality as the Checkpoint node, 
–as well as maintaining an in-memory, up-to-date copy of the file system namespace 
•Always synchronized with the active NameNodestate. 
•Maintain up-to-date copy of the filysystemnamespace in the memory 
•Both run on different server 
–Primary and backup node 
–Since memory requirements are of same order 
•The Backup node does not need to download -since it already has an up-to- date state of the namespace state in memory.
Terminology 
•Active NN 
–NN that is actively serving the read and write operations from the clients. 
•Standby NN 
–this NN waits and becomes active when the Active dies or is unhealthy. 
–Backup Node as in Hadoop release 0.21 could be used to implement the Standby for the “shared-nothing” storage of filesystemnamespace. 
•Cold Standby 
–Standby NN has zero state (e.g.it is started after the Active is declared dead) 
•Warm Standby 
–Standby has partial state: 
–it has loaded fsImageand editLogsbut has not received any block reports 
–it has loaded fsImageand rolledlogsand all blockreports 
•Hot Standby 
–Standby has all most of the Active’s state and start immediately
High Level Use Cases 
•Planned Downtime : 
–A Hadoop cluster is often shut down in order to upgrade the software or configuration. 
–A Hadoop cluster of 4000 nodes takes approximately 2 hours to be restarted. 
•Unplanned Downtime or Unresponsive Service. 
–The failover of the Namenode service can occur due to hardware, osfailure, a failure of the Namenode daemon, or because the 
–Namenode daemon becomes unresponsive for a few minutes. 
–While this is not as common as one may expect the failure can occur at unexpected times and may have an impact on meeting the SLAs of some critical applications.
Specific use case 
1.Single NN configuration; no failover. 
2.Active and Standby with manual failover. 
a)Standby could be cold/warm/hot. 
3.Active and Standby with automatic failover. 
a)Both NNs started, one automatically becomes active and the other standby 
b)Active and Standby running 
c)Active fails, or is unhealthy; Standby takes over. 
d)Active and Standby running -Active is shutdown 
e)Active and Standby running, Standby fails. Active continues. 
f)Active running, Standby down for maintenance. Active dies and cannot start. Standby is started and takes over as active. 
g)Both NNs started, only one comes up. It becomes active 
h)Active and Standby running; Active state is unknown (e.g. disconnected from heartbeat) and Standby takes over.
HDFS high-availability (HDFS-HA)
HDFS-HA 
•In this implementation there is a pair of namenodesin an active-standby configuration. 
•In the event of the failure of the active namenode, the standby takes over its duties to continue servicing client requests without a significant interruption. 
•A few architectural changes are needed to allow this to happen: 
–The namenodesmust use highly-available shared storage to share the edit log. (In the initial implementation of HA this will require an NFS filer, but in future releases more options will be provided, such as a BookKeeper-based system built on Zoo-Keeper.) 
–When a standby namenode comes up it reads up to the end of the shared edit log to synchronize its state with the active namenode, and then continues to read new entries as they are written by the active namenode. 
–Datanodesmust send block reports to both namenodessince the block mappings are stored in a namenode’smemory, and not on disk. 
–Clients must be configured to handle namenode failover, which uses a mechanism that is transparent to users.
NN HA with Shared Storage and Zookeeper
Failover in HDFS-HA 
•If the active namenode fails, then the standby can take over very quickly (in a few tens of seconds) since it has the latest state available in memory: 
–both the latest edit log entries, and 
–an up-to-date block mapping. 
•The actual observed failover time will be longer in practice (around a minute or so), since the system needs to be conservative in deciding that the active namenode has failed. 
•In the unlikely event of the standby being down when the active fails, the administrator can still start the standby from cold. 
•This is no worse than the non-HA case, and from an operational point of view it’s an improvement, since the process is a standard operational procedure built into Hadoop. 
•The transition from the active namenode to the standby is managed by a new entity in the system called the failover controller. 
•Failover controllers are pluggable, but the first implementation uses ZooKeeperto ensure that only one namenode is active. 
•Each namenode runs a lightweight failover controller process whose job it is to monitor its namenode for failures (using a simple heartbeatingmechanism) and trigger a failover should a namenode fail.
Fencing 
•It is vital for the correct operation of an HA cluster that only one of the NameNodesbe Active at a time. 
•Otherwise, the namespace state would quickly diverge between the two, risking data loss or other incorrect results. 
•In order to ensure this and prevent the so-called "split-brain scenario," the administrator must configure at least one fencing method for the shared storage. 
•The HA implementation goes to great lengths to ensure that the previously active namenode is prevented from doing any damage and causing corruption—a method known as fencing. 
•Fencing mechanism: 
–killing the namenode’sprocess, 
–revoking its access to the shared storage directory (typically by using a vendor- specific NFS command), and 
–disabling its network port via a remote management command. 
–As a last resort, the previously active namenode can be fenced with a technique rather graphically known as STONITH, or “shoot the other node in the head”, which uses a specialized power distribution unit to forcibly power down the host machine.
Client side 
•Client failover is handled transparently by the client library. 
•The simplest implementation uses client-side configuration to control failover. 
•The HDFS URI uses a logical hostname which is mapped to a pair of namenode addresses (in the configuration file), and the client library tries each namenode address until the operation succeeds.
End of session 
Day –1: Availability

More Related Content

What's hot

Hdfs ha using journal nodes
Hdfs ha using journal nodesHdfs ha using journal nodes
Hdfs ha using journal nodesEvans Ye
 
Hadoop ha system admin
Hadoop ha system adminHadoop ha system admin
Hadoop ha system admin
Trieu Dao Minh
 
Drill Down the most underestimate Oracle Feature - Database Resource Manager
Drill Down the most underestimate Oracle Feature - Database Resource ManagerDrill Down the most underestimate Oracle Feature - Database Resource Manager
Drill Down the most underestimate Oracle Feature - Database Resource Manager
Luis Marques
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
Rose Toomey
 
Lisa12 methodologies
Lisa12 methodologiesLisa12 methodologies
Lisa12 methodologies
Brendan Gregg
 
SharePoint Backup And Disaster Recovery with Joel Oleson
SharePoint Backup And Disaster Recovery with Joel OlesonSharePoint Backup And Disaster Recovery with Joel Oleson
SharePoint Backup And Disaster Recovery with Joel Oleson
Joel Oleson
 
Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 pre...
Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 pre...Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 pre...
Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 pre...
Microsoft Technet France
 
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...
Microsoft Technet France
 
Ensuring performance for real time packet processing in open stack white paper
Ensuring performance for real time packet processing in open stack white paperEnsuring performance for real time packet processing in open stack white paper
Ensuring performance for real time packet processing in open stack white paper
hptoga
 
Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability
Omid Vahdaty
 
Oracle Database Performance Tuning Concept
Oracle Database Performance Tuning ConceptOracle Database Performance Tuning Concept
Oracle Database Performance Tuning Concept
Chien Chung Shen
 
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
Cloudera, Inc.
 
Strata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureStrata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and Future
Cloudera, Inc.
 
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...
Michael Pirker
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster Recovery
Steven Francia
 
Introduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesIntroduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network Issues
Jason TC HOU (侯宗成)
 
SVC / Storwize: cache partition analysis (BVQ howto)
SVC / Storwize: cache partition analysis  (BVQ howto)   SVC / Storwize: cache partition analysis  (BVQ howto)
SVC / Storwize: cache partition analysis (BVQ howto)
Michael Pirker
 
PostreSQL HA and DR Setup & Use Cases
PostreSQL HA and DR Setup & Use CasesPostreSQL HA and DR Setup & Use Cases
PostreSQL HA and DR Setup & Use Cases
Ashnikbiz
 

What's hot (20)

Hdfs ha using journal nodes
Hdfs ha using journal nodesHdfs ha using journal nodes
Hdfs ha using journal nodes
 
Hadoop ha system admin
Hadoop ha system adminHadoop ha system admin
Hadoop ha system admin
 
Drill Down the most underestimate Oracle Feature - Database Resource Manager
Drill Down the most underestimate Oracle Feature - Database Resource ManagerDrill Down the most underestimate Oracle Feature - Database Resource Manager
Drill Down the most underestimate Oracle Feature - Database Resource Manager
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
Lisa12 methodologies
Lisa12 methodologiesLisa12 methodologies
Lisa12 methodologies
 
SharePoint Backup And Disaster Recovery with Joel Oleson
SharePoint Backup And Disaster Recovery with Joel OlesonSharePoint Backup And Disaster Recovery with Joel Oleson
SharePoint Backup And Disaster Recovery with Joel Oleson
 
Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 pre...
Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 pre...Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 pre...
Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 pre...
 
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...
 
Ensuring performance for real time packet processing in open stack white paper
Ensuring performance for real time packet processing in open stack white paperEnsuring performance for real time packet processing in open stack white paper
Ensuring performance for real time packet processing in open stack white paper
 
Introduction to hadoop high availability
Introduction to hadoop high availability Introduction to hadoop high availability
Introduction to hadoop high availability
 
Oracle Database Performance Tuning Concept
Oracle Database Performance Tuning ConceptOracle Database Performance Tuning Concept
Oracle Database Performance Tuning Concept
 
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
Strata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureStrata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and Future
 
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster Recovery
 
Introduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesIntroduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network Issues
 
SVC / Storwize: cache partition analysis (BVQ howto)
SVC / Storwize: cache partition analysis  (BVQ howto)   SVC / Storwize: cache partition analysis  (BVQ howto)
SVC / Storwize: cache partition analysis (BVQ howto)
 
PostreSQL HA and DR Setup & Use Cases
PostreSQL HA and DR Setup & Use CasesPostreSQL HA and DR Setup & Use Cases
PostreSQL HA and DR Setup & Use Cases
 
Delphix
DelphixDelphix
Delphix
 

Viewers also liked

Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
Subhas Kumar Ghosh
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operation
Subhas Kumar Ghosh
 
03 pig intro
03 pig intro03 pig intro
03 pig intro
Subhas Kumar Ghosh
 
Hadoop map reduce v2
Hadoop map reduce v2Hadoop map reduce v2
Hadoop map reduce v2
Subhas Kumar Ghosh
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
Subhas Kumar Ghosh
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
Subhas Kumar Ghosh
 
Effective Hive Queries
Effective Hive QueriesEffective Hive Queries
Effective Hive Queries
Qubole
 
Hadoop exercise
Hadoop exerciseHadoop exercise
Hadoop exercise
Subhas Kumar Ghosh
 
Hadoop Day 3
Hadoop Day 3Hadoop Day 3
Hadoop Day 3
Subhas Kumar Ghosh
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
Subhas Kumar Ghosh
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis
Subhas Kumar Ghosh
 
Introduction to Hive for Hadoop
Introduction to Hive for HadoopIntroduction to Hive for Hadoop
Introduction to Hive for Hadoop
ryanlecompte
 
03 hive query language (hql)
03 hive query language (hql)03 hive query language (hql)
03 hive query language (hql)
Subhas Kumar Ghosh
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
Subhas Kumar Ghosh
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
Subhas Kumar Ghosh
 
Hive query optimization infinity
Hive query optimization infinityHive query optimization infinity
Hive query optimization infinity
Shashwat Shriparv
 
Hadoop map reduce concepts
Hadoop map reduce conceptsHadoop map reduce concepts
Hadoop map reduce concepts
Subhas Kumar Ghosh
 
07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent
Subhas Kumar Ghosh
 

Viewers also liked (20)

Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operation
 
01 hbase
01 hbase01 hbase
01 hbase
 
03 pig intro
03 pig intro03 pig intro
03 pig intro
 
Greedy embedding problem
Greedy embedding problemGreedy embedding problem
Greedy embedding problem
 
Hadoop map reduce v2
Hadoop map reduce v2Hadoop map reduce v2
Hadoop map reduce v2
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
Effective Hive Queries
Effective Hive QueriesEffective Hive Queries
Effective Hive Queries
 
Hadoop exercise
Hadoop exerciseHadoop exercise
Hadoop exercise
 
Hadoop Day 3
Hadoop Day 3Hadoop Day 3
Hadoop Day 3
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis
 
Introduction to Hive for Hadoop
Introduction to Hive for HadoopIntroduction to Hive for Hadoop
Introduction to Hive for Hadoop
 
03 hive query language (hql)
03 hive query language (hql)03 hive query language (hql)
03 hive query language (hql)
 
Hadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparatorHadoop secondary sort and a custom comparator
Hadoop secondary sort and a custom comparator
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
 
Hive query optimization infinity
Hive query optimization infinityHive query optimization infinity
Hive query optimization infinity
 
Hadoop map reduce concepts
Hadoop map reduce conceptsHadoop map reduce concepts
Hadoop map reduce concepts
 
07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent07 logistic regression and stochastic gradient descent
07 logistic regression and stochastic gradient descent
 

Similar to Hadoop availability

Performance Tuning
Performance TuningPerformance Tuning
Performance Tuning
Jannet Peetz
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
vijayapraba1
 
Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale
Perforce
 
High availability and disaster recovery in IBM PureApplication System
High availability and disaster recovery in IBM PureApplication SystemHigh availability and disaster recovery in IBM PureApplication System
High availability and disaster recovery in IBM PureApplication System
Scott Moonen
 
05. performance-concepts
05. performance-concepts05. performance-concepts
05. performance-concepts
Muhammad Ahad
 
05. performance-concepts-26-slides
05. performance-concepts-26-slides05. performance-concepts-26-slides
05. performance-concepts-26-slides
Muhammad Ahad
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Toronto-Oracle-Users-Group
 
운영체제론 Ch17
운영체제론 Ch17운영체제론 Ch17
운영체제론 Ch17
Jongmyoung Kim
 
Oracle database performance tuning
Oracle database performance tuningOracle database performance tuning
Oracle database performance tuning
Abishek V S
 
Lecture-7 Main Memroy.pptx
Lecture-7 Main Memroy.pptxLecture-7 Main Memroy.pptx
Lecture-7 Main Memroy.pptx
Amanuelmergia
 
Gfs google-file-system-13331
Gfs google-file-system-13331Gfs google-file-system-13331
Gfs google-file-system-13331
Fengchang Xie
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
Jignesh Shah
 
Life In The FastLane: Full Speed XPages
Life In The FastLane: Full Speed XPagesLife In The FastLane: Full Speed XPages
Life In The FastLane: Full Speed XPages
Ulrich Krause
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
MongoDB
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nage
Santosh Nage
 
Sql disaster recovery
Sql disaster recoverySql disaster recovery
Sql disaster recovery
Sqlperfomance
 
(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance
BIOVIA
 
Big Data for QAs
Big Data for QAsBig Data for QAs
Big Data for QAs
Ahmed Misbah
 
Chapter2.pdf
Chapter2.pdfChapter2.pdf
Chapter2.pdf
WasyihunSema2
 

Similar to Hadoop availability (20)

Performance Tuning
Performance TuningPerformance Tuning
Performance Tuning
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale
 
High availability and disaster recovery in IBM PureApplication System
High availability and disaster recovery in IBM PureApplication SystemHigh availability and disaster recovery in IBM PureApplication System
High availability and disaster recovery in IBM PureApplication System
 
05. performance-concepts
05. performance-concepts05. performance-concepts
05. performance-concepts
 
05. performance-concepts-26-slides
05. performance-concepts-26-slides05. performance-concepts-26-slides
05. performance-concepts-26-slides
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
 
운영체제론 Ch17
운영체제론 Ch17운영체제론 Ch17
운영체제론 Ch17
 
Oracle database performance tuning
Oracle database performance tuningOracle database performance tuning
Oracle database performance tuning
 
Lecture-7 Main Memroy.pptx
Lecture-7 Main Memroy.pptxLecture-7 Main Memroy.pptx
Lecture-7 Main Memroy.pptx
 
Gfs google-file-system-13331
Gfs google-file-system-13331Gfs google-file-system-13331
Gfs google-file-system-13331
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
 
Life In The FastLane: Full Speed XPages
Life In The FastLane: Full Speed XPagesLife In The FastLane: Full Speed XPages
Life In The FastLane: Full Speed XPages
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nage
 
Sql disaster recovery
Sql disaster recoverySql disaster recovery
Sql disaster recovery
 
(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance
 
Big Data for QAs
Big Data for QAsBig Data for QAs
Big Data for QAs
 
Chapter2.pdf
Chapter2.pdfChapter2.pdf
Chapter2.pdf
 

More from Subhas Kumar Ghosh

06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
Subhas Kumar Ghosh
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
Subhas Kumar Ghosh
 
02 data warehouse applications with hive
02 data warehouse applications with hive02 data warehouse applications with hive
02 data warehouse applications with hive
Subhas Kumar Ghosh
 
06 pig etl features
06 pig etl features06 pig etl features
06 pig etl features
Subhas Kumar Ghosh
 
05 pig user defined functions (udfs)
05 pig user defined functions (udfs)05 pig user defined functions (udfs)
05 pig user defined functions (udfs)
Subhas Kumar Ghosh
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
Subhas Kumar Ghosh
 
Hadoop job chaining
Hadoop job chainingHadoop job chaining
Hadoop job chaining
Subhas Kumar Ghosh
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
Subhas Kumar Ghosh
 
Hadoop scheduler
Hadoop schedulerHadoop scheduler
Hadoop scheduler
Subhas Kumar Ghosh
 
Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index construction
Subhas Kumar Ghosh
 

More from Subhas Kumar Ghosh (10)

06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
 
05 k-means clustering
05 k-means clustering05 k-means clustering
05 k-means clustering
 
02 data warehouse applications with hive
02 data warehouse applications with hive02 data warehouse applications with hive
02 data warehouse applications with hive
 
06 pig etl features
06 pig etl features06 pig etl features
06 pig etl features
 
05 pig user defined functions (udfs)
05 pig user defined functions (udfs)05 pig user defined functions (udfs)
05 pig user defined functions (udfs)
 
Hadoop performance optimization tips
Hadoop performance optimization tipsHadoop performance optimization tips
Hadoop performance optimization tips
 
Hadoop job chaining
Hadoop job chainingHadoop job chaining
Hadoop job chaining
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
 
Hadoop scheduler
Hadoop schedulerHadoop scheduler
Hadoop scheduler
 
Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index construction
 

Recently uploaded

Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 

Recently uploaded (20)

Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 

Hadoop availability

  • 2. Reliability in distributed system •To be truly reliable, a distributed system must have the following characteristics: –Fault-Tolerant: It can recover from component failures without performing incorrect actions. –Highly Available: It can restore operations, permitting it to resume providing services even when some components have failed. –Recoverable: Failed components can restart themselves and rejoin the system, after the cause of failure has been repaired. –Consistent: The system can coordinate actions by multiple components often in the presence of concurrency and failure. This underlies the ability of a distributed system to act like a non-distributed system. –Scalable: It can operate correctly even as some aspect of the system is scaled to a larger size. For example, we might increase the size of the network on which the system is running. This increases the frequency of network outages and could degrade a "non-scalable" system. Similarly, we might increase the number of users or servers, or overall load on the system. In a scalable system, this should not have a significant effect. –Predictable Performance: The ability to provide desired responsiveness in a timely manner. –Secure: The system authenticates access to data and services
  • 3. SPOF •The combination of –replicating namenode metadata on multiple file-systems, and –using the secondary namenode to create checkpoints •Protects against data loss •But does not provide high-availability of the file- system.
  • 4. SPOF •The namenode is still •a single point of failure (SPOF), •since if it did fail –all client –including MapReduce jobs •would be unable to read, write, or list files •Because the namenode is the sole repository of –the metadata and the –file-to-block mapping
  • 5. SPOF •In such an event •The whole Hadoop system would effectively be •“Out of service” •Until a new namenode could be brought online.
  • 6. Reasons for downtime •An important part of •improving availability and •articulating requirements is •understanding the causes of downtime •There are many types of failures in distributed systems, ways to classify them, and analyses of how failures result in downtime.
  • 7. Maintenance •Maintenance to a master host normally requires a restart of the entire system.
  • 8. Hardware failures •Hosts and their connections may fail •Hardware failures on the master host •or a failure in the connection between the master and the majority of the slaves •can cause system downtime
  • 9. Software failures •Software bugs may cause a component in the system to stop functioning or require a restart. •For example, a bug in upgrade code could result in downtime due to data corruption. •A dependent software component may become unavailable (e.g. the Java garbage collector enters a stop-the-world phase). •A software bug in a master service will likely cause downtime.
  • 10. Software failures •Software failures are a significant issue in distributed systems. •Even with rigorous testing, software bugs account for a substantial fraction of unplanned downtime (estimated at 25-35%). •Residual bugs in mature systems can be classified into two main categories .
  • 11. Heisenbug •A bug that seems to disappear or alter its characteristics when it is observed or researched. •A common example is a bug that occurs in a release-mode compile of a program, but not when researched under debug- mode. •The name "heisenbug" is a pun on the "Heisenberg uncertainty principle," a quantum physics term which is commonly (yet inaccurately) used to refer to the way in which observers affect the measurements of the things that they are observing, by the act of observing alone (this is actually the observer effect, and is commonly confused with the Heisenberg uncertainty principle).
  • 12. Bohrbug •A bug (named after the Bohr atom model) that, in contrast to a heisenbug, does not disappear or alter its characteristics when it is researched. •A Bohrbugtypically manifests itself reliably under a well- defined set of conditions.
  • 13. Software failures •Heisenbugstend to be more prevalent in distributed systems than in local systems. •One reason for this is the difficulty programmers have in obtaining a coherent and comprehensive view of the interactions of concurrent processes.
  • 14. Operator errors •People make mistakes. •Hadoop attempts to limit operator error by simplifying administration, validating its configuration, and providing useful messages in logs and UI components; •however operator mistakes may still cause downtime.
  • 15. Strategy Severity of Database Downtime Planned Unplanned Catastrophic Latency of Database Recovery No Downtime HighAvailability ContinuousAvailability DisasterRecovery OnlineMaintenance OfflineMaintenance HighAvailabilityClusters Switchingand WarmStandbyReplication ColdStandby
  • 16. Recall •The NameNodestores modifications to the file system as a log appended to a native file system file,edits. •When a NameNodestarts up, it reads HDFS state from an image file,fsimage, and then applies edits from the edits log file. •It then writes new HDFS state to thefsimageand starts normal operation with an empty edits file. •Since NameNodemergesfsimageandeditsfiles only during start up, the edits log file could get very large over time on a busy cluster. •Another side effect of a larger edits file is that next restart of NameNodetakes longer.
  • 17. Availability –Attempt 1 -Secondary namenode •Its main role is to periodically merge the namespace image with the edit log to prevent the edit log from becoming too large. •The secondary namenode usually runs on a separate physical machine, since it requires plenty of CPU and as much memory as the namenode to perform the merge. •It keeps a copy of the merged namespace image, which can be used in the event of the namenode failing. •However, the state of the secondary namenode lags that of the primary, so in the event of total failure of the primary, data loss is almost certain. •The usual course of action in this case is to copy the namenode’smetadata files that are on NFS to the secondary and run it as the new primary. •The secondary NameNodestores the latest checkpoint in a directory which is structured the same way as the primary NameNode'sdirectory. •So that the check pointed image is always ready to be read by the primary NameNodeif necessary.
  • 18. Long Recovery •To recover from a failed namenode, an administrator starts a new primary namenode with one of the file-system metadata replicas, and configures datanodesand clients to use this new namenode. •The new namenode is not able to serve requests until it has –loaded its namespace image into memory, –replayed its editlog, and –received enough block reports from the datanodesto leave safe mode. •On large clusters with many files and blocks, the time it takes for a namenode to start from cold can be 30 minutes or more. •The long recovery time is a problem for routine maintenance too. •In fact, since unexpected failure of the namenode is so rare, the case for planned downtime is actually more important in practice.
  • 19. Other roads to availability •NameNodepersists its namespace using two files: –fsimage, which is the latest checkpoint of the namespace andedits, –a journal (log) of changes to the namespace since the checkpoint. •When a NameNodestarts up, it merges thefsimageandeditsjournal to provide an up-to-date view of the file system metadata. •The NameNodethen overwritesfsimagewith the new HDFS state and begins a neweditsjournal. •Secondary name-node acts as mere checkpointer. •Secondary name-node should be transformed into a standby name-node (SNN). •Make it a warm standby. •Provide real time streaming of edits to SNN so that it contained the up-to-date namespace state.
  • 20. Availability –Attempt 2 -Backup node Checkpoint node •The Checkpoint node periodically creates checkpoints of the namespace. •Downloadsfsimageandeditsfrom the active NameNode •Merges locally, and uploads the new image back to the active NameNode. •The Backup node provides –the same checkpointingfunctionality as the Checkpoint node, –as well as maintaining an in-memory, up-to-date copy of the file system namespace •Always synchronized with the active NameNodestate. •Maintain up-to-date copy of the filysystemnamespace in the memory •Both run on different server –Primary and backup node –Since memory requirements are of same order •The Backup node does not need to download -since it already has an up-to- date state of the namespace state in memory.
  • 21. Terminology •Active NN –NN that is actively serving the read and write operations from the clients. •Standby NN –this NN waits and becomes active when the Active dies or is unhealthy. –Backup Node as in Hadoop release 0.21 could be used to implement the Standby for the “shared-nothing” storage of filesystemnamespace. •Cold Standby –Standby NN has zero state (e.g.it is started after the Active is declared dead) •Warm Standby –Standby has partial state: –it has loaded fsImageand editLogsbut has not received any block reports –it has loaded fsImageand rolledlogsand all blockreports •Hot Standby –Standby has all most of the Active’s state and start immediately
  • 22. High Level Use Cases •Planned Downtime : –A Hadoop cluster is often shut down in order to upgrade the software or configuration. –A Hadoop cluster of 4000 nodes takes approximately 2 hours to be restarted. •Unplanned Downtime or Unresponsive Service. –The failover of the Namenode service can occur due to hardware, osfailure, a failure of the Namenode daemon, or because the –Namenode daemon becomes unresponsive for a few minutes. –While this is not as common as one may expect the failure can occur at unexpected times and may have an impact on meeting the SLAs of some critical applications.
  • 23. Specific use case 1.Single NN configuration; no failover. 2.Active and Standby with manual failover. a)Standby could be cold/warm/hot. 3.Active and Standby with automatic failover. a)Both NNs started, one automatically becomes active and the other standby b)Active and Standby running c)Active fails, or is unhealthy; Standby takes over. d)Active and Standby running -Active is shutdown e)Active and Standby running, Standby fails. Active continues. f)Active running, Standby down for maintenance. Active dies and cannot start. Standby is started and takes over as active. g)Both NNs started, only one comes up. It becomes active h)Active and Standby running; Active state is unknown (e.g. disconnected from heartbeat) and Standby takes over.
  • 25. HDFS-HA •In this implementation there is a pair of namenodesin an active-standby configuration. •In the event of the failure of the active namenode, the standby takes over its duties to continue servicing client requests without a significant interruption. •A few architectural changes are needed to allow this to happen: –The namenodesmust use highly-available shared storage to share the edit log. (In the initial implementation of HA this will require an NFS filer, but in future releases more options will be provided, such as a BookKeeper-based system built on Zoo-Keeper.) –When a standby namenode comes up it reads up to the end of the shared edit log to synchronize its state with the active namenode, and then continues to read new entries as they are written by the active namenode. –Datanodesmust send block reports to both namenodessince the block mappings are stored in a namenode’smemory, and not on disk. –Clients must be configured to handle namenode failover, which uses a mechanism that is transparent to users.
  • 26. NN HA with Shared Storage and Zookeeper
  • 27. Failover in HDFS-HA •If the active namenode fails, then the standby can take over very quickly (in a few tens of seconds) since it has the latest state available in memory: –both the latest edit log entries, and –an up-to-date block mapping. •The actual observed failover time will be longer in practice (around a minute or so), since the system needs to be conservative in deciding that the active namenode has failed. •In the unlikely event of the standby being down when the active fails, the administrator can still start the standby from cold. •This is no worse than the non-HA case, and from an operational point of view it’s an improvement, since the process is a standard operational procedure built into Hadoop. •The transition from the active namenode to the standby is managed by a new entity in the system called the failover controller. •Failover controllers are pluggable, but the first implementation uses ZooKeeperto ensure that only one namenode is active. •Each namenode runs a lightweight failover controller process whose job it is to monitor its namenode for failures (using a simple heartbeatingmechanism) and trigger a failover should a namenode fail.
  • 28. Fencing •It is vital for the correct operation of an HA cluster that only one of the NameNodesbe Active at a time. •Otherwise, the namespace state would quickly diverge between the two, risking data loss or other incorrect results. •In order to ensure this and prevent the so-called "split-brain scenario," the administrator must configure at least one fencing method for the shared storage. •The HA implementation goes to great lengths to ensure that the previously active namenode is prevented from doing any damage and causing corruption—a method known as fencing. •Fencing mechanism: –killing the namenode’sprocess, –revoking its access to the shared storage directory (typically by using a vendor- specific NFS command), and –disabling its network port via a remote management command. –As a last resort, the previously active namenode can be fenced with a technique rather graphically known as STONITH, or “shoot the other node in the head”, which uses a specialized power distribution unit to forcibly power down the host machine.
  • 29. Client side •Client failover is handled transparently by the client library. •The simplest implementation uses client-side configuration to control failover. •The HDFS URI uses a logical hostname which is mapped to a pair of namenode addresses (in the configuration file), and the client library tries each namenode address until the operation succeeds.
  • 30. End of session Day –1: Availability