SlideShare a Scribd company logo
HDFS BASICS
Pre-requisites
 Basic understanding of what file system means
 Practical working knowledge of UNIX file system basics
HDFS : Hadoop distributed file system
(BASIC HDFS)
Architecture of HDFS
How does HDFS store the file internally
Failure handling and recovery mechanism
Rack awareness
Role of name node and secondary name node
When to use HDFS and when not to use it
Agenda
HDFS – The Storage Layer in Hadoop
Splitting of file into blocks in HDFS
File Storage in HDFS
7
Multilayer switches/routers interconnect the switches on each
rack
Hadoop cluster deployed in a production
environment into multiple racks
DN4
DN3
DN2
DN5
DN1
DN6
DN7
DN8
DN9
DN10
DN11
DN12
Q)What happens in the event of a data node
Failure ? (eg : DN 10 fails)
A)Data saved on that node will be lost
To avoid loss of data, copies of the
Data blocks on data node is stored on multiple
data nodes. This is called data replication.
Failure of a data node
 How many copies of each block to save?
Its decided by REPLICATION FACTOR (by
default its 3, i.e. every block of data on each
data node is saved on 2 more machines so that
there is total 3 copies of the same data block on
different machines)
This replication factor can be set on per file
basis while the file is being written to HDFS for
the first time.
9
Replication of data blocks
Design and Architecture Overview
DN4
DN3
DN2
DN5
DN1
DN6
DN7
DN8
DN9
DN10
DN11
DN12
Replication factor =2
DN10 fails
Replication factor for block N2 is now 1 !
Data replication on failure and failed node recovers
DN4
DN3
DN2
DN5
DN1
DN6
DN7
DN8
DN9
DN10
DN11
DN12
Replication factor =2
DN10 fails
The data on the failed node would be copied
to some other node in the cluster
automatically. In this case its copied to
DN12 from its nearest neighbour DN11
Data replication on failure and failed node recovers Pt2
DN4
DN3
DN2
DN5
DN1
DN6
DN7
DN8
DN9
DN10
DN11
DN12
Replication factor = 3 for block N2
HDFS will delete one extra copy of N2 from
any of the 3 nodes (DN10, DN11 or DN12)
This ensures that the replication count is
maintained all the time
DN10 is up again after some time …
Q. How does namenode choose which datanodes to store replicas on?
 Replica Placements are rack aware. Namenode uses the network
location when determining where to place block replicas.
 Tradeoff: Reliability v/s read/write bandwidth e.g.
– If all replica is on single node - lowest write bandwidth but no redundancy if nodes fails
– If replica is off-rack - real redundancy but high read bandwidth (more time)
– If replica is off datacenter – best redundancy at the cost of huge bandwidth
 Hadoop’s default strategy:
– 1st replica on same node as client
– 2nd replica on off rack any random node
– 3rd replica is same rack as 2nd but other node
 Clients always read from the nearest node
 Once the replica locations is chosen a
pipeline is built taking network topology
into account
Replica Placement Strategy
15
I know where
the file blocks are..
I shall back up the data of
name node
BEWARE !
I do not work in HOT STANDBY
mode in the event of name node
failure…..
In Hadoop 1.0, there is no active standby secondary name node.
(HA : Highly available is another term used for HOT/ACTIVE STANDBY )
If the name node fails, the entire cluster goes down ! We need to manually restart
The name node and the contents of the secondary name node has to be copied to it.
Name node & secondary name node in Hadoop 1.0
 HDFS is Good for…
 Storing large files
 Terabytes, Petabytes, etc.
 millions rather billions of files (less number of large files)
 Each file typically 100MB or more
 Streaming data
 WORM - write once read many times patterns
 Optimized for batch/streaming reads rather than random reads
 Append operation added to Hadoop 0.21
 Cheap commodity hardware
 HDFS is not so Good for…
 Large amount of small files
 Better for less no of large files instead of more small files
 Low latency reads
 Many writes: write once, no random writes, append mode write at end of file
When to/not to use HDFS?
17
Introduction to HDFS
Replication factor
Rack awareness
When not to use HDFS
17
Summary

More Related Content

Similar to HDFS+basics.pptx

Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
Bhavesh Padharia
 
Hdfs
HdfsHdfs
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
MindsMapped Consulting
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreduce
senthil0809
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
Delhi/NCR HUG
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
Adam Kawa
 
Hadoop -HDFS.ppt
Hadoop -HDFS.pptHadoop -HDFS.ppt
Hadoop -HDFS.ppt
RamyaMurugesan12
 
13160296.ppt
13160296.ppt13160296.ppt
13160296.ppt
snowflakebatch
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hari Shankar Sreekumar
 
module 2.pptx
module 2.pptxmodule 2.pptx
module 2.pptx
ssuser6e8e41
 
HDFS.ppt
HDFS.pptHDFS.ppt
HDFS.ppt
ssuserec53e73
 
Hdfs
HdfsHdfs
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
Mahendran Ponnusamy
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
Jazan University
 
Anatomy of file read in hadoop
Anatomy of file read in hadoopAnatomy of file read in hadoop
Anatomy of file read in hadoop
Rajesh Ananda Kumar
 
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Yahoo Developer Network
 
Unit 1
Unit 1Unit 1
HDFS.ppt
HDFS.pptHDFS.ppt
HDFS.ppt
ssuserec53e73
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Simplilearn
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impala
NAVER D2
 

Similar to HDFS+basics.pptx (20)

Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Hdfs
HdfsHdfs
Hdfs
 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreduce
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
 
Hadoop -HDFS.ppt
Hadoop -HDFS.pptHadoop -HDFS.ppt
Hadoop -HDFS.ppt
 
13160296.ppt
13160296.ppt13160296.ppt
13160296.ppt
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
module 2.pptx
module 2.pptxmodule 2.pptx
module 2.pptx
 
HDFS.ppt
HDFS.pptHDFS.ppt
HDFS.ppt
 
Hdfs
HdfsHdfs
Hdfs
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Anatomy of file read in hadoop
Anatomy of file read in hadoopAnatomy of file read in hadoop
Anatomy of file read in hadoop
 
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
 
Unit 1
Unit 1Unit 1
Unit 1
 
HDFS.ppt
HDFS.pptHDFS.ppt
HDFS.ppt
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
(Aaron myers) hdfs impala
(Aaron myers)   hdfs impala(Aaron myers)   hdfs impala
(Aaron myers) hdfs impala
 

Recently uploaded

一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 

Recently uploaded (20)

一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 

HDFS+basics.pptx

  • 2. Pre-requisites  Basic understanding of what file system means  Practical working knowledge of UNIX file system basics HDFS : Hadoop distributed file system (BASIC HDFS)
  • 3. Architecture of HDFS How does HDFS store the file internally Failure handling and recovery mechanism Rack awareness Role of name node and secondary name node When to use HDFS and when not to use it Agenda
  • 4. HDFS – The Storage Layer in Hadoop
  • 5. Splitting of file into blocks in HDFS
  • 7. 7 Multilayer switches/routers interconnect the switches on each rack Hadoop cluster deployed in a production environment into multiple racks
  • 8. DN4 DN3 DN2 DN5 DN1 DN6 DN7 DN8 DN9 DN10 DN11 DN12 Q)What happens in the event of a data node Failure ? (eg : DN 10 fails) A)Data saved on that node will be lost To avoid loss of data, copies of the Data blocks on data node is stored on multiple data nodes. This is called data replication. Failure of a data node
  • 9.  How many copies of each block to save? Its decided by REPLICATION FACTOR (by default its 3, i.e. every block of data on each data node is saved on 2 more machines so that there is total 3 copies of the same data block on different machines) This replication factor can be set on per file basis while the file is being written to HDFS for the first time. 9 Replication of data blocks
  • 11. DN4 DN3 DN2 DN5 DN1 DN6 DN7 DN8 DN9 DN10 DN11 DN12 Replication factor =2 DN10 fails Replication factor for block N2 is now 1 ! Data replication on failure and failed node recovers
  • 12. DN4 DN3 DN2 DN5 DN1 DN6 DN7 DN8 DN9 DN10 DN11 DN12 Replication factor =2 DN10 fails The data on the failed node would be copied to some other node in the cluster automatically. In this case its copied to DN12 from its nearest neighbour DN11 Data replication on failure and failed node recovers Pt2
  • 13. DN4 DN3 DN2 DN5 DN1 DN6 DN7 DN8 DN9 DN10 DN11 DN12 Replication factor = 3 for block N2 HDFS will delete one extra copy of N2 from any of the 3 nodes (DN10, DN11 or DN12) This ensures that the replication count is maintained all the time DN10 is up again after some time …
  • 14. Q. How does namenode choose which datanodes to store replicas on?  Replica Placements are rack aware. Namenode uses the network location when determining where to place block replicas.  Tradeoff: Reliability v/s read/write bandwidth e.g. – If all replica is on single node - lowest write bandwidth but no redundancy if nodes fails – If replica is off-rack - real redundancy but high read bandwidth (more time) – If replica is off datacenter – best redundancy at the cost of huge bandwidth  Hadoop’s default strategy: – 1st replica on same node as client – 2nd replica on off rack any random node – 3rd replica is same rack as 2nd but other node  Clients always read from the nearest node  Once the replica locations is chosen a pipeline is built taking network topology into account Replica Placement Strategy
  • 15. 15 I know where the file blocks are.. I shall back up the data of name node BEWARE ! I do not work in HOT STANDBY mode in the event of name node failure….. In Hadoop 1.0, there is no active standby secondary name node. (HA : Highly available is another term used for HOT/ACTIVE STANDBY ) If the name node fails, the entire cluster goes down ! We need to manually restart The name node and the contents of the secondary name node has to be copied to it. Name node & secondary name node in Hadoop 1.0
  • 16.  HDFS is Good for…  Storing large files  Terabytes, Petabytes, etc.  millions rather billions of files (less number of large files)  Each file typically 100MB or more  Streaming data  WORM - write once read many times patterns  Optimized for batch/streaming reads rather than random reads  Append operation added to Hadoop 0.21  Cheap commodity hardware  HDFS is not so Good for…  Large amount of small files  Better for less no of large files instead of more small files  Low latency reads  Many writes: write once, no random writes, append mode write at end of file When to/not to use HDFS?
  • 17. 17 Introduction to HDFS Replication factor Rack awareness When not to use HDFS 17 Summary