SlideShare a Scribd company logo
HDFS 
HADOOP DISTRIBUTED FILE SYSTEM
What is HDFS 
HDFS is a Filesystem designed for storing very large files running on 
cluster of commodity hardware 
It provides fault-tolerant storage layer for Hadoop and other sub-projects 
It is a distributed file system that provides high-throughput access to 
application data 
It stores data reliably even in case of hardware faliure
HDFS Architecture 
There are two types of nodes, works in master-slaves pattern 
◦ Master (or called as Namenode) – It manages all the slave nodes and assign 
work to slaves 
◦ Slave (or called as Datanode) – Actual worker nodes, which does actual task
HDFS Architecture
Distributed storage 
Data is distributed across the cluster of nodes 
Data is splitted into smaller pieces and stored across multiple nodes of 
the cluster 
In this way it provides a way to Map-Reduce to process subset of large 
data parallely on multiple nodes 
Distributed storage also provides fault-tolerance capability
Namenode Datanode 
Two Daemons Runs for HDFS (for storage) 
◦ NameNode 
◦ This daemon runs on Master 
◦ Namenode stores all the metadata like filename, path, number of blocks, blockIds, block 
locations, number of replicas, slave related config, etc. 
◦ DataNode 
◦ This daemon runs on all the slaves 
◦ Datanodes stores the actual data
Namenode Datanode
Blocks 
Files are splitted into number of blocks 
Unlike 1 KB block size of OS filesystem, HDFS’s default block size is 64 
MB (which can be increased according to the requirements) 
It saves disk seek time, and other advantage is in the case of processing 
(1 Mapper processes 1 block at a time)
Blocks
Replication 
In HDFS all the data blocks are replicated across the cluster of nodes 
Ideally one replica is present at one geographical location 
The default replication factor is 3, which can be changed according to 
the requirements. 
We can change replication factor to desired value by editing 
configuration files
High Availability 
HDFS provides high availability of data by creating multiple replicas of 
data blocks 
If a network link or node or some hardware goes down data will be 
available from some other path 
If a hardware goes down data can be accessed from other nodes or 
other paths as we have replicated data (default 3 replicas)
Reliable Data Storage 
Data is stored quite reliably in HDFS, this feature is related to High 
Availability. 
Since data is highly available due to replication, hence if a node or few 
nodes or some hardware crashes, the data will be stored reliably. 
We can again balance the cluster by making the replication factor to 
desirable value (by running few commands) 
In other words we can say Hadoop (HDFS) is fault tolerant
Reliable Data Storage
Scalability 
Two ways to scale the Filesystem 
◦ Add more disks on the nodes of the cluster 
◦ For this we need to edit the configuration files and add corresponding entries of newly added 
disks 
◦ Horizontal Scaling 
◦ Another option is to add more nodes in the cluster 
◦ We can add n number of nodes in the cluster on the fly (without any downtime)
File Access & other Operations 
Programming 
CLI (Command Line Interface) 
We can perform almost all the operations which we can do on local 
filesystem 
HDFS provides different access rights rwx to ugo 
We can also browse the filesystem by browser (http://Master-IP:50070)
Read a File 
To read a file from HDFS, client needs to interact with Namenode, as 
namenode is single place where all the the metadata is stored 
Namenode provides the address of the slaves where file is stored 
Client will interact with respective datanodes to read the file 
Namenode also provide a token to the client which it shows to data 
node for authentication
Read a File
Write a File 
To write a file into HDFS, client needs to interact with namenode. 
Namenode provides the address of the slave on which client will start 
writing the data 
As soon as client finishes writing the block, the slave starts copying the 
block to another slave which in tern copy the block to another slave (3 
replicas by default) 
After required replicas are created then it will send the acknowledge to 
the client 
Authentication process (same as file read)
Write a File
Thank You

More Related Content

What's hot

Distributed file systems dfs
Distributed file systems   dfsDistributed file systems   dfs
Distributed file systems dfs
Pragati Startup Presentation Designer firm
 
Hadoop at a glance
Hadoop at a glanceHadoop at a glance
Hadoop at a glance
Tan Tran
 
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, FacebookМасштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
yaevents
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
Konstantin V. Shvachko
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
Apache Apex
 
Interacting with hdfs
Interacting with hdfsInteracting with hdfs
Interacting with hdfs
Pradeep Kumbhar
 
Hadoop HDFS
Hadoop HDFSHadoop HDFS
Hadoop HDFS
Vigen Sahakyan
 
Snapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File SystemSnapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File System
Bhavesh Padharia
 
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS FederationMarch 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
Yahoo Developer Network
 
Hadoop and HDFS
Hadoop and HDFSHadoop and HDFS
Hadoop and HDFS
SatyaHadoop
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
Subhas Kumar Ghosh
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
Hanborq Inc.
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User Reference
Biju Nair
 
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HA
Hanborq Inc.
 
Coordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed SystemsCoordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed Systems
Konstantin V. Shvachko
 
HDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once SemanticsHDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once Semantics
DataWorks Summit
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
tutorialvillage
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
Uday Vakalapudi
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
ProTechSkills Training
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
tutorialvillage
 

What's hot (20)

Distributed file systems dfs
Distributed file systems   dfsDistributed file systems   dfs
Distributed file systems dfs
 
Hadoop at a glance
Hadoop at a glanceHadoop at a glance
Hadoop at a glance
 
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, FacebookМасштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
Interacting with hdfs
Interacting with hdfsInteracting with hdfs
Interacting with hdfs
 
Hadoop HDFS
Hadoop HDFSHadoop HDFS
Hadoop HDFS
 
Snapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File SystemSnapshot in Hadoop Distributed File System
Snapshot in Hadoop Distributed File System
 
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS FederationMarch 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
 
Hadoop and HDFS
Hadoop and HDFSHadoop and HDFS
Hadoop and HDFS
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User Reference
 
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HA
 
Coordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed SystemsCoordinating Metadata Replication: Survival Strategy for Distributed Systems
Coordinating Metadata Replication: Survival Strategy for Distributed Systems
 
HDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once SemanticsHDFS Trunncate: Evolving Beyond Write-Once Semantics
HDFS Trunncate: Evolving Beyond Write-Once Semantics
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 

Viewers also liked

Distributed File Systems
Distributed File SystemsDistributed File Systems
Distributed File Systems
awesomesos
 
Distributed File Systems: An Overview
Distributed File Systems: An OverviewDistributed File Systems: An Overview
Distributed File Systems: An Overview
Anant Narayanan
 
Chapter 8 distributed file systems
Chapter 8 distributed file systemsChapter 8 distributed file systems
Chapter 8 distributed file systemsAbDul ThaYyal
 
Chapter 17 - Distributed File Systems
Chapter 17 - Distributed File SystemsChapter 17 - Distributed File Systems
Chapter 17 - Distributed File Systems
Wayne Jones Jnr
 
Unico Riv Maya
Unico Riv MayaUnico Riv Maya
Unico Riv Maya
chglat
 
Royalton Punta Cana
Royalton Punta Cana Royalton Punta Cana
Royalton Punta Cana
chglat
 
Друзья-редакторы. Как СМИ становятся социальными
Друзья-редакторы. Как СМИ становятся социальнымиДрузья-редакторы. Как СМИ становятся социальными
Друзья-редакторы. Как СМИ становятся социальными
Vsevolod Pulya
 
R meetup talk
R meetup talkR meetup talk
R meetup talk
Joseph Adler
 
Livingwaterspresents
LivingwaterspresentsLivingwaterspresents
Livingwaterspresents
Runner00
 
Sono Innamorato Di Te
Sono Innamorato Di TeSono Innamorato Di Te
Sono Innamorato Di Tedargula70
 
Secrets Playa Mujeres
Secrets Playa MujeresSecrets Playa Mujeres
Secrets Playa Mujeres
chglat
 
Kauai
KauaiKauai
Kauai
chglat
 
Spencer Bahamas Option
Spencer Bahamas OptionSpencer Bahamas Option
Spencer Bahamas Optionchglat
 
Ocsid test
Ocsid testOcsid test
Ocsid testenhee_92
 
Креативный. 3-хлетие. планы на ближайшее время
Креативный. 3-хлетие. планы на ближайшее времяКреативный. 3-хлетие. планы на ближайшее время
Креативный. 3-хлетие. планы на ближайшее времяAlexandr Mikhailov
 
Servant plan II
Servant plan IIServant plan II
Servant plan IImgiiicm
 

Viewers also liked (20)

Distributed File Systems
Distributed File SystemsDistributed File Systems
Distributed File Systems
 
Distributed File Systems: An Overview
Distributed File Systems: An OverviewDistributed File Systems: An Overview
Distributed File Systems: An Overview
 
Chapter 8 distributed file systems
Chapter 8 distributed file systemsChapter 8 distributed file systems
Chapter 8 distributed file systems
 
Chapter 17 - Distributed File Systems
Chapter 17 - Distributed File SystemsChapter 17 - Distributed File Systems
Chapter 17 - Distributed File Systems
 
Unico Riv Maya
Unico Riv MayaUnico Riv Maya
Unico Riv Maya
 
Royalton Punta Cana
Royalton Punta Cana Royalton Punta Cana
Royalton Punta Cana
 
Друзья-редакторы. Как СМИ становятся социальными
Друзья-редакторы. Как СМИ становятся социальнымиДрузья-редакторы. Как СМИ становятся социальными
Друзья-редакторы. Как СМИ становятся социальными
 
R meetup talk
R meetup talkR meetup talk
R meetup talk
 
24 7
24 724 7
24 7
 
Livingwaterspresents
LivingwaterspresentsLivingwaterspresents
Livingwaterspresents
 
Sono Innamorato Di Te
Sono Innamorato Di TeSono Innamorato Di Te
Sono Innamorato Di Te
 
Secrets Playa Mujeres
Secrets Playa MujeresSecrets Playa Mujeres
Secrets Playa Mujeres
 
Kauai
KauaiKauai
Kauai
 
DB - Leadership experiences
DB -  Leadership experiencesDB -  Leadership experiences
DB - Leadership experiences
 
2013 10 31 halloween
2013 10 31 halloween2013 10 31 halloween
2013 10 31 halloween
 
Yasmin2
Yasmin2Yasmin2
Yasmin2
 
Spencer Bahamas Option
Spencer Bahamas OptionSpencer Bahamas Option
Spencer Bahamas Option
 
Ocsid test
Ocsid testOcsid test
Ocsid test
 
Креативный. 3-хлетие. планы на ближайшее время
Креативный. 3-хлетие. планы на ближайшее времяКреативный. 3-хлетие. планы на ближайшее время
Креативный. 3-хлетие. планы на ближайшее время
 
Servant plan II
Servant plan IIServant plan II
Servant plan II
 

Similar to Hdfs

Introduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptxIntroduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptx
sunithachphd
 
module 2.pptx
module 2.pptxmodule 2.pptx
module 2.pptx
ssuser6e8e41
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
Bhavesh Padharia
 
big data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing databig data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing data
preetik9044
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
srikanthhadoop
 
Hadoop
HadoopHadoop
Hadoop
HadoopHadoop
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
sudhakara st
 
Introduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptxIntroduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptx
SakthiVinoth78
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Simplilearn
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
Adam Kawa
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreduce
senthil0809
 
Hadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsHadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data Analytics
DrPDShebaKeziaMalarc
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
Kalyan Hadoop
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemMilad Sobhkhiz
 
HADOOP.pptx
HADOOP.pptxHADOOP.pptx
HADOOP.pptx
Bharathi567510
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
Subhas Kumar Ghosh
 
Big Data Reverse Knowledge Transfer.pptx
Big Data Reverse Knowledge Transfer.pptxBig Data Reverse Knowledge Transfer.pptx
Big Data Reverse Knowledge Transfer.pptx
ssuser8c3ea7
 

Similar to Hdfs (20)

Introduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptxIntroduction_to_HDFS sun.pptx
Introduction_to_HDFS sun.pptx
 
module 2.pptx
module 2.pptxmodule 2.pptx
module 2.pptx
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
big data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing databig data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing data
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
Introduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptxIntroduction to Hadoop Distributed File System(HDFS).pptx
Introduction to Hadoop Distributed File System(HDFS).pptx
 
Unit 1
Unit 1Unit 1
Unit 1
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreduce
 
Hadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsHadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data Analytics
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
HADOOP.pptx
HADOOP.pptxHADOOP.pptx
HADOOP.pptx
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data Reverse Knowledge Transfer.pptx
Big Data Reverse Knowledge Transfer.pptxBig Data Reverse Knowledge Transfer.pptx
Big Data Reverse Knowledge Transfer.pptx
 

More from Chirag Ahuja

Deploy hadoop cluster
Deploy hadoop clusterDeploy hadoop cluster
Deploy hadoop cluster
Chirag Ahuja
 
Word count example in hadoop mapreduce using java
Word count example in hadoop mapreduce using javaWord count example in hadoop mapreduce using java
Word count example in hadoop mapreduce using java
Chirag Ahuja
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
Chirag Ahuja
 
Flume
FlumeFlume
Hbase
HbaseHbase
Pig
PigPig
Hive : WareHousing Over hadoop
Hive :  WareHousing Over hadoopHive :  WareHousing Over hadoop
Hive : WareHousing Over hadoop
Chirag Ahuja
 
Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advanced
Chirag Ahuja
 
MapReduce basic
MapReduce basicMapReduce basic
MapReduce basic
Chirag Ahuja
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
Chirag Ahuja
 

More from Chirag Ahuja (10)

Deploy hadoop cluster
Deploy hadoop clusterDeploy hadoop cluster
Deploy hadoop cluster
 
Word count example in hadoop mapreduce using java
Word count example in hadoop mapreduce using javaWord count example in hadoop mapreduce using java
Word count example in hadoop mapreduce using java
 
Big data introduction
Big data introductionBig data introduction
Big data introduction
 
Flume
FlumeFlume
Flume
 
Hbase
HbaseHbase
Hbase
 
Pig
PigPig
Pig
 
Hive : WareHousing Over hadoop
Hive :  WareHousing Over hadoopHive :  WareHousing Over hadoop
Hive : WareHousing Over hadoop
 
Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advanced
 
MapReduce basic
MapReduce basicMapReduce basic
MapReduce basic
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 

Recently uploaded

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 

Recently uploaded (20)

一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 

Hdfs

  • 2. What is HDFS HDFS is a Filesystem designed for storing very large files running on cluster of commodity hardware It provides fault-tolerant storage layer for Hadoop and other sub-projects It is a distributed file system that provides high-throughput access to application data It stores data reliably even in case of hardware faliure
  • 3. HDFS Architecture There are two types of nodes, works in master-slaves pattern ◦ Master (or called as Namenode) – It manages all the slave nodes and assign work to slaves ◦ Slave (or called as Datanode) – Actual worker nodes, which does actual task
  • 5. Distributed storage Data is distributed across the cluster of nodes Data is splitted into smaller pieces and stored across multiple nodes of the cluster In this way it provides a way to Map-Reduce to process subset of large data parallely on multiple nodes Distributed storage also provides fault-tolerance capability
  • 6. Namenode Datanode Two Daemons Runs for HDFS (for storage) ◦ NameNode ◦ This daemon runs on Master ◦ Namenode stores all the metadata like filename, path, number of blocks, blockIds, block locations, number of replicas, slave related config, etc. ◦ DataNode ◦ This daemon runs on all the slaves ◦ Datanodes stores the actual data
  • 8. Blocks Files are splitted into number of blocks Unlike 1 KB block size of OS filesystem, HDFS’s default block size is 64 MB (which can be increased according to the requirements) It saves disk seek time, and other advantage is in the case of processing (1 Mapper processes 1 block at a time)
  • 10. Replication In HDFS all the data blocks are replicated across the cluster of nodes Ideally one replica is present at one geographical location The default replication factor is 3, which can be changed according to the requirements. We can change replication factor to desired value by editing configuration files
  • 11. High Availability HDFS provides high availability of data by creating multiple replicas of data blocks If a network link or node or some hardware goes down data will be available from some other path If a hardware goes down data can be accessed from other nodes or other paths as we have replicated data (default 3 replicas)
  • 12. Reliable Data Storage Data is stored quite reliably in HDFS, this feature is related to High Availability. Since data is highly available due to replication, hence if a node or few nodes or some hardware crashes, the data will be stored reliably. We can again balance the cluster by making the replication factor to desirable value (by running few commands) In other words we can say Hadoop (HDFS) is fault tolerant
  • 14. Scalability Two ways to scale the Filesystem ◦ Add more disks on the nodes of the cluster ◦ For this we need to edit the configuration files and add corresponding entries of newly added disks ◦ Horizontal Scaling ◦ Another option is to add more nodes in the cluster ◦ We can add n number of nodes in the cluster on the fly (without any downtime)
  • 15. File Access & other Operations Programming CLI (Command Line Interface) We can perform almost all the operations which we can do on local filesystem HDFS provides different access rights rwx to ugo We can also browse the filesystem by browser (http://Master-IP:50070)
  • 16. Read a File To read a file from HDFS, client needs to interact with Namenode, as namenode is single place where all the the metadata is stored Namenode provides the address of the slaves where file is stored Client will interact with respective datanodes to read the file Namenode also provide a token to the client which it shows to data node for authentication
  • 18. Write a File To write a file into HDFS, client needs to interact with namenode. Namenode provides the address of the slave on which client will start writing the data As soon as client finishes writing the block, the slave starts copying the block to another slave which in tern copy the block to another slave (3 replicas by default) After required replicas are created then it will send the acknowledge to the client Authentication process (same as file read)