SlideShare a Scribd company logo
1 of 16
Download to read offline
Hadoop Distributed File System:
The Architecture
Aisha Siddiqa
aasiddiqa@gmail.com
Outlines
• Motivation
• Introduction
• Basic Features
• Architecture
• Namenode
• Datanodes
• File System namespace
• Replication
• Replica Placement
• Replica Selection
• Namenode Startup
• Conclusion
2
Motivation
• Recent research trends are towards exploring and developing solutions for
big data
• Hadoop is the most popular framework for analyzing big data
• There is a need to have knowledge of distributed file system implemented
on Hadoop
3
INTRODUCTION 4
Basic Features
• Highly fault-tolerant
• Suitable for applications with large data sets
• High throughput
• Streaming access to file system data
• Can be built out of commodity hardware
• Platform Independent
• Write-once-read-many: append is supported
• A map-reduce application fits perfectly with this model
5
ARCHITECTURE 6
Multi-Node Cluster
7
Block Ops
Metadata Ops
Read
Write Write
Blocks
NameNode
Blocks Blocks BlocksBlocks
Client
Rack 1 Rack 2
Replication
MetaData(Name, Replicas,...)
Master/slave architecture
Namenode
 Single Namenode in a cluster
 manages the file system namespace and regulates access to files by clients
Datanodes
 A number of DataNodes usually one per node in a cluster
 manage storage attached to the nodes that they run on
 serve read/write requests, perform block creation, deletion and replication
upon instruction from Namenode
 multiple DataNodes on the same machine is rare
8
Namenode
 Keeps image of entire file system namespace and file Blockmap in memory
 4GB of local RAM is sufficient
Periodic checkpointing
• gets the FsImage and Editlog from its local file system at startup
• update FsImage with EditLog information
• stores a copy of the FsImage on filesytstem as a checkpoint
• the system can recover back to the last checkpointed state in case of crash
EditLog
• a transaction log to record every change that occurs to the filesystem
metadata
FsImage
• stores file system namespace with mapping of blocks to files and file system
properties
9
Datanode
 stores data in files in its local file system
 no knowledge about HDFS filesystem
 stores each block of HDFS data in a separate file
 Datanode does not create all files in the same directory
 heuristics to determine optimal number of files per directory and create
directories appropriately:
 Research issue?
 When the filesystem starts up it generates a list of all HDFS blocks and send
this report to Namenode: Blockreport
10
File system Namespace
• Hierarchical file system with directories and files
• Create, remove, move, rename etc.
• Namenode maintains the file system
Metadata
• Any meta information changes to the file system is recorded by the
Namenode
• number of replicas of the file can be specified by application
• replication factor of the file is stored in the Namenode
11
Data Replication
 each file is a sequence of blocks
 same size blocks
 for fault tolerance
 configurable block size and replicas (per file)
 a Heartbeat and a BlockReport is sent to Namenode
 Heartbeat notifies activeness of Datanode
 BlockReport contains record of all the blocks on a Datanode
12
Replica Selection
• to minimize the bandwidth consumption and latency
• local replica node is most preferred
• replica in the local data center is preferred over the remote one
13
Replica Placement
 Optimized replica placement
 Rack-aware replica placement:
 to improve reliability, availability and network bandwidth utilization
 Research topic
 Many racks, communication between racks are through switches
 Network bandwidth is different
 Replicas are typically placed on unique racks
 Simple but non-optimal
 Writes are expensive
 Replication factor is 3
 Another research topic?
 Replicas are placed: one on a node in a local rack, one on a different node in
the local rack and one on a node in a different rack.
 1/3 of the replica on a node, 2/3 on a rack and 1/3 distributed evenly across
remaining racks.
14
Namenode Startup
 Safemode
 Replication is not possible
 Each DataNode checks in with Heartbeat and BlockReport
 Namenode verifies that each block has acceptable number of replicas
 Namenode exits Safemode
 list of blocks that need to be replicated.
 Namenode then proceeds to replicate these blocks to other Datanodes.
15
Conclusion
• A discussion of HDFS Architecture
• Some policies are unique and provide future research directions
• Files and Directories per datanode
• Replica Placement
• Rack-aware replica placement
16

More Related Content

What's hot

Lecture 10 distributed database management system
Lecture 10   distributed database management systemLecture 10   distributed database management system
Lecture 10 distributed database management system
emailharmeet
 

What's hot (20)

Characteristics Schedule based on Recover-ability & Serial-ability
Characteristics Schedule based on Recover-ability & Serial-abilityCharacteristics Schedule based on Recover-ability & Serial-ability
Characteristics Schedule based on Recover-ability & Serial-ability
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Google File System
Google File SystemGoogle File System
Google File System
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guide
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Lecture 10 distributed database management system
Lecture 10   distributed database management systemLecture 10   distributed database management system
Lecture 10 distributed database management system
 
Distributed file system
Distributed file systemDistributed file system
Distributed file system
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
In-Memory Big Data Analytics
In-Memory Big Data AnalyticsIn-Memory Big Data Analytics
In-Memory Big Data Analytics
 
Kdd process
Kdd processKdd process
Kdd process
 
Tools for data warehousing
Tools  for data warehousingTools  for data warehousing
Tools for data warehousing
 
Object storage
Object storageObject storage
Object storage
 
Types of Load distributing algorithm in Distributed System
Types of Load distributing algorithm in Distributed SystemTypes of Load distributing algorithm in Distributed System
Types of Load distributing algorithm in Distributed System
 
Big data
Big dataBig data
Big data
 
Deadlock in database
Deadlock in databaseDeadlock in database
Deadlock in database
 
(Lecture 4)Slowly Changing Dimensions.pdf
(Lecture 4)Slowly Changing Dimensions.pdf(Lecture 4)Slowly Changing Dimensions.pdf
(Lecture 4)Slowly Changing Dimensions.pdf
 
Introduction to distributed database
Introduction to distributed databaseIntroduction to distributed database
Introduction to distributed database
 
Concurrency control
Concurrency controlConcurrency control
Concurrency control
 

Viewers also liked

Distributed Filesystems Review
Distributed Filesystems ReviewDistributed Filesystems Review
Distributed Filesystems Review
Schubert Zhang
 

Viewers also liked (7)

HDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed FilesystemHDFS: Hadoop Distributed Filesystem
HDFS: Hadoop Distributed Filesystem
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS Architecture
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Distributed Filesystems Review
Distributed Filesystems ReviewDistributed Filesystems Review
Distributed Filesystems Review
 
Intro To Hadoop
Intro To HadoopIntro To Hadoop
Intro To Hadoop
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Google File System
Google File SystemGoogle File System
Google File System
 

Similar to Hdfs architecture

Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Milad Sobhkhiz
 

Similar to Hdfs architecture (20)

Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Data Analytics presentation.pptx
Data Analytics presentation.pptxData Analytics presentation.pptx
Data Analytics presentation.pptx
 
Hadoop -HDFS.ppt
Hadoop -HDFS.pptHadoop -HDFS.ppt
Hadoop -HDFS.ppt
 
Hadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsHadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data Analytics
 
Hadoop and HDFS
Hadoop and HDFSHadoop and HDFS
Hadoop and HDFS
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
 
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangalore
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologies
 
HADOOP.pptx
HADOOP.pptxHADOOP.pptx
HADOOP.pptx
 
Hdfs
HdfsHdfs
Hdfs
 
HDFS.ppt
HDFS.pptHDFS.ppt
HDFS.ppt
 
Hadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbaiHadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbai
 
Hadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbaiHadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbai
 
Hadoop HDFS Architeture and Design
Hadoop HDFS Architeture and DesignHadoop HDFS Architeture and Design
Hadoop HDFS Architeture and Design
 
Tutorial Haddop 2.3
Tutorial Haddop 2.3Tutorial Haddop 2.3
Tutorial Haddop 2.3
 
HDFS.ppt
HDFS.pptHDFS.ppt
HDFS.ppt
 
Hadoop
HadoopHadoop
Hadoop
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 

Recently uploaded

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Recently uploaded (20)

Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 

Hdfs architecture

  • 1. Hadoop Distributed File System: The Architecture Aisha Siddiqa aasiddiqa@gmail.com
  • 2. Outlines • Motivation • Introduction • Basic Features • Architecture • Namenode • Datanodes • File System namespace • Replication • Replica Placement • Replica Selection • Namenode Startup • Conclusion 2
  • 3. Motivation • Recent research trends are towards exploring and developing solutions for big data • Hadoop is the most popular framework for analyzing big data • There is a need to have knowledge of distributed file system implemented on Hadoop 3
  • 5. Basic Features • Highly fault-tolerant • Suitable for applications with large data sets • High throughput • Streaming access to file system data • Can be built out of commodity hardware • Platform Independent • Write-once-read-many: append is supported • A map-reduce application fits perfectly with this model 5
  • 7. Multi-Node Cluster 7 Block Ops Metadata Ops Read Write Write Blocks NameNode Blocks Blocks BlocksBlocks Client Rack 1 Rack 2 Replication MetaData(Name, Replicas,...)
  • 8. Master/slave architecture Namenode  Single Namenode in a cluster  manages the file system namespace and regulates access to files by clients Datanodes  A number of DataNodes usually one per node in a cluster  manage storage attached to the nodes that they run on  serve read/write requests, perform block creation, deletion and replication upon instruction from Namenode  multiple DataNodes on the same machine is rare 8
  • 9. Namenode  Keeps image of entire file system namespace and file Blockmap in memory  4GB of local RAM is sufficient Periodic checkpointing • gets the FsImage and Editlog from its local file system at startup • update FsImage with EditLog information • stores a copy of the FsImage on filesytstem as a checkpoint • the system can recover back to the last checkpointed state in case of crash EditLog • a transaction log to record every change that occurs to the filesystem metadata FsImage • stores file system namespace with mapping of blocks to files and file system properties 9
  • 10. Datanode  stores data in files in its local file system  no knowledge about HDFS filesystem  stores each block of HDFS data in a separate file  Datanode does not create all files in the same directory  heuristics to determine optimal number of files per directory and create directories appropriately:  Research issue?  When the filesystem starts up it generates a list of all HDFS blocks and send this report to Namenode: Blockreport 10
  • 11. File system Namespace • Hierarchical file system with directories and files • Create, remove, move, rename etc. • Namenode maintains the file system Metadata • Any meta information changes to the file system is recorded by the Namenode • number of replicas of the file can be specified by application • replication factor of the file is stored in the Namenode 11
  • 12. Data Replication  each file is a sequence of blocks  same size blocks  for fault tolerance  configurable block size and replicas (per file)  a Heartbeat and a BlockReport is sent to Namenode  Heartbeat notifies activeness of Datanode  BlockReport contains record of all the blocks on a Datanode 12
  • 13. Replica Selection • to minimize the bandwidth consumption and latency • local replica node is most preferred • replica in the local data center is preferred over the remote one 13
  • 14. Replica Placement  Optimized replica placement  Rack-aware replica placement:  to improve reliability, availability and network bandwidth utilization  Research topic  Many racks, communication between racks are through switches  Network bandwidth is different  Replicas are typically placed on unique racks  Simple but non-optimal  Writes are expensive  Replication factor is 3  Another research topic?  Replicas are placed: one on a node in a local rack, one on a different node in the local rack and one on a node in a different rack.  1/3 of the replica on a node, 2/3 on a rack and 1/3 distributed evenly across remaining racks. 14
  • 15. Namenode Startup  Safemode  Replication is not possible  Each DataNode checks in with Heartbeat and BlockReport  Namenode verifies that each block has acceptable number of replicas  Namenode exits Safemode  list of blocks that need to be replicated.  Namenode then proceeds to replicate these blocks to other Datanodes. 15
  • 16. Conclusion • A discussion of HDFS Architecture • Some policies are unique and provide future research directions • Files and Directories per datanode • Replica Placement • Rack-aware replica placement 16