SlideShare a Scribd company logo
1 of 16
Hello!
Myself Archana R
Assistant Professor In
Dept Of CS
SACWC.
I am here because I love
to give presentation
about
HADOOP
• Need to process huge datasets on large clusters of computers
• Very expensive to build reliability into each application
• Nodes fail every day
Failure is expected, rather than exceptional
The number of nodes in a cluster is not constant
• Need a common infrastructure
Efficient, reliable, easy to use
Open Source, Apache Licence
Who uses Hadoop?
• Amazon/A9
• Facebook
• Google
• New York Times
• Veoh
• Yahoo!
• …. many more
Commodity Hardware
• Typically in 2 level architecture
Nodes are commodity PCs
30-40 nodes/rack
Uplink from rack is 3-4 gigabit
Rack-internal is 1 gigabit
Aggregation switch
Rack switch
Goals of HDFS
• Very Large Distributed File System
10K nodes, 100 million files, 10PB
• Assumes Commodity Hardware
Files are replicated to handle hardware failure
Detect failures and recover from them
• Optimized for Batch Processing
Data locations exposed so that computations can move to where data resides
Provides very high aggregate bandwidth
Distributed File System
• Single Namespace for entire cluster
• Data Coherency
Write-once-read-many access model
Client can only append to existing files
• Files are broken up into blocks
Typically 64MB block size
Each block replicated on multiple DataNodes
• Intelligent Client
Client can find location of blocks
Client accesses data directly from DataNode
HDFS Architecture
Functions of Name Node
• Manages File System Namespace
• Maps a file name to a set of blocks
• Maps a block to the DataNodes where it resides
• Cluster Configuration Management
• Replication Engine for Blocks
Name Node Metadata
• Metadata in Memory
• The entire metadata is in main memory
• No demand paging of metadata
• Types of metadata
• List of files
• List of Blocks for each file
• List of DataNodes for each block
• File attributes, e.g. creation time, replication factor
• A Transaction Log
• Records file creations, file deletions etc
Data Node
• A Block Server
Stores data in the local file system (e.g. ext3)
Stores metadata of a block (e.g. CRC)
Serves data and metadata to Clients
• Block Report
Periodically sends a report of all existing blocks to the Name Node
• Facilitates Pipelining of Data
Forwards data to other specified Data Nodes
Block Placement
• Current Strategy
One replica on local node
Second replica on a remote rack
Third replica on same remote rack
Additional replicas are randomly placed
• Clients read from nearest replicas
• Would like to make this policy pluggable
Heart beats
• Data Nodes send hear beat to the Name Node
Once every 3 seconds
• Name Node uses heartbeats to detect Data Node failure
Replication Engine
• Name Node detects Data Node failures
Chooses new Data Nodes for new replicas
Balances disk usage
Balances communication traffic to Data Nodes
Data Correctness
• Use Checksums to validate data
Use CRC32
• File Creation
Client computes checksum per 512 bytes
DataNode stores the checksum
• File access
Client retrieves the data and checksum from DataNode
If Validation fails, Client tries other replicas
Data Pipelining
• Client retrieves a list of DataNodes on which to place replicas of a block
• Client writes block to the first DataNode
• The first DataNode forwards the data to the next node in the Pipeline
• When all replicas are written, the Client moves on to write the next block in file
User Interface
• Commads for HDFS User:
hadoop dfs -mkdir /foodir
hadoop dfs -cat /foodir/myfile.txt
hadoop dfs -rm /foodir/myfile.txt
• Commands for HDFS Administrator
hadoop dfsadmin -report
hadoop dfsadmin -decommision datanodename
• Web Interface
http://host:port/dfshealth.jsp
THANK YOU

More Related Content

What's hot

Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5RojaT4
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asiaMuhammad Rifqi
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDr. C.V. Suresh Babu
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenmaharajothip1
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architectureHarikrishnan K
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystemnallagangus
 
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCEHADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCEHarsha Siva Sai
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streamshktripathy
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologiesneeraj rathore
 
Big data analysis using hadoop cluster
Big data analysis using hadoop clusterBig data analysis using hadoop cluster
Big data analysis using hadoop clusterFurqan Haider
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)SahilRaina21
 

What's hot (20)

Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File System
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
 
Hadoop
HadoopHadoop
Hadoop
 
Apache hadoop introduction and architecture
Apache hadoop  introduction and architectureApache hadoop  introduction and architecture
Apache hadoop introduction and architecture
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCEHADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
 
Hadoop
HadoopHadoop
Hadoop
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data & hadoop
Big data & hadoopBig data & hadoop
Big data & hadoop
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
 
Big data analysis using hadoop cluster
Big data analysis using hadoop clusterBig data analysis using hadoop cluster
Big data analysis using hadoop cluster
 
Hadoop
HadoopHadoop
Hadoop
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)
 

Similar to Introduction to Hadoop Distributed File System (HDFS) Architecture

HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inRahulBhole12
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File Systemelliando dias
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
Borthakur hadoop univ-research
Borthakur hadoop univ-researchBorthakur hadoop univ-research
Borthakur hadoop univ-researchsaintdevil163
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiridatastack
 
NoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application EnablementNoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application EnablementDATAVERSITY
 
Hadoop Ecosystem and Low Latency Streaming Architecture
Hadoop Ecosystem and Low Latency Streaming ArchitectureHadoop Ecosystem and Low Latency Streaming Architecture
Hadoop Ecosystem and Low Latency Streaming ArchitectureInSemble
 
Gfs google-file-system-13331
Gfs google-file-system-13331Gfs google-file-system-13331
Gfs google-file-system-13331Fengchang Xie
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservicesBigstep
 
UNIT II DIS.pptx
UNIT II DIS.pptxUNIT II DIS.pptx
UNIT II DIS.pptxSamPrem3
 
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxData
 

Similar to Introduction to Hadoop Distributed File System (HDFS) Architecture (20)

HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangalore
 
Cloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation inCloud computing UNIT 2.1 presentation in
Cloud computing UNIT 2.1 presentation in
 
Hdfs architecture
Hdfs architectureHdfs architecture
Hdfs architecture
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Borthakur hadoop univ-research
Borthakur hadoop univ-researchBorthakur hadoop univ-research
Borthakur hadoop univ-research
 
HADOOP.pptx
HADOOP.pptxHADOOP.pptx
HADOOP.pptx
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
 
NoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application EnablementNoSQL – Data Center Centric Application Enablement
NoSQL – Data Center Centric Application Enablement
 
Big Data for QAs
Big Data for QAsBig Data for QAs
Big Data for QAs
 
Hadoop Ecosystem and Low Latency Streaming Architecture
Hadoop Ecosystem and Low Latency Streaming ArchitectureHadoop Ecosystem and Low Latency Streaming Architecture
Hadoop Ecosystem and Low Latency Streaming Architecture
 
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
 
Hadoop
HadoopHadoop
Hadoop
 
Gfs google-file-system-13331
Gfs google-file-system-13331Gfs google-file-system-13331
Gfs google-file-system-13331
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
UNIT II DIS.pptx
UNIT II DIS.pptxUNIT II DIS.pptx
UNIT II DIS.pptx
 
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
 

More from Archana Gopinath

Data Transfer & Manipulation.pptx
Data Transfer & Manipulation.pptxData Transfer & Manipulation.pptx
Data Transfer & Manipulation.pptxArchana Gopinath
 
DP _ CO Instruction Format.pptx
DP _ CO Instruction Format.pptxDP _ CO Instruction Format.pptx
DP _ CO Instruction Format.pptxArchana Gopinath
 
Language for specifying lexical Analyzer
Language for specifying lexical AnalyzerLanguage for specifying lexical Analyzer
Language for specifying lexical AnalyzerArchana Gopinath
 
Implementation of lexical analyser
Implementation of lexical analyserImplementation of lexical analyser
Implementation of lexical analyserArchana Gopinath
 
A simple approach of lexical analyzers
A simple approach of lexical analyzersA simple approach of lexical analyzers
A simple approach of lexical analyzersArchana Gopinath
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical AnalyzerArchana Gopinath
 
minimization the number of states of DFA
minimization the number of states of DFAminimization the number of states of DFA
minimization the number of states of DFAArchana Gopinath
 
Regular Expression to Finite Automata
Regular Expression to Finite AutomataRegular Expression to Finite Automata
Regular Expression to Finite AutomataArchana Gopinath
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopArchana Gopinath
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSArchana Gopinath
 
Programming with R in Big Data Analytics
Programming with R in Big Data AnalyticsProgramming with R in Big Data Analytics
Programming with R in Big Data AnalyticsArchana Gopinath
 
If statements in c programming
If statements in c programmingIf statements in c programming
If statements in c programmingArchana Gopinath
 
Guided media Transmission Media
Guided media Transmission MediaGuided media Transmission Media
Guided media Transmission MediaArchana Gopinath
 

More from Archana Gopinath (18)

Data Transfer & Manipulation.pptx
Data Transfer & Manipulation.pptxData Transfer & Manipulation.pptx
Data Transfer & Manipulation.pptx
 
DP _ CO Instruction Format.pptx
DP _ CO Instruction Format.pptxDP _ CO Instruction Format.pptx
DP _ CO Instruction Format.pptx
 
Language for specifying lexical Analyzer
Language for specifying lexical AnalyzerLanguage for specifying lexical Analyzer
Language for specifying lexical Analyzer
 
Implementation of lexical analyser
Implementation of lexical analyserImplementation of lexical analyser
Implementation of lexical analyser
 
A simple approach of lexical analyzers
A simple approach of lexical analyzersA simple approach of lexical analyzers
A simple approach of lexical analyzers
 
A Role of Lexical Analyzer
A Role of Lexical AnalyzerA Role of Lexical Analyzer
A Role of Lexical Analyzer
 
minimization the number of states of DFA
minimization the number of states of DFAminimization the number of states of DFA
minimization the number of states of DFA
 
Regular Expression to Finite Automata
Regular Expression to Finite AutomataRegular Expression to Finite Automata
Regular Expression to Finite Automata
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and Hadoop
 
Map reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICSMap reduce in Hadoop BIG DATA ANALYTICS
Map reduce in Hadoop BIG DATA ANALYTICS
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Programming with R in Big Data Analytics
Programming with R in Big Data AnalyticsProgramming with R in Big Data Analytics
Programming with R in Big Data Analytics
 
If statements in c programming
If statements in c programmingIf statements in c programming
If statements in c programming
 
un Guided media
un Guided mediaun Guided media
un Guided media
 
Guided media Transmission Media
Guided media Transmission MediaGuided media Transmission Media
Guided media Transmission Media
 
Main Memory RAM and ROM
Main Memory RAM and ROMMain Memory RAM and ROM
Main Memory RAM and ROM
 
Java thread life cycle
Java thread life cycleJava thread life cycle
Java thread life cycle
 
PCSTt11 overview of java
PCSTt11 overview of javaPCSTt11 overview of java
PCSTt11 overview of java
 

Recently uploaded

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 

Recently uploaded (20)

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 

Introduction to Hadoop Distributed File System (HDFS) Architecture

  • 1. Hello! Myself Archana R Assistant Professor In Dept Of CS SACWC. I am here because I love to give presentation about
  • 2. HADOOP • Need to process huge datasets on large clusters of computers • Very expensive to build reliability into each application • Nodes fail every day Failure is expected, rather than exceptional The number of nodes in a cluster is not constant • Need a common infrastructure Efficient, reliable, easy to use Open Source, Apache Licence
  • 3. Who uses Hadoop? • Amazon/A9 • Facebook • Google • New York Times • Veoh • Yahoo! • …. many more
  • 4. Commodity Hardware • Typically in 2 level architecture Nodes are commodity PCs 30-40 nodes/rack Uplink from rack is 3-4 gigabit Rack-internal is 1 gigabit Aggregation switch Rack switch
  • 5. Goals of HDFS • Very Large Distributed File System 10K nodes, 100 million files, 10PB • Assumes Commodity Hardware Files are replicated to handle hardware failure Detect failures and recover from them • Optimized for Batch Processing Data locations exposed so that computations can move to where data resides Provides very high aggregate bandwidth
  • 6. Distributed File System • Single Namespace for entire cluster • Data Coherency Write-once-read-many access model Client can only append to existing files • Files are broken up into blocks Typically 64MB block size Each block replicated on multiple DataNodes • Intelligent Client Client can find location of blocks Client accesses data directly from DataNode
  • 8. Functions of Name Node • Manages File System Namespace • Maps a file name to a set of blocks • Maps a block to the DataNodes where it resides • Cluster Configuration Management • Replication Engine for Blocks
  • 9. Name Node Metadata • Metadata in Memory • The entire metadata is in main memory • No demand paging of metadata • Types of metadata • List of files • List of Blocks for each file • List of DataNodes for each block • File attributes, e.g. creation time, replication factor • A Transaction Log • Records file creations, file deletions etc
  • 10. Data Node • A Block Server Stores data in the local file system (e.g. ext3) Stores metadata of a block (e.g. CRC) Serves data and metadata to Clients • Block Report Periodically sends a report of all existing blocks to the Name Node • Facilitates Pipelining of Data Forwards data to other specified Data Nodes
  • 11. Block Placement • Current Strategy One replica on local node Second replica on a remote rack Third replica on same remote rack Additional replicas are randomly placed • Clients read from nearest replicas • Would like to make this policy pluggable
  • 12. Heart beats • Data Nodes send hear beat to the Name Node Once every 3 seconds • Name Node uses heartbeats to detect Data Node failure Replication Engine • Name Node detects Data Node failures Chooses new Data Nodes for new replicas Balances disk usage Balances communication traffic to Data Nodes
  • 13. Data Correctness • Use Checksums to validate data Use CRC32 • File Creation Client computes checksum per 512 bytes DataNode stores the checksum • File access Client retrieves the data and checksum from DataNode If Validation fails, Client tries other replicas
  • 14. Data Pipelining • Client retrieves a list of DataNodes on which to place replicas of a block • Client writes block to the first DataNode • The first DataNode forwards the data to the next node in the Pipeline • When all replicas are written, the Client moves on to write the next block in file
  • 15. User Interface • Commads for HDFS User: hadoop dfs -mkdir /foodir hadoop dfs -cat /foodir/myfile.txt hadoop dfs -rm /foodir/myfile.txt • Commands for HDFS Administrator hadoop dfsadmin -report hadoop dfsadmin -decommision datanodename • Web Interface http://host:port/dfshealth.jsp