SlideShare a Scribd company logo
Hadoop Distributed File System
Big Data Analytics
Nadar Saraswathi College of Arts & Science
Submitted By
N. Nagapandiyammal
M.Sc Computer Science
Hadoop Distributed File System
 The Hadoop Distributed File System (HDFS) is the primary
data storage system used by Hadoop applications.
 It employs a NameNode and DataNode architecture to
implement a distributed file system that provides high-
performance access to data across highly scalable Hadoop
clusters.
 HDFS is a key part of the many Hadoop ecosystem
technologies, as it provides a reliable means for managing
pools of big data and supporting related big data
analytics applications.
 The Hadoop distributed file system (HDFS) is a distributed,
scalable, and portable file system written in Java for the
Hadoop framework.
HDFS has five services
 1. Name Node
 2. Secondary Name Node
 3. Job tracker
 4. Data Node
 5. Task Tracker
Name Node
 HDFS consists of only one Name Node we call it as Master
Node which can track the files, manage the file system and
has the meta data and the whole data in it.
 To be particular Name node contains the details of the No.
of blocks, Locations at what data node the data is stored and
where the replications are stored and other details.
 As we have only one Name Node we call it as Single Point
Failure. It has Direct connect with the client.
Data Node
 A Data Node stores data in it as the blocks. This is also
known as the slave node and it stores the actual data into
HDFS which is responsible for the client to read and write.
 These are slave daemons. Every Data node sends a
Heartbeat message to the Name node every 3 seconds and
conveys that it is alive.
 In this way when Name Node does not receive a heartbeat
from a data node for 2 minutes, it will take that data node as
dead and starts the process of block replications on some
other Data node.
Secondary Name Node
 This is only to take care of the checkpoints of the file
system metadata which is in the Name Node.
 This is also known as the checkpoint Node. It is helper
Node for the Name Node.
Job Tracker
 Basically Job Tracker will be useful in the Processing the
data. Job Tracker receives the requests for Map Reduce
execution from the client.
 Job tracker talks to the Name node to know about the
location of the data like Job Tracker will request the Name
Node for the processing the data.
 Name node in response gives the Meta data to job tracker.
Task Tracker
 It is the Slave Node for the Job Tracker and it will take the
task from the Job Tracker. And also it receives code from
the Job Tracker.
 Task Tracker will take the code and apply on the file. The
process of applying that code on the file is known as
Mapper.
Other file systems
 HDFS: Hadoop's own rack-aware file system. This is designed
to scale to tens of petabytes of storage and runs on top of the
file systems of the underlying operating systems.
 FTP file system: This stores all its data on remotely accessible
FTP servers.
 Amazon S3 (Simple Storage Service) object storage: This is
targeted at clusters hosted on the Amazon Elastic Compute
Cloud server-on-demand infrastructure. There is no rack-
awareness in this file system, as it is all remote.
 Windows Azure Storage Blobs (WASB) file system: This is an
extension of HDFS that allows distributions of Hadoop to
access data in Azure blob stores without moving the data
permanently into the cluster.
Why use HDFS?
 The Hadoop Distributed File System arose at Yahoo as a
part of that company's ad serving and search engine
requirements. Like other web-oriented companies, Yahoo
found itself juggling a variety of applications that were
accessed by a growing numbers of users, who were creating
more and more data.
 Facebook, eBay, LinkedIn and Twitter are among the web
companies that used HDFS to underpin big data analytics to
address these same requirements.
 HDFS was used by The New York Times as part of large-
scale image conversions, Media6Degrees for log processing
and machine learning, LiveBet for log storage and odds
analysis, Joost for session analysis and Fox Audience
Network for log analysis and data mining.
 HDFS is also at the core of many open source data
warehouse alternatives, sometimes called data lakes.
HDFS and Hadoop history
 In 2006, Hadoop's originators ceded their work on HDFS and
MapReduce to the Apache Software Foundation project. In 2012,
HDFS and Hadoop became available in Version 1.0. The basic HDFS
standard has been continuously updated since its inception.
 With Version 2.0 of Hadoop in 2013, a general-purpose YARN
resource manager was added, and MapReduce and HDFS were
effectively decoupled. Thereafter, diverse data processing frameworks
and file systems were supported by Hadoop.
 While MapReduce was often replaced by Apache Spark, HDFS
continued to be a prevalent file format for Hadoop. After four alpha
releases and one beta, Apache Hadoop 3.0.0 became generally
available in December 2017, with HDFS enhancements supporting
additional NameNodes, erasure coding facilities and greater data
compression.
 At the same time, advances in HDFS tooling, such as LinkedIn's open
source Dr. Elephant and Dynamometer performance testing tools, have
expanded to enable development of ever larger HDFS
implementations.
Thank You

More Related Content

What's hot

2.introduction to hdfs
2.introduction to hdfs2.introduction to hdfs
2.introduction to hdfs
databloginfo
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Koushik Mondal
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
Anshul Bhatnagar
 
Hadoop file system
Hadoop file systemHadoop file system
Hadoop file system
John Veigas
 
Big Data and Hadoop - An Introduction
Big Data and Hadoop - An IntroductionBig Data and Hadoop - An Introduction
Big Data and Hadoop - An Introduction
Nagarjuna Kanamarlapudi
 
Hadoop paper
Hadoop paperHadoop paper
Hadoop paper
ATWIINE Simon Alex
 
BIG DATA Session 6
BIG DATA Session 6BIG DATA Session 6
BIG DATA Session 6
Infinity Tech Solutions
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
vinayiqbusiness
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
JigsawAcademy2014
 
Sector Vs Hadoop
Sector Vs HadoopSector Vs Hadoop
Sector Vs Hadoop
lilyco
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File System
Dr. C.V. Suresh Babu
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
Oleksiy Krotov
 
lec4_ref.pdf
lec4_ref.pdflec4_ref.pdf
lec4_ref.pdf
vishal choudhary
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
Laxmi Rauth
 
Most Popular Hadoop Interview Questions and Answers
Most Popular Hadoop Interview Questions and AnswersMost Popular Hadoop Interview Questions and Answers
Most Popular Hadoop Interview Questions and Answers
Sprintzeal
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Cognizant
 
Big data
Big dataBig data
Big data
revathireddyb
 

What's hot (19)

2.introduction to hdfs
2.introduction to hdfs2.introduction to hdfs
2.introduction to hdfs
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Hadoop distributed file system
Hadoop distributed file systemHadoop distributed file system
Hadoop distributed file system
 
Hadoop file system
Hadoop file systemHadoop file system
Hadoop file system
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data and Hadoop - An Introduction
Big Data and Hadoop - An IntroductionBig Data and Hadoop - An Introduction
Big Data and Hadoop - An Introduction
 
Hadoop paper
Hadoop paperHadoop paper
Hadoop paper
 
BIG DATA Session 6
BIG DATA Session 6BIG DATA Session 6
BIG DATA Session 6
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
Sector Vs Hadoop
Sector Vs HadoopSector Vs Hadoop
Sector Vs Hadoop
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File System
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 
lec4_ref.pdf
lec4_ref.pdflec4_ref.pdf
lec4_ref.pdf
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Most Popular Hadoop Interview Questions and Answers
Most Popular Hadoop Interview Questions and AnswersMost Popular Hadoop Interview Questions and Answers
Most Popular Hadoop Interview Questions and Answers
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
 
Big data
Big dataBig data
Big data
 

Similar to Hadoop Distributed File System

big data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing databig data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing data
preetik9044
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
Bhavesh Padharia
 
BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdf
DIVYA370851
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
Hadoop
HadoopHadoop
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem
Rupak Roy
 
module 2.pptx
module 2.pptxmodule 2.pptx
module 2.pptx
ssuser6e8e41
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
Nalini Mehta
 
Hadoop
HadoopHadoop
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
Manoj Jangalva
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on Raspberry
IJRESJOURNAL
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
Jay Nagar
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
Uttara University
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
Siva Sankar
 
Hadoop-BigData
Hadoop-BigDataHadoop-BigData
Hadoop-BigData
Gigin Krishnan
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreduce
senthil0809
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
AltafKhadim
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
JanBask Training
 

Similar to Hadoop Distributed File System (20)

big data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing databig data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing data
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdf
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Hadoop
HadoopHadoop
Hadoop
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem
 
module 2.pptx
module 2.pptxmodule 2.pptx
module 2.pptx
 
Unit 1
Unit 1Unit 1
Unit 1
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
Design and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on RaspberryDesign and Research of Hadoop Distributed Cluster Based on Raspberry
Design and Research of Hadoop Distributed Cluster Based on Raspberry
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
 
Hadoop-BigData
Hadoop-BigDataHadoop-BigData
Hadoop-BigData
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreduce
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 
hadoop
hadoophadoop
hadoop
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
 

More from NilaNila16

Basic Block Scheduling
Basic Block SchedulingBasic Block Scheduling
Basic Block Scheduling
NilaNila16
 
Affine Array Indexes
Affine Array IndexesAffine Array Indexes
Affine Array Indexes
NilaNila16
 
Software Engineering
Software EngineeringSoftware Engineering
Software Engineering
NilaNila16
 
Web Programming
Web ProgrammingWeb Programming
Web Programming
NilaNila16
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
NilaNila16
 
Data Mining
Data MiningData Mining
Data Mining
NilaNila16
 
Operating system
Operating systemOperating system
Operating system
NilaNila16
 
RDBMS
RDBMSRDBMS
RDBMS
NilaNila16
 
Linear Block Codes
Linear Block CodesLinear Block Codes
Linear Block Codes
NilaNila16
 
Applications of graph theory
                      Applications of graph theory                      Applications of graph theory
Applications of graph theory
NilaNila16
 
Hasse Diagram
Hasse DiagramHasse Diagram
Hasse Diagram
NilaNila16
 
Fuzzy set
Fuzzy set Fuzzy set
Fuzzy set
NilaNila16
 
Recurrence Relation
Recurrence RelationRecurrence Relation
Recurrence Relation
NilaNila16
 
Input/Output Exploring java.io
Input/Output Exploring java.ioInput/Output Exploring java.io
Input/Output Exploring java.io
NilaNila16
 

More from NilaNila16 (14)

Basic Block Scheduling
Basic Block SchedulingBasic Block Scheduling
Basic Block Scheduling
 
Affine Array Indexes
Affine Array IndexesAffine Array Indexes
Affine Array Indexes
 
Software Engineering
Software EngineeringSoftware Engineering
Software Engineering
 
Web Programming
Web ProgrammingWeb Programming
Web Programming
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Data Mining
Data MiningData Mining
Data Mining
 
Operating system
Operating systemOperating system
Operating system
 
RDBMS
RDBMSRDBMS
RDBMS
 
Linear Block Codes
Linear Block CodesLinear Block Codes
Linear Block Codes
 
Applications of graph theory
                      Applications of graph theory                      Applications of graph theory
Applications of graph theory
 
Hasse Diagram
Hasse DiagramHasse Diagram
Hasse Diagram
 
Fuzzy set
Fuzzy set Fuzzy set
Fuzzy set
 
Recurrence Relation
Recurrence RelationRecurrence Relation
Recurrence Relation
 
Input/Output Exploring java.io
Input/Output Exploring java.ioInput/Output Exploring java.io
Input/Output Exploring java.io
 

Recently uploaded

International Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software TestingInternational Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software Testing
Sebastiano Panichella
 
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Sebastiano Panichella
 
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Orkestra
 
Eureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 PresentationEureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 Presentation
Access Innovations, Inc.
 
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
OECD Directorate for Financial and Enterprise Affairs
 
Obesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditionsObesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditions
Faculty of Medicine And Health Sciences
 
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXOBitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Matjaž Lipuš
 
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
0x01 - Newton's Third Law:  Static vs. Dynamic Abusers0x01 - Newton's Third Law:  Static vs. Dynamic Abusers
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
OWASP Beja
 
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptxsomanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
Howard Spence
 
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdfBonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
khadija278284
 
Getting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerGetting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control Tower
Vladimir Samoylov
 
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Sebastiano Panichella
 
Acorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesAcorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutes
IP ServerOne
 

Recently uploaded (13)

International Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software TestingInternational Workshop on Artificial Intelligence in Software Testing
International Workshop on Artificial Intelligence in Software Testing
 
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...Doctoral Symposium at the 17th IEEE International Conference on Software Test...
Doctoral Symposium at the 17th IEEE International Conference on Software Test...
 
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
 
Eureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 PresentationEureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 Presentation
 
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
Competition and Regulation in Professional Services – KLEINER – June 2024 OEC...
 
Obesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditionsObesity causes and management and associated medical conditions
Obesity causes and management and associated medical conditions
 
Bitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXOBitcoin Lightning wallet and tic-tac-toe game XOXO
Bitcoin Lightning wallet and tic-tac-toe game XOXO
 
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
0x01 - Newton's Third Law:  Static vs. Dynamic Abusers0x01 - Newton's Third Law:  Static vs. Dynamic Abusers
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
 
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptxsomanykidsbutsofewfathers-140705000023-phpapp02.pptx
somanykidsbutsofewfathers-140705000023-phpapp02.pptx
 
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdfBonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
Bonzo subscription_hjjjjjjjj5hhhhhhh_2024.pdf
 
Getting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerGetting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control Tower
 
Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...Announcement of 18th IEEE International Conference on Software Testing, Verif...
Announcement of 18th IEEE International Conference on Software Testing, Verif...
 
Acorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesAcorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutes
 

Hadoop Distributed File System

  • 1. Hadoop Distributed File System Big Data Analytics Nadar Saraswathi College of Arts & Science Submitted By N. Nagapandiyammal M.Sc Computer Science
  • 2. Hadoop Distributed File System  The Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop applications.  It employs a NameNode and DataNode architecture to implement a distributed file system that provides high- performance access to data across highly scalable Hadoop clusters.  HDFS is a key part of the many Hadoop ecosystem technologies, as it provides a reliable means for managing pools of big data and supporting related big data analytics applications.  The Hadoop distributed file system (HDFS) is a distributed, scalable, and portable file system written in Java for the Hadoop framework.
  • 3. HDFS has five services  1. Name Node  2. Secondary Name Node  3. Job tracker  4. Data Node  5. Task Tracker
  • 4.
  • 5. Name Node  HDFS consists of only one Name Node we call it as Master Node which can track the files, manage the file system and has the meta data and the whole data in it.  To be particular Name node contains the details of the No. of blocks, Locations at what data node the data is stored and where the replications are stored and other details.  As we have only one Name Node we call it as Single Point Failure. It has Direct connect with the client.
  • 6. Data Node  A Data Node stores data in it as the blocks. This is also known as the slave node and it stores the actual data into HDFS which is responsible for the client to read and write.  These are slave daemons. Every Data node sends a Heartbeat message to the Name node every 3 seconds and conveys that it is alive.  In this way when Name Node does not receive a heartbeat from a data node for 2 minutes, it will take that data node as dead and starts the process of block replications on some other Data node.
  • 7. Secondary Name Node  This is only to take care of the checkpoints of the file system metadata which is in the Name Node.  This is also known as the checkpoint Node. It is helper Node for the Name Node.
  • 8. Job Tracker  Basically Job Tracker will be useful in the Processing the data. Job Tracker receives the requests for Map Reduce execution from the client.  Job tracker talks to the Name node to know about the location of the data like Job Tracker will request the Name Node for the processing the data.  Name node in response gives the Meta data to job tracker.
  • 9. Task Tracker  It is the Slave Node for the Job Tracker and it will take the task from the Job Tracker. And also it receives code from the Job Tracker.  Task Tracker will take the code and apply on the file. The process of applying that code on the file is known as Mapper.
  • 10. Other file systems  HDFS: Hadoop's own rack-aware file system. This is designed to scale to tens of petabytes of storage and runs on top of the file systems of the underlying operating systems.  FTP file system: This stores all its data on remotely accessible FTP servers.  Amazon S3 (Simple Storage Service) object storage: This is targeted at clusters hosted on the Amazon Elastic Compute Cloud server-on-demand infrastructure. There is no rack- awareness in this file system, as it is all remote.  Windows Azure Storage Blobs (WASB) file system: This is an extension of HDFS that allows distributions of Hadoop to access data in Azure blob stores without moving the data permanently into the cluster.
  • 11. Why use HDFS?  The Hadoop Distributed File System arose at Yahoo as a part of that company's ad serving and search engine requirements. Like other web-oriented companies, Yahoo found itself juggling a variety of applications that were accessed by a growing numbers of users, who were creating more and more data.  Facebook, eBay, LinkedIn and Twitter are among the web companies that used HDFS to underpin big data analytics to address these same requirements.  HDFS was used by The New York Times as part of large- scale image conversions, Media6Degrees for log processing and machine learning, LiveBet for log storage and odds analysis, Joost for session analysis and Fox Audience Network for log analysis and data mining.  HDFS is also at the core of many open source data warehouse alternatives, sometimes called data lakes.
  • 12. HDFS and Hadoop history  In 2006, Hadoop's originators ceded their work on HDFS and MapReduce to the Apache Software Foundation project. In 2012, HDFS and Hadoop became available in Version 1.0. The basic HDFS standard has been continuously updated since its inception.  With Version 2.0 of Hadoop in 2013, a general-purpose YARN resource manager was added, and MapReduce and HDFS were effectively decoupled. Thereafter, diverse data processing frameworks and file systems were supported by Hadoop.  While MapReduce was often replaced by Apache Spark, HDFS continued to be a prevalent file format for Hadoop. After four alpha releases and one beta, Apache Hadoop 3.0.0 became generally available in December 2017, with HDFS enhancements supporting additional NameNodes, erasure coding facilities and greater data compression.  At the same time, advances in HDFS tooling, such as LinkedIn's open source Dr. Elephant and Dynamometer performance testing tools, have expanded to enable development of ever larger HDFS implementations.
  • 13.