SlideShare a Scribd company logo
1 of 36
Hadoop manages:
› processor time
› memory
› disk space
› network bandwidth
 Does not have a security model
 Can handle HW failure
www.kellytechno.com
 Issues:
› race conditions
› synchronization
› deadlock
 i.e., same issues as distributed OS &
distributed filesystem
www.kellytechno.com
 Grid computing: (What is this?)
 e.g. Condor
› MPI model is more complicated
› does not automatically distribute data
› requires separate managed SAN
www.kellytechno.com
 Hadoop:
› simplified programming model
› data distributed as it is loaded
 HDFS splits large data files across machines
 HDFS replicates data
› failure causes additional replication
www.kellytechno.com
www.kellytechno.com
 Core idea: records are processed in
isolation
 Benefit: reduced communication
 Jargon:
› mapper – task that processes records
› Reducer – task that aggregates results from
mappers
www.kellytechno.com
www.kellytechno.com
How is the previous picture different from
normal grid/cluster computing?
Grid/cluster:
Programmer manages communication via MPI
vs
Hadoop:
communication is implicit
Hadoop manages data transfer and cluster topology
issues
www.kellytechno.com
 Hadoop overhead
› MPI does better for small numbers of nodes
 Hadoop – flat scalabity  pays off with
large data
› Little extra work to go from few to many nodes
 MPI – requires explicit refactoring from small
to larger number of nodes
www.kellytechno.com
 NFS: the Network File System
› Saw this in OS class
› Supports file system exporting
› Supports mounting of remote file system
www.kellytechno.com
www.kellytechno.com
Mounts Cascading mounts
www.kellytechno.com
 Establishes logical connection between server
and client.
 Mount operation: name of remote directory &
name of server
› Mount request is mapped to corresponding RPC
and forwarded to mount server running on server
machine.
› Export list – specifies local file systems that server
exports for mounting, along with names of
machines that are permitted to mount them.
www.kellytechno.com
 server returns a file handle—a key for further
accesses.
 File handle – a file-system identifier, and an
inode number to identify the mounted directory
 The mount operation changes only the user’s
view and does not affect the server side.
www.kellytechno.com
NFS Advantages
› Transparency – clients unaware of local vs remote
› Standard operations - open(), close(), fread(), etc.
NFS disadvantages
› Files in an NFS volume reside on a single machine
› No reliability guarantees if that machine goes down
› All clients must go to this machine to retrieve their
data
www.kellytechno.com
 HDFS Advantages:
› designed to store terabytes or petabytes
› data spread across a large number of machines
› supports much larger file sizes than NFS
› stores data reliably (replication)
www.kellytechno.com
 HDFS Advantages:
›  provides fast, scalable access
› serve more clients by adding more machines
› integrates with MapReduce local
computation
www.kellytechno.com
 HDFS Disadvantages
› Not as general-purpose as NFS
› Design restricts use to a particular class of
applications
› HDFS optimized for streaming read performance
not good at random access
www.kellytechno.com
 HDFS Disadvantages
› Write once read many model
› Updating a files after it has been closed is not
supported (can’t append data)
› System does not provide a mechanism for local
caching of data
www.kellytechno.com
 HDFS – block-structured file system
 File broken into blocks distributed among
DataNodes
 DataNodes – machines used to store data blocks
www.kellytechno.com
 Target machines chosen randomly on a block-
by-block basis
 Supports file sizes far larger than a single-
machine DFS
 Each block replicated across a number of
machines (3, by default)
www.kellytechno.com
www.kellytechno.com
 Expects large file size
› Small number of large files
› Hundreds of MB to GB each
 Expects sequential access
 Default block size in HDFS is 64MB
 Result:
› Reduces amount of metadata storage per file
›  Supports fast streaming of data (large amounts
of contiguous data)
www.kellytechno.com
 HDFS expects to read a block start-to-
finish
› Useful for MapReduce
› Not good for random access
› Not a good general purpose file system
www.kellytechno.com
 HDFS files are NOT part of the ordinary file system
 HDFS files are in separate name space
 Not possible to interact with files using ls, cp, mv,
etc.
 Don’t worry: HDFS provides similar utilities
www.kellytechno.com
 Meta data handled by NameNode
› Deal with synchronization by only allowing
one machine to handle it
› Store meta data for entire file system
› Not much data: file names, permissions, &
locations of each block of each file
www.kellytechno.com
www.kellytechno.com
 What happens if the NameNode fails?
› Bigger problem than failed DataNode
› Better be using RAID ;-)
› Cluster is kaput until NameNode restored
 Not exactly relevant but:
› DataNodes are more likely to fail.
› Why?
www.kellytechno.com
 First download and unzip a copy of Hadoop (
http://hadoop.apache.org/releases.html)
 Or better yet, follow this lecture first ;-)
  Important links:
› Hadoop website http://hadoop.apache.org/index.html
› Hadoop Users Guide http://hadoop.apache.org/docs/current/hadoop-
project-dist/hadoop-hdfs/HdfsUserGuide.html
› 2012 Edition of Hadoop User’s Guide http://it-ebooks.info/book/635/
www.kellytechno.com
  HDFS configuration is in conf/hadoop-defaults.xml
› Don’t change this file.
› Instead modify conf/hadoop-site.xml
› Be sure to replicate this file across all nodes in your cluster
› Format of entries in this file:
<property>
<name>property-name</name>
<value>property-value</value>
</property>
www.kellytechno.com
Necessary settings:
1.fs.default.name - describes the NameNode
› Format: protocol specifier, hostname, port
› Example: hdfs://punchbowl.cse.sc.edu:9000
1.dfs.data.dir – path on the local file system in which the
DataNode instance should store its data
› Format: pathname
› Example: /home/sauron/hdfs/data
› Can differ from DataNode to DataNode
› Default is /tmp
› /tmp is not a good idea in a production system ;-)
www.kellytechno.com
3. dfs.name.dir - path on the local FS of the
NameNode where the NameNode metadata is
stored
› Format: pathname
› Example: /home/sauron/hdfs/name
› Only used by NameNode
› Default is /tmp
› /tmp is not a good idea in a production system ;-)
3. dfs.replication – default replication factor
› Default is 3
› Fewer than 3 will impact availability of data.
www.kellytechno.com
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://your.server.name.com:9000</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/username/hdfs/data</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/home/username/hdfs/name</value>
</property>
</configuration>
www.kellytechno.com
 The Master Node needs to know the names of the
DataNode machines
› Add hostnames to conf/slaves
› One fully-qualified hostname per line
› (NameNode runs on Master Node)
 Create Necessary directories
› user@EachMachine$ mkdir -p $HOME/hdfs/data
› user@namenode$ mkdir -p $HOME/hdfs/name
› Note: owner needs read/write access to all directories
› Can run under your own name in a single machine cluster
› Do not run Hadoop as root. Duh!
www.kellytechno.com
www.kellytechno.com

More Related Content

What's hot

Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapakapa rohit
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisSameer Tiwari
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemRutvik Bapat
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHanborq Inc.
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceUday Vakalapudi
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User ReferenceBiju Nair
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationAdam Kawa
 
HDFS introduction
HDFS introductionHDFS introduction
HDFS introductioninjae yeo
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemAnand Kulkarni
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFSApache Apex
 

What's hot (20)

Interacting with hdfs
Interacting with hdfsInteracting with hdfs
Interacting with hdfs
 
Hadoop admin
Hadoop adminHadoop admin
Hadoop admin
 
Hadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapaHadoop HDFS by rohitkapa
Hadoop HDFS by rohitkapa
 
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - RedisStorage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
Storage Systems for big data - HDFS, HBase, and intro to KV Store - Redis
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Hadoop HDFS
Hadoop HDFSHadoop HDFS
Hadoop HDFS
 
Anatomy of file write in hadoop
Anatomy of file write in hadoopAnatomy of file write in hadoop
Anatomy of file write in hadoop
 
HDFS Design Principles
HDFS Design PrinciplesHDFS Design Principles
HDFS Design Principles
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User Reference
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
 
HDFS introduction
HDFS introductionHDFS introduction
HDFS introduction
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Redis vs Memcached
Redis vs MemcachedRedis vs Memcached
Redis vs Memcached
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
Hadoop HDFS Concepts
Hadoop HDFS ConceptsHadoop HDFS Concepts
Hadoop HDFS Concepts
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
Hadoop hdfs
Hadoop hdfsHadoop hdfs
Hadoop hdfs
 

Viewers also liked

Riju_Dey Resume
Riju_Dey ResumeRiju_Dey Resume
Riju_Dey ResumeRiju Dey
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabadKelly Technologies
 
Budi bandwidth-manajemen-dengan-pcq-pada-mikrotik
Budi bandwidth-manajemen-dengan-pcq-pada-mikrotikBudi bandwidth-manajemen-dengan-pcq-pada-mikrotik
Budi bandwidth-manajemen-dengan-pcq-pada-mikrotikFrance Xaviery
 
Project Jansori - location based scheduler
Project Jansori - location based schedulerProject Jansori - location based scheduler
Project Jansori - location based scheduler원준 이
 
Hadoop training institutes in bangalore
Hadoop training institutes in bangaloreHadoop training institutes in bangalore
Hadoop training institutes in bangaloreKelly Technologies
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadKelly Technologies
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesKelly Technologies
 

Viewers also liked (10)

Hi lite corinth
Hi lite corinthHi lite corinth
Hi lite corinth
 
Riju_Dey Resume
Riju_Dey ResumeRiju_Dey Resume
Riju_Dey Resume
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabad
 
Budi bandwidth-manajemen-dengan-pcq-pada-mikrotik
Budi bandwidth-manajemen-dengan-pcq-pada-mikrotikBudi bandwidth-manajemen-dengan-pcq-pada-mikrotik
Budi bandwidth-manajemen-dengan-pcq-pada-mikrotik
 
Project Jansori - location based scheduler
Project Jansori - location based schedulerProject Jansori - location based scheduler
Project Jansori - location based scheduler
 
Hadoop training institutes in bangalore
Hadoop training institutes in bangaloreHadoop training institutes in bangalore
Hadoop training institutes in bangalore
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologies
 
Tableau training in bangalore
Tableau training in bangaloreTableau training in bangalore
Tableau training in bangalore
 

Similar to Hadoop Distributed File System (HDFS) Overview

Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hivesrikanthhadoop
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreducesenthil0809
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxAnkitChauhan817826
 
Hadoop at a glance
Hadoop at a glanceHadoop at a glance
Hadoop at a glanceTan Tran
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesappaji intelhunt
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Simplilearn
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answersKalyan Hadoop
 
Hadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.inHadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.inTIB Academy
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyJay Nagar
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nageSantosh Nage
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basicHafizur Rahman
 
Dfs (Distributed computing)
Dfs (Distributed computing)Dfs (Distributed computing)
Dfs (Distributed computing)Sri Prasanna
 
Hadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbaiHadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbaiUnmesh Baile
 
Hadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbaiHadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbaiUnmesh Baile
 
Hadoop File System.pptx
Hadoop File System.pptxHadoop File System.pptx
Hadoop File System.pptxAakashBerlia1
 
Hadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsHadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsDrPDShebaKeziaMalarc
 

Similar to Hadoop Distributed File System (HDFS) Overview (20)

Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hive
 
Big data with HDFS and Mapreduce
Big data  with HDFS and MapreduceBig data  with HDFS and Mapreduce
Big data with HDFS and Mapreduce
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
 
Hadoop at a glance
Hadoop at a glanceHadoop at a glance
Hadoop at a glance
 
Hdfs
HdfsHdfs
Hdfs
 
Hadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologiesHadoop training in bangalore-kellytechnologies
Hadoop training in bangalore-kellytechnologies
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
Hadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.inHadoop tutorial for beginners-tibacademy.in
Hadoop tutorial for beginners-tibacademy.in
 
Apache Hadoop Big Data Technology
Apache Hadoop Big Data TechnologyApache Hadoop Big Data Technology
Apache Hadoop Big Data Technology
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nage
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basic
 
Hadoop data management
Hadoop data managementHadoop data management
Hadoop data management
 
Dfs (Distributed computing)
Dfs (Distributed computing)Dfs (Distributed computing)
Dfs (Distributed computing)
 
Hadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbaiHadoop professional-software-development-course-in-mumbai
Hadoop professional-software-development-course-in-mumbai
 
Hadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbaiHadoop-professional-software-development-course-in-mumbai
Hadoop-professional-software-development-course-in-mumbai
 
Hadoop File System.pptx
Hadoop File System.pptxHadoop File System.pptx
Hadoop File System.pptx
 
Hadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data AnalyticsHadoop Distributed File System for Big Data Analytics
Hadoop Distributed File System for Big Data Analytics
 

More from Kelly Technologies

Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesKelly Technologies
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesKelly Technologies
 
Data science institutes in hyderabad
Data science institutes in hyderabadData science institutes in hyderabad
Data science institutes in hyderabadKelly Technologies
 
Hadoop training institute in hyderabad
Hadoop training institute in hyderabadHadoop training institute in hyderabad
Hadoop training institute in hyderabadKelly Technologies
 
Hadoop institutes in hyderabad
Hadoop institutes in hyderabadHadoop institutes in hyderabad
Hadoop institutes in hyderabadKelly Technologies
 
Websphere mb training in hyderabad
Websphere mb training in hyderabadWebsphere mb training in hyderabad
Websphere mb training in hyderabadKelly Technologies
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreKelly Technologies
 
Oracle training-institutes-in-hyderabad
Oracle training-institutes-in-hyderabadOracle training-institutes-in-hyderabad
Oracle training-institutes-in-hyderabadKelly Technologies
 
Salesforce crm-training-in-bangalore
Salesforce crm-training-in-bangaloreSalesforce crm-training-in-bangalore
Salesforce crm-training-in-bangaloreKelly Technologies
 
Qlikview training in hyderabad
Qlikview training in hyderabadQlikview training in hyderabad
Qlikview training in hyderabadKelly Technologies
 
Project Management Planning training in hyderabad
Project Management Planning training in hyderabadProject Management Planning training in hyderabad
Project Management Planning training in hyderabadKelly Technologies
 
Ax finance training in hyderabad
Ax finance training in hyderabadAx finance training in hyderabad
Ax finance training in hyderabadKelly Technologies
 

More from Kelly Technologies (17)

Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
 
Data science institutes in hyderabad
Data science institutes in hyderabadData science institutes in hyderabad
Data science institutes in hyderabad
 
Hadoop training institute in hyderabad
Hadoop training institute in hyderabadHadoop training institute in hyderabad
Hadoop training institute in hyderabad
 
Hadoop institutes in hyderabad
Hadoop institutes in hyderabadHadoop institutes in hyderabad
Hadoop institutes in hyderabad
 
Sas training in hyderabad
Sas training in hyderabadSas training in hyderabad
Sas training in hyderabad
 
Websphere mb training in hyderabad
Websphere mb training in hyderabadWebsphere mb training in hyderabad
Websphere mb training in hyderabad
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangalore
 
Oracle training-institutes-in-hyderabad
Oracle training-institutes-in-hyderabadOracle training-institutes-in-hyderabad
Oracle training-institutes-in-hyderabad
 
Salesforce crm-training-in-bangalore
Salesforce crm-training-in-bangaloreSalesforce crm-training-in-bangalore
Salesforce crm-training-in-bangalore
 
Oracle training in hyderabad
Oracle training in hyderabadOracle training in hyderabad
Oracle training in hyderabad
 
Qlikview training in hyderabad
Qlikview training in hyderabadQlikview training in hyderabad
Qlikview training in hyderabad
 
Spark training-in-bangalore
Spark training-in-bangaloreSpark training-in-bangalore
Spark training-in-bangalore
 
Project Management Planning training in hyderabad
Project Management Planning training in hyderabadProject Management Planning training in hyderabad
Project Management Planning training in hyderabad
 
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangalore
 
Oracle training in_hyderabad
Oracle training in_hyderabadOracle training in_hyderabad
Oracle training in_hyderabad
 
Ax finance training in hyderabad
Ax finance training in hyderabadAx finance training in hyderabad
Ax finance training in hyderabad
 

Recently uploaded

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 

Recently uploaded (20)

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 

Hadoop Distributed File System (HDFS) Overview

  • 1.
  • 2. Hadoop manages: › processor time › memory › disk space › network bandwidth  Does not have a security model  Can handle HW failure www.kellytechno.com
  • 3.  Issues: › race conditions › synchronization › deadlock  i.e., same issues as distributed OS & distributed filesystem www.kellytechno.com
  • 4.  Grid computing: (What is this?)  e.g. Condor › MPI model is more complicated › does not automatically distribute data › requires separate managed SAN www.kellytechno.com
  • 5.  Hadoop: › simplified programming model › data distributed as it is loaded  HDFS splits large data files across machines  HDFS replicates data › failure causes additional replication www.kellytechno.com
  • 7.  Core idea: records are processed in isolation  Benefit: reduced communication  Jargon: › mapper – task that processes records › Reducer – task that aggregates results from mappers www.kellytechno.com
  • 9. How is the previous picture different from normal grid/cluster computing? Grid/cluster: Programmer manages communication via MPI vs Hadoop: communication is implicit Hadoop manages data transfer and cluster topology issues www.kellytechno.com
  • 10.  Hadoop overhead › MPI does better for small numbers of nodes  Hadoop – flat scalabity  pays off with large data › Little extra work to go from few to many nodes  MPI – requires explicit refactoring from small to larger number of nodes www.kellytechno.com
  • 11.  NFS: the Network File System › Saw this in OS class › Supports file system exporting › Supports mounting of remote file system www.kellytechno.com
  • 14.  Establishes logical connection between server and client.  Mount operation: name of remote directory & name of server › Mount request is mapped to corresponding RPC and forwarded to mount server running on server machine. › Export list – specifies local file systems that server exports for mounting, along with names of machines that are permitted to mount them. www.kellytechno.com
  • 15.  server returns a file handle—a key for further accesses.  File handle – a file-system identifier, and an inode number to identify the mounted directory  The mount operation changes only the user’s view and does not affect the server side. www.kellytechno.com
  • 16. NFS Advantages › Transparency – clients unaware of local vs remote › Standard operations - open(), close(), fread(), etc. NFS disadvantages › Files in an NFS volume reside on a single machine › No reliability guarantees if that machine goes down › All clients must go to this machine to retrieve their data www.kellytechno.com
  • 17.  HDFS Advantages: › designed to store terabytes or petabytes › data spread across a large number of machines › supports much larger file sizes than NFS › stores data reliably (replication) www.kellytechno.com
  • 18.  HDFS Advantages: ›  provides fast, scalable access › serve more clients by adding more machines › integrates with MapReduce local computation www.kellytechno.com
  • 19.  HDFS Disadvantages › Not as general-purpose as NFS › Design restricts use to a particular class of applications › HDFS optimized for streaming read performance not good at random access www.kellytechno.com
  • 20.  HDFS Disadvantages › Write once read many model › Updating a files after it has been closed is not supported (can’t append data) › System does not provide a mechanism for local caching of data www.kellytechno.com
  • 21.  HDFS – block-structured file system  File broken into blocks distributed among DataNodes  DataNodes – machines used to store data blocks www.kellytechno.com
  • 22.  Target machines chosen randomly on a block- by-block basis  Supports file sizes far larger than a single- machine DFS  Each block replicated across a number of machines (3, by default) www.kellytechno.com
  • 24.  Expects large file size › Small number of large files › Hundreds of MB to GB each  Expects sequential access  Default block size in HDFS is 64MB  Result: › Reduces amount of metadata storage per file ›  Supports fast streaming of data (large amounts of contiguous data) www.kellytechno.com
  • 25.  HDFS expects to read a block start-to- finish › Useful for MapReduce › Not good for random access › Not a good general purpose file system www.kellytechno.com
  • 26.  HDFS files are NOT part of the ordinary file system  HDFS files are in separate name space  Not possible to interact with files using ls, cp, mv, etc.  Don’t worry: HDFS provides similar utilities www.kellytechno.com
  • 27.  Meta data handled by NameNode › Deal with synchronization by only allowing one machine to handle it › Store meta data for entire file system › Not much data: file names, permissions, & locations of each block of each file www.kellytechno.com
  • 29.  What happens if the NameNode fails? › Bigger problem than failed DataNode › Better be using RAID ;-) › Cluster is kaput until NameNode restored  Not exactly relevant but: › DataNodes are more likely to fail. › Why? www.kellytechno.com
  • 30.  First download and unzip a copy of Hadoop ( http://hadoop.apache.org/releases.html)  Or better yet, follow this lecture first ;-)   Important links: › Hadoop website http://hadoop.apache.org/index.html › Hadoop Users Guide http://hadoop.apache.org/docs/current/hadoop- project-dist/hadoop-hdfs/HdfsUserGuide.html › 2012 Edition of Hadoop User’s Guide http://it-ebooks.info/book/635/ www.kellytechno.com
  • 31.   HDFS configuration is in conf/hadoop-defaults.xml › Don’t change this file. › Instead modify conf/hadoop-site.xml › Be sure to replicate this file across all nodes in your cluster › Format of entries in this file: <property> <name>property-name</name> <value>property-value</value> </property> www.kellytechno.com
  • 32. Necessary settings: 1.fs.default.name - describes the NameNode › Format: protocol specifier, hostname, port › Example: hdfs://punchbowl.cse.sc.edu:9000 1.dfs.data.dir – path on the local file system in which the DataNode instance should store its data › Format: pathname › Example: /home/sauron/hdfs/data › Can differ from DataNode to DataNode › Default is /tmp › /tmp is not a good idea in a production system ;-) www.kellytechno.com
  • 33. 3. dfs.name.dir - path on the local FS of the NameNode where the NameNode metadata is stored › Format: pathname › Example: /home/sauron/hdfs/name › Only used by NameNode › Default is /tmp › /tmp is not a good idea in a production system ;-) 3. dfs.replication – default replication factor › Default is 3 › Fewer than 3 will impact availability of data. www.kellytechno.com
  • 35.  The Master Node needs to know the names of the DataNode machines › Add hostnames to conf/slaves › One fully-qualified hostname per line › (NameNode runs on Master Node)  Create Necessary directories › user@EachMachine$ mkdir -p $HOME/hdfs/data › user@namenode$ mkdir -p $HOME/hdfs/name › Note: owner needs read/write access to all directories › Can run under your own name in a single machine cluster › Do not run Hadoop as root. Duh! www.kellytechno.com