SlideShare a Scribd company logo
1 of 13
Download to read offline
Apache Hadoop,
The Hadoop
Ecosystem
BY ROHIT RAJ
Why Hadoop?
Suppose we want to process a data. In the traditional approach, we used to store data on local machines. This data was then
processed. Now as data started increasing, the local machines or computers were not capable enough to store this huge data set.
So, data was then started to be stored on remote servers. Now suppose we need to process that data. So, in the traditional
approach, this data has to be fetched from the servers and then processed upon. Suppose this data is of 500 GB. Now, practically it
is very complex and expensive to fetch this data. This approach is also called Enterprise Approach.
In the new Hadoop Approach, instead of fetching the data on local machines we send the query to the data. Obviously, the
query to process the data will not be as huge as the data itself. Moreover, at the server, the query is divided into several
parts. All these parts process the data simultaneously. This is called parallel execution and is possible because of Map
Reduce. So, now not only there is no need to fetch the data, but also the processing takes lesser time. The result of the query
is then sent to the user. Thus the Hadoop makes data storage, processing and analyzing way easier than its traditional
approach.
What Is Hadoop?
Facebook
Google
JPMorgan and Chase
Goldmansachs
Yahoo
AWS
Microsoft
IBM
Cloudera
IQVIA
Rackspace Technology
And Many More
Companies Uses Hadoop Based System
Data Scientist
Big Data Visualizer
Big Data Research Analyst
Big Data Engineer
Big Data Analyst
Big Data Architect
And Many More
Jobs Opportunity With Apache Hadoop
Hadoop Architecture
Hadoop Distributed File System: In our local PC, by default the block size in Hard Disk is 4KB. When we install Hadoop, the HDFS by
default changes the block size to 64 MB. Since it is used to store huge data. We can also change the block size to 128 MB. Now HDFS
works with Data Node and Name Node. While Name Node is a master service and it keeps the metadata as for on which commodity
hardware, the data is residing, the Data Node stores the actual data. Now, since the block size is of 64 MB thus the storage required to
store metadata is reduced thus making HDFS better. Also, Hadoop stores three copies of every dataset at three different locations.
This ensures that the Hadoop is not prone to single point of failure.
Map Reduce: In the simplest manner, it can be understood that MapReduce breaks a query into multiple parts and now each part
process the data coherently. This parallel execution helps to execute a query faster and makes Hadoop a suitable and optimal choice
to deal with Big Data.
YARN: As we know that Yet Another Resource Negotiator works like an operating system to Hadoop and as operating systems are
resource managers so YARN manages the resources of Hadoop so that Hadoop serves big data in a better way.
1.
2.
3.
Hadoop has a master-slave
topology. In this topology, we have
one master node and multiple slave
nodes. Master node’s function is to
assign a task to various slave nodes
and manage resources. The slave
nodes do the actual computing.
Slave nodes store the real data
whereas on master we have
metadata. This means it stores data
about data.
Map Reduce
MapReduce is the processing layer of Hadoop. MapReduce programming model is designed for
processing large volumes of data in parallel by dividing the work into a set of independent tasks.
You need to put business logic in the way MapReduce works and rest things will be taken care by
the framework. Work (complete job) which is submitted by the user to master is divided into small
works (tasks) and assigned to slaves.
MapReduce programs are written in a particular style influenced by functional programming
constructs, specifical idioms for processing lists of data. Here in MapReduce, we get inputs from a
list and it converts it into output which is again a list. It is the heart of Hadoop. Hadoop is so much
powerful and efficient due to MapRreduce as here parallel processing is done.
Map() performs sorting and filtering of data and thereby organizing them in the
form of group. Map generates a key-value pair based result which is later on
processed by the Reduce() method.
Reduce(), as the name suggests does the summarization by aggregating the
mapped data. In simple, Reduce() takes the output generated by Map() as input
and combines those tuples into smaller set of tuples
Two Key Words :- 1. Map , 2.Reduce
1.
2.
Hadoop Yarn
Yet Another Resource Negotiator, as the name implies, YARN is the
one who helps to manage the resources across the clusters. In short,
it performs scheduling and resource allocation for the Hadoop
System.
Consists of three major components i.e.
Resource Manager
Nodes Manager
Application Manager
Resource manager has the privilege of allocating resources for the
applications in a system whereas Node managers work on the
allocation of resources such as CPU, memory, bandwidth per
machine and later on acknowledges the resource manager.
Application manager works as an interface between the resource
manager and node manager and performs negotiations as per the
requirement of the two.
a.
b.
c.
Hadoop HDFS
HDFS stands for Hadoop Distributed File System.
It provides for data storage of Hadoop. HDFS
splits the data unit into smaller units called
blocks and stores them in a distributed manner.
It has got two daemons running. One for master
node – NameNode and other for slave nodes –
DataNode.
HDFS has a Master-slave architecture. The daemon called
NameNode runs on the master server. It is responsible for
Namespace management and regulates file access by the client.
DataNode daemon runs on slave nodes. It is responsible for
storing actual business data. Internally, a file gets split into a
number of data blocks and stored on a group of slave machines.
Namenode manages modifications to file system namespace.
These are actions like the opening, closing and renaming files or
directories. NameNode also keeps track of mapping of blocks to
DataNodes. This DataNodes serves read/write request from the file
system’s client. DataNode also creates, deletes and replicates
blocks on demand from NameNode.
HeartBeat : It is the signal that datanode continuously sends to
namenode. If namenode doesn’t receive heartbeat from a
datanode then it will consider it dead.
Balancing : If a datanode is crashed the blocks present on it will
be gone too and the blocks will be under-replicated compared to
the remaining blocks. Here master node(namenode) will give a
signal to datanodes containing replicas of those lost blocks to
replicate so that overall distribution of blocks is balanced.
Replication:: It is done by datanode.
Terms related to HDFS:
Features Of Hadoop
Economically Feasible:
It is cheaper to store
data and process it than
it was in the traditional
approach. Since the
actual machines used to
store data are only
commodity hardware.
Easy to Use: The
projects or set of tools
provided by Apache
Hadoop are easy to
work upon in order to
analyze complex data
sets.
Open Source: Since
Hadoop is distributed as
an open source software
under Apache License,
so one does not need to
pay for it, just download
it and use it.
Scalability: Hadoop is
highly scalable in
nature. If one needs to
scale up or scale down
the cluster, one only
needs to change the
number of commodity
hardware in the cluster.
Fault Tolerance: Since Hadoop stores
three copies of data, so even if one copy is
lost because of any commodity hardware
failure, the data is safe. Moreover, as
Hadoop version 3 has multiple name
nodes, so even the single point of failure
of Hadoop has also been removed.
Locality of Data: This is one of the most
alluring and promising features of
Hadoop. In Hadoop, to process a query
over a data set, instead of bringing the
data to the local computer we send the
query to the server and fetch the final
result from there. This is called data
locality.
Distributed Processing:
HDFS and Map Reduce
ensures distributed
storage and processing
of the data.
Advantage and Disadvantage
Difference B/w Hadoop and RDBMS
The Hadoop EcoSystem
Its A Platform/Framework
Helps To Solve Big Data Problems
Bibliography
https://data-flair.training/blogs/hadoop-architecture/
https://www.geeksforgeeks.org/hadoop-introduction/
https://hadoop.apache.org/docs/current/
https://github.com/apache/hadoop
https://youtu.be/1vbXmCrkT3Y
Thank You

More Related Content

What's hot

Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programPraveen Kumar Donta
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hari Shankar Sreekumar
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopRan Ziv
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationSameer Tiwari
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questionsKalyan Hadoop
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache HadoopOleksiy Krotov
 
Hadoop Distributed file system.pdf
Hadoop Distributed file system.pdfHadoop Distributed file system.pdf
Hadoop Distributed file system.pdfvishal choudhary
 
Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overview
Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overviewHdfs, Map Reduce & hadoop 1.0 vs 2.0 overview
Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overviewNitesh Ghosh
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoopjeffturner
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 
White paper hadoop performancetuning
White paper hadoop performancetuningWhite paper hadoop performancetuning
White paper hadoop performancetuningAnil Reddy
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slidesryancox
 
02.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 201302.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 2013WANdisco Plc
 

What's hot (17)

An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to Hadoop
 
Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
Hadoop architecture (Delhi Hadoop User Group Meetup 10 Sep 2011)
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animation
 
Hadoop hdfs interview questions
Hadoop hdfs interview questionsHadoop hdfs interview questions
Hadoop hdfs interview questions
 
BIG DATA: Apache Hadoop
BIG DATA: Apache HadoopBIG DATA: Apache Hadoop
BIG DATA: Apache Hadoop
 
Hadoop Distributed file system.pdf
Hadoop Distributed file system.pdfHadoop Distributed file system.pdf
Hadoop Distributed file system.pdf
 
Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overview
Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overviewHdfs, Map Reduce & hadoop 1.0 vs 2.0 overview
Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overview
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
Understanding hdfs
Understanding hdfsUnderstanding hdfs
Understanding hdfs
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
White paper hadoop performancetuning
White paper hadoop performancetuningWhite paper hadoop performancetuning
White paper hadoop performancetuning
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
02.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 201302.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 2013
 
Hadoop
HadoopHadoop
Hadoop
 

Similar to Hadoop Ecosystem

Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxUttara University
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component rebeccatho
 
Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khanKamranKhan587
 
BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfDIVYA370851
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxAltafKhadim
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopMr. Ankit
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with HadoopNalini Mehta
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Rupak Roy
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherJanBask Training
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorialvinayiqbusiness
 
Hadoop – Architecture.pptx
Hadoop – Architecture.pptxHadoop – Architecture.pptx
Hadoop – Architecture.pptxSakthiVinoth78
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introductionXuan-Chao Huang
 
big data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing databig data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing datapreetik9044
 

Similar to Hadoop Ecosystem (20)

Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
 
Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khan
 
BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdf
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop Tutorial for Beginners
Hadoop Tutorial for BeginnersHadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
 
hadoop
hadoophadoop
hadoop
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
hadoop.pptx
hadoop.pptxhadoop.pptx
hadoop.pptx
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
Introduction to hadoop ecosystem
Introduction to hadoop ecosystem Introduction to hadoop ecosystem
Introduction to hadoop ecosystem
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Hadoop – Architecture.pptx
Hadoop – Architecture.pptxHadoop – Architecture.pptx
Hadoop – Architecture.pptx
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 
big data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing databig data hadoop technonolgy for storing and processing data
big data hadoop technonolgy for storing and processing data
 
Bigdata ppt
Bigdata pptBigdata ppt
Bigdata ppt
 

Recently uploaded

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 

Recently uploaded (20)

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 

Hadoop Ecosystem

  • 2. Why Hadoop? Suppose we want to process a data. In the traditional approach, we used to store data on local machines. This data was then processed. Now as data started increasing, the local machines or computers were not capable enough to store this huge data set. So, data was then started to be stored on remote servers. Now suppose we need to process that data. So, in the traditional approach, this data has to be fetched from the servers and then processed upon. Suppose this data is of 500 GB. Now, practically it is very complex and expensive to fetch this data. This approach is also called Enterprise Approach. In the new Hadoop Approach, instead of fetching the data on local machines we send the query to the data. Obviously, the query to process the data will not be as huge as the data itself. Moreover, at the server, the query is divided into several parts. All these parts process the data simultaneously. This is called parallel execution and is possible because of Map Reduce. So, now not only there is no need to fetch the data, but also the processing takes lesser time. The result of the query is then sent to the user. Thus the Hadoop makes data storage, processing and analyzing way easier than its traditional approach.
  • 3. What Is Hadoop? Facebook Google JPMorgan and Chase Goldmansachs Yahoo AWS Microsoft IBM Cloudera IQVIA Rackspace Technology And Many More Companies Uses Hadoop Based System Data Scientist Big Data Visualizer Big Data Research Analyst Big Data Engineer Big Data Analyst Big Data Architect And Many More Jobs Opportunity With Apache Hadoop
  • 4. Hadoop Architecture Hadoop Distributed File System: In our local PC, by default the block size in Hard Disk is 4KB. When we install Hadoop, the HDFS by default changes the block size to 64 MB. Since it is used to store huge data. We can also change the block size to 128 MB. Now HDFS works with Data Node and Name Node. While Name Node is a master service and it keeps the metadata as for on which commodity hardware, the data is residing, the Data Node stores the actual data. Now, since the block size is of 64 MB thus the storage required to store metadata is reduced thus making HDFS better. Also, Hadoop stores three copies of every dataset at three different locations. This ensures that the Hadoop is not prone to single point of failure. Map Reduce: In the simplest manner, it can be understood that MapReduce breaks a query into multiple parts and now each part process the data coherently. This parallel execution helps to execute a query faster and makes Hadoop a suitable and optimal choice to deal with Big Data. YARN: As we know that Yet Another Resource Negotiator works like an operating system to Hadoop and as operating systems are resource managers so YARN manages the resources of Hadoop so that Hadoop serves big data in a better way. 1. 2. 3. Hadoop has a master-slave topology. In this topology, we have one master node and multiple slave nodes. Master node’s function is to assign a task to various slave nodes and manage resources. The slave nodes do the actual computing. Slave nodes store the real data whereas on master we have metadata. This means it stores data about data.
  • 5. Map Reduce MapReduce is the processing layer of Hadoop. MapReduce programming model is designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks. You need to put business logic in the way MapReduce works and rest things will be taken care by the framework. Work (complete job) which is submitted by the user to master is divided into small works (tasks) and assigned to slaves. MapReduce programs are written in a particular style influenced by functional programming constructs, specifical idioms for processing lists of data. Here in MapReduce, we get inputs from a list and it converts it into output which is again a list. It is the heart of Hadoop. Hadoop is so much powerful and efficient due to MapRreduce as here parallel processing is done. Map() performs sorting and filtering of data and thereby organizing them in the form of group. Map generates a key-value pair based result which is later on processed by the Reduce() method. Reduce(), as the name suggests does the summarization by aggregating the mapped data. In simple, Reduce() takes the output generated by Map() as input and combines those tuples into smaller set of tuples Two Key Words :- 1. Map , 2.Reduce 1. 2.
  • 6. Hadoop Yarn Yet Another Resource Negotiator, as the name implies, YARN is the one who helps to manage the resources across the clusters. In short, it performs scheduling and resource allocation for the Hadoop System. Consists of three major components i.e. Resource Manager Nodes Manager Application Manager Resource manager has the privilege of allocating resources for the applications in a system whereas Node managers work on the allocation of resources such as CPU, memory, bandwidth per machine and later on acknowledges the resource manager. Application manager works as an interface between the resource manager and node manager and performs negotiations as per the requirement of the two. a. b. c.
  • 7. Hadoop HDFS HDFS stands for Hadoop Distributed File System. It provides for data storage of Hadoop. HDFS splits the data unit into smaller units called blocks and stores them in a distributed manner. It has got two daemons running. One for master node – NameNode and other for slave nodes – DataNode. HDFS has a Master-slave architecture. The daemon called NameNode runs on the master server. It is responsible for Namespace management and regulates file access by the client. DataNode daemon runs on slave nodes. It is responsible for storing actual business data. Internally, a file gets split into a number of data blocks and stored on a group of slave machines. Namenode manages modifications to file system namespace. These are actions like the opening, closing and renaming files or directories. NameNode also keeps track of mapping of blocks to DataNodes. This DataNodes serves read/write request from the file system’s client. DataNode also creates, deletes and replicates blocks on demand from NameNode. HeartBeat : It is the signal that datanode continuously sends to namenode. If namenode doesn’t receive heartbeat from a datanode then it will consider it dead. Balancing : If a datanode is crashed the blocks present on it will be gone too and the blocks will be under-replicated compared to the remaining blocks. Here master node(namenode) will give a signal to datanodes containing replicas of those lost blocks to replicate so that overall distribution of blocks is balanced. Replication:: It is done by datanode. Terms related to HDFS:
  • 8. Features Of Hadoop Economically Feasible: It is cheaper to store data and process it than it was in the traditional approach. Since the actual machines used to store data are only commodity hardware. Easy to Use: The projects or set of tools provided by Apache Hadoop are easy to work upon in order to analyze complex data sets. Open Source: Since Hadoop is distributed as an open source software under Apache License, so one does not need to pay for it, just download it and use it. Scalability: Hadoop is highly scalable in nature. If one needs to scale up or scale down the cluster, one only needs to change the number of commodity hardware in the cluster. Fault Tolerance: Since Hadoop stores three copies of data, so even if one copy is lost because of any commodity hardware failure, the data is safe. Moreover, as Hadoop version 3 has multiple name nodes, so even the single point of failure of Hadoop has also been removed. Locality of Data: This is one of the most alluring and promising features of Hadoop. In Hadoop, to process a query over a data set, instead of bringing the data to the local computer we send the query to the server and fetch the final result from there. This is called data locality. Distributed Processing: HDFS and Map Reduce ensures distributed storage and processing of the data.
  • 11. The Hadoop EcoSystem Its A Platform/Framework Helps To Solve Big Data Problems