62_Tazeen_Sayed_Hadoop_Ecosystem.pptx

Hadoop Ecosystem: FromBig
Data to Big Results
Name: Tazeen GulrezSayed
Class : TE - A
Roll number :62
Subject: Data Science And Big Data Analytics

1.Introduction to Hadoop Ecosystem
2.HDFS - Hadoop Distributed File System
3.YARN - Yet Another Resource
NegotiatorMapReduce
4.Other Components of Hadoop Ecosystem
5.Conclusion
INDEX

Hadoop is an open-source software framework that is
used for distributed storage and processing of big
data. It was created by Doug Cutting and Mike
Cafarella in 2005, and it has since become one of the
most popular big data processing platforms in the
world.
The Hadoop ecosystem consists of several
components, including HDFS (Hadoop Distributed File
System), YARN (Yet Another Resource Negotiator),
and MapReduce. These components work together to
provide a scalable, fault-tolerant platform for
processing large amounts ofdata.
Introduction to Hadoop
Ecosystem

HDFS - Hadoop Distributed File
System
Hadoop is an open-source software framework that is
used for distributed storage and processing of big
data. It was created by Doug Cutting and Mike
Cafarella in 2005, and it has since become one of the
most popular big data processing platforms in the
world.
The Hadoop ecosystem consists of several
components, including HDFS (Hadoop Distributed File
System), YARN (Yet Another Resource Negotiator),
and MapReduce. These components work together to
provide a scalable, fault-tolerant platform for

YARN - Yet Another Resource
Negotiator
YARN is the resource management layer of Hadoop. It
is responsible for managing resources in a Hadoop
cluster, such as CPU, memory, and disk space. YARN
allows multiple applications to run on the same
cluster without interfering with each other.
YARN also enables dynamic allocation of resources,
allowing applications to request additional resources
as needed. This makes it possible to run complex big
data applications that require significant amounts of
resources.

MapReduce
Map Reduce is a programming model used for
processing large datasets in parallel. It works by
breaking down a large dataset into smaller chunks,
which are then processed in parallel across multiple
nodes in a cluster. Map Reduce consists of two main
functions: map andreduce.
The map function takes input data and converts it into
key-value pairs, while the reduce function takes the
output of the map function and combines it into a
smaller set of key-value pairs. Map Reduce is highly
scalable and fault-tolerant, making it ideal for

Other Components ofHadoop
Ecosystem
In addition to HDFS, YARN, and MapReduce, the
Hadoop ecosystem includes several other
components that provide additional functionality.
These include Hive, Pig, HBase, and Spark.n addition
to HDFS, YARN, and MapReduce, the Hadoop
ecosystem includes several other components that
provide additional functionality. These include Hive,
Pig, HBase, and Spark.
Hive is a data warehouse system that provides SQL-
like querying capabilities for Hadoop. Pig is a high-
level platform for creating MapReduce programs.
HBase is a NoSQL database that provides real-time
access to data stored in Hadoop. Spark is a fast, in-
memory data processing engine that can be used with
Hadoop to perform real-time analytics.

CONCLUSION
The Hadoop ecosystem is a powerful platform
for processing large amounts of data. With its
distributed architecture, fault tolerance, and
scalability, Hadoop has become the go-to
solution for big data processing.
By understanding the various components of the
Hadoop ecosystem, businesses and
organizations can take advantage of its
capabilities to gain insights and make informed
decisions based on theirdata.

62_Tazeen_Sayed_Hadoop_Ecosystem.pptx

62_Tazeen_Sayed_Hadoop_Ecosystem.pptx

Recommended

Recommended

More Related Content

Similar to 62_Tazeen_Sayed_Hadoop_Ecosystem.pptx

Similar to 62_Tazeen_Sayed_Hadoop_Ecosystem.pptx (20)

Recently uploaded

Recently uploaded (20)

62_Tazeen_Sayed_Hadoop_Ecosystem.pptx