SlideShare a Scribd company logo

Distributed Systems Hadoop.pptx

This presentation provides a comprehensive introduction to the Hadoop Distributed System, a powerful and widely used framework for distributed storage and processing of large-scale data. Hadoop has revolutionized the way organizations manage and analyze data, making it a crucial tool in the field of big data and data analytics. In this presentation, we explore the key components and features of Hadoop, shedding light on the fundamental building blocks that enable its exceptional data processing capabilities. We cover essential topics, including the Hadoop Distributed File System (HDFS), MapReduce, YARN (Yet Another Resource Negotiator), and Hadoop Ecosystem components like Hive, Pig, and Spark.

1 of 22
Download to read offline
Distributed Systems (Hadoop)
Name: Alamin
Stu Id: 23-92971-2
Table Of Content
 What is distributed system?
 What is Hadoop?
 How Hadoop works?
 Important components of Hadoop
 Hadoop Common
 Hadoop HDFS
 Hadoop YARN
 Hadoop MapReduce
 Key features of Hadoop
What is distributed systems?
 The distributed system is a collection of interconnected computers or nodes that work
together to achieve a common goal.
 In a distributed system, these nodes are physically separated and communicate with each
other through a network, such as the internet or a local area network (LAN).
 Distributed computing is a way to make computers work together like a team. It's like
breaking down a big job into smaller pieces, and then giving each piece to a different
computer to work on.
 Distributed computing is used in all sorts of applications, from scientific research to business
intelligence to video games.
 It's a powerful tool that can be used to solve problems that would be too big or too hard for a
single computer to handle.
Some common types of Distributed systems
There are many distributed systems have like:
 Client-server system
 Peer-to-Peer(P2P) system
 Cluster and Grid Computing
 Cloud Computing
 Distributed Database
 Distributed file systems
What is Hadoop?
 Hadoop follow the distributed architecture or you can say Hadoop also be a distribute systems
service.
 Hadoop is an open-source framework that allows us to store and process large datasets in a
parallel and distributed manner.
 This distributed environment is built up of a cluster of machines that work closely together to
give an impression of a single working machine.
 It is designed to handle massive amounts of data across a distributed cluster of commodity
hardware.
 Hadoop was originally developed by Doug Cutting and Mike Cafarella in 2005 and is now
maintained by the Apache Software Foundation.
How Hadoop Works?
Hadoop works by distributing and processing large datasets across a cluster of computers,
providing a framework for scalable and fault-tolerant data storage and analysis. Here's an
overview of how Hadoop works:
 Data Storage with HDFS (Hadoop Distributed File System):
 Data is stored in Hadoop using HDFS, which divides large files into smaller blocks (typically 128
MB or 256 MB in size).
 These blocks are replicated across multiple nodes in the Hadoop cluster for fault tolerance. By
default, each block is replicated three times.
 Data Ingestion:
 Data is ingested into Hadoop by copying it to the HDFS. This can be done using Hadoop
commands, APIs, or other tools.
 Data Processing with MapReduce:
 MapReduce is a programming model for parallel data processing. It consists of two main phases:
Map and Reduce.
 In the Map phase, data is broken down into key-value pairs, and a set of user-defined Map functions
is applied to each pair.

Recommended

More Related Content

Similar to Distributed Systems Hadoop.pptx

Similar to Distributed Systems Hadoop.pptx (20)

hadoop
hadoophadoop
hadoop
 
Hadoop overview.pdf
Hadoop overview.pdfHadoop overview.pdf
Hadoop overview.pdf
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
 
Big Data and Hadoop Guide
Big Data and Hadoop GuideBig Data and Hadoop Guide
Big Data and Hadoop Guide
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoop
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big Data and Hadoop Basics
Big Data and Hadoop BasicsBig Data and Hadoop Basics
Big Data and Hadoop Basics
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Big Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – HadoopBig Data Analysis and Its Scheduling Policy – Hadoop
Big Data Analysis and Its Scheduling Policy – Hadoop
 
G017143640
G017143640G017143640
G017143640
 
Introduction to HADOOP.pdf
Introduction to HADOOP.pdfIntroduction to HADOOP.pdf
Introduction to HADOOP.pdf
 
Big data
Big dataBig data
Big data
 
Hadoop map reduce
Hadoop map reduceHadoop map reduce
Hadoop map reduce
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 
Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khan
 
Anju
AnjuAnju
Anju
 
Hadoop
HadoopHadoop
Hadoop
 

Recently uploaded

Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...
Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...
Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...AKSHAYMAGAR17
 
A LABORATORY MANUAL FOR ORGANIC CHEMISTRY.pdf
A LABORATORY MANUAL FOR ORGANIC CHEMISTRY.pdfA LABORATORY MANUAL FOR ORGANIC CHEMISTRY.pdf
A LABORATORY MANUAL FOR ORGANIC CHEMISTRY.pdfDr.M.Geethavani
 
2.22.24 Black Nationalism and the Nation of Islam.pptx
2.22.24 Black Nationalism and the Nation of Islam.pptx2.22.24 Black Nationalism and the Nation of Islam.pptx
2.22.24 Black Nationalism and the Nation of Islam.pptxMaryPotorti1
 
Overview of Databases and Data Modelling-2.pdf
Overview of Databases and Data Modelling-2.pdfOverview of Databases and Data Modelling-2.pdf
Overview of Databases and Data Modelling-2.pdfChristalin Nelson
 
Andreas Schleicher - 20 Feb 2024 - How pop music, podcasts, and Tik Tok are i...
Andreas Schleicher - 20 Feb 2024 - How pop music, podcasts, and Tik Tok are i...Andreas Schleicher - 20 Feb 2024 - How pop music, podcasts, and Tik Tok are i...
Andreas Schleicher - 20 Feb 2024 - How pop music, podcasts, and Tik Tok are i...EduSkills OECD
 
Data Modeling - Entity Relationship Diagrams-1.pdf
Data Modeling - Entity Relationship Diagrams-1.pdfData Modeling - Entity Relationship Diagrams-1.pdf
Data Modeling - Entity Relationship Diagrams-1.pdfChristalin Nelson
 
The Ministry of Utmost Happiness by Arundhati Roy
The Ministry of Utmost Happiness by Arundhati RoyThe Ministry of Utmost Happiness by Arundhati Roy
The Ministry of Utmost Happiness by Arundhati RoyTrushali Dodiya
 
Grades 7 to 8 Anti- OSAEC and CSAEM session.pptx
Grades 7 to 8 Anti- OSAEC and CSAEM session.pptxGrades 7 to 8 Anti- OSAEC and CSAEM session.pptx
Grades 7 to 8 Anti- OSAEC and CSAEM session.pptxGladysValencia13
 
ICSE English Literature Class X Handwritten Notes
ICSE English Literature Class X Handwritten NotesICSE English Literature Class X Handwritten Notes
ICSE English Literature Class X Handwritten NotesGauri S
 
DISCOURSE: TEXT AS CONNECTED DISCOURSE
DISCOURSE:   TEXT AS CONNECTED DISCOURSEDISCOURSE:   TEXT AS CONNECTED DISCOURSE
DISCOURSE: TEXT AS CONNECTED DISCOURSEMYDA ANGELICA SUAN
 
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - HK2 - GLOBAL SUCCESS - NĂM HỌC 202...
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - HK2 - GLOBAL SUCCESS - NĂM HỌC 202...BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - HK2 - GLOBAL SUCCESS - NĂM HỌC 202...
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - HK2 - GLOBAL SUCCESS - NĂM HỌC 202...Nguyen Thanh Tu Collection
 
11 CI SINIF SINAQLARI - 5-2023-Aynura-Hamidova.pdf
11 CI SINIF SINAQLARI - 5-2023-Aynura-Hamidova.pdf11 CI SINIF SINAQLARI - 5-2023-Aynura-Hamidova.pdf
11 CI SINIF SINAQLARI - 5-2023-Aynura-Hamidova.pdfAynouraHamidova
 
Overview of Databases and Data Modelling-1.pdf
Overview of Databases and Data Modelling-1.pdfOverview of Databases and Data Modelling-1.pdf
Overview of Databases and Data Modelling-1.pdfChristalin Nelson
 
BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...
BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...
BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...MohonDas
 
IR introduction Introduction, Principle & Theory
IR introduction Introduction, Principle & TheoryIR introduction Introduction, Principle & Theory
IR introduction Introduction, Principle & Theorynivedithag131
 
UniSC Fraser Coast library self-guided tour
UniSC Fraser Coast library self-guided tourUniSC Fraser Coast library self-guided tour
UniSC Fraser Coast library self-guided tourUSC_Library
 
EDL 290F Week 1 - Meet Me at the Start Line.pdf
EDL 290F Week 1 - Meet Me at the Start Line.pdfEDL 290F Week 1 - Meet Me at the Start Line.pdf
EDL 290F Week 1 - Meet Me at the Start Line.pdfElizabeth Walsh
 
skeletal system details with joints and its types
skeletal system details with joints and its typesskeletal system details with joints and its types
skeletal system details with joints and its typesMinaxi patil. CATALLYST
 
Plant Genetic Resources, Germplasm, gene pool - Copy.pptx
Plant Genetic Resources, Germplasm, gene pool - Copy.pptxPlant Genetic Resources, Germplasm, gene pool - Copy.pptx
Plant Genetic Resources, Germplasm, gene pool - Copy.pptxAKSHAYMAGAR17
 

Recently uploaded (20)

Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...
Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...
Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...
 
A LABORATORY MANUAL FOR ORGANIC CHEMISTRY.pdf
A LABORATORY MANUAL FOR ORGANIC CHEMISTRY.pdfA LABORATORY MANUAL FOR ORGANIC CHEMISTRY.pdf
A LABORATORY MANUAL FOR ORGANIC CHEMISTRY.pdf
 
2.22.24 Black Nationalism and the Nation of Islam.pptx
2.22.24 Black Nationalism and the Nation of Islam.pptx2.22.24 Black Nationalism and the Nation of Islam.pptx
2.22.24 Black Nationalism and the Nation of Islam.pptx
 
Overview of Databases and Data Modelling-2.pdf
Overview of Databases and Data Modelling-2.pdfOverview of Databases and Data Modelling-2.pdf
Overview of Databases and Data Modelling-2.pdf
 
Andreas Schleicher - 20 Feb 2024 - How pop music, podcasts, and Tik Tok are i...
Andreas Schleicher - 20 Feb 2024 - How pop music, podcasts, and Tik Tok are i...Andreas Schleicher - 20 Feb 2024 - How pop music, podcasts, and Tik Tok are i...
Andreas Schleicher - 20 Feb 2024 - How pop music, podcasts, and Tik Tok are i...
 
Capter 5 Climate of Ethiopia and the Horn GeES 1011.pdf
Capter 5 Climate of Ethiopia and the Horn GeES 1011.pdfCapter 5 Climate of Ethiopia and the Horn GeES 1011.pdf
Capter 5 Climate of Ethiopia and the Horn GeES 1011.pdf
 
Data Modeling - Entity Relationship Diagrams-1.pdf
Data Modeling - Entity Relationship Diagrams-1.pdfData Modeling - Entity Relationship Diagrams-1.pdf
Data Modeling - Entity Relationship Diagrams-1.pdf
 
The Ministry of Utmost Happiness by Arundhati Roy
The Ministry of Utmost Happiness by Arundhati RoyThe Ministry of Utmost Happiness by Arundhati Roy
The Ministry of Utmost Happiness by Arundhati Roy
 
Grades 7 to 8 Anti- OSAEC and CSAEM session.pptx
Grades 7 to 8 Anti- OSAEC and CSAEM session.pptxGrades 7 to 8 Anti- OSAEC and CSAEM session.pptx
Grades 7 to 8 Anti- OSAEC and CSAEM session.pptx
 
ICSE English Literature Class X Handwritten Notes
ICSE English Literature Class X Handwritten NotesICSE English Literature Class X Handwritten Notes
ICSE English Literature Class X Handwritten Notes
 
DISCOURSE: TEXT AS CONNECTED DISCOURSE
DISCOURSE:   TEXT AS CONNECTED DISCOURSEDISCOURSE:   TEXT AS CONNECTED DISCOURSE
DISCOURSE: TEXT AS CONNECTED DISCOURSE
 
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - HK2 - GLOBAL SUCCESS - NĂM HỌC 202...
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - HK2 - GLOBAL SUCCESS - NĂM HỌC 202...BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - HK2 - GLOBAL SUCCESS - NĂM HỌC 202...
BÀI TẬP BỔ TRỢ 4 KĨ NĂNG TIẾNG ANH LỚP 8 - HK2 - GLOBAL SUCCESS - NĂM HỌC 202...
 
11 CI SINIF SINAQLARI - 5-2023-Aynura-Hamidova.pdf
11 CI SINIF SINAQLARI - 5-2023-Aynura-Hamidova.pdf11 CI SINIF SINAQLARI - 5-2023-Aynura-Hamidova.pdf
11 CI SINIF SINAQLARI - 5-2023-Aynura-Hamidova.pdf
 
Overview of Databases and Data Modelling-1.pdf
Overview of Databases and Data Modelling-1.pdfOverview of Databases and Data Modelling-1.pdf
Overview of Databases and Data Modelling-1.pdf
 
BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...
BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...
BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...
 
IR introduction Introduction, Principle & Theory
IR introduction Introduction, Principle & TheoryIR introduction Introduction, Principle & Theory
IR introduction Introduction, Principle & Theory
 
UniSC Fraser Coast library self-guided tour
UniSC Fraser Coast library self-guided tourUniSC Fraser Coast library self-guided tour
UniSC Fraser Coast library self-guided tour
 
EDL 290F Week 1 - Meet Me at the Start Line.pdf
EDL 290F Week 1 - Meet Me at the Start Line.pdfEDL 290F Week 1 - Meet Me at the Start Line.pdf
EDL 290F Week 1 - Meet Me at the Start Line.pdf
 
skeletal system details with joints and its types
skeletal system details with joints and its typesskeletal system details with joints and its types
skeletal system details with joints and its types
 
Plant Genetic Resources, Germplasm, gene pool - Copy.pptx
Plant Genetic Resources, Germplasm, gene pool - Copy.pptxPlant Genetic Resources, Germplasm, gene pool - Copy.pptx
Plant Genetic Resources, Germplasm, gene pool - Copy.pptx
 

Distributed Systems Hadoop.pptx

  • 1. Distributed Systems (Hadoop) Name: Alamin Stu Id: 23-92971-2
  • 2. Table Of Content  What is distributed system?  What is Hadoop?  How Hadoop works?  Important components of Hadoop  Hadoop Common  Hadoop HDFS  Hadoop YARN  Hadoop MapReduce  Key features of Hadoop
  • 3. What is distributed systems?  The distributed system is a collection of interconnected computers or nodes that work together to achieve a common goal.  In a distributed system, these nodes are physically separated and communicate with each other through a network, such as the internet or a local area network (LAN).  Distributed computing is a way to make computers work together like a team. It's like breaking down a big job into smaller pieces, and then giving each piece to a different computer to work on.  Distributed computing is used in all sorts of applications, from scientific research to business intelligence to video games.  It's a powerful tool that can be used to solve problems that would be too big or too hard for a single computer to handle.
  • 4. Some common types of Distributed systems There are many distributed systems have like:  Client-server system  Peer-to-Peer(P2P) system  Cluster and Grid Computing  Cloud Computing  Distributed Database  Distributed file systems
  • 5. What is Hadoop?  Hadoop follow the distributed architecture or you can say Hadoop also be a distribute systems service.  Hadoop is an open-source framework that allows us to store and process large datasets in a parallel and distributed manner.  This distributed environment is built up of a cluster of machines that work closely together to give an impression of a single working machine.  It is designed to handle massive amounts of data across a distributed cluster of commodity hardware.  Hadoop was originally developed by Doug Cutting and Mike Cafarella in 2005 and is now maintained by the Apache Software Foundation.
  • 6. How Hadoop Works? Hadoop works by distributing and processing large datasets across a cluster of computers, providing a framework for scalable and fault-tolerant data storage and analysis. Here's an overview of how Hadoop works:  Data Storage with HDFS (Hadoop Distributed File System):  Data is stored in Hadoop using HDFS, which divides large files into smaller blocks (typically 128 MB or 256 MB in size).  These blocks are replicated across multiple nodes in the Hadoop cluster for fault tolerance. By default, each block is replicated three times.  Data Ingestion:  Data is ingested into Hadoop by copying it to the HDFS. This can be done using Hadoop commands, APIs, or other tools.  Data Processing with MapReduce:  MapReduce is a programming model for parallel data processing. It consists of two main phases: Map and Reduce.  In the Map phase, data is broken down into key-value pairs, and a set of user-defined Map functions is applied to each pair.
  • 7. How Hadoop Works? (Continued) Hadoop works by distributing and processing large datasets across a cluster of computers, providing a framework for scalable and fault-tolerant data storage and analysis. Here's an overview of how Hadoop works:  Job Scheduling and Execution:  Hadoop's resource manager (usually YARN) manages the allocation of cluster resources and schedules job execution.  The Map and Reduce tasks are distributed across the cluster nodes, where the data is located, to minimize data transfer over the network.  Fault Tolerance:  Hadoop provides fault tolerance through data replication and task recovery.  If a node or task fails, Hadoop automatically reschedules tasks to run on healthy nodes and utilizes the replicated data blocks.  Monitoring and Management:  Hadoop provides tools like the Hadoop Distributed File System (HDFS) web interface and resource manager web UI for monitoring and managing the cluster.
  • 8. Important components of Hadoop Hadoop is an open-source framework used for distributed storage and processing of large datasets. It consists of several key components, including four most important key components in below:  Hadoop Common.  Hadoop HDFS.  Hadoop YARN.  Hadoop MapReduce.
  • 9. Hadoop Common  Hadoop Common refers to the collection of common utilities and libraries that support other Hadoop modules.  It is an essential part or module of the Apache Hadoop Framework, along with the Hadoop Distributed File System (HDFS), Hadoop YARN and Hadoop MapReduce.  Like all other modules, Hadoop Common assumes that hardware failures are common and that these should be automatically handled in software by the Hadoop Framework. Hadoop Common is also known as Hadoop Core.  Here are some key aspects of Hadoop Common:  Core Libraries  HDFS Clients  Configuration Management  Logging and Monitoring  Security  CLI Tools  Error Handling  Utilities
  • 10. Hadoop HDFS Hadoop Distributed File System (HDFS): HDFS is the primary storage system in Hadoop. It divides large files into smaller blocks and distributes them across multiple data nodes in a cluster, providing fault tolerance and high availability.
  • 11. Hadoop HDFS (Continued)  Name Node (Master Node)  Manages all the slave nodes and assign work to them.  It executes filesystem namespace operations like opening, closing, renaming files and directories.  It manages the file system namespace by executing an operation like the opening, renaming and closing the files.  It should be deployed on reliable hardware which has the high config. not on commodity hardware.  Master Node has the record of everything, it knows the location and info of each and every single data node and the blocks they contain, i.e., nothing is done without the permission of master node.
  • 12. Hadoop HDFS (Continued)  Data Node (Slave Node)  Actual worker nodes, who do the actual work like reading, writing, processing etc.  They also perform creation, deletion, and replication upon instruction from the master.  They can be deployed on commodity hardware.  The HDFS cluster contains multiple DataNodes. Each DataNodes contains multiple data blocks.
  • 13. Hadoop YARN YARN (Yet Another Resource Negotiator): Hadoop YARN, or Yet Another Resource Negotiator, is a key component of the Hadoop ecosystem that manages and allocates resources in a Hadoop cluster. YARN is responsible for resource management and job scheduling, making it an integral part of distributed data processing in Hadoop.
  • 14. Hadoop YARN (Continued)  ResourceManager  The ResourceManager is the central component of YARN.  It manages and allocates cluster resources, such as CPU and memory, to different applications.  It tracks available resources and queues, making sure that resources are allocated efficiently.  NodeManager  Each worker node in the cluster runs a NodeManager, which is responsible for monitoring resource usage on that node and reporting it back to the ResourceManager.  NodeManagers manage the execution of application containers.
  • 15. Hadoop MapReduce MapReduce: MapReduce is a processing technique and a program model for distributed computing based on java. The MapReduce algorithm contains two important tasks, namely Map and Reduce. Map takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). Secondly, reduce task, which takes the output from a map as an input and combines those data tuples into a smaller set of tuples. As the sequence of the name MapReduce implies, the reduce task is always performed after the map job. MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage.
  • 16. Hadoop MapReduce (Continued) Map stage  The map or mapper’s job is to process the input data.  Generally, the input data is in the form of file or directory and is stored in the Hadoop file system (HDFS).  The input file is passed to the mapper function line by line. The mapper processes the data and creates several small chunks of data. Reduce stage  This stage is the combination of the Shuffle stage and the Reduce stage.  The Reducer’s job is to process the data that comes from the mapper.  After processing, it produces a new set of output, which will be stored in the HDFS.
  • 17. Hadoop MapReduce (Continued) Two essential daemons of Map Reducer Job tracker, Task tracker: Job Tracker: In Hadoop's classic MapReduce framework, the Job Tracker was a central service responsible for scheduling and managing MapReduce jobs, monitoring task progress, and handling job recovery. Task Tracker: In the same framework, Task Trackers were worker nodes responsible for executing individual map and reduce tasks within a MapReduce job, with a focus on data localization and failure handling.
  • 18. Key features of Hadoop  Distributed Storage: Hadoop stores large data sets across multiple machines, allowing for the storage and processing of extremely large amounts of data.  Scalability: Hadoop can scale from a single server to thousands of machines, making it easy to add more capacity as needed.  Fault-Tolerance: Hadoop is designed to be highly fault-tolerant, meaning it can continue to operate even in the presence of hardware failures.  Data locality: Hadoop provides data locality feature, where the data is stored on the same node where it will be processed, this feature helps to reduce the network traffic and improve the performance.
  • 19. Key features of Hadoop (Continued)  High Availability: Hadoop provides High Availability feature, which helps to make sure that the data is always available and is not lost.  Flexible Data Processing: Hadoop’s MapReduce programming model allows for the processing of data in a distributed fashion, making it easy to implement a wide variety of data processing tasks.  Data Integrity: Hadoop provides built-in checksum feature, which helps to ensure that the data stored is consistent and correct.  Data Replication: Hadoop provides data replication feature, which helps to replicate the data across the cluster for fault tolerance.
  • 20. Key features of Hadoop (Continued)  Data Compression: Hadoop provides built-in data compression feature, which helps to reduce the storage space and improve the performance.  YARN: A resource management platform that allows multiple data processing engines like real-time streaming, batch processing, and interactive SQL, to run and process data stored in HDFS.
  • 21. References: 1. https://www.simplilearn.com/tutorials/hadoop-tutorial/hadoop-ecosystem 2. https://www.analyticsvidhya.com/blog/2020/10/introduction-hadoop-ecosystem/ 3. https://www.geeksforgeeks.org/hadoop-ecosystem/ 4. https://en.wikipedia.org/wiki/Distributed_computing 5. https://aws.amazon.com/emr/details/hadoop/what-is-hadoop/ 6. https://www.javatpoint.com/what-is-hadoop 7. https://www.geeksforgeeks.org/hadoop-an-introduction/ 8. https://www.projectpro.io/hadoop-tutorial/hadoop-mapreduce-tutorial- 9. https://www.geeksforgeeks.org/hadoop-yarn-architecture/ 10. https://www.techopedia.com/definition/30427/hadoop-common 11. https://techvidvan.com/tutorials/how-hadoop-works-internally/