SlideShare a Scribd company logo
1 of 10
Download to read offline
HADOOP
TECHVIDVAN
Hadoop – An Apache Hadoop
Tutorial for Beginners
The main goal of this Hadoop Tutorial is to describe each and every aspect of
the Apache Hadoop Framework. Basically, this tutorial is designed in a way that
it would be easy to Learn Hadoop from basics.
In this article, we will do our best to answer questions like what is Big data
Hadoop, What is the need for Hadoop, what is the history of Hadoop, and lastly
advantages and disadvantages of the Apache Hadoop framework.
TECHVIDVAN
What is Hadoop?
The Storage layer – HDFS
Batch processing engine – MapReduce
Resource Management Layer – YARN
It is an open-source software framework for distributed storage & processing
of huge amounts of data sets. Open source means it is freely available and even
we can change its source code as per your requirements.
It also makes it possible to run applications on a system with thousands of
nodes. It’s distributed file system has the provision of rapid data transfer rates
among nodes. It also allows the system to continue operating in case of node
failure.
TECHVIDVAN
Hadoop – History
In 2003, Google launches project Nutch to handle billions of searches. Also for
indexing millions of web pages. In October 2003 Google published GFS (Google
File System) paper, from that paper Hadoop was originated.
In 2004, Google releases paper with MapReduce. And in 2005, Nutch used GFS
and MapReduce to perform operations.
In 2006, Computer scientists Doug Cutting and Mike Cafarella created Hadoop.
In February 2006 Doug Cutting joined Yahoo. This provided resources and the
dedicated team to turn Hadoop into a system that ran at a web scale. In 2007,
Yahoo started using Hadoop on a 100-node cluster.
TECHVIDVAN
In January 2008, Hadoop made its own top-level project at Apache, confirming
its success. Many other companies used Hadoop besides Yahoo!, such as the
New York Times and Facebook.
In April 2008, Hadoop broke a world record to become the fastest system to
sort a terabyte of data. Running on a 910-node cluster, In sorted one terabyte in
209 seconds.
In December 2011, Apache Hadoop released version 1.0. In August 2013, version
2.0.6 was available. Later in June 2017, Apache Hadoop 3.0.0-alpha4 is available.
ASF (Apache Software Foundation) manages and maintains Hadoop’s framework
and ecosystem of technologies.
TECHVIDVAN
Why Hadoop?
a. Storage for Big Data – HDFS Solved this problem. It stores Big Data in Distributed Manner.
HDFS also stores each file as blocks. Block is the smallest unit of data in a filesystem.
Suppose you have 512MB of data. And you have configured HDFS such that it will create
128Mb of data blocks. So HDFS divides data into 4 blocks (512/128=4) and stores it across
different DataNodes. It also replicates the data blocks on different data nodes.
b. Scalability – It also solves the Scaling problem. It mainly focuses on horizontal scaling
rather than vertical scaling. You can add extra data nodes to the HDFS cluster as and when
required. Instead of scaling up the resources of your data nodes.
c. Storing the variety of data – HDFS solved this problem. HDFS can store all kinds of data
(structured, semi-structured, or unstructured). It also follows to write once and read many
models. Due to this, you can write any kind of data once and you can read it multiple times
for finding insights.
d. Data Processing Speed – This is the major problem of big data. In order to solve this
problem, move computation to data instead of data to computation. This principle is Data
locality.
Hadoop Core Components
a. HDFS
Hadoop distributed file system (HDFS) is the primary storage system of Hadoop.
HDFS stores very large files running on a cluster of commodity hardware. It follows
the principle of storing less number of large files rather than a huge number of small
files.
b. MapReduce
MapReduce is the data processing layer of Hadoop. It processes large structured and
unstructured data stored in HDFS. MapReduce also processes a huge amount of data
in parallel.
c. YARN
YARN provides resource management. It is the operating system of Hadoop. It is
responsible for managing and monitoring workloads, also implementing security
controls. Apache YARN is also a central platform to deliver data governance tools
across the clusters.
Advantages of Hadoop
Scalability –By adding nodes we can easily grow our system to handle more data.
Flexibility – In this framework, you don’t have to preprocess data before storing
it. You can store as much data as you want and decide how to use later.
Low-cost – Open source framework is free and runs on low-cost commodity
hardware.
Fault tolerance – If nodes go down, then jobs are automatically redirected to
other nodes.
Computing power – It’s distributed computing model processes big data fast. The
more computing nodes you use more processing power you have.
Let’s now discuss various Hadoop advantages to solve the big data problems.
TECHVIDVAN
Disadvantages of Hadoop
Security concerns – It can be challenging in managing the complex application. If
the user doesn’t know how to enable platform who is managing the platform, then
your data could be a huge risk. Since, storage and network levels Hadoop are
missing encryption, which is a major point of concern.
Vulnerable by nature – The framework is written almost in java, most widely used
language. Java is heavily exploited by cybercriminals. As a result, implicated in
numerous security breaches.
Not fit for small data –Since, it is not suited for small data. Hence, it lacks the
ability to efficiently support the random reading of small files.
Potential stability issues – As it is an open source framework. This means that it is
created by many developers who continue to work on the project. While
constantly improvements are made, It has stability issues. To avoid these issues
organizations should run on the latest stable version.
Some Disadvantage of Apache Hadoop Framework is given below-
Conclusion
In conclusion, we can say that it is the most popular and powerful Big data tool.
It stores huge amounts of data in a distributed manner.
And then processes the data in parallel on a cluster of nodes. It also provides the
world’s most reliable storage layer- HDFS. Batch processing engine MapReduce
and Resource management layer- YARN.
Hence, these daemons ensure Hadoop functionality.
TECHVIDVAN

More Related Content

Similar to Hadoop .pdf

Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Ranjith Sekar
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorialvinayiqbusiness
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in AmritsarE2MATRIX
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceNeev Technologies
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in MohaliE2MATRIX
 
Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in LudhianaE2MATRIX
 
BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfDIVYA370851
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersMrigendra Sharma
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 

Similar to Hadoop .pdf (20)

Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Hadoop
HadoopHadoop
Hadoop
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in Amritsar
 
Bigdata ppt
Bigdata pptBigdata ppt
Bigdata ppt
 
Bigdata
BigdataBigdata
Bigdata
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in Mohali
 
Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in Ludhiana
 
BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdf
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, Providers
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Hadoopppt.pptx
Hadoopppt.pptxHadoopppt.pptx
Hadoopppt.pptx
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 
Hadoop
HadoopHadoop
Hadoop
 

More from SudhanshiBakre1

Float Data Type in C.pdf
Float Data Type in C.pdfFloat Data Type in C.pdf
Float Data Type in C.pdfSudhanshiBakre1
 
IoT Hardware – The Backbone of Smart Devices.pdf
IoT Hardware – The Backbone of Smart Devices.pdfIoT Hardware – The Backbone of Smart Devices.pdf
IoT Hardware – The Backbone of Smart Devices.pdfSudhanshiBakre1
 
Internet of Things – Contiki.pdf
Internet of Things – Contiki.pdfInternet of Things – Contiki.pdf
Internet of Things – Contiki.pdfSudhanshiBakre1
 
Java abstract Keyword.pdf
Java abstract Keyword.pdfJava abstract Keyword.pdf
Java abstract Keyword.pdfSudhanshiBakre1
 
Collections in Python - Where Data Finds Its Perfect Home.pdf
Collections in Python - Where Data Finds Its Perfect Home.pdfCollections in Python - Where Data Finds Its Perfect Home.pdf
Collections in Python - Where Data Finds Its Perfect Home.pdfSudhanshiBakre1
 
File Handling in Java.pdf
File Handling in Java.pdfFile Handling in Java.pdf
File Handling in Java.pdfSudhanshiBakre1
 
Types of AI you should know.pdf
Types of AI you should know.pdfTypes of AI you should know.pdf
Types of AI you should know.pdfSudhanshiBakre1
 
Annotations in Java with Example.pdf
Annotations in Java with Example.pdfAnnotations in Java with Example.pdf
Annotations in Java with Example.pdfSudhanshiBakre1
 
Top Cryptocurrency Exchanges of 2023.pdf
Top Cryptocurrency Exchanges of 2023.pdfTop Cryptocurrency Exchanges of 2023.pdf
Top Cryptocurrency Exchanges of 2023.pdfSudhanshiBakre1
 
Epic Python Face-Off -Methods vs.pdf
Epic Python Face-Off -Methods vs.pdfEpic Python Face-Off -Methods vs.pdf
Epic Python Face-Off -Methods vs.pdfSudhanshiBakre1
 
Django Tutorial_ Let’s take a deep dive into Django’s web framework.pdf
Django Tutorial_ Let’s take a deep dive into Django’s web framework.pdfDjango Tutorial_ Let’s take a deep dive into Django’s web framework.pdf
Django Tutorial_ Let’s take a deep dive into Django’s web framework.pdfSudhanshiBakre1
 
Benefits Of IoT Salesforce.pdf
Benefits Of IoT Salesforce.pdfBenefits Of IoT Salesforce.pdf
Benefits Of IoT Salesforce.pdfSudhanshiBakre1
 
Epic Python Face-Off -Methods vs. Functions.pdf
Epic Python Face-Off -Methods vs. Functions.pdfEpic Python Face-Off -Methods vs. Functions.pdf
Epic Python Face-Off -Methods vs. Functions.pdfSudhanshiBakre1
 
Python Classes_ Empowering Developers, Enabling Breakthroughs.pdf
Python Classes_ Empowering Developers, Enabling Breakthroughs.pdfPython Classes_ Empowering Developers, Enabling Breakthroughs.pdf
Python Classes_ Empowering Developers, Enabling Breakthroughs.pdfSudhanshiBakre1
 

More from SudhanshiBakre1 (20)

IoT Security.pdf
IoT Security.pdfIoT Security.pdf
IoT Security.pdf
 
Top Java Frameworks.pdf
Top Java Frameworks.pdfTop Java Frameworks.pdf
Top Java Frameworks.pdf
 
Numpy ndarrays.pdf
Numpy ndarrays.pdfNumpy ndarrays.pdf
Numpy ndarrays.pdf
 
Float Data Type in C.pdf
Float Data Type in C.pdfFloat Data Type in C.pdf
Float Data Type in C.pdf
 
IoT Hardware – The Backbone of Smart Devices.pdf
IoT Hardware – The Backbone of Smart Devices.pdfIoT Hardware – The Backbone of Smart Devices.pdf
IoT Hardware – The Backbone of Smart Devices.pdf
 
Internet of Things – Contiki.pdf
Internet of Things – Contiki.pdfInternet of Things – Contiki.pdf
Internet of Things – Contiki.pdf
 
Java abstract Keyword.pdf
Java abstract Keyword.pdfJava abstract Keyword.pdf
Java abstract Keyword.pdf
 
Node.js with MySQL.pdf
Node.js with MySQL.pdfNode.js with MySQL.pdf
Node.js with MySQL.pdf
 
Collections in Python - Where Data Finds Its Perfect Home.pdf
Collections in Python - Where Data Finds Its Perfect Home.pdfCollections in Python - Where Data Finds Its Perfect Home.pdf
Collections in Python - Where Data Finds Its Perfect Home.pdf
 
File Handling in Java.pdf
File Handling in Java.pdfFile Handling in Java.pdf
File Handling in Java.pdf
 
Types of AI you should know.pdf
Types of AI you should know.pdfTypes of AI you should know.pdf
Types of AI you should know.pdf
 
Streams in Node .pdf
Streams in Node .pdfStreams in Node .pdf
Streams in Node .pdf
 
Annotations in Java with Example.pdf
Annotations in Java with Example.pdfAnnotations in Java with Example.pdf
Annotations in Java with Example.pdf
 
RESTful API in Node.pdf
RESTful API in Node.pdfRESTful API in Node.pdf
RESTful API in Node.pdf
 
Top Cryptocurrency Exchanges of 2023.pdf
Top Cryptocurrency Exchanges of 2023.pdfTop Cryptocurrency Exchanges of 2023.pdf
Top Cryptocurrency Exchanges of 2023.pdf
 
Epic Python Face-Off -Methods vs.pdf
Epic Python Face-Off -Methods vs.pdfEpic Python Face-Off -Methods vs.pdf
Epic Python Face-Off -Methods vs.pdf
 
Django Tutorial_ Let’s take a deep dive into Django’s web framework.pdf
Django Tutorial_ Let’s take a deep dive into Django’s web framework.pdfDjango Tutorial_ Let’s take a deep dive into Django’s web framework.pdf
Django Tutorial_ Let’s take a deep dive into Django’s web framework.pdf
 
Benefits Of IoT Salesforce.pdf
Benefits Of IoT Salesforce.pdfBenefits Of IoT Salesforce.pdf
Benefits Of IoT Salesforce.pdf
 
Epic Python Face-Off -Methods vs. Functions.pdf
Epic Python Face-Off -Methods vs. Functions.pdfEpic Python Face-Off -Methods vs. Functions.pdf
Epic Python Face-Off -Methods vs. Functions.pdf
 
Python Classes_ Empowering Developers, Enabling Breakthroughs.pdf
Python Classes_ Empowering Developers, Enabling Breakthroughs.pdfPython Classes_ Empowering Developers, Enabling Breakthroughs.pdf
Python Classes_ Empowering Developers, Enabling Breakthroughs.pdf
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 

Hadoop .pdf

  • 2. Hadoop – An Apache Hadoop Tutorial for Beginners The main goal of this Hadoop Tutorial is to describe each and every aspect of the Apache Hadoop Framework. Basically, this tutorial is designed in a way that it would be easy to Learn Hadoop from basics. In this article, we will do our best to answer questions like what is Big data Hadoop, What is the need for Hadoop, what is the history of Hadoop, and lastly advantages and disadvantages of the Apache Hadoop framework. TECHVIDVAN
  • 3. What is Hadoop? The Storage layer – HDFS Batch processing engine – MapReduce Resource Management Layer – YARN It is an open-source software framework for distributed storage & processing of huge amounts of data sets. Open source means it is freely available and even we can change its source code as per your requirements. It also makes it possible to run applications on a system with thousands of nodes. It’s distributed file system has the provision of rapid data transfer rates among nodes. It also allows the system to continue operating in case of node failure. TECHVIDVAN
  • 4. Hadoop – History In 2003, Google launches project Nutch to handle billions of searches. Also for indexing millions of web pages. In October 2003 Google published GFS (Google File System) paper, from that paper Hadoop was originated. In 2004, Google releases paper with MapReduce. And in 2005, Nutch used GFS and MapReduce to perform operations. In 2006, Computer scientists Doug Cutting and Mike Cafarella created Hadoop. In February 2006 Doug Cutting joined Yahoo. This provided resources and the dedicated team to turn Hadoop into a system that ran at a web scale. In 2007, Yahoo started using Hadoop on a 100-node cluster. TECHVIDVAN
  • 5. In January 2008, Hadoop made its own top-level project at Apache, confirming its success. Many other companies used Hadoop besides Yahoo!, such as the New York Times and Facebook. In April 2008, Hadoop broke a world record to become the fastest system to sort a terabyte of data. Running on a 910-node cluster, In sorted one terabyte in 209 seconds. In December 2011, Apache Hadoop released version 1.0. In August 2013, version 2.0.6 was available. Later in June 2017, Apache Hadoop 3.0.0-alpha4 is available. ASF (Apache Software Foundation) manages and maintains Hadoop’s framework and ecosystem of technologies. TECHVIDVAN
  • 6. Why Hadoop? a. Storage for Big Data – HDFS Solved this problem. It stores Big Data in Distributed Manner. HDFS also stores each file as blocks. Block is the smallest unit of data in a filesystem. Suppose you have 512MB of data. And you have configured HDFS such that it will create 128Mb of data blocks. So HDFS divides data into 4 blocks (512/128=4) and stores it across different DataNodes. It also replicates the data blocks on different data nodes. b. Scalability – It also solves the Scaling problem. It mainly focuses on horizontal scaling rather than vertical scaling. You can add extra data nodes to the HDFS cluster as and when required. Instead of scaling up the resources of your data nodes. c. Storing the variety of data – HDFS solved this problem. HDFS can store all kinds of data (structured, semi-structured, or unstructured). It also follows to write once and read many models. Due to this, you can write any kind of data once and you can read it multiple times for finding insights. d. Data Processing Speed – This is the major problem of big data. In order to solve this problem, move computation to data instead of data to computation. This principle is Data locality.
  • 7. Hadoop Core Components a. HDFS Hadoop distributed file system (HDFS) is the primary storage system of Hadoop. HDFS stores very large files running on a cluster of commodity hardware. It follows the principle of storing less number of large files rather than a huge number of small files. b. MapReduce MapReduce is the data processing layer of Hadoop. It processes large structured and unstructured data stored in HDFS. MapReduce also processes a huge amount of data in parallel. c. YARN YARN provides resource management. It is the operating system of Hadoop. It is responsible for managing and monitoring workloads, also implementing security controls. Apache YARN is also a central platform to deliver data governance tools across the clusters.
  • 8. Advantages of Hadoop Scalability –By adding nodes we can easily grow our system to handle more data. Flexibility – In this framework, you don’t have to preprocess data before storing it. You can store as much data as you want and decide how to use later. Low-cost – Open source framework is free and runs on low-cost commodity hardware. Fault tolerance – If nodes go down, then jobs are automatically redirected to other nodes. Computing power – It’s distributed computing model processes big data fast. The more computing nodes you use more processing power you have. Let’s now discuss various Hadoop advantages to solve the big data problems. TECHVIDVAN
  • 9. Disadvantages of Hadoop Security concerns – It can be challenging in managing the complex application. If the user doesn’t know how to enable platform who is managing the platform, then your data could be a huge risk. Since, storage and network levels Hadoop are missing encryption, which is a major point of concern. Vulnerable by nature – The framework is written almost in java, most widely used language. Java is heavily exploited by cybercriminals. As a result, implicated in numerous security breaches. Not fit for small data –Since, it is not suited for small data. Hence, it lacks the ability to efficiently support the random reading of small files. Potential stability issues – As it is an open source framework. This means that it is created by many developers who continue to work on the project. While constantly improvements are made, It has stability issues. To avoid these issues organizations should run on the latest stable version. Some Disadvantage of Apache Hadoop Framework is given below-
  • 10. Conclusion In conclusion, we can say that it is the most popular and powerful Big data tool. It stores huge amounts of data in a distributed manner. And then processes the data in parallel on a cluster of nodes. It also provides the world’s most reliable storage layer- HDFS. Batch processing engine MapReduce and Resource management layer- YARN. Hence, these daemons ensure Hadoop functionality. TECHVIDVAN