SlideShare a Scribd company logo
HADOOP
TECHVIDVAN
Hadoop – An Apache Hadoop
Tutorial for Beginners
The main goal of this Hadoop Tutorial is to describe each and every aspect of
the Apache Hadoop Framework. Basically, this tutorial is designed in a way that
it would be easy to Learn Hadoop from basics.
In this article, we will do our best to answer questions like what is Big data
Hadoop, What is the need for Hadoop, what is the history of Hadoop, and lastly
advantages and disadvantages of the Apache Hadoop framework.
TECHVIDVAN
What is Hadoop?
The Storage layer – HDFS
Batch processing engine – MapReduce
Resource Management Layer – YARN
It is an open-source software framework for distributed storage & processing
of huge amounts of data sets. Open source means it is freely available and even
we can change its source code as per your requirements.
It also makes it possible to run applications on a system with thousands of
nodes. It’s distributed file system has the provision of rapid data transfer rates
among nodes. It also allows the system to continue operating in case of node
failure.
TECHVIDVAN
Hadoop – History
In 2003, Google launches project Nutch to handle billions of searches. Also for
indexing millions of web pages. In October 2003 Google published GFS (Google
File System) paper, from that paper Hadoop was originated.
In 2004, Google releases paper with MapReduce. And in 2005, Nutch used GFS
and MapReduce to perform operations.
In 2006, Computer scientists Doug Cutting and Mike Cafarella created Hadoop.
In February 2006 Doug Cutting joined Yahoo. This provided resources and the
dedicated team to turn Hadoop into a system that ran at a web scale. In 2007,
Yahoo started using Hadoop on a 100-node cluster.
TECHVIDVAN
In January 2008, Hadoop made its own top-level project at Apache, confirming
its success. Many other companies used Hadoop besides Yahoo!, such as the
New York Times and Facebook.
In April 2008, Hadoop broke a world record to become the fastest system to
sort a terabyte of data. Running on a 910-node cluster, In sorted one terabyte in
209 seconds.
In December 2011, Apache Hadoop released version 1.0. In August 2013, version
2.0.6 was available. Later in June 2017, Apache Hadoop 3.0.0-alpha4 is available.
ASF (Apache Software Foundation) manages and maintains Hadoop’s framework
and ecosystem of technologies.
TECHVIDVAN
Why Hadoop?
a. Storage for Big Data – HDFS Solved this problem. It stores Big Data in Distributed Manner.
HDFS also stores each file as blocks. Block is the smallest unit of data in a filesystem.
Suppose you have 512MB of data. And you have configured HDFS such that it will create
128Mb of data blocks. So HDFS divides data into 4 blocks (512/128=4) and stores it across
different DataNodes. It also replicates the data blocks on different data nodes.
b. Scalability – It also solves the Scaling problem. It mainly focuses on horizontal scaling
rather than vertical scaling. You can add extra data nodes to the HDFS cluster as and when
required. Instead of scaling up the resources of your data nodes.
c. Storing the variety of data – HDFS solved this problem. HDFS can store all kinds of data
(structured, semi-structured, or unstructured). It also follows to write once and read many
models. Due to this, you can write any kind of data once and you can read it multiple times
for finding insights.
d. Data Processing Speed – This is the major problem of big data. In order to solve this
problem, move computation to data instead of data to computation. This principle is Data
locality.
Hadoop Core Components
a. HDFS
Hadoop distributed file system (HDFS) is the primary storage system of Hadoop.
HDFS stores very large files running on a cluster of commodity hardware. It follows
the principle of storing less number of large files rather than a huge number of small
files.
b. MapReduce
MapReduce is the data processing layer of Hadoop. It processes large structured and
unstructured data stored in HDFS. MapReduce also processes a huge amount of data
in parallel.
c. YARN
YARN provides resource management. It is the operating system of Hadoop. It is
responsible for managing and monitoring workloads, also implementing security
controls. Apache YARN is also a central platform to deliver data governance tools
across the clusters.
Advantages of Hadoop
Scalability –By adding nodes we can easily grow our system to handle more data.
Flexibility – In this framework, you don’t have to preprocess data before storing
it. You can store as much data as you want and decide how to use later.
Low-cost – Open source framework is free and runs on low-cost commodity
hardware.
Fault tolerance – If nodes go down, then jobs are automatically redirected to
other nodes.
Computing power – It’s distributed computing model processes big data fast. The
more computing nodes you use more processing power you have.
Let’s now discuss various Hadoop advantages to solve the big data problems.
TECHVIDVAN
Disadvantages of Hadoop
Security concerns – It can be challenging in managing the complex application. If
the user doesn’t know how to enable platform who is managing the platform, then
your data could be a huge risk. Since, storage and network levels Hadoop are
missing encryption, which is a major point of concern.
Vulnerable by nature – The framework is written almost in java, most widely used
language. Java is heavily exploited by cybercriminals. As a result, implicated in
numerous security breaches.
Not fit for small data –Since, it is not suited for small data. Hence, it lacks the
ability to efficiently support the random reading of small files.
Potential stability issues – As it is an open source framework. This means that it is
created by many developers who continue to work on the project. While
constantly improvements are made, It has stability issues. To avoid these issues
organizations should run on the latest stable version.
Some Disadvantage of Apache Hadoop Framework is given below-
Conclusion
In conclusion, we can say that it is the most popular and powerful Big data tool.
It stores huge amounts of data in a distributed manner.
And then processes the data in parallel on a cluster of nodes. It also provides the
world’s most reliable storage layer- HDFS. Batch processing engine MapReduce
and Resource management layer- YARN.
Hence, these daemons ensure Hadoop functionality.
TECHVIDVAN

More Related Content

Similar to Hadoop .pdf

Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
Laxmi Rauth
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
Ranjith Sekar
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
vinayiqbusiness
 
Hadoop
HadoopHadoop
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
Mohanasundaram Ponnusamy
 
What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?
tommychauhan
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in Amritsar
E2MATRIX
 
Bigdata
BigdataBigdata
Bigdata
renukarenuka9
 
Bigdata ppt
Bigdata pptBigdata ppt
Bigdata ppt
renukarenuka9
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
Neev Technologies
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
KrishnenduKrishh
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in Mohali
E2MATRIX
 
Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in Ludhiana
E2MATRIX
 
BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdf
DIVYA370851
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, Providers
Mrigendra Sharma
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
Thanh Nguyen
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
Thanh Nguyen
 
Hadoopppt.pptx
Hadoopppt.pptxHadoopppt.pptx
Hadoopppt.pptx
ssuser552a8f
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
umapavankumar kethavarapu
 

Similar to Hadoop .pdf (20)

Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Hadoop architecture-tutorial
Hadoop  architecture-tutorialHadoop  architecture-tutorial
Hadoop architecture-tutorial
 
Hadoop
HadoopHadoop
Hadoop
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
 
What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?What is Apache Hadoop and its ecosystem?
What is Apache Hadoop and its ecosystem?
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in Amritsar
 
Bigdata
BigdataBigdata
Bigdata
 
Bigdata ppt
Bigdata pptBigdata ppt
Bigdata ppt
 
Hadoop Ecosystem at a Glance
Hadoop Ecosystem at a GlanceHadoop Ecosystem at a Glance
Hadoop Ecosystem at a Glance
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in Mohali
 
Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in Ludhiana
 
BIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdfBIGDATA MODULE 3.pdf
BIGDATA MODULE 3.pdf
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, Providers
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Hadoopppt.pptx
Hadoopppt.pptxHadoopppt.pptx
Hadoopppt.pptx
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 

More from SudhanshiBakre1

IoT Security.pdf
IoT Security.pdfIoT Security.pdf
IoT Security.pdf
SudhanshiBakre1
 
Top Java Frameworks.pdf
Top Java Frameworks.pdfTop Java Frameworks.pdf
Top Java Frameworks.pdf
SudhanshiBakre1
 
Numpy ndarrays.pdf
Numpy ndarrays.pdfNumpy ndarrays.pdf
Numpy ndarrays.pdf
SudhanshiBakre1
 
Float Data Type in C.pdf
Float Data Type in C.pdfFloat Data Type in C.pdf
Float Data Type in C.pdf
SudhanshiBakre1
 
IoT Hardware – The Backbone of Smart Devices.pdf
IoT Hardware – The Backbone of Smart Devices.pdfIoT Hardware – The Backbone of Smart Devices.pdf
IoT Hardware – The Backbone of Smart Devices.pdf
SudhanshiBakre1
 
Internet of Things – Contiki.pdf
Internet of Things – Contiki.pdfInternet of Things – Contiki.pdf
Internet of Things – Contiki.pdf
SudhanshiBakre1
 
Java abstract Keyword.pdf
Java abstract Keyword.pdfJava abstract Keyword.pdf
Java abstract Keyword.pdf
SudhanshiBakre1
 
Node.js with MySQL.pdf
Node.js with MySQL.pdfNode.js with MySQL.pdf
Node.js with MySQL.pdf
SudhanshiBakre1
 
Collections in Python - Where Data Finds Its Perfect Home.pdf
Collections in Python - Where Data Finds Its Perfect Home.pdfCollections in Python - Where Data Finds Its Perfect Home.pdf
Collections in Python - Where Data Finds Its Perfect Home.pdf
SudhanshiBakre1
 
File Handling in Java.pdf
File Handling in Java.pdfFile Handling in Java.pdf
File Handling in Java.pdf
SudhanshiBakre1
 
Types of AI you should know.pdf
Types of AI you should know.pdfTypes of AI you should know.pdf
Types of AI you should know.pdf
SudhanshiBakre1
 
Streams in Node .pdf
Streams in Node .pdfStreams in Node .pdf
Streams in Node .pdf
SudhanshiBakre1
 
Annotations in Java with Example.pdf
Annotations in Java with Example.pdfAnnotations in Java with Example.pdf
Annotations in Java with Example.pdf
SudhanshiBakre1
 
RESTful API in Node.pdf
RESTful API in Node.pdfRESTful API in Node.pdf
RESTful API in Node.pdf
SudhanshiBakre1
 
Top Cryptocurrency Exchanges of 2023.pdf
Top Cryptocurrency Exchanges of 2023.pdfTop Cryptocurrency Exchanges of 2023.pdf
Top Cryptocurrency Exchanges of 2023.pdf
SudhanshiBakre1
 
Epic Python Face-Off -Methods vs.pdf
Epic Python Face-Off -Methods vs.pdfEpic Python Face-Off -Methods vs.pdf
Epic Python Face-Off -Methods vs.pdf
SudhanshiBakre1
 
Django Tutorial_ Let’s take a deep dive into Django’s web framework.pdf
Django Tutorial_ Let’s take a deep dive into Django’s web framework.pdfDjango Tutorial_ Let’s take a deep dive into Django’s web framework.pdf
Django Tutorial_ Let’s take a deep dive into Django’s web framework.pdf
SudhanshiBakre1
 
Benefits Of IoT Salesforce.pdf
Benefits Of IoT Salesforce.pdfBenefits Of IoT Salesforce.pdf
Benefits Of IoT Salesforce.pdf
SudhanshiBakre1
 
Epic Python Face-Off -Methods vs. Functions.pdf
Epic Python Face-Off -Methods vs. Functions.pdfEpic Python Face-Off -Methods vs. Functions.pdf
Epic Python Face-Off -Methods vs. Functions.pdf
SudhanshiBakre1
 
Python Classes_ Empowering Developers, Enabling Breakthroughs.pdf
Python Classes_ Empowering Developers, Enabling Breakthroughs.pdfPython Classes_ Empowering Developers, Enabling Breakthroughs.pdf
Python Classes_ Empowering Developers, Enabling Breakthroughs.pdf
SudhanshiBakre1
 

More from SudhanshiBakre1 (20)

IoT Security.pdf
IoT Security.pdfIoT Security.pdf
IoT Security.pdf
 
Top Java Frameworks.pdf
Top Java Frameworks.pdfTop Java Frameworks.pdf
Top Java Frameworks.pdf
 
Numpy ndarrays.pdf
Numpy ndarrays.pdfNumpy ndarrays.pdf
Numpy ndarrays.pdf
 
Float Data Type in C.pdf
Float Data Type in C.pdfFloat Data Type in C.pdf
Float Data Type in C.pdf
 
IoT Hardware – The Backbone of Smart Devices.pdf
IoT Hardware – The Backbone of Smart Devices.pdfIoT Hardware – The Backbone of Smart Devices.pdf
IoT Hardware – The Backbone of Smart Devices.pdf
 
Internet of Things – Contiki.pdf
Internet of Things – Contiki.pdfInternet of Things – Contiki.pdf
Internet of Things – Contiki.pdf
 
Java abstract Keyword.pdf
Java abstract Keyword.pdfJava abstract Keyword.pdf
Java abstract Keyword.pdf
 
Node.js with MySQL.pdf
Node.js with MySQL.pdfNode.js with MySQL.pdf
Node.js with MySQL.pdf
 
Collections in Python - Where Data Finds Its Perfect Home.pdf
Collections in Python - Where Data Finds Its Perfect Home.pdfCollections in Python - Where Data Finds Its Perfect Home.pdf
Collections in Python - Where Data Finds Its Perfect Home.pdf
 
File Handling in Java.pdf
File Handling in Java.pdfFile Handling in Java.pdf
File Handling in Java.pdf
 
Types of AI you should know.pdf
Types of AI you should know.pdfTypes of AI you should know.pdf
Types of AI you should know.pdf
 
Streams in Node .pdf
Streams in Node .pdfStreams in Node .pdf
Streams in Node .pdf
 
Annotations in Java with Example.pdf
Annotations in Java with Example.pdfAnnotations in Java with Example.pdf
Annotations in Java with Example.pdf
 
RESTful API in Node.pdf
RESTful API in Node.pdfRESTful API in Node.pdf
RESTful API in Node.pdf
 
Top Cryptocurrency Exchanges of 2023.pdf
Top Cryptocurrency Exchanges of 2023.pdfTop Cryptocurrency Exchanges of 2023.pdf
Top Cryptocurrency Exchanges of 2023.pdf
 
Epic Python Face-Off -Methods vs.pdf
Epic Python Face-Off -Methods vs.pdfEpic Python Face-Off -Methods vs.pdf
Epic Python Face-Off -Methods vs.pdf
 
Django Tutorial_ Let’s take a deep dive into Django’s web framework.pdf
Django Tutorial_ Let’s take a deep dive into Django’s web framework.pdfDjango Tutorial_ Let’s take a deep dive into Django’s web framework.pdf
Django Tutorial_ Let’s take a deep dive into Django’s web framework.pdf
 
Benefits Of IoT Salesforce.pdf
Benefits Of IoT Salesforce.pdfBenefits Of IoT Salesforce.pdf
Benefits Of IoT Salesforce.pdf
 
Epic Python Face-Off -Methods vs. Functions.pdf
Epic Python Face-Off -Methods vs. Functions.pdfEpic Python Face-Off -Methods vs. Functions.pdf
Epic Python Face-Off -Methods vs. Functions.pdf
 
Python Classes_ Empowering Developers, Enabling Breakthroughs.pdf
Python Classes_ Empowering Developers, Enabling Breakthroughs.pdfPython Classes_ Empowering Developers, Enabling Breakthroughs.pdf
Python Classes_ Empowering Developers, Enabling Breakthroughs.pdf
 

Recently uploaded

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 

Recently uploaded (20)

PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 

Hadoop .pdf

  • 2. Hadoop – An Apache Hadoop Tutorial for Beginners The main goal of this Hadoop Tutorial is to describe each and every aspect of the Apache Hadoop Framework. Basically, this tutorial is designed in a way that it would be easy to Learn Hadoop from basics. In this article, we will do our best to answer questions like what is Big data Hadoop, What is the need for Hadoop, what is the history of Hadoop, and lastly advantages and disadvantages of the Apache Hadoop framework. TECHVIDVAN
  • 3. What is Hadoop? The Storage layer – HDFS Batch processing engine – MapReduce Resource Management Layer – YARN It is an open-source software framework for distributed storage & processing of huge amounts of data sets. Open source means it is freely available and even we can change its source code as per your requirements. It also makes it possible to run applications on a system with thousands of nodes. It’s distributed file system has the provision of rapid data transfer rates among nodes. It also allows the system to continue operating in case of node failure. TECHVIDVAN
  • 4. Hadoop – History In 2003, Google launches project Nutch to handle billions of searches. Also for indexing millions of web pages. In October 2003 Google published GFS (Google File System) paper, from that paper Hadoop was originated. In 2004, Google releases paper with MapReduce. And in 2005, Nutch used GFS and MapReduce to perform operations. In 2006, Computer scientists Doug Cutting and Mike Cafarella created Hadoop. In February 2006 Doug Cutting joined Yahoo. This provided resources and the dedicated team to turn Hadoop into a system that ran at a web scale. In 2007, Yahoo started using Hadoop on a 100-node cluster. TECHVIDVAN
  • 5. In January 2008, Hadoop made its own top-level project at Apache, confirming its success. Many other companies used Hadoop besides Yahoo!, such as the New York Times and Facebook. In April 2008, Hadoop broke a world record to become the fastest system to sort a terabyte of data. Running on a 910-node cluster, In sorted one terabyte in 209 seconds. In December 2011, Apache Hadoop released version 1.0. In August 2013, version 2.0.6 was available. Later in June 2017, Apache Hadoop 3.0.0-alpha4 is available. ASF (Apache Software Foundation) manages and maintains Hadoop’s framework and ecosystem of technologies. TECHVIDVAN
  • 6. Why Hadoop? a. Storage for Big Data – HDFS Solved this problem. It stores Big Data in Distributed Manner. HDFS also stores each file as blocks. Block is the smallest unit of data in a filesystem. Suppose you have 512MB of data. And you have configured HDFS such that it will create 128Mb of data blocks. So HDFS divides data into 4 blocks (512/128=4) and stores it across different DataNodes. It also replicates the data blocks on different data nodes. b. Scalability – It also solves the Scaling problem. It mainly focuses on horizontal scaling rather than vertical scaling. You can add extra data nodes to the HDFS cluster as and when required. Instead of scaling up the resources of your data nodes. c. Storing the variety of data – HDFS solved this problem. HDFS can store all kinds of data (structured, semi-structured, or unstructured). It also follows to write once and read many models. Due to this, you can write any kind of data once and you can read it multiple times for finding insights. d. Data Processing Speed – This is the major problem of big data. In order to solve this problem, move computation to data instead of data to computation. This principle is Data locality.
  • 7. Hadoop Core Components a. HDFS Hadoop distributed file system (HDFS) is the primary storage system of Hadoop. HDFS stores very large files running on a cluster of commodity hardware. It follows the principle of storing less number of large files rather than a huge number of small files. b. MapReduce MapReduce is the data processing layer of Hadoop. It processes large structured and unstructured data stored in HDFS. MapReduce also processes a huge amount of data in parallel. c. YARN YARN provides resource management. It is the operating system of Hadoop. It is responsible for managing and monitoring workloads, also implementing security controls. Apache YARN is also a central platform to deliver data governance tools across the clusters.
  • 8. Advantages of Hadoop Scalability –By adding nodes we can easily grow our system to handle more data. Flexibility – In this framework, you don’t have to preprocess data before storing it. You can store as much data as you want and decide how to use later. Low-cost – Open source framework is free and runs on low-cost commodity hardware. Fault tolerance – If nodes go down, then jobs are automatically redirected to other nodes. Computing power – It’s distributed computing model processes big data fast. The more computing nodes you use more processing power you have. Let’s now discuss various Hadoop advantages to solve the big data problems. TECHVIDVAN
  • 9. Disadvantages of Hadoop Security concerns – It can be challenging in managing the complex application. If the user doesn’t know how to enable platform who is managing the platform, then your data could be a huge risk. Since, storage and network levels Hadoop are missing encryption, which is a major point of concern. Vulnerable by nature – The framework is written almost in java, most widely used language. Java is heavily exploited by cybercriminals. As a result, implicated in numerous security breaches. Not fit for small data –Since, it is not suited for small data. Hence, it lacks the ability to efficiently support the random reading of small files. Potential stability issues – As it is an open source framework. This means that it is created by many developers who continue to work on the project. While constantly improvements are made, It has stability issues. To avoid these issues organizations should run on the latest stable version. Some Disadvantage of Apache Hadoop Framework is given below-
  • 10. Conclusion In conclusion, we can say that it is the most popular and powerful Big data tool. It stores huge amounts of data in a distributed manner. And then processes the data in parallel on a cluster of nodes. It also provides the world’s most reliable storage layer- HDFS. Batch processing engine MapReduce and Resource management layer- YARN. Hence, these daemons ensure Hadoop functionality. TECHVIDVAN