SlideShare a Scribd company logo
1 of 26
Certified Big Data & Hadoop Training – DataFlair
Hadoop Tutorial
Certified Big Data & Hadoop Training – DataFlair
Agenda
 Introduction to Hadoop
 Hadoop nodes & daemons
 Hadoop Architecture
 Characteristics
 Hadoop Features
Certified Big Data & Hadoop Training – DataFlair
What is Hadoop?
The Technology that empowers Yahoo, Facebook, Twitter, Walmart and others
Hadoop
Certified Big Data & Hadoop Training – DataFlair
What is Hadoop?
An Open Source framework that
allows distributed processing of
large data-sets across the cluster
of commodity hardware
Certified Big Data & Hadoop Training – DataFlair
What is Hadoop?
An Open Source framework that
allows distributed processing of
large data-sets across the cluster
of commodity hardware
Open Source
 Source code is freely available
 It may be redistributed and
modified
Certified Big Data & Hadoop Training – DataFlair
What is Hadoop?
An open source framework that
allows Distributed Processing of
large data-sets across the cluster
of commodity hardware
Distributed Processing
 Data is processed distributedly
on multiple nodes / servers
 Multiple machines processes
the data independently
Certified Big Data & Hadoop Training – DataFlair
What is Hadoop?
An open source framework that
allows distributed processing of
large data-sets across the Cluster
of commodity hardware
Cluster
 Multiple machines connected
together
 Nodes are connected via LAN
Certified Big Data & Hadoop Training – DataFlair
What is Hadoop?
An open source framework that
allows distributed processing of
large data-sets across the cluster
of Commodity Hardware
Commodity Hardware
 Economic / affordable
machines
 Typically low performance
hardware
Certified Big Data & Hadoop Training – DataFlair
What is Hadoop?
• Open source framework written in Java
• Inspired by Google's Map-Reduce programming model as well as its file
system (GFS)
Certified Big Data & Hadoop Training – DataFlair
Hadoop defeated
Super computer
Hadoop became
top-level project
launched Hive,
SQL Support for Hadoop
Development of
started as Lucene sub-project
published GFS &
MapReduce papers
2002 2003 2005 2006 2008
Doug Cutting started
working on
Doug Cutting added
DFS & MapReduce
in
converted 4TB of
image archives over
100 EC2 instances
Doug Cutting
joined Cloudera
2009
2004
Hadoop History
2007
Certified Big Data & Hadoop Training – DataFlair
Hadoop Components
Hadoop consists of three key parts
Certified Big Data & Hadoop Training – DataFlair
Master Node Slave Node
Hadoop Nodes
Nodes
Certified Big Data & Hadoop Training – DataFlair
Master Node Slave Node
Hadoop Daemons
Resource
Manager
NameNode
Node
Manager
DataNode
Nodes
Certified Big Data & Hadoop Training – DataFlair
Sub Work Sub Work Sub Work Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Work
Sub Work Sub Work Sub Work Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Sub Work
Basic Hadoop Architecture
Certified Big Data & Hadoop Training – DataFlair
Hadoop Characteristics
Certified Big Data & Hadoop Training – DataFlair
Open Source
• Source code is freely
available
• Can be redistributed
• Can be modified
Free
Affordable
Community
Transparent
Inter-
operable
No vendor
lock
Open
Source
Certified Big Data & Hadoop Training – DataFlair
Distributed Processing
• Data is processed distributedly
on cluster
• Multiple nodes in the cluster
process data independently
Centralized Processing
Distributed Processing
Certified Big Data & Hadoop Training – DataFlair
Fault Tolerance
• Failure of nodes are recovered
automatically
• Framework takes care of failure
of hardware as well tasks
Certified Big Data & Hadoop Training – DataFlair
Reliability
• Data is reliably stored on the
cluster of machines despite
machine failures
• Failure of nodes doesn’t
cause data loss
Certified Big Data & Hadoop Training – DataFlair
High Availability
• Data is highly available and
accessible despite hardware
failure
• There will be no downtime for
end user application due to
data
Certified Big Data & Hadoop Training – DataFlair
Scalability
• Vertical Scalability – New
hardware can be added to the
nodes
• Horizontal Scalability – New
nodes can be added on the fly
Certified Big Data & Hadoop Training – DataFlair
Economic
• No need to purchase costly license
• No need to purchase costly hardware
Economic
Open Source
Commodity
Hardware =
+
Certified Big Data & Hadoop Training – DataFlair
Easy to Use
• Distributed computing challenges
are handled by framework
• Client just need to concentrate on
business logic
Certified Big Data & Hadoop Training – DataFlair
Data Locality
• Move computation to data
instead of data to computation
• Data is processed on the nodes
where it is stored Storage Servers App Servers
Data Data
Data
Data
Servers
Data Data
Data
Data
Algorithm
Algo Algo
Algo
Algo
Certified Big Data & Hadoop Training – DataFlair
Summary
• Everyday we generate 2.3 trillion GBs of data
• Hadoop handles huge volumes of data efficiently
• Hadoop uses the power of distributed computing
• HDFS & Yarn are two main components of Hadoop
• It is highly fault tolerant, reliable & available
Certified Big Data & Hadoop Training – DataFlair
Thank You
DataFlair
/c/DataFlairWS /DataFlairWS

More Related Content

Similar to HadoopIntroduction.pptx

project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
Aswini Ashu
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
aswini pilli
 

Similar to HadoopIntroduction.pptx (20)

Big SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor LandscapeBig SQL Competitive Summary - Vendor Landscape
Big SQL Competitive Summary - Vendor Landscape
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
Optimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for HadoopOptimizing Dell PowerEdge Configurations for Hadoop
Optimizing Dell PowerEdge Configurations for Hadoop
 
Talend for big_data_intorduction
Talend for big_data_intorductionTalend for big_data_intorduction
Talend for big_data_intorduction
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
Exploring sql server 2016 bi
Exploring sql server 2016 biExploring sql server 2016 bi
Exploring sql server 2016 bi
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hot Technologies of 2013: Hadoop 2.0
Hot Technologies of 2013: Hadoop 2.0Hot Technologies of 2013: Hadoop 2.0
Hot Technologies of 2013: Hadoop 2.0
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Hadoop, Infrastructure and Stack
Hadoop, Infrastructure and StackHadoop, Infrastructure and Stack
Hadoop, Infrastructure and Stack
 
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
 

More from BalasundaramSr (20)

WEB 3 IS THE FILE UPLOADED IN THIS APPROACH
WEB 3 IS THE FILE UPLOADED IN THIS APPROACHWEB 3 IS THE FILE UPLOADED IN THIS APPROACH
WEB 3 IS THE FILE UPLOADED IN THIS APPROACH
 
Semantic Search to Web 3.0 Complete Tutorial
Semantic Search to Web 3.0 Complete TutorialSemantic Search to Web 3.0 Complete Tutorial
Semantic Search to Web 3.0 Complete Tutorial
 
Objects and Classes BRIEF.pptx
Objects and Classes BRIEF.pptxObjects and Classes BRIEF.pptx
Objects and Classes BRIEF.pptx
 
SocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfSocialCom09-tutorial.pdf
SocialCom09-tutorial.pdf
 
13047926.ppt
13047926.ppt13047926.ppt
13047926.ppt
 
Xpath.pdf
Xpath.pdfXpath.pdf
Xpath.pdf
 
OSNs.pptx
OSNs.pptxOSNs.pptx
OSNs.pptx
 
HadoopIntroduction.pptx
HadoopIntroduction.pptxHadoopIntroduction.pptx
HadoopIntroduction.pptx
 
Data Mart Lake Ware.pptx
Data Mart Lake Ware.pptxData Mart Lake Ware.pptx
Data Mart Lake Ware.pptx
 
Simple SNA.pdf
Simple SNA.pdfSimple SNA.pdf
Simple SNA.pdf
 
XPATH_XSLT-1.pptx
XPATH_XSLT-1.pptxXPATH_XSLT-1.pptx
XPATH_XSLT-1.pptx
 
Cognitive Science.ppt
Cognitive Science.pptCognitive Science.ppt
Cognitive Science.ppt
 
Web Page Design.ppt
Web Page Design.pptWeb Page Design.ppt
Web Page Design.ppt
 
wipo_res_dev_ge_09_www_130165.ppt
wipo_res_dev_ge_09_www_130165.pptwipo_res_dev_ge_09_www_130165.ppt
wipo_res_dev_ge_09_www_130165.ppt
 
OOA Analysis(1).pdf
OOA Analysis(1).pdfOOA Analysis(1).pdf
OOA Analysis(1).pdf
 
OODIAGRAMS.ppt
OODIAGRAMS.pptOODIAGRAMS.ppt
OODIAGRAMS.ppt
 
Threading.pptx
Threading.pptxThreading.pptx
Threading.pptx
 
OMTanalysis.ppt
OMTanalysis.pptOMTanalysis.ppt
OMTanalysis.ppt
 
network.ppt
network.pptnetwork.ppt
network.ppt
 
css1.ppt
css1.pptcss1.ppt
css1.ppt
 

Recently uploaded

Recently uploaded (20)

Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.ppt
 
Play hard learn harder: The Serious Business of Play
Play hard learn harder:  The Serious Business of PlayPlay hard learn harder:  The Serious Business of Play
Play hard learn harder: The Serious Business of Play
 
Our Environment Class 10 Science Notes pdf
Our Environment Class 10 Science Notes pdfOur Environment Class 10 Science Notes pdf
Our Environment Class 10 Science Notes pdf
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Simple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdfSimple, Complex, and Compound Sentences Exercises.pdf
Simple, Complex, and Compound Sentences Exercises.pdf
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
VAMOS CUIDAR DO NOSSO PLANETA! .
VAMOS CUIDAR DO NOSSO PLANETA!                    .VAMOS CUIDAR DO NOSSO PLANETA!                    .
VAMOS CUIDAR DO NOSSO PLANETA! .
 
What is 3 Way Matching Process in Odoo 17.pptx
What is 3 Way Matching Process in Odoo 17.pptxWhat is 3 Way Matching Process in Odoo 17.pptx
What is 3 Way Matching Process in Odoo 17.pptx
 
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdfUGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 

HadoopIntroduction.pptx

  • 1. Certified Big Data & Hadoop Training – DataFlair Hadoop Tutorial
  • 2. Certified Big Data & Hadoop Training – DataFlair Agenda  Introduction to Hadoop  Hadoop nodes & daemons  Hadoop Architecture  Characteristics  Hadoop Features
  • 3. Certified Big Data & Hadoop Training – DataFlair What is Hadoop? The Technology that empowers Yahoo, Facebook, Twitter, Walmart and others Hadoop
  • 4. Certified Big Data & Hadoop Training – DataFlair What is Hadoop? An Open Source framework that allows distributed processing of large data-sets across the cluster of commodity hardware
  • 5. Certified Big Data & Hadoop Training – DataFlair What is Hadoop? An Open Source framework that allows distributed processing of large data-sets across the cluster of commodity hardware Open Source  Source code is freely available  It may be redistributed and modified
  • 6. Certified Big Data & Hadoop Training – DataFlair What is Hadoop? An open source framework that allows Distributed Processing of large data-sets across the cluster of commodity hardware Distributed Processing  Data is processed distributedly on multiple nodes / servers  Multiple machines processes the data independently
  • 7. Certified Big Data & Hadoop Training – DataFlair What is Hadoop? An open source framework that allows distributed processing of large data-sets across the Cluster of commodity hardware Cluster  Multiple machines connected together  Nodes are connected via LAN
  • 8. Certified Big Data & Hadoop Training – DataFlair What is Hadoop? An open source framework that allows distributed processing of large data-sets across the cluster of Commodity Hardware Commodity Hardware  Economic / affordable machines  Typically low performance hardware
  • 9. Certified Big Data & Hadoop Training – DataFlair What is Hadoop? • Open source framework written in Java • Inspired by Google's Map-Reduce programming model as well as its file system (GFS)
  • 10. Certified Big Data & Hadoop Training – DataFlair Hadoop defeated Super computer Hadoop became top-level project launched Hive, SQL Support for Hadoop Development of started as Lucene sub-project published GFS & MapReduce papers 2002 2003 2005 2006 2008 Doug Cutting started working on Doug Cutting added DFS & MapReduce in converted 4TB of image archives over 100 EC2 instances Doug Cutting joined Cloudera 2009 2004 Hadoop History 2007
  • 11. Certified Big Data & Hadoop Training – DataFlair Hadoop Components Hadoop consists of three key parts
  • 12. Certified Big Data & Hadoop Training – DataFlair Master Node Slave Node Hadoop Nodes Nodes
  • 13. Certified Big Data & Hadoop Training – DataFlair Master Node Slave Node Hadoop Daemons Resource Manager NameNode Node Manager DataNode Nodes
  • 14. Certified Big Data & Hadoop Training – DataFlair Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Sub Work Basic Hadoop Architecture
  • 15. Certified Big Data & Hadoop Training – DataFlair Hadoop Characteristics
  • 16. Certified Big Data & Hadoop Training – DataFlair Open Source • Source code is freely available • Can be redistributed • Can be modified Free Affordable Community Transparent Inter- operable No vendor lock Open Source
  • 17. Certified Big Data & Hadoop Training – DataFlair Distributed Processing • Data is processed distributedly on cluster • Multiple nodes in the cluster process data independently Centralized Processing Distributed Processing
  • 18. Certified Big Data & Hadoop Training – DataFlair Fault Tolerance • Failure of nodes are recovered automatically • Framework takes care of failure of hardware as well tasks
  • 19. Certified Big Data & Hadoop Training – DataFlair Reliability • Data is reliably stored on the cluster of machines despite machine failures • Failure of nodes doesn’t cause data loss
  • 20. Certified Big Data & Hadoop Training – DataFlair High Availability • Data is highly available and accessible despite hardware failure • There will be no downtime for end user application due to data
  • 21. Certified Big Data & Hadoop Training – DataFlair Scalability • Vertical Scalability – New hardware can be added to the nodes • Horizontal Scalability – New nodes can be added on the fly
  • 22. Certified Big Data & Hadoop Training – DataFlair Economic • No need to purchase costly license • No need to purchase costly hardware Economic Open Source Commodity Hardware = +
  • 23. Certified Big Data & Hadoop Training – DataFlair Easy to Use • Distributed computing challenges are handled by framework • Client just need to concentrate on business logic
  • 24. Certified Big Data & Hadoop Training – DataFlair Data Locality • Move computation to data instead of data to computation • Data is processed on the nodes where it is stored Storage Servers App Servers Data Data Data Data Servers Data Data Data Data Algorithm Algo Algo Algo Algo
  • 25. Certified Big Data & Hadoop Training – DataFlair Summary • Everyday we generate 2.3 trillion GBs of data • Hadoop handles huge volumes of data efficiently • Hadoop uses the power of distributed computing • HDFS & Yarn are two main components of Hadoop • It is highly fault tolerant, reliable & available
  • 26. Certified Big Data & Hadoop Training – DataFlair Thank You DataFlair /c/DataFlairWS /DataFlairWS