SlideShare a Scribd company logo
1 of 14
.
Contents
 What is Hadoop?
 How big it is??
 Why need for Large Computing?
 How it Works?
 Advantages
 Disadvantages
 Who uses hadoop?
 Conclusion
What is Hadoop?
 It is a open source software written in java
 Hadoop software library is a framework that
allows for the distributed processing of large
data sets across clusters of computers using
simple programming models.
 HDFC: Self healing high bandwidth clustered
storage.
 Map reduce : fault-tolerant distributed
processing.
 Operates on unstructured and structured
data.
How BIG it is????
• We have ~20,000
machines running Hadoop.
• Our largest clusters
are currently 2000 nodes
Several petabytes of user
data (compressed,
unreplicated).
• We run hundreds of
thousands of jobs every
month.
WHY NEED FOR LARGE
COMPUTING????
 The New York Stock Exchange generates
about one terabyte of new trade data per
day.
 Facebook hosts approximately 10 billion
photos, taking up one petabyte of storage.
 The Internet Archive stores around 2
petabytes of data, and is growing at a rate
of 20 terabytes per month.
How it Works?
Map-Reduce=Computation
HDFS=Storage
How Does HDFS Works?
Hadoop Distributed File System
Every Chunk Will Store 64mb in single Chunk
 ;
MAPREDUCE
EX : WORD COUNT OVER A GIVEN SET OF
STRINGS
We love India
We 1
love
1
India 1
We 1
Play
1
tennis 1
Love 1
India 1
We 2
tennis 1
play
1
We play tennis
Map Reduce
 The Hadoop MapReduce framework harnesses a cluster of
machines and executes user defined MapReduce jobs across
the nodes in the cluster.
Advantages
 A Reliable shared storage.
 Simple analysis system.
 Distributed File System.
 Tasks are independent.
 Easy to handle partial failures - entire nodes
can fail and restart.
Disadvantages
 Lack of central data can be frustrating.
 Still single master, which requires care and
may limit scaling.
 Managing job flow isn’t trivial when
intermediate data should be kept.
Who uses hadoop?
Conclusion
 Hadoop is a data grid operating system
which provides an economically scalable
solution for storing and processing large
amounts of unstructured or structured data
over long periods of time.
‘

More Related Content

What's hot

Cred_hadoop_presenatation
Cred_hadoop_presenatationCred_hadoop_presenatation
Cred_hadoop_presenatation
Ashish Saraf
 

What's hot (20)

Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPT
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
Big data computing
Big data computingBig data computing
Big data computing
 
Anju
AnjuAnju
Anju
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Cred_hadoop_presenatation
Cred_hadoop_presenatationCred_hadoop_presenatation
Cred_hadoop_presenatation
 
Hadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute BeginnersHadoop and Big Data for Absolute Beginners
Hadoop and Big Data for Absolute Beginners
 
Big data
Big dataBig data
Big data
 
Rebot Project Contents and Description
Rebot Project Contents and DescriptionRebot Project Contents and Description
Rebot Project Contents and Description
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Intro to Hadoop and MapReduce
Intro to Hadoop and MapReduceIntro to Hadoop and MapReduce
Intro to Hadoop and MapReduce
 
Hadoop
HadoopHadoop
Hadoop
 
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCEHADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
HADOOP DISTRIBUTED FILE SYSTEM AND MAPREDUCE
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
 
Cloud computing and Hadoop introduction
Cloud computing and Hadoop introductionCloud computing and Hadoop introduction
Cloud computing and Hadoop introduction
 
Big data and tools
Big data and tools Big data and tools
Big data and tools
 

Viewers also liked

Viewers also liked (10)

Hadoop Cluster on Docker Containers
Hadoop Cluster on Docker ContainersHadoop Cluster on Docker Containers
Hadoop Cluster on Docker Containers
 
How MapReduce part of Hadoop works (i.e. system's view) ?
How MapReduce part of Hadoop works (i.e. system's view) ? How MapReduce part of Hadoop works (i.e. system's view) ?
How MapReduce part of Hadoop works (i.e. system's view) ?
 
What's the Scoop on Hadoop? How It Works and How to WORK IT!
What's the Scoop on Hadoop? How It Works and How to WORK IT!What's the Scoop on Hadoop? How It Works and How to WORK IT!
What's the Scoop on Hadoop? How It Works and How to WORK IT!
 
Hadoop - How It Works
Hadoop - How It WorksHadoop - How It Works
Hadoop - How It Works
 
What is hadoop and how it works?
What is hadoop and how it works?What is hadoop and how it works?
What is hadoop and how it works?
 
Learn Big Data & Hadoop
Learn Big Data & Hadoop Learn Big Data & Hadoop
Learn Big Data & Hadoop
 
An Introduction to the World of Hadoop
An Introduction to the World of HadoopAn Introduction to the World of Hadoop
An Introduction to the World of Hadoop
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
 
Big data and Hadoop
Big data and HadoopBig data and Hadoop
Big data and Hadoop
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Similar to Hadoop

Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 

Similar to Hadoop (20)

Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
hadoop
hadoophadoop
hadoop
 
hadoop
hadoophadoop
hadoop
 
Hadoop info
Hadoop infoHadoop info
Hadoop info
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 
Big data
Big dataBig data
Big data
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangalore
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Intro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and MapreduceIntro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and Mapreduce
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
 
Big data
Big dataBig data
Big data
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Hadoop

  • 1. .
  • 2. Contents  What is Hadoop?  How big it is??  Why need for Large Computing?  How it Works?  Advantages  Disadvantages  Who uses hadoop?  Conclusion
  • 3. What is Hadoop?  It is a open source software written in java  Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.  HDFC: Self healing high bandwidth clustered storage.  Map reduce : fault-tolerant distributed processing.  Operates on unstructured and structured data.
  • 4. How BIG it is???? • We have ~20,000 machines running Hadoop. • Our largest clusters are currently 2000 nodes Several petabytes of user data (compressed, unreplicated). • We run hundreds of thousands of jobs every month.
  • 5. WHY NEED FOR LARGE COMPUTING????  The New York Stock Exchange generates about one terabyte of new trade data per day.  Facebook hosts approximately 10 billion photos, taking up one petabyte of storage.  The Internet Archive stores around 2 petabytes of data, and is growing at a rate of 20 terabytes per month.
  • 7. How Does HDFS Works? Hadoop Distributed File System
  • 8. Every Chunk Will Store 64mb in single Chunk  ;
  • 9. MAPREDUCE EX : WORD COUNT OVER A GIVEN SET OF STRINGS We love India We 1 love 1 India 1 We 1 Play 1 tennis 1 Love 1 India 1 We 2 tennis 1 play 1 We play tennis Map Reduce  The Hadoop MapReduce framework harnesses a cluster of machines and executes user defined MapReduce jobs across the nodes in the cluster.
  • 10. Advantages  A Reliable shared storage.  Simple analysis system.  Distributed File System.  Tasks are independent.  Easy to handle partial failures - entire nodes can fail and restart.
  • 11. Disadvantages  Lack of central data can be frustrating.  Still single master, which requires care and may limit scaling.  Managing job flow isn’t trivial when intermediate data should be kept.
  • 13. Conclusion  Hadoop is a data grid operating system which provides an economically scalable solution for storing and processing large amounts of unstructured or structured data over long periods of time.
  • 14.