SlideShare a Scribd company logo
1 of 8
Apache top level project, open-
source implementation of frameworks
for reliable, scalable, distributed
computing and data storage.
It is a flexible and highly-available
architecture for large scale
computation and data processing on
a network of commodity hardware.
Hadoop’s Developers
Doug Cutting
2005: Doug Cutting and Michael J.
Cafarella developed Hadoop to
support distribution for
the Nutch search engine project.
The project was funded by Yahoo.
2006: Yahoo gave the project to
Apache
Software Foundation.
Google Origins
2003
2004
2006
• Hadoop:
• an open-source software framework that supports data-intensive
distributed applications, licensed under the Apache v2 license.
• Goals / Requirements:
• Abstract and facilitate the storage and processing of large and/or rapidly
growing data sets
• Structured and non-structured data
• Simple programming models
• High scalability and availability
• Use commodity (cheap!) hardware with little redundancy
• Fault-tolerance
• Move computation rather than data
Hadoop Framework Tools
Hadoop’s Architecture
Three main applications of Hadoop:
• Advertisement (Mining user behavior to
generate recommendations)
• Searches (group related documents)
• Security (search for uncommon
patterns)
introduction to hadoop

More Related Content

What's hot

Bridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architectureBridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architecture
IBM Analytics
 

What's hot (20)

CSB_community
CSB_communityCSB_community
CSB_community
 
Big data analysis using hadoop cluster
Big data analysis using hadoop clusterBig data analysis using hadoop cluster
Big data analysis using hadoop cluster
 
MahoutNew
MahoutNewMahoutNew
MahoutNew
 
Big data in Azure
Big data in AzureBig data in Azure
Big data in Azure
 
Bridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architectureBridging to a hybrid cloud data services architecture
Bridging to a hybrid cloud data services architecture
 
Spark - The beginnings
Spark -  The beginningsSpark -  The beginnings
Spark - The beginnings
 
Intake at AnacondaCon
Intake at AnacondaConIntake at AnacondaCon
Intake at AnacondaCon
 
NIF Data Federation
NIF Data FederationNIF Data Federation
NIF Data Federation
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
The HDF Group: Community models and outreach
The HDF Group: Community models and outreachThe HDF Group: Community models and outreach
The HDF Group: Community models and outreach
 
Plans for Enhanced NetCDF-4 Interface to HDF5 Data
Plans for Enhanced NetCDF-4 Interface to HDF5 DataPlans for Enhanced NetCDF-4 Interface to HDF5 Data
Plans for Enhanced NetCDF-4 Interface to HDF5 Data
 
Data Tools and the Data Scientist Shortage
Data Tools and the Data Scientist ShortageData Tools and the Data Scientist Shortage
Data Tools and the Data Scientist Shortage
 
DataFest 2019 Science Gateways
DataFest 2019 Science GatewaysDataFest 2019 Science Gateways
DataFest 2019 Science Gateways
 
Intro To Hadoop
Intro To HadoopIntro To Hadoop
Intro To Hadoop
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
Simplifying And Accelerating Data Access for Python With Dremio and Apache ArrowSimplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"
 
Intro to Big Data - Spark
Intro to Big Data - SparkIntro to Big Data - Spark
Intro to Big Data - Spark
 
HDF Cloud Services
HDF Cloud ServicesHDF Cloud Services
HDF Cloud Services
 

Similar to introduction to hadoop

Similar to introduction to hadoop (20)

Data analytics
Data analyticsData analytics
Data analytics
 
Hadoop foundation for analytics
Hadoop foundation for analyticsHadoop foundation for analytics
Hadoop foundation for analytics
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop training
Hadoop trainingHadoop training
Hadoop training
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop Eco system
Hadoop Eco systemHadoop Eco system
Hadoop Eco system
 
Akhil's hadoop
Akhil's hadoopAkhil's hadoop
Akhil's hadoop
 
Akhil's hadoop
Akhil's hadoopAkhil's hadoop
Akhil's hadoop
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introduction
 
Cap 10 ingles
Cap  10 inglesCap  10 ingles
Cap 10 ingles
 
Cap 10 ingles
Cap  10 inglesCap  10 ingles
Cap 10 ingles
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
 
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
 

More from ASIT

Asit Never Cheats Unemployes
Asit Never Cheats UnemployesAsit Never Cheats Unemployes
Asit Never Cheats Unemployes
ASIT
 
Asit amc never cheats students
Asit amc never cheats studentsAsit amc never cheats students
Asit amc never cheats students
ASIT
 

More from ASIT (20)

Asit education student review
Asit education student reviewAsit education student review
Asit education student review
 
ASIT EDUCATION STUDENT REVIEWS
ASIT EDUCATION STUDENT REVIEWSASIT EDUCATION STUDENT REVIEWS
ASIT EDUCATION STUDENT REVIEWS
 
Asit Education
Asit EducationAsit Education
Asit Education
 
Asit Education Student Reviews
Asit Education Student ReviewsAsit Education Student Reviews
Asit Education Student Reviews
 
Asit education Student review
Asit education Student reviewAsit education Student review
Asit education Student review
 
ASIT EDUCATION REVIEW
ASIT EDUCATION REVIEWASIT EDUCATION REVIEW
ASIT EDUCATION REVIEW
 
Asit Never Cheats Unemployes
Asit Never Cheats UnemployesAsit Never Cheats Unemployes
Asit Never Cheats Unemployes
 
Latest News on Amc Square Asit
Latest News on Amc Square AsitLatest News on Amc Square Asit
Latest News on Amc Square Asit
 
Asit amc never cheats students
Asit amc never cheats studentsAsit amc never cheats students
Asit amc never cheats students
 
News on AMC Square ASIT
News on AMC Square ASITNews on AMC Square ASIT
News on AMC Square ASIT
 
News on Asit Amc
News on Asit AmcNews on Asit Amc
News on Asit Amc
 
Time Management
Time ManagementTime Management
Time Management
 
learn Ruby at ASIT
learn Ruby at ASITlearn Ruby at ASIT
learn Ruby at ASIT
 
introduction to Mongodb
introduction to Mongodbintroduction to Mongodb
introduction to Mongodb
 
ASIT REVIEWS
ASIT REVIEWSASIT REVIEWS
ASIT REVIEWS
 
ASIT REVIEWS
ASIT REVIEWSASIT REVIEWS
ASIT REVIEWS
 
Learn REST API at ASIT
Learn REST API at ASITLearn REST API at ASIT
Learn REST API at ASIT
 
Learn C LANGUAGE at ASIT
Learn C LANGUAGE at ASITLearn C LANGUAGE at ASIT
Learn C LANGUAGE at ASIT
 
Learn Advanced JAVA at ASIT
Learn Advanced JAVA at ASITLearn Advanced JAVA at ASIT
Learn Advanced JAVA at ASIT
 
Learn WCF at ASIT
Learn  WCF at ASITLearn  WCF at ASIT
Learn WCF at ASIT
 

Recently uploaded

Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
fonyou31
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 

Recently uploaded (20)

IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 

introduction to hadoop

  • 1. Apache top level project, open- source implementation of frameworks for reliable, scalable, distributed computing and data storage. It is a flexible and highly-available architecture for large scale computation and data processing on a network of commodity hardware.
  • 2. Hadoop’s Developers Doug Cutting 2005: Doug Cutting and Michael J. Cafarella developed Hadoop to support distribution for the Nutch search engine project. The project was funded by Yahoo. 2006: Yahoo gave the project to Apache Software Foundation.
  • 4. • Hadoop: • an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. • Goals / Requirements: • Abstract and facilitate the storage and processing of large and/or rapidly growing data sets • Structured and non-structured data • Simple programming models • High scalability and availability • Use commodity (cheap!) hardware with little redundancy • Fault-tolerance • Move computation rather than data
  • 7. Three main applications of Hadoop: • Advertisement (Mining user behavior to generate recommendations) • Searches (group related documents) • Security (search for uncommon patterns)