SlideShare a Scribd company logo
1 of 15
HADOOP
Glossary
Scalable: It is a system whose
performance is improved after
having added more hardware
capacity, proportional to the
added capacity, it is said to be a
scalable system.
Nodes: A node is a point of
intersection, connection or union
of several elements that converge
in the same place
Cluster: The term cluster (of the
English cluster, meaning group or
cluster) is applied to the groups
or conglomerates of computers
linked together by a high-speed
network and which behave as if
they were a single computer
Open source: Open source is a
software development model
based on open collaboration
How did you get to Hadoop?
• Google worked from the first years of S. XXI on new
methods for access to information and its work was
directed to the massive treatment of large volumes of
data and in parallel systems.
• Innovations that shaped the development of Hadoop,
created by Google
1. The Google File System (GFS)- (2003)
2. MapReduce: Simplified Data Processing on
Large Clusters.-(2004)
3. Big Table(2006)
Google File System.
It is a scalable distributed file system for
intensive applications of large
distributed data
MapReduce
It is a programming model and an
associated implementation for
processing and generation of large data
sets
Big Table
It's a structured data management
distributed storage system that was
designed by Google to scale to a very
large size.
History of Hadoop
2004 - 2006 • Google publishes the GFS and MapReduce articles.
2006 • Doug Cutting, a software engineer who worked at Google,
implements an open source version called Nutch
• In 2006 formally Hadoop appears
2007 • An alliance was made between Google and IBM for university
research purposes to build a joint research group of MapReduce and
GFS
2008 • Hadoop begins to become popular and commercial exploitation
begins and the foundation of Apache Software takes responsibility
for the project
• In July 2009, new member of the Board of
Directors of the Apache Software Foundation
2009
2010
• Cutting abandons Yahoo and leaves to Cloudera
one of the most active organizations in the
development and implantation of Hadoop
• He is currently Chairman of the Board
of the Apache Foundation and works in
Cloudera as Software Architect, whose
distribution of Hadoop leads the
market.
• Cloudera is an organization that
provides services of training and
certification, sopor and sale of tools
for management in its cluster.
• The current era of Hadoop began in
2011 when the three major database
providers (Oracle, IBM and Microsoft)
adopted it.
What is
Hadoop?
• Hadoop is an open source implementation of
MapReduce, originally founded on Yahoo, in early
2006, created by Doug Cutting.
• Hadoop represents the most complete ecosystem to
solve in an efficient and economic way the
scalability of data, especially large volumes
(Terabytes and Petabytes).
• Currently Hadoop is led by the Apache Hadoop
foundation.
• Hadoo is a framework framework that allows you to
process large amounts of data at very low cost.
• Hadoop runs on low cost commercial hardware and
reduces the cost compared to other commercial
data storage and processing alternatives.
• Hadoop was designed to run on a large number of
machines that do not share memory, or disks.
Characteristics
• Hadoop is a distributed system whose main task is to solve the problem of storing
information that exceeds the capacity of a machine.
• The core of Hadoop is MapReduce
• Hadoop consists of two fundamental parts:
• A file system (HDFS)
• The MapReduce programming paradigm
Hadoop components
1. Hadoop Distributed File System
(HDFS). It is inspired by the Google
project (GFS)
2. Hadoop MapReduce. (mapper-
reduce). Achieve the manipulation
of the distributed data to nodes of
a cluster and obtain a high
parallelism in the processing
3. Hadoop Common.It is a set of
libraries that support several
Hadoop processes.
To understand the Hadoop
system better, it is important
to know its fundamental
infrastructure of the file
system and the programming
model
The file system allows
applications to run on
different servers
The programming model is a
framework.
Apache Hadoop Project
 Defined by the Apache Hadoop foundation, it develops open
source software for distributed, reliable and scalable
computing.
 The Apache Hadoop software library is a framework that
allows the distributed processing of large data sets.
 It is designed to scale from a few servers to thousands of
machines.
 Consider 4 components:
1. Hadoop Common
2. Hadoop Distributed File System
3. Hadoop YARN
4. Hadoop MapReduce
Applications that use Hadoop
• Facebook
• Twitter
• EBay
• EHarmony
• Netflix
• AOL
• Apple
• Linkedln
• Tuenti
Projects related to Hadoop Apache
• Avro
• Cassandra
• Chukwa
• Hbase
• Hive
• Mahout
• Pig
• Zookeeper
Hadoop
platforms
The consultant of Forrester published in 2012 a study
on Hadoop solutions in conclusion the results were:
• Amazon Web Services holds the leadership thanks to
Elastic MapReduce, its proven subscription service
rich in benefits
• IBM and EMC greenplum offers Hadoop solutions
with important EDW portfolios
• MapR and Cloudera impress with the best business-
scale distribution solutions
• Hortonworks offers an impressive portfolio of
professional services based on Hadoop
Thank You

More Related Content

What's hot (20)

Analytics 3
Analytics 3Analytics 3
Analytics 3
 
Anju
AnjuAnju
Anju
 
Hadoop map reduce
Hadoop map reduceHadoop map reduce
Hadoop map reduce
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPT
 
Introduction to bigdata
Introduction to bigdataIntroduction to bigdata
Introduction to bigdata
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Data analytics
Data analyticsData analytics
Data analytics
 
Introduction to apache hadoop copy
Introduction to apache hadoop   copyIntroduction to apache hadoop   copy
Introduction to apache hadoop copy
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
 
Big data advance topics - part 2.pptx
Big data   advance topics - part 2.pptxBig data   advance topics - part 2.pptx
Big data advance topics - part 2.pptx
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Big Data and Hadoop - An Introduction
Big Data and Hadoop - An IntroductionBig Data and Hadoop - An Introduction
Big Data and Hadoop - An Introduction
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
Big data and tools
Big data and tools Big data and tools
Big data and tools
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Hadoop
HadoopHadoop
Hadoop
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applications
 

Similar to Cap 10 ingles

Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemMahabubur Rahaman
 
Hadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An OverviewHadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An Overviewrahulmonikasharma
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxDr.Florence Dayana
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop TechnologyRahul Sharma
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoopManoj Jangalva
 
Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop EMC
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersMrigendra Sharma
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoopAditi Yadav
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoopOmar Jaber
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoopGanesh Sanap
 

Similar to Cap 10 ingles (20)

Hadoop
HadoopHadoop
Hadoop
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Hadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An OverviewHadoop and its role in Facebook: An Overview
Hadoop and its role in Facebook: An Overview
 
HDFS
HDFSHDFS
HDFS
 
Hadoop .pdf
Hadoop .pdfHadoop .pdf
Hadoop .pdf
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Big Data Hadoop Technology
Big Data Hadoop TechnologyBig Data Hadoop Technology
Big Data Hadoop Technology
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
 
Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop
 
Hadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, ProvidersHadoop Platforms - Introduction, Importance, Providers
Hadoop Platforms - Introduction, Importance, Providers
 
Bigdata and hadoop
Bigdata and hadoopBigdata and hadoop
Bigdata and hadoop
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
Hadoop Report
Hadoop ReportHadoop Report
Hadoop Report
 
Introduction to Apache hadoop
Introduction to Apache hadoopIntroduction to Apache hadoop
Introduction to Apache hadoop
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Hadoop basics
Hadoop basicsHadoop basics
Hadoop basics
 
Bigdata and Hadoop Introduction
Bigdata and Hadoop IntroductionBigdata and Hadoop Introduction
Bigdata and Hadoop Introduction
 

Recently uploaded

CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 

Recently uploaded (20)

CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 

Cap 10 ingles

  • 2. Glossary Scalable: It is a system whose performance is improved after having added more hardware capacity, proportional to the added capacity, it is said to be a scalable system. Nodes: A node is a point of intersection, connection or union of several elements that converge in the same place Cluster: The term cluster (of the English cluster, meaning group or cluster) is applied to the groups or conglomerates of computers linked together by a high-speed network and which behave as if they were a single computer Open source: Open source is a software development model based on open collaboration
  • 3. How did you get to Hadoop? • Google worked from the first years of S. XXI on new methods for access to information and its work was directed to the massive treatment of large volumes of data and in parallel systems. • Innovations that shaped the development of Hadoop, created by Google 1. The Google File System (GFS)- (2003) 2. MapReduce: Simplified Data Processing on Large Clusters.-(2004) 3. Big Table(2006)
  • 4. Google File System. It is a scalable distributed file system for intensive applications of large distributed data MapReduce It is a programming model and an associated implementation for processing and generation of large data sets Big Table It's a structured data management distributed storage system that was designed by Google to scale to a very large size.
  • 5. History of Hadoop 2004 - 2006 • Google publishes the GFS and MapReduce articles. 2006 • Doug Cutting, a software engineer who worked at Google, implements an open source version called Nutch • In 2006 formally Hadoop appears 2007 • An alliance was made between Google and IBM for university research purposes to build a joint research group of MapReduce and GFS 2008 • Hadoop begins to become popular and commercial exploitation begins and the foundation of Apache Software takes responsibility for the project
  • 6. • In July 2009, new member of the Board of Directors of the Apache Software Foundation 2009 2010 • Cutting abandons Yahoo and leaves to Cloudera one of the most active organizations in the development and implantation of Hadoop
  • 7. • He is currently Chairman of the Board of the Apache Foundation and works in Cloudera as Software Architect, whose distribution of Hadoop leads the market. • Cloudera is an organization that provides services of training and certification, sopor and sale of tools for management in its cluster. • The current era of Hadoop began in 2011 when the three major database providers (Oracle, IBM and Microsoft) adopted it.
  • 8. What is Hadoop? • Hadoop is an open source implementation of MapReduce, originally founded on Yahoo, in early 2006, created by Doug Cutting. • Hadoop represents the most complete ecosystem to solve in an efficient and economic way the scalability of data, especially large volumes (Terabytes and Petabytes). • Currently Hadoop is led by the Apache Hadoop foundation. • Hadoo is a framework framework that allows you to process large amounts of data at very low cost. • Hadoop runs on low cost commercial hardware and reduces the cost compared to other commercial data storage and processing alternatives. • Hadoop was designed to run on a large number of machines that do not share memory, or disks.
  • 9. Characteristics • Hadoop is a distributed system whose main task is to solve the problem of storing information that exceeds the capacity of a machine. • The core of Hadoop is MapReduce • Hadoop consists of two fundamental parts: • A file system (HDFS) • The MapReduce programming paradigm
  • 10. Hadoop components 1. Hadoop Distributed File System (HDFS). It is inspired by the Google project (GFS) 2. Hadoop MapReduce. (mapper- reduce). Achieve the manipulation of the distributed data to nodes of a cluster and obtain a high parallelism in the processing 3. Hadoop Common.It is a set of libraries that support several Hadoop processes.
  • 11. To understand the Hadoop system better, it is important to know its fundamental infrastructure of the file system and the programming model The file system allows applications to run on different servers The programming model is a framework.
  • 12. Apache Hadoop Project  Defined by the Apache Hadoop foundation, it develops open source software for distributed, reliable and scalable computing.  The Apache Hadoop software library is a framework that allows the distributed processing of large data sets.  It is designed to scale from a few servers to thousands of machines.  Consider 4 components: 1. Hadoop Common 2. Hadoop Distributed File System 3. Hadoop YARN 4. Hadoop MapReduce
  • 13. Applications that use Hadoop • Facebook • Twitter • EBay • EHarmony • Netflix • AOL • Apple • Linkedln • Tuenti Projects related to Hadoop Apache • Avro • Cassandra • Chukwa • Hbase • Hive • Mahout • Pig • Zookeeper
  • 14. Hadoop platforms The consultant of Forrester published in 2012 a study on Hadoop solutions in conclusion the results were: • Amazon Web Services holds the leadership thanks to Elastic MapReduce, its proven subscription service rich in benefits • IBM and EMC greenplum offers Hadoop solutions with important EDW portfolios • MapR and Cloudera impress with the best business- scale distribution solutions • Hortonworks offers an impressive portfolio of professional services based on Hadoop