SlideShare a Scribd company logo

What is Hadoop?

Slides of the course on big data by C. Levallois from EMLYON Business School. For business students. Check the online video connected with these slides. -> Basic definition of Hadoop in relation to cloud computing and big data.

1 of 7
Download to read offline
MK99 – Big Data 1
Big data
&
cross-platform analytics
MOOC lectures Pr. Clement Levallois
MK99 – Big Data 2
Focus on “Hadoop”
• Frequently mentioned in relation to big data
• Vague definitions available and inflated talks
• This short video will clarify it.
MK99 – Big Data 3
• Note on the terminology:
– “computers” are called “servers” when they are just used
for computing / processing / storing data
– They have no screen, no mouse and no keyboard because
that’s not needed.
– But they are basically computers!
MK99 – Big Data 4
“Hadoop”
• Created by Yahoo! engineers in ~ 2005. Named after the elephant toy of one of the
engineer’s kid.
• Made open source and now developed by the main open source developer
community, called “Apache”. So you can see sometimes “Apache Hadoop”.
• In simple words:
– Hadoop is a free, open source software.
– It serves to connect several servers, so that a single task can be accomplished in parallel on
them.
– So, with Hadoop and 5 servers you can get a task of data crunching finish 5 times sooner than
with if you had just used one server.
– That’s it!
MK99 – Big Data 5
Why are Hadoop, cloud computing and big data
often discussed together?
– Imagine that you are Walmart and want to compute something on your CRM: say, what are
the clients who are most profitable for each store, based on their purchase history.
– You will need many servers to store the data, and many servers to do the computations.
– Instead of purchasing a farm of servers for this (expensive! time consuming!), you can pay for
a service of cloud computing (such as Amazon AWS EC2) to rent servers just for this task,
– And install Hadoop on these servers to divide the task among all servers and get it to run in
parallel, speeding up computation times.
– You will get the results in minutes or hours, instead of days.
MK99 – Big Data 6
And map/reduce?
– “Map/reduce” is also an expression often discussed in relation with cloud computing and
Hadoop.
– This is a principle of programming perfected by engineers in Google around 2004, and made
open source.
– It is a principle that solves this problem: when I have data spread on 500 different servers,
how do I search some data on all the servers? Checking all servers one by one (sequential
search) would take a very long time. MapReduce dispatches the search on all servers at once,
hence it is 500 times quicker than a sequential search.
– Any software can use this principle of programming. Mapreduce is at the heart of Hadoop,
which is one of the most popular software using it.

Recommended

A Big Data Telco Solution by Dr. Laura Wynter
A Big Data Telco Solution by Dr. Laura WynterA Big Data Telco Solution by Dr. Laura Wynter
A Big Data Telco Solution by Dr. Laura Wynterwkwsci-research
 
Part 2: covid-19 on Twitter, with a focus on 3 new seed accounts
Part 2: covid-19 on Twitter, with a focus on 3 new seed accountsPart 2: covid-19 on Twitter, with a focus on 3 new seed accounts
Part 2: covid-19 on Twitter, with a focus on 3 new seed accountsClement Levallois
 
Education et intelligence artificielle
Education et intelligence artificielleEducation et intelligence artificielle
Education et intelligence artificielleClement Levallois
 
3 familles d'intelligence artificielle et leurs applications business
3 familles d'intelligence artificielle et leurs applications business3 familles d'intelligence artificielle et leurs applications business
3 familles d'intelligence artificielle et leurs applications businessClement Levallois
 
Présentation FrenchWeb: Qu'est-ce que la visualisation des données?
Présentation FrenchWeb: Qu'est-ce que la visualisation des données?Présentation FrenchWeb: Qu'est-ce que la visualisation des données?
Présentation FrenchWeb: Qu'est-ce que la visualisation des données?Clement Levallois
 
Presentation of programming languages for beginners
Presentation of programming languages for beginnersPresentation of programming languages for beginners
Presentation of programming languages for beginnersClement Levallois
 
Umigon: crowdsourcing in the classroom
Umigon: crowdsourcing in the classroomUmigon: crowdsourcing in the classroom
Umigon: crowdsourcing in the classroomClement Levallois
 
Data visualization: enjeux pour le business
Data visualization: enjeux pour le businessData visualization: enjeux pour le business
Data visualization: enjeux pour le businessClement Levallois
 

More Related Content

More from Clement Levallois

An explanation of machine learning for business
An explanation of machine learning for businessAn explanation of machine learning for business
An explanation of machine learning for businessClement Levallois
 
A Primer on Text Mining for Business
A Primer on Text Mining for BusinessA Primer on Text Mining for Business
A Primer on Text Mining for BusinessClement Levallois
 
The business stakes of data integration
The business stakes of data integrationThe business stakes of data integration
The business stakes of data integrationClement Levallois
 

More from Clement Levallois (7)

Twitter for beginners
Twitter for beginnersTwitter for beginners
Twitter for beginners
 
An explanation of machine learning for business
An explanation of machine learning for businessAn explanation of machine learning for business
An explanation of machine learning for business
 
Data and personalization
Data and personalizationData and personalization
Data and personalization
 
A Primer on Text Mining for Business
A Primer on Text Mining for BusinessA Primer on Text Mining for Business
A Primer on Text Mining for Business
 
The business stakes of data integration
The business stakes of data integrationThe business stakes of data integration
The business stakes of data integration
 
What is big data?
What is big data?What is big data?
What is big data?
 
What is "data"?
What is "data"?What is "data"?
What is "data"?
 

Recently uploaded

Decision on Curriculum Change Path: Towards Standards-Based Curriculum in Ghana
Decision on Curriculum Change Path: Towards Standards-Based Curriculum in GhanaDecision on Curriculum Change Path: Towards Standards-Based Curriculum in Ghana
Decision on Curriculum Change Path: Towards Standards-Based Curriculum in GhanaPrince Armah, PhD
 
Nzinga Kika - The story of the queen
Nzinga Kika    -  The story of the queenNzinga Kika    -  The story of the queen
Nzinga Kika - The story of the queenDeanAmory1
 
Overview of Databases and Data Modelling-1.pdf
Overview of Databases and Data Modelling-1.pdfOverview of Databases and Data Modelling-1.pdf
Overview of Databases and Data Modelling-1.pdfChristalin Nelson
 
11 CI SINIF SINAQLARI - 10-2023-Aynura-Hamidova.pdf
11 CI SINIF SINAQLARI - 10-2023-Aynura-Hamidova.pdf11 CI SINIF SINAQLARI - 10-2023-Aynura-Hamidova.pdf
11 CI SINIF SINAQLARI - 10-2023-Aynura-Hamidova.pdfAynouraHamidova
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...Nguyen Thanh Tu Collection
 
spring_bee_bot_creations_erd primary.pdf
spring_bee_bot_creations_erd primary.pdfspring_bee_bot_creations_erd primary.pdf
spring_bee_bot_creations_erd primary.pdfKonstantina Koutsodimou
 
Biology 152 - Topic Statement - Boolean Search
Biology 152 - Topic Statement - Boolean SearchBiology 152 - Topic Statement - Boolean Search
Biology 152 - Topic Statement - Boolean SearchRobert Tomaszewski
 
Diploma 2nd yr PHARMACOLOGY chapter 5 part 1.pdf
Diploma 2nd yr PHARMACOLOGY chapter 5 part 1.pdfDiploma 2nd yr PHARMACOLOGY chapter 5 part 1.pdf
Diploma 2nd yr PHARMACOLOGY chapter 5 part 1.pdfSUMIT TIWARI
 
ICSE English Language Class X Handwritten Notes
ICSE English Language Class X Handwritten NotesICSE English Language Class X Handwritten Notes
ICSE English Language Class X Handwritten NotesGauri S
 
Dr.M.Florence Dayana-Cloud Computing-unit - 4.pdf
Dr.M.Florence Dayana-Cloud Computing-unit - 4.pdfDr.M.Florence Dayana-Cloud Computing-unit - 4.pdf
Dr.M.Florence Dayana-Cloud Computing-unit - 4.pdfDr.Florence Dayana
 
Plant Genetic Resources, Germplasm, gene pool - Copy.pptx
Plant Genetic Resources, Germplasm, gene pool - Copy.pptxPlant Genetic Resources, Germplasm, gene pool - Copy.pptx
Plant Genetic Resources, Germplasm, gene pool - Copy.pptxAKSHAYMAGAR17
 
Genetic deterioration Seed ageing of improved variety seed, Maintenance of G...
Genetic deterioration  Seed ageing of improved variety seed, Maintenance of G...Genetic deterioration  Seed ageing of improved variety seed, Maintenance of G...
Genetic deterioration Seed ageing of improved variety seed, Maintenance of G...AKSHAYMAGAR17
 
A LABORATORY MANUAL FOR ORGANIC CHEMISTRY.pdf
A LABORATORY MANUAL FOR ORGANIC CHEMISTRY.pdfA LABORATORY MANUAL FOR ORGANIC CHEMISTRY.pdf
A LABORATORY MANUAL FOR ORGANIC CHEMISTRY.pdfDr.M.Geethavani
 
Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...
Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...
Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...AKSHAYMAGAR17
 
BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...
BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...
BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...MohonDas
 
ICSE English Literature Class X Handwritten Notes
ICSE English Literature Class X Handwritten NotesICSE English Literature Class X Handwritten Notes
ICSE English Literature Class X Handwritten NotesGauri S
 
11 CI SINIF SINAQLARI - 9-2023-Aynura-Hamidova.pdf
11 CI SINIF SINAQLARI - 9-2023-Aynura-Hamidova.pdf11 CI SINIF SINAQLARI - 9-2023-Aynura-Hamidova.pdf
11 CI SINIF SINAQLARI - 9-2023-Aynura-Hamidova.pdfAynouraHamidova
 
VPEC BROUCHER FOR ALL COURSES UPDATED FEB 2024
VPEC BROUCHER FOR ALL COURSES UPDATED FEB 2024VPEC BROUCHER FOR ALL COURSES UPDATED FEB 2024
VPEC BROUCHER FOR ALL COURSES UPDATED FEB 2024avesmalik2
 
Barrow Motor Ability Test - TEST, MEASUREMENT AND EVALUATION IN PHYSICAL EDUC...
Barrow Motor Ability Test - TEST, MEASUREMENT AND EVALUATION IN PHYSICAL EDUC...Barrow Motor Ability Test - TEST, MEASUREMENT AND EVALUATION IN PHYSICAL EDUC...
Barrow Motor Ability Test - TEST, MEASUREMENT AND EVALUATION IN PHYSICAL EDUC...Rabiya Husain
 

Recently uploaded (20)

Decision on Curriculum Change Path: Towards Standards-Based Curriculum in Ghana
Decision on Curriculum Change Path: Towards Standards-Based Curriculum in GhanaDecision on Curriculum Change Path: Towards Standards-Based Curriculum in Ghana
Decision on Curriculum Change Path: Towards Standards-Based Curriculum in Ghana
 
Nzinga Kika - The story of the queen
Nzinga Kika    -  The story of the queenNzinga Kika    -  The story of the queen
Nzinga Kika - The story of the queen
 
Overview of Databases and Data Modelling-1.pdf
Overview of Databases and Data Modelling-1.pdfOverview of Databases and Data Modelling-1.pdf
Overview of Databases and Data Modelling-1.pdf
 
11 CI SINIF SINAQLARI - 10-2023-Aynura-Hamidova.pdf
11 CI SINIF SINAQLARI - 10-2023-Aynura-Hamidova.pdf11 CI SINIF SINAQLARI - 10-2023-Aynura-Hamidova.pdf
11 CI SINIF SINAQLARI - 10-2023-Aynura-Hamidova.pdf
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (GLOB...
 
spring_bee_bot_creations_erd primary.pdf
spring_bee_bot_creations_erd primary.pdfspring_bee_bot_creations_erd primary.pdf
spring_bee_bot_creations_erd primary.pdf
 
Biology 152 - Topic Statement - Boolean Search
Biology 152 - Topic Statement - Boolean SearchBiology 152 - Topic Statement - Boolean Search
Biology 152 - Topic Statement - Boolean Search
 
Diploma 2nd yr PHARMACOLOGY chapter 5 part 1.pdf
Diploma 2nd yr PHARMACOLOGY chapter 5 part 1.pdfDiploma 2nd yr PHARMACOLOGY chapter 5 part 1.pdf
Diploma 2nd yr PHARMACOLOGY chapter 5 part 1.pdf
 
Advance Mobile Application Development class 04
Advance Mobile Application Development class 04Advance Mobile Application Development class 04
Advance Mobile Application Development class 04
 
ICSE English Language Class X Handwritten Notes
ICSE English Language Class X Handwritten NotesICSE English Language Class X Handwritten Notes
ICSE English Language Class X Handwritten Notes
 
Dr.M.Florence Dayana-Cloud Computing-unit - 4.pdf
Dr.M.Florence Dayana-Cloud Computing-unit - 4.pdfDr.M.Florence Dayana-Cloud Computing-unit - 4.pdf
Dr.M.Florence Dayana-Cloud Computing-unit - 4.pdf
 
Plant Genetic Resources, Germplasm, gene pool - Copy.pptx
Plant Genetic Resources, Germplasm, gene pool - Copy.pptxPlant Genetic Resources, Germplasm, gene pool - Copy.pptx
Plant Genetic Resources, Germplasm, gene pool - Copy.pptx
 
Genetic deterioration Seed ageing of improved variety seed, Maintenance of G...
Genetic deterioration  Seed ageing of improved variety seed, Maintenance of G...Genetic deterioration  Seed ageing of improved variety seed, Maintenance of G...
Genetic deterioration Seed ageing of improved variety seed, Maintenance of G...
 
A LABORATORY MANUAL FOR ORGANIC CHEMISTRY.pdf
A LABORATORY MANUAL FOR ORGANIC CHEMISTRY.pdfA LABORATORY MANUAL FOR ORGANIC CHEMISTRY.pdf
A LABORATORY MANUAL FOR ORGANIC CHEMISTRY.pdf
 
Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...
Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...
Ideotype concept and climate resilient crop varieties for future- Wheat, Rice...
 
BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...
BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...
BEZA or Bangladesh Economic Zone Authority recruitment exam question solution...
 
ICSE English Literature Class X Handwritten Notes
ICSE English Literature Class X Handwritten NotesICSE English Literature Class X Handwritten Notes
ICSE English Literature Class X Handwritten Notes
 
11 CI SINIF SINAQLARI - 9-2023-Aynura-Hamidova.pdf
11 CI SINIF SINAQLARI - 9-2023-Aynura-Hamidova.pdf11 CI SINIF SINAQLARI - 9-2023-Aynura-Hamidova.pdf
11 CI SINIF SINAQLARI - 9-2023-Aynura-Hamidova.pdf
 
VPEC BROUCHER FOR ALL COURSES UPDATED FEB 2024
VPEC BROUCHER FOR ALL COURSES UPDATED FEB 2024VPEC BROUCHER FOR ALL COURSES UPDATED FEB 2024
VPEC BROUCHER FOR ALL COURSES UPDATED FEB 2024
 
Barrow Motor Ability Test - TEST, MEASUREMENT AND EVALUATION IN PHYSICAL EDUC...
Barrow Motor Ability Test - TEST, MEASUREMENT AND EVALUATION IN PHYSICAL EDUC...Barrow Motor Ability Test - TEST, MEASUREMENT AND EVALUATION IN PHYSICAL EDUC...
Barrow Motor Ability Test - TEST, MEASUREMENT AND EVALUATION IN PHYSICAL EDUC...
 

What is Hadoop?

  • 1. MK99 – Big Data 1 Big data & cross-platform analytics MOOC lectures Pr. Clement Levallois
  • 2. MK99 – Big Data 2 Focus on “Hadoop” • Frequently mentioned in relation to big data • Vague definitions available and inflated talks • This short video will clarify it.
  • 3. MK99 – Big Data 3 • Note on the terminology: – “computers” are called “servers” when they are just used for computing / processing / storing data – They have no screen, no mouse and no keyboard because that’s not needed. – But they are basically computers!
  • 4. MK99 – Big Data 4 “Hadoop” • Created by Yahoo! engineers in ~ 2005. Named after the elephant toy of one of the engineer’s kid. • Made open source and now developed by the main open source developer community, called “Apache”. So you can see sometimes “Apache Hadoop”. • In simple words: – Hadoop is a free, open source software. – It serves to connect several servers, so that a single task can be accomplished in parallel on them. – So, with Hadoop and 5 servers you can get a task of data crunching finish 5 times sooner than with if you had just used one server. – That’s it!
  • 5. MK99 – Big Data 5 Why are Hadoop, cloud computing and big data often discussed together? – Imagine that you are Walmart and want to compute something on your CRM: say, what are the clients who are most profitable for each store, based on their purchase history. – You will need many servers to store the data, and many servers to do the computations. – Instead of purchasing a farm of servers for this (expensive! time consuming!), you can pay for a service of cloud computing (such as Amazon AWS EC2) to rent servers just for this task, – And install Hadoop on these servers to divide the task among all servers and get it to run in parallel, speeding up computation times. – You will get the results in minutes or hours, instead of days.
  • 6. MK99 – Big Data 6 And map/reduce? – “Map/reduce” is also an expression often discussed in relation with cloud computing and Hadoop. – This is a principle of programming perfected by engineers in Google around 2004, and made open source. – It is a principle that solves this problem: when I have data spread on 500 different servers, how do I search some data on all the servers? Checking all servers one by one (sequential search) would take a very long time. MapReduce dispatches the search on all servers at once, hence it is 500 times quicker than a sequential search. – Any software can use this principle of programming. Mapreduce is at the heart of Hadoop, which is one of the most popular software using it.
  • 7. MK99 – Big Data 7 What is the business relevance of Hadoop? • Hadoop made it possible to process large amounts of data quickly, using free software. • It enables business models where intensive data crunching is necessary to create value. • Examples: – Amazon computing book recommendations for you, – Walmart offering personalized coupons, – NYT showing personalized display ads, – Waze (driving app) showing the state of traffic on your road in real time, – your electricity utility company computing how much electricity should be generated at peak hours.