SlideShare a Scribd company logo
MapReduce
Programming
Model
OUTLINE
Motivation
Sales exemples
words count exemple
1.wordcount in Hadoop
using python
2.Arraysum Demo using
Java
MapReduce daemons
in Hadoop
Big Data
Map
Reduce
INTRODUCTION
Parallel Processing
01 02
04
Task tracker
Job tracker
MapReduce
03
Demo code
05
2
Summary
Conclusion
06
INTRODUCTION
01
3
Parallel
Processing
4
Task is broken up to multiple parts with a software tool
and each part is distributed to a processor, then each
processor will perform the assigned part.
5
Finally, the parts are reassembled to deliver
the final solution or execute the task.
Reminder !
6
Multiprocessing
Parallel Processing is not
Motivation
7
8
NO !
9
What is the proposed
solution to deal with
?
Motivation
10
• Motivations
● Large-scale data processing on clusters
● Massively parallel (hundreds or thousands of CPUs)
● Reliable execution with easy data access
• Functions
● Fault-tolerance
● Status and monitoring tools
● A clean abstraction for programmers
Inspired by LISP
Function
Programming
Map
11
Reduce
12
Lisp map function
● Input parameters: a function and a set of values
● This function is applied to each of the values.
Lisp reduce function
● given a binary function and a set of values.
● It combines all the values together using the
binary function.
(map ‘length ‘(() (a) (ab) (abc)))
(length(()) length(a) length(ab)
length(abc))
(0 1 2 3)
use the + (add) function to reduce the
list
(reduce #'+ '(0 1 2 3))
6
Example
MapReduce
02
13
14
Instead of browsing the file sequentially, it is divided into chunks that are browsed in
parallel.
Example 1 :
Principal
15
Calculate the total sales for the current year ?
Solution
16
+
++
Instead of having one person
cover the whole book
we hire several !
A first group is called mappers
the second is called reducers
Divide the book in several parts
and give one to each mapper .
17
18
(key , value)
(key , values)
Intermediate registration
Results
shuffle & sort
The Famous
words count
example
02
19
20
Example 2 :
More Details
21
Input/output specification of the WC mapreduce job
Input : a set of (key values) stored in files
key: document ID
value: a list of words as content of each document
Output: a set of (key values) stored in files
key: wordID
value: word frequency appeared in all documents
MapReduce function specification:
map(String input_key, String input_value):
reduce(String output_key, Iterator intermediate_values):
22
Pseudo-code
23
MapReduce
Daemons in
Hadoop
03
24
25
“MapReduce has been implemented in many
programming languages and frameworks, such
as Apache Hadoop, Pig, Hive, etc. “
26
Divides the work on mappers
and reducers
runs on each node to execute
the real mapreduce tasks
Brief introduction for later use
mapReduce daemons
Demo Code
1
05
27
Sum array elements using mapReduce
28
Map: Split the array of 1000
elements into 10 small data
chunks (each chunk will have 100
elements)
Each chunk will be processed by a
separate thread concurrently.
We will have 10 threads and each
thread will iterate 100 elements to
produce the sum of those 100
elements.
Reducer: takes the output of
these 10 threads and will be
summed again to produce the
final output.
Sum array elements using mapReduce with java
29
Project structure Main
Call map task and Reduce Task to
perform mapReduce fn
Environnement
30
create thread pool of 10
save each task
of each chunk
in queue
split array of 1k into
chunks each of
100
save map result
of each chunk
into mapOutput
31
getoutput of map and
aggregate results
For each element
in mapOut(
the result from
previous map)
source code link : https://github.com/HabibaAbderrahim/thread_mapReduce
Demo Code
2
32
Words count using Hadoop framework
33
Environnement
Pseudo Distributed environment
PS : This is a pseudo environment that simulate a fully distributed environment since
we have one server / one pc
java should be installed
create hadoop
sudo user
install hadoop
for the official
website
check hadoop
is installed
version : 3.2.1
34
Environnement
Pseudo Distributed environment
Files configuration
version : 3.2.1
java home and hadoop home
add java path
HDFS : hadoop file system
HDFS configuration : namenode/datanode/replication
mapReduce configuration
mapReduce runs on Yarn
Verify Hadoop daemons
35
We decided to work with python
just to test hadoop
streaming Features
Environnement
version : 3.2.1
version : 3.5.1
word count using mapReduce in Hadoop with python
36
Environnement
version : 3.2.1
version : 3.5.1
Mapper
Reducer
37
Environnement
version : 3.2.1
version : 3.5.1
see what is inside our file
data.txt
Words count in
data.txt
MapReduce
sort results alphabetic
Conclusion
06
38
The ideas, concepts and diagrams are taken from the following websites:
● http://www.metz.supelec.fr/metz/personnel/vialle/course/BigData-2A-CS/poly-
pdf/Poly-chap6.pdf
● https://sites.cs.ucsb.edu/~tyang/class/240a17/slides/CS240TopicMapReduce.
pdf
● https://fr.slideshare.net/LiliaSfaxi/bigdatachp2-hadoop-mapreduce
● https://algodaily.com/lessons/what-is-mapreduce-and-how-does-it-work
[References]
39
Thanks!
Do you have any questions?
40

More Related Content

Similar to mapReduce.pptx

Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Mohamed Ali Mahmoud khouder
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
Kelly Technologies
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticians
attilacsordas
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Ran Silberman
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
Xiao Qin
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Apache Apex
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangalore
Kelly Technologies
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
Databricks
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
Asad Masood Qazi
 
Hadoop
HadoopHadoop
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
GERARDO BARBERENA
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
agiamas
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Ran Silberman
 
Map reduce
Map reduceMap reduce
Map reducexydii
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingcoolmirza143
 

Similar to mapReduce.pptx (20)

Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticians
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangalore
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
 
Hadoop
HadoopHadoop
Hadoop
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Unit 2
Unit 2Unit 2
Unit 2
 
Map reduce
Map reduceMap reduce
Map reduce
 
Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
 

Recently uploaded

Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Amil Baba Dawood bangali
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
MLILAB
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 

Recently uploaded (20)

Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang,  ICLR 2024, MLILAB, KAIST AI.pdfJ.Yang,  ICLR 2024, MLILAB, KAIST AI.pdf
J.Yang, ICLR 2024, MLILAB, KAIST AI.pdf
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 

mapReduce.pptx

  • 2. OUTLINE Motivation Sales exemples words count exemple 1.wordcount in Hadoop using python 2.Arraysum Demo using Java MapReduce daemons in Hadoop Big Data Map Reduce INTRODUCTION Parallel Processing 01 02 04 Task tracker Job tracker MapReduce 03 Demo code 05 2 Summary Conclusion 06
  • 5. Task is broken up to multiple parts with a software tool and each part is distributed to a processor, then each processor will perform the assigned part. 5 Finally, the parts are reassembled to deliver the final solution or execute the task.
  • 9. 9 What is the proposed solution to deal with ?
  • 10. Motivation 10 • Motivations ● Large-scale data processing on clusters ● Massively parallel (hundreds or thousands of CPUs) ● Reliable execution with easy data access • Functions ● Fault-tolerance ● Status and monitoring tools ● A clean abstraction for programmers
  • 12. 12 Lisp map function ● Input parameters: a function and a set of values ● This function is applied to each of the values. Lisp reduce function ● given a binary function and a set of values. ● It combines all the values together using the binary function. (map ‘length ‘(() (a) (ab) (abc))) (length(()) length(a) length(ab) length(abc)) (0 1 2 3) use the + (add) function to reduce the list (reduce #'+ '(0 1 2 3)) 6 Example
  • 14. 14 Instead of browsing the file sequentially, it is divided into chunks that are browsed in parallel. Example 1 : Principal
  • 15. 15 Calculate the total sales for the current year ? Solution
  • 16. 16 + ++ Instead of having one person cover the whole book we hire several ! A first group is called mappers the second is called reducers Divide the book in several parts and give one to each mapper .
  • 17. 17
  • 18. 18 (key , value) (key , values) Intermediate registration Results shuffle & sort
  • 21. 21 Input/output specification of the WC mapreduce job Input : a set of (key values) stored in files key: document ID value: a list of words as content of each document Output: a set of (key values) stored in files key: wordID value: word frequency appeared in all documents MapReduce function specification: map(String input_key, String input_value): reduce(String output_key, Iterator intermediate_values):
  • 23. 23
  • 25. 25 “MapReduce has been implemented in many programming languages and frameworks, such as Apache Hadoop, Pig, Hive, etc. “
  • 26. 26 Divides the work on mappers and reducers runs on each node to execute the real mapreduce tasks Brief introduction for later use mapReduce daemons
  • 27. Demo Code 1 05 27 Sum array elements using mapReduce
  • 28. 28 Map: Split the array of 1000 elements into 10 small data chunks (each chunk will have 100 elements) Each chunk will be processed by a separate thread concurrently. We will have 10 threads and each thread will iterate 100 elements to produce the sum of those 100 elements. Reducer: takes the output of these 10 threads and will be summed again to produce the final output. Sum array elements using mapReduce with java
  • 29. 29 Project structure Main Call map task and Reduce Task to perform mapReduce fn Environnement
  • 30. 30 create thread pool of 10 save each task of each chunk in queue split array of 1k into chunks each of 100 save map result of each chunk into mapOutput
  • 31. 31 getoutput of map and aggregate results For each element in mapOut( the result from previous map) source code link : https://github.com/HabibaAbderrahim/thread_mapReduce
  • 32. Demo Code 2 32 Words count using Hadoop framework
  • 33. 33 Environnement Pseudo Distributed environment PS : This is a pseudo environment that simulate a fully distributed environment since we have one server / one pc java should be installed create hadoop sudo user install hadoop for the official website check hadoop is installed version : 3.2.1
  • 34. 34 Environnement Pseudo Distributed environment Files configuration version : 3.2.1 java home and hadoop home add java path HDFS : hadoop file system HDFS configuration : namenode/datanode/replication mapReduce configuration mapReduce runs on Yarn Verify Hadoop daemons
  • 35. 35 We decided to work with python just to test hadoop streaming Features Environnement version : 3.2.1 version : 3.5.1 word count using mapReduce in Hadoop with python
  • 37. 37 Environnement version : 3.2.1 version : 3.5.1 see what is inside our file data.txt Words count in data.txt MapReduce sort results alphabetic
  • 39. The ideas, concepts and diagrams are taken from the following websites: ● http://www.metz.supelec.fr/metz/personnel/vialle/course/BigData-2A-CS/poly- pdf/Poly-chap6.pdf ● https://sites.cs.ucsb.edu/~tyang/class/240a17/slides/CS240TopicMapReduce. pdf ● https://fr.slideshare.net/LiliaSfaxi/bigdatachp2-hadoop-mapreduce ● https://algodaily.com/lessons/what-is-mapreduce-and-how-does-it-work [References] 39
  • 40. Thanks! Do you have any questions? 40

Editor's Notes

  1. should not be confused with Multiprocessing in where multiple processors or cores are working on solving different tasks, instead of parts of the same task as in parallel processing.
  2. before driven into detail , take a moment and ask yourself what does
  3. » Functional programming meets distributed computing » A batch data processing system
  4. Traditional approach In this approach we will iterate each element in an array and will add it to produce final sum.