mapReduce.pptx

OUTLINE
Motivation
Sales exemples
words count exemple
1.wordcount in Hadoop
using python
2.Arraysum Demo using
Java
MapReduce daemons
in Hadoop
Big Data
Map
Reduce
INTRODUCTION
Parallel Processing
01 02
04
Task tracker
Job tracker
MapReduce
03
Demo code
05
2
Summary
Conclusion
06

Task is broken up to multiple parts with a software tool
and each part is distributed to a processor, then each
processor will perform the assigned part.
5
Finally, the parts are reassembled to deliver
the final solution or execute the task.

Reminder !
6
Multiprocessing
Parallel Processing is not

9
What is the proposed
solution to deal with
?

Motivation
10
• Motivations
● Large-scale data processing on clusters
● Massively parallel (hundreds or thousands of CPUs)
● Reliable execution with easy data access
• Functions
● Fault-tolerance
● Status and monitoring tools
● A clean abstraction for programmers

Inspired by LISP
Function
Programming
Map
11
Reduce

12
Lisp map function
● Input parameters: a function and a set of values
● This function is applied to each of the values.
Lisp reduce function
● given a binary function and a set of values.
● It combines all the values together using the
binary function.
(map ‘length ‘(() (a) (ab) (abc)))
(length(()) length(a) length(ab)
length(abc))
(0 1 2 3)
use the + (add) function to reduce the
list
(reduce #'+ '(0 1 2 3))
6
Example

14
Instead of browsing the file sequentially, it is divided into chunks that are browsed in
parallel.
Example 1 :
Principal

15
Calculate the total sales for the current year ?
Solution

16
+
++
Instead of having one person
cover the whole book
we hire several !
A first group is called mappers
the second is called reducers
Divide the book in several parts
and give one to each mapper .

18
(key , value)
(key , values)
Intermediate registration
Results
shuffle & sort

The Famous
words count
example
02
19

21
Input/output specification of the WC mapreduce job
Input : a set of (key values) stored in files
key: document ID
value: a list of words as content of each document
Output: a set of (key values) stored in files
key: wordID
value: word frequency appeared in all documents
MapReduce function specification:
map(String input_key, String input_value):
reduce(String output_key, Iterator intermediate_values):

MapReduce
Daemons in
Hadoop
03
24

25
“MapReduce has been implemented in many
programming languages and frameworks, such
as Apache Hadoop, Pig, Hive, etc. “

26
Divides the work on mappers
and reducers
runs on each node to execute
the real mapreduce tasks
Brief introduction for later use
mapReduce daemons

Demo Code
1
05
27
Sum array elements using mapReduce

28
Map: Split the array of 1000
elements into 10 small data
chunks (each chunk will have 100
elements)
Each chunk will be processed by a
separate thread concurrently.
We will have 10 threads and each
thread will iterate 100 elements to
produce the sum of those 100
elements.
Reducer: takes the output of
these 10 threads and will be
summed again to produce the
final output.
Sum array elements using mapReduce with java

29
Project structure Main
Call map task and Reduce Task to
perform mapReduce fn
Environnement

30
create thread pool of 10
save each task
of each chunk
in queue
split array of 1k into
chunks each of
100
save map result
of each chunk
into mapOutput

31
getoutput of map and
aggregate results
For each element
in mapOut(
the result from
previous map)
source code link : https://github.com/HabibaAbderrahim/thread_mapReduce

Demo Code
2
32
Words count using Hadoop framework

33
Environnement
Pseudo Distributed environment
PS : This is a pseudo environment that simulate a fully distributed environment since
we have one server / one pc
java should be installed
create hadoop
sudo user
install hadoop
for the official
website
check hadoop
is installed
version : 3.2.1

34
Environnement
Pseudo Distributed environment
Files configuration
version : 3.2.1
java home and hadoop home
add java path
HDFS : hadoop file system
HDFS configuration : namenode/datanode/replication
mapReduce configuration
mapReduce runs on Yarn
Verify Hadoop daemons

35
We decided to work with python
just to test hadoop
streaming Features
Environnement
version : 3.2.1
version : 3.5.1
word count using mapReduce in Hadoop with python

36
Environnement
version : 3.2.1
version : 3.5.1
Mapper
Reducer

37
Environnement
version : 3.2.1
version : 3.5.1
see what is inside our file
data.txt
Words count in
data.txt
MapReduce
sort results alphabetic

The ideas, concepts and diagrams are taken from the following websites:
● http://www.metz.supelec.fr/metz/personnel/vialle/course/BigData-2A-CS/poly-
pdf/Poly-chap6.pdf
● https://sites.cs.ucsb.edu/~tyang/class/240a17/slides/CS240TopicMapReduce.
pdf
● https://fr.slideshare.net/LiliaSfaxi/bigdatachp2-hadoop-mapreduce
● https://algodaily.com/lessons/what-is-mapreduce-and-how-does-it-work
[References]
39

Thanks!
Do you have any questions?
40

mapReduce.pptx

Recommended

Recommended

More Related Content

Similar to mapReduce.pptx

Similar to mapReduce.pptx (20)

Recently uploaded

Recently uploaded (20)

mapReduce.pptx

Editor's Notes