MAP-REDUCE
RECIPES IN WITH C#
Erik LeBel
elebel@pyxis-tech.com
WHAT IS MAP-REDUCE?

© Pyxis Technologies inc.

Outlined in a paper published by google in
2004
In tandem with the google file system (also
described in a white paper circa 2004) make
crunching of large data possible across multiple
systems
Is not very complicated or unique to google
Is how we define two operations in a data
processing pipeline
WHY SHOULD I CARE?

© Pyxis Technologies inc.

Is a tool to have under your belt
Is useful for analyzing large volumes of data
of various varieties
WHAT IS IT USED FOR?
Data transformations

© Pyxis Technologies inc.

 Searches
 Indexing
 Restructuring data
 Data aggregation
 Anything that would traditionally been solved by
ETL
SO ABOUT THIS MAP REDUCE?

© Pyxis Technologies inc.

A two operation pattern for processing
It runs as a series of jobs
THE MAPPING OPERATION

© Pyxis Technologies inc.

 Extracts data from source files
 Can be run in parallel in multiple processes and
systems
 Emits a collection of data associated with a key
 The key will be used shuffle the data and join the
results later
THE REDUCE OPERATION

© Pyxis Technologies inc.

 The reduce function is applied to each group of
data based on the keys
 Typically a reduce function returns a list of value
for a specified key
 When gathered together, the results of all reduce
operations make up the result of the Map -Reduce
job
SOLVING
A PROBLEM

Word
Count
WORD COUNT (SIMPLE)

© Pyxis Technologies inc.

Map

shuffle

(a, 3)

(b, 1), (b, 1)

Reduce (b)

(b, 2)

(c, 1)

abacab

(a, 1)
(b, 1)
(a, 1)
(c, 1)
(a, 1)
(b, 1)

Reduce (a)

Reduce (c)

(c, 1)

Combiner

(a, 1), (a, 1),
(a1)

(a, 3)
(b, 2)
(c, 1)
DISTRIBUTED WORD COUNT

abbacc

© Pyxis Technologies inc.

babbcc

Map

Map

(a, 1)
(b, 1)
(b, 1)
(a, 1)
(c, 1)
(c, 1)

Map

(b, 1)
(a, 1)
(b, 1)
(b, 1)
(c, 1)
(c, 1)

(a, 1), (a, 1)..

shuffle

Reduce (a)

(a, 6)

(b, 1), (b, 1)..

Reduce (b)

(b, 7)

(c, 1), (c, 1)..

Reduce (c)

(c, 5)

Combiner

abacab

(a, 1)
(b, 1)
(a, 1)
(c, 1)
(a, 1)
(b, 1)

(a, 6)
(b, 7)
(c, 5)
AND NOW FOR SOME
CODE
Erik LeBel
elebel@pyxis-tech.com
Questions?
T h a n k yo u !

Titre sur mesure

POINTS FORTS
1

pyxis-tech.com
LIVE MAP REDUCE (10
PARTICIPANTS)
Mappers (4)

Queues (2)

Reducers (2)

Combiners (1)

© Pyxis Technologies inc.

Summarize
data
Keep buffer
Split red and
black

Count by suit

Result (1)

Store result

MapMap-Reduce recipes in with c#

  • 1.
    MAP-REDUCE RECIPES IN WITHC# Erik LeBel elebel@pyxis-tech.com
  • 2.
    WHAT IS MAP-REDUCE? ©Pyxis Technologies inc. Outlined in a paper published by google in 2004 In tandem with the google file system (also described in a white paper circa 2004) make crunching of large data possible across multiple systems Is not very complicated or unique to google Is how we define two operations in a data processing pipeline
  • 3.
    WHY SHOULD ICARE? © Pyxis Technologies inc. Is a tool to have under your belt Is useful for analyzing large volumes of data of various varieties
  • 4.
    WHAT IS ITUSED FOR? Data transformations © Pyxis Technologies inc.  Searches  Indexing  Restructuring data  Data aggregation  Anything that would traditionally been solved by ETL
  • 5.
    SO ABOUT THISMAP REDUCE? © Pyxis Technologies inc. A two operation pattern for processing It runs as a series of jobs
  • 6.
    THE MAPPING OPERATION ©Pyxis Technologies inc.  Extracts data from source files  Can be run in parallel in multiple processes and systems  Emits a collection of data associated with a key  The key will be used shuffle the data and join the results later
  • 7.
    THE REDUCE OPERATION ©Pyxis Technologies inc.  The reduce function is applied to each group of data based on the keys  Typically a reduce function returns a list of value for a specified key  When gathered together, the results of all reduce operations make up the result of the Map -Reduce job
  • 8.
  • 9.
    WORD COUNT (SIMPLE) ©Pyxis Technologies inc. Map shuffle (a, 3) (b, 1), (b, 1) Reduce (b) (b, 2) (c, 1) abacab (a, 1) (b, 1) (a, 1) (c, 1) (a, 1) (b, 1) Reduce (a) Reduce (c) (c, 1) Combiner (a, 1), (a, 1), (a1) (a, 3) (b, 2) (c, 1)
  • 10.
    DISTRIBUTED WORD COUNT abbacc ©Pyxis Technologies inc. babbcc Map Map (a, 1) (b, 1) (b, 1) (a, 1) (c, 1) (c, 1) Map (b, 1) (a, 1) (b, 1) (b, 1) (c, 1) (c, 1) (a, 1), (a, 1).. shuffle Reduce (a) (a, 6) (b, 1), (b, 1).. Reduce (b) (b, 7) (c, 1), (c, 1).. Reduce (c) (c, 5) Combiner abacab (a, 1) (b, 1) (a, 1) (c, 1) (a, 1) (b, 1) (a, 6) (b, 7) (c, 5)
  • 11.
    AND NOW FORSOME CODE
  • 12.
    Erik LeBel elebel@pyxis-tech.com Questions? T ha n k yo u ! Titre sur mesure POINTS FORTS 1 pyxis-tech.com
  • 13.
    LIVE MAP REDUCE(10 PARTICIPANTS) Mappers (4) Queues (2) Reducers (2) Combiners (1) © Pyxis Technologies inc. Summarize data Keep buffer Split red and black Count by suit Result (1) Store result