VanPyz, June 2, 2009




Introduction to MapReduce
using Disco
Erlang and Python


by @JimRoepcke



                                       1
Computing at Google Scale
                                Image Source: http://ischool.tv/news/files/2006/12/computer-grid02s.jpg



Massive databases and data
streams need to be processed
quickly and reliably
Thousands of commodity PCs
available in Google’s cluster
for computations
Faults are statistically
“guaranteed” to occur

                                                                                                         2
Google’s Motivation

Google has thousands of programs to process user-
generated data
Even simple computations were being obscured by the
complex code required to run efficiently and reliably on
their clusters.
Engineers shouldn’t have to be experts in distributed
systems to write scalable data-processing software.


                                                          3
Why not just use threads?

Threads only add concurrency, only on one node
Does not scale to > 1 node, a cluster, or a cloud
Coordinating work between nodes requires distribution
middleware
MapReduce is distribution middleware
MapReduce scales linearly with cores / nodes


                                                        4
Hadoop


Apache Foundation project

Written in Java

Includes the Hadoop Distributed File System




                                              5
Disco

Created by Ville Tuulos of the Nokia Research Center

Written in Erlang and Python

Does not include a distributed File System

  Provide your own data distribution mechanism



                                                       6
How MapReduce works



                      7
The big scary diagram...
Source: http://labs.google.com/papers/mapreduce-osdi04.pdf




                                                    User
                                                  Program
                                  (1) fork
                                                  (1) fork          (1) fork



                                                   Master
                                                                        (2)
                                         (2)                          assign
                                       assign                         reduce
                                        map

                     worker
split 0                                                                                             (6) write
                                                                                                                output
split 1                                                                                    worker               file 0
                                                             (5) remote read
split 2   (3) read            (4) local write
                     worker                                                                                     output
split 3                                                                                    worker
                                                                                                                file 1
split 4


                     worker


Input                Map                   Intermediate files                              Reduce               Output
 files               phase                  (on local disks)                                phase                files

                                                                                                                         9

                                 Figure 1: Execution overview
It’s truly very simple...
Master splits input


 The (typically huge) input is split into chunks

   One or more for each “map worker”




                                                   11
Splits fed to map workers

The master tells each map worker which split(s) it will
process

  A split is a file containing some number of input
  records

  Each record has a key and its associated value



                                                          12
Map each input


The map worker executes your problem-specific map
algorithm

  Called once for each record in its input




                                                   13
Map emits (Key,Value) pairs

 Your map algorithm emits zero or more intermediate
 key-value pairs for each record processed

   Let’s call these “(K,V) pairs” from now on

   Keys and values are both strings




                                                      14
(K,V) Pairs hashed to buckets
 Each map worker has its own set of buckets

 Each (K,V) pair is placed into one of these buckets

 Which bucket is determined by a hash function


 Advanced: if you know the distribution of your
 intermediate keys is skewed, provide a custom hash
 function that distributes (K,V) pairs evenly

                                                       15
Buckets sent to Reducers
Once all map workers are finished, corresponding
buckets of (K,V) pairs are sent to reduce workers

Example: Each map worker placed (K,V) pairs into its
own buckets A, B, and C.

Send bucket A from each map to reduce worker 1;
Send bucket B from each map to reduce worker 2;
Send bucket C from each map to reduce worker 3.

                                                       16
Reduce inputs sorted
The reduce worker first concatenates the buckets it
received into one file

Then the file of (K,V) pairs is sorted by K

  Now the (K,V) pairs are grouped by key

This sorted list of (K,V) pairs is the input to the reduce
worker


                                                             17
Reduce the list of (K,V) pairs

 The reduce worker executes your problem-specific
 reduce algorithm

   Called once for each key in its input

   Writes whatever it wants to its output file




                                                   18
Output

The output of the MapReduce job is the set of output
files generated by the reduce workers

What you do with this output is up to you

You might use this output as the input to another
MapReduce job



                                                       19
Modified from source: http://labs.google.com/papers/mapreduce-osdi04.pdf




Example: Counting words
    def map (key, value):
       # key: document name (ignored)
       # value: words in document (list)
       for word in value:
               EmitIntermediate(word, “1”)
    def reduce (key, values):
       # key: a word
       # values: a list of counts
       result = 0
       for v in values:
               result += int(v)
       print key, result
                                                                                    20
Stand up! Let’s do it!


 Organize yourselves into approximately equal numbers
 of map and reduce workers

 I’ll be the master
Disco demonstration
Wanted to demonstrate a cool
puzzle solver.

No go, but I can show the code.
It’s really simple!

Instead you get count_words again,
but scaled way up!

python count_words.py
disco://localhost

Introduction to MapReduce using Disco

  • 1.
    VanPyz, June 2,2009 Introduction to MapReduce using Disco Erlang and Python by @JimRoepcke 1
  • 2.
    Computing at GoogleScale Image Source: http://ischool.tv/news/files/2006/12/computer-grid02s.jpg Massive databases and data streams need to be processed quickly and reliably Thousands of commodity PCs available in Google’s cluster for computations Faults are statistically “guaranteed” to occur 2
  • 3.
    Google’s Motivation Google hasthousands of programs to process user- generated data Even simple computations were being obscured by the complex code required to run efficiently and reliably on their clusters. Engineers shouldn’t have to be experts in distributed systems to write scalable data-processing software. 3
  • 4.
    Why not justuse threads? Threads only add concurrency, only on one node Does not scale to > 1 node, a cluster, or a cloud Coordinating work between nodes requires distribution middleware MapReduce is distribution middleware MapReduce scales linearly with cores / nodes 4
  • 5.
    Hadoop Apache Foundation project Writtenin Java Includes the Hadoop Distributed File System 5
  • 6.
    Disco Created by VilleTuulos of the Nokia Research Center Written in Erlang and Python Does not include a distributed File System Provide your own data distribution mechanism 6
  • 7.
  • 8.
    The big scarydiagram...
  • 9.
    Source: http://labs.google.com/papers/mapreduce-osdi04.pdf User Program (1) fork (1) fork (1) fork Master (2) (2) assign assign reduce map worker split 0 (6) write output split 1 worker file 0 (5) remote read split 2 (3) read (4) local write worker output split 3 worker file 1 split 4 worker Input Map Intermediate files Reduce Output files phase (on local disks) phase files 9 Figure 1: Execution overview
  • 10.
  • 11.
    Master splits input The (typically huge) input is split into chunks One or more for each “map worker” 11
  • 12.
    Splits fed tomap workers The master tells each map worker which split(s) it will process A split is a file containing some number of input records Each record has a key and its associated value 12
  • 13.
    Map each input Themap worker executes your problem-specific map algorithm Called once for each record in its input 13
  • 14.
    Map emits (Key,Value)pairs Your map algorithm emits zero or more intermediate key-value pairs for each record processed Let’s call these “(K,V) pairs” from now on Keys and values are both strings 14
  • 15.
    (K,V) Pairs hashedto buckets Each map worker has its own set of buckets Each (K,V) pair is placed into one of these buckets Which bucket is determined by a hash function Advanced: if you know the distribution of your intermediate keys is skewed, provide a custom hash function that distributes (K,V) pairs evenly 15
  • 16.
    Buckets sent toReducers Once all map workers are finished, corresponding buckets of (K,V) pairs are sent to reduce workers Example: Each map worker placed (K,V) pairs into its own buckets A, B, and C. Send bucket A from each map to reduce worker 1; Send bucket B from each map to reduce worker 2; Send bucket C from each map to reduce worker 3. 16
  • 17.
    Reduce inputs sorted Thereduce worker first concatenates the buckets it received into one file Then the file of (K,V) pairs is sorted by K Now the (K,V) pairs are grouped by key This sorted list of (K,V) pairs is the input to the reduce worker 17
  • 18.
    Reduce the listof (K,V) pairs The reduce worker executes your problem-specific reduce algorithm Called once for each key in its input Writes whatever it wants to its output file 18
  • 19.
    Output The output ofthe MapReduce job is the set of output files generated by the reduce workers What you do with this output is up to you You might use this output as the input to another MapReduce job 19
  • 20.
    Modified from source:http://labs.google.com/papers/mapreduce-osdi04.pdf Example: Counting words def map (key, value): # key: document name (ignored) # value: words in document (list) for word in value: EmitIntermediate(word, “1”) def reduce (key, values): # key: a word # values: a list of counts result = 0 for v in values: result += int(v) print key, result 20
  • 21.
    Stand up! Let’sdo it! Organize yourselves into approximately equal numbers of map and reduce workers I’ll be the master
  • 22.
    Disco demonstration Wanted todemonstrate a cool puzzle solver. No go, but I can show the code. It’s really simple! Instead you get count_words again, but scaled way up! python count_words.py disco://localhost