Map Reduce In 5 Minutes

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite

    Map Reduce In 5 Minutes - Presentation Transcript

    1. MapReduce
    2. MapReduce reduced
    3. PetaMengurangi
    4. * PetaMengurangi *google translate
    5. the
problem
    6. lots
of
data
    7. e.g.
 the
entire
interwebs
    8. single
computer not
going
to
work
    9. lots
of
computers
    10. we
have
that
    11. cluster
programming
    12. cluster
programming =
suck
    13. MapReduce
    14. makes
the
pain
go
away
    15. 2
main
stages
    16. map
    17. map process
data
on
hosts
    18. reduce
    19. reduce summarise
the
results
    20. example count
words
on
lines
    21. >>>
reduce(operator.add,
map(countWords,
lines))
    22. >>>
reduce(operator.add,
map(countWords,
lines))
    23. >>>
reduce(operator.add,
map(countWords,
lines))
    24. except
in
this
case
    25. lots
of
machines
    26. typical
cluster
    27. O(103)
machines each
2‐8Gb
RAM local
IDE
disks
    28. GFS
distributes
the
data
    29. process
data
on
hosts summarise
results
    30. split
data
into
chunks process
data
on
hosts summarise
results
    31. split
data
into
chunks allocate
machines process
data
on
hosts summarise
results
    32. split
data
into
chunks allocate
machines start
processes process
data
on
hosts summarise
results
    33. split
data
into
chunks allocate
machines start
processes send
data
to
mappers process
data
on
hosts summarise
results
    34. split
data
into
chunks allocate
machines start
processes send
data
to
mappers process
data
on
hosts monitor
hosts summarise
results
    35. split
data
into
chunks allocate
machines start
processes send
data
to
mappers process
data
on
hosts monitor
hosts send
results
to
reducers summarise
results
    36. split
data
into
chunks allocate
machines start
processes send
data
to
mappers process
data
on
hosts monitor
hosts redo
failed
and
stragglers send
results
to
reducers summarise
results
    37. split
data
into
chunks allocate
machines start
processes send
data
to
mappers process
data
on
hosts monitor
hosts redo
failed
and
stragglers send
results
to
reducers summarise
results output
final
results
    38. MapReduce
does
the
 yukky
stuff
    39. split
data
into
chunks allocate
machines start
processes send
data
to
mappers MapReduce process
data
on
hosts monitor
hosts programmer redo
failed
and
stragglers send
results
to
reducers summarise
results output
final
results
    40. handles
failures
    41. handles
stragglers
    42. a
vanity
search
    43. %
of
refs
to
Anthony
    44. %
of
refs
to
Anthony Baxter
    45. count(‘Anthony
Baxter’) count(‘Anthony’)
    46. C++
library
    47. ...
with
Python
bindings,
 yay!
    48. class
AnthonyMapper(mrpython.Mapper): 



def
Map(self,
map_input): 







meCount
=
otherCount
=
0 







docId
=
map_input.key()
#
ignored
‐
doc
id 







src
=
map_input.value()
#
document
source 







text
=
ExtractText(src).split() 







seenAnthony
=
False 







for
word
in
text: 











if
not
seenAnthony: 















if
word.lower()
==
'anthony': 



















seenAnthony
=
True 











else: 















if
word.lower()
==
'baxter': 



















meCount
+=
1 















else: 



















otherCount
+=
1
 















seenAnthony
=
False 







yield
'me',
meCount 







yield
'other',
otherCount
    49. class
AnthonyReducer(mrpython.Reducer): 



def
Reducer(self,
reduce_input): 







'''
Passed
a
key
(either
'me'
or
'other')
and
a
list 











of
counts.
Adds
the
counts
and
returns
them. 







''' 







count
=
0 







for
val
in
reduce_input.values(): 











sum
+=
int(val) 







yield
count
    50. the
result:
    51. the
result: about
1
in
4000
    52. other
uses
for
MapReduce
    53. web
link
graphs
    54. access
logs
    55. text
analysis
    56. google
news
clustering
    57. local
search
    58. road
traffic
    59. take
speed
samples
    60. group
by
road
segment
    61. take
the
average
    62. once
per
minute
    63. output
to
a
map
layer
    64. limitation:
availability
of
 data
    65. MapReduce
is
pretty
cool
    66. for
more
information
    67. “mapreduce
paper” “gfs
paper” “google
papers”
    68. if
you’d
like
to
play
    69. hadoop.apache.org
    70. open
source
    71. java
implementation
    72. HDFS
    SlideShare Zeitgeist 2009

    + Harisfazillah JamelHarisfazillah Jamel Nominate

    custom

    626 views, 1 favs, 0 embeds more stats

    Map Reduce In 5 Minutes

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 626
      • 626 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 15
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories