Mi primer map reduce

488 views

Published on

Charla sobre big data y map reduce por Rubén Orta.

Published in: Technology, News & Politics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
488
On SlideShare
0
From Embeds
0
Number of Embeds
57
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Mi primer map reduce

  1. 1. Mi Primer Map/Reduce Rubén Orta @agileando
  2. 2. 1 historia 2 implementación 3 netflix prize en python 4 enlaces
  3. 3. 1 Big Data = Contar
  4. 4. CON TAR 1
  5. 5. 1 Jeff Dean Sanjay Ghemawat
  6. 6. map (key , value) new_value = a_function(value) return new_key, new_value 2 reduce (key, value) new_value = another_function(value) return key, new_value
  7. 7. Dataset: 2 Millones de páginas web Map f() f() f() f() f() f’() f’() f’() f’() f’() for each word in document: return (word, 1); Reduce total = 0 for each item in value: total++ return (key, total);
  8. 8. 2
  9. 9. 3
  10. 10. import mincemeat 3 data = dict((f, read_data(f)) for f in data_files) s = mincemeat.Server() s.datasource = data s.mapfn = mapfn s.reducefn = reducefn results = s.run_server (password = "ruben")
  11. 11. def mapfn(key, value): lines = value.splitlines() film_id = lines[0][:-1] for line in lines[1:]: items = line.split(",") user_id = items[0] rating = items[1] date = items[2] yield user_id, film_id 3
  12. 12. def reducefn(key, values): number_of_films = 0 for value in values: number_of_films += 1 return number_of_films 3
  13. 13. Papers 4 GFS MapReduce BigTable http://research.google.com/archive/gfs.html http://research.google.com/archive/mapreduce.html http://research.google.com/archive/bigtable.html Dynamo http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf Dremel Spanner http://research.google.com/pubs/pub36632.html http://research.google.com/archive/spanner.html Python MinceMeat.py https://github.com/michaelfairley/mincemeatpy Octo.py http://code.google.com/p/octopy/ Netflix DataSet http://www.lifecrunch.biz/archives/207
  14. 14. Rubén Orta http://www.slideshare.net/agileando/mi-primer-map-reduce Blog Twitter GitHub http://devspoke.com/ https://twitter.com/agileando https://github.com/rubenorta 4
  15. 15. BUSCAMOS GENTE PARA NUESTRO EQUIPO ¿Quieres unirte? *unix, scripting (python, perl) devops

×