MapReduce Intro                  The MapReduce Programming Model                         Introduction and Examples        ...
MapReduce Intro      1   MapReduce in a nutshell      2   Thinking in MapReduce      3   Applying MapReduce      4   Succe...
MapReduce Intro  MapReduce in a nutshell Features      A programming model...         1   Large-scale distributed data pro...
MapReduce Intro  MapReduce in a nutshell Antecedents      Functional programming         1   Inspired         2   ...but n...
MapReduce Intro  MapReduce in a nutshell Antecedents      Functional programming         1   Inspired         2   ...but n...
MapReduce Intro  MapReduce in a nutshell ...Other examples...      Example in Python      “Return the sum of the squares o...
MapReduce Intro  MapReduce in a nutshell Some interesting points...      The Map Reduce framework...         1   Inspired ...
MapReduce Intro  MapReduce in a nutshell Basic Model      “MapReduce: The Programming Model and Practice”, SIGMETRICS, Tur...
MapReduce Intro  MapReduce in a nutshell Map Function      Figure: Mapping creates a new output list by applying a functio...
MapReduce Intro  MapReduce in a nutshell Reduce Function      Figure: Reducing a list iterates over the input values to pr...
MapReduce Intro  MapReduce in a nutshell MapReduce Flow                              Figure: High-level MapReduce pipeline...
MapReduce Intro  MapReduce in a nutshell MapReduce Flow                       Figure: Detailed Hadoop MapReduce data flow. ...
MapReduce Intro  MapReduce in a nutshell Tip      What is MapReduce?      It is a framework inspired in functional program...
MapReduce Intro  Thinking in MapReduce When should I use MapReduce?      Query              Index and Search: inverted ind...
MapReduce Intro  Thinking in MapReduce When should I use MapReduce?      Query              Index and Search: inverted ind...
MapReduce Intro  Thinking in MapReduce When should I use MapReduce?      Query              Index and Search: inverted ind...
MapReduce Intro  Thinking in MapReduce How Google uses MapReduce (80% of data processing)             Large-scale web sear...
MapReduce Intro  Thinking in MapReduce Comparison of MapReduce and other approaches      “MapReduce: The Programming Model...
MapReduce Intro  Thinking in MapReduce Evaluation of MapReduce and other approaches      “MapReduce: The Programming Model...
MapReduce Intro  Thinking in MapReduce Apache Hadoop   MapReduce definition   The Apache Hadoop software   library is a fra...
MapReduce Intro  Thinking in MapReduce Tip      What can I do in MapReduce?      Three main functions:         1   Queryin...
MapReduce Intro  Applying MapReduce MapReduce in Action      MapReduce Patterns         1   Summarization         2   Filt...
MapReduce Intro  Applying MapReduce Overview (stages)-Counting Letters                                      23 / 61
MapReduce Intro  Applying MapReduce Summarization      Types         1   Numerical summarizations         2   Inverted ind...
MapReduce Intro  Applying MapReduce Numerical Summarization-I      Description      A general pattern for calculating aggr...
MapReduce Intro  Applying MapReduce Numerical Summarization-II      Applicability          To deal with numerical data or ...
MapReduce Intro  Applying MapReduce Numerical Summarization-Pseudocode        class Mapper          method Map(recordid id...
MapReduce Intro  Applying MapReduce Overview-Word Counter                         28 / 61
MapReduce Intro  Applying MapReduce Numerical Summarization-Word Counter             §                                    ...
MapReduce Intro  Applying MapReduce Example-II      Min/Max      Given a list of tweets (username, date, text) determine fi...
MapReduce Intro  Applying MapReduce Overview - Min/Max      ∗ Min and max creation date are the same in the map phase.    ...
MapReduce Intro  Applying MapReduce Example II-Min/Max, function Map             §                                        ...
MapReduce Intro  Applying MapReduce Example II-Min/Max, function Reduce             §                                     ...
MapReduce Intro  Applying MapReduce Example-III      Average      Given a list of tweets (username, date, text) determine ...
MapReduce Intro  Applying MapReduce Overview - Average                       35 / 61
MapReduce Intro  Applying MapReduce Example III-Average, function Map             §                                       ...
MapReduce Intro  Applying MapReduce Example III-Average, function Reduce             §                                    ...
MapReduce Intro  Applying MapReduce Numerical Summarization-Other approaches      Relation to SQL             §           ...
MapReduce Intro  Applying MapReduce Numerical Summarization-Other approaches      Relation to SQL             §           ...
MapReduce Intro  Applying MapReduce Filtering      Types         1   Filtering         2   Top N records         3   Bloom...
MapReduce Intro  Applying MapReduce Filtering-I      Description      It evaluates each record separately and decides, bas...
MapReduce Intro  Applying MapReduce Filtering-II      Applicability      To collate data      Examples          1   Closer...
MapReduce Intro  Applying MapReduce Filtering-Pseudocode      class Mapper         method Map(recordid id, record r)      ...
MapReduce Intro  Applying MapReduce Example-IV      Distributed Grep      Given a list of tweets (username, date, text) de...
MapReduce Intro  Applying MapReduce Overview - Distributed Grep                               45 / 61
MapReduce Intro  Applying MapReduce Example IV-Distributed Grep, function Map               §                             ...
MapReduce Intro  Applying MapReduce Example-V      Top 5      Given a list of tweets (username, date, text) determine the ...
MapReduce Intro  Applying MapReduce Overview - Top 5                       48 / 61
MapReduce Intro  Applying MapReduce Example V-Top 5, function Map             §                                           ...
MapReduce Intro  Applying MapReduce Example V-Top 5, function Reduce             §                                        ...
MapReduce Intro  Applying MapReduce Filtering-Other approaches      Relation to SQL             §                         ...
MapReduce Intro  Applying MapReduce Filtering-Other approaches      Relation to SQL             §                         ...
MapReduce Intro  Applying MapReduce Tip      How can I use and run a MapReduce framework?      You should identify what ki...
MapReduce Intro  Success Stories with MapReduce Tip      Who is using MapReduce?      All companies that are dealing with ...
MapReduce Intro  Success Stories with MapReduce Apache Hadoop-Related Projects                                   55 / 61
MapReduce Intro  Success Stories with MapReduce More tips      FAQ             MapReduce is a framework based on a simple ...
MapReduce Intro  Summary and Conclusions Summary                            57 / 61
MapReduce Intro  Summary and Conclusions Conclusions      What is MapReduce?      It is a framework inspired in functional...
MapReduce Intro  Summary and Conclusions Conclusions      What is MapReduce?      It is a framework inspired in functional...
MapReduce Intro  Summary and Conclusions Conclusions      What is MapReduce?      It is a framework inspired in functional...
MapReduce Intro  Summary and Conclusions What’s next?      ...             Concatenate MapReduce jobs             Optimiza...
MapReduce Intro  References               J. Dean and S. Ghemawat.               MapReduce: simplified data processing on l...
MapReduce Intro  References               Volume: Tutorial Abstracts, NAACL-Tutorials ’09, pages 1–2,               Stroud...
MapReduce Intro  References               Morgan Kaufmann Publishers Inc., San Francisco, CA, USA,               2005.    ...
Upcoming SlideShare
Loading in …5
×

Map/Reduce intro

7,673
-1

Published on

Some slides about the Map/Reduce programming model (academic purposes) adapting some examples of the book Map/Reduce design patterns.

Special thanks to the next authors:

-http://shop.oreilly.com/product/0636920025122.do
-http://mapreducepatterns.com/index.php?title=Main_Page
-http://highlyscalable.wordpress.com/2012/02/01/mapreduce-patterns/

Published in: Technology
1 Comment
14 Likes
Statistics
Notes
  • http://dbmanagement.info/Tutorials/MapReduce.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
7,673
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
351
Comments
1
Likes
14
Embeds 0
No embeds

No notes for slide

Map/Reduce intro

  1. 1. MapReduce Intro The MapReduce Programming Model Introduction and Examples Dr. Jose Mar´ Alvarez-Rodr´ ıa ıguez “Quality Management in Service-based Systems and Cloud Applications” FP7 RELATE-ITN South East European Research Center Thessaloniki, 10th of April, 2013 1 / 61
  2. 2. MapReduce Intro 1 MapReduce in a nutshell 2 Thinking in MapReduce 3 Applying MapReduce 4 Success Stories with MapReduce 5 Summary and Conclusions 2 / 61
  3. 3. MapReduce Intro MapReduce in a nutshell Features A programming model... 1 Large-scale distributed data processing 2 Simple but restricted 3 Paralell programming 4 Extensible 3 / 61
  4. 4. MapReduce Intro MapReduce in a nutshell Antecedents Functional programming 1 Inspired 2 ...but not equivalent Example in Python “Given a list of numbers between 1 and 50 print only even numbers” § ¤ print filter ( lambda x : x % 2 == 0 , range (1 , 50) ) ¦ ¥ A list of numbers (data) A condition (even numbers) A function filter that is applied to the list (map) 4 / 61
  5. 5. MapReduce Intro MapReduce in a nutshell Antecedents Functional programming 1 Inspired 2 ...but not equivalent Example in Python “Given a list of numbers between 1 and 50 print only even numbers” § ¤ print filter ( lambda x : x % 2 == 0 , range (1 , 50) ) ¦ ¥ A list of numbers (data) A condition (even numbers) A function filter that is applied to the list (map) 5 / 61
  6. 6. MapReduce Intro MapReduce in a nutshell ...Other examples... Example in Python “Return the sum of the squares of a list of numbers between 1 and 50” § ¤ import operator reduce ( operator . add , map (( lambda x : x **2) , range (1 ,50) ) , 0) ¦ ¥ “reduce” is equivalent to “foldl” in other func. languages as Haskell other math considerations should be taken into account (kind of operator)... 6 / 61
  7. 7. MapReduce Intro MapReduce in a nutshell Some interesting points... The Map Reduce framework... 1 Inspired in functional programming concepts (but not equivalent) 2 Problems that can be paralellized 3 Sometimes recursive solutions 4 ... 7 / 61
  8. 8. MapReduce Intro MapReduce in a nutshell Basic Model “MapReduce: The Programming Model and Practice”, SIGMETRICS, Turorials 2009, Google. 8 / 61
  9. 9. MapReduce Intro MapReduce in a nutshell Map Function Figure: Mapping creates a new output list by applying a function to individual elements of an input list. “Module 4: MapReduce”, Hadoop Tutorial, Yahoo!. 9 / 61
  10. 10. MapReduce Intro MapReduce in a nutshell Reduce Function Figure: Reducing a list iterates over the input values to produce an aggregate value as output. “Module 4: MapReduce”, Hadoop Tutorial, Yahoo!. 10 / 61
  11. 11. MapReduce Intro MapReduce in a nutshell MapReduce Flow Figure: High-level MapReduce pipeline. “Module 4: MapReduce”, Hadoop Tutorial, Yahoo!. 11 / 61
  12. 12. MapReduce Intro MapReduce in a nutshell MapReduce Flow Figure: Detailed Hadoop MapReduce data flow. 12 / 61
  13. 13. MapReduce Intro MapReduce in a nutshell Tip What is MapReduce? It is a framework inspired in functional programming to tackle problems in which steps can be paralellized applying a divide and conquer approach. 13 / 61
  14. 14. MapReduce Intro Thinking in MapReduce When should I use MapReduce? Query Index and Search: inverted index Filtering Classification Recommendations: clustering or collaborative filtering Analytics Summarization and statistics Sorting and merging Frequency distribution SQL-based queries: group-by, having, etc. Generation of graphics: histograms, scatter plots. Others Message passing such as Breadth First-Search or PageRank algorithms. 14 / 61
  15. 15. MapReduce Intro Thinking in MapReduce When should I use MapReduce? Query Index and Search: inverted index Filtering Classification Recommendations: clustering or collaborative filtering Analytics Summarization and statistics Sorting and merging Frequency distribution SQL-based queries: group-by, having, etc. Generation of graphics: histograms, scatter plots. Others Message passing such as Breadth First-Search or PageRank algorithms. 15 / 61
  16. 16. MapReduce Intro Thinking in MapReduce When should I use MapReduce? Query Index and Search: inverted index Filtering Classification Recommendations: clustering or collaborative filtering Analytics Summarization and statistics Sorting and merging Frequency distribution SQL-based queries: group-by, having, etc. Generation of graphics: histograms, scatter plots. Others Message passing such as Breadth First-Search or PageRank algorithms. 16 / 61
  17. 17. MapReduce Intro Thinking in MapReduce How Google uses MapReduce (80% of data processing) Large-scale web search indexing Clustering problems for Google News Produce reports for popular queries, e.g. Google Trend Processing of satellite imagery data Language model processing for statistical machine translation Large-scale machine learning problems ... 17 / 61
  18. 18. MapReduce Intro Thinking in MapReduce Comparison of MapReduce and other approaches “MapReduce: The Programming Model and Practice”, SIGMETRICS, Turorials 2009, Google. 18 / 61
  19. 19. MapReduce Intro Thinking in MapReduce Evaluation of MapReduce and other approaches “MapReduce: The Programming Model and Practice”, SIGMETRICS, Turorials 2009, Google. 19 / 61
  20. 20. MapReduce Intro Thinking in MapReduce Apache Hadoop MapReduce definition The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets Figure: Apache Hadoop Logo. across clusters of computers using simple programming models. 20 / 61
  21. 21. MapReduce Intro Thinking in MapReduce Tip What can I do in MapReduce? Three main functions: 1 Querying 2 Summarizing 3 Analyzing . . . large datasets in off-line mode for boosting other on-line processes. 21 / 61
  22. 22. MapReduce Intro Applying MapReduce MapReduce in Action MapReduce Patterns 1 Summarization 2 Filtering 3 Data Organization (sort, merging, etc.) 4 Relational-based (join, selection, projection, etc.) 5 Iterative Message Passing (graph processing) 6 Others (depending on the implementation): Simulation of distributed systems Cross-correlation Metapatterns Input-output ... 22 / 61
  23. 23. MapReduce Intro Applying MapReduce Overview (stages)-Counting Letters 23 / 61
  24. 24. MapReduce Intro Applying MapReduce Summarization Types 1 Numerical summarizations 2 Inverted index 3 Counting and counters 24 / 61
  25. 25. MapReduce Intro Applying MapReduce Numerical Summarization-I Description A general pattern for calculating aggregate statistical values over your data. Intent Group records together by a key field and calculate a numerical aggregate per group to get a top-level view of the larger data set. 25 / 61
  26. 26. MapReduce Intro Applying MapReduce Numerical Summarization-II Applicability To deal with numerical data or counting. To group data by specific fields Examples 1 Word count 2 Record count 3 Min/Max/Count 4 Average/Median/Standard deviation 5 ... 26 / 61
  27. 27. MapReduce Intro Applying MapReduce Numerical Summarization-Pseudocode class Mapper method Map(recordid id, record r) for all term t in record r do Emit(term t, count 1) class Reducer method Reduce(term t, counts [c1, c2,...]) sum = 0 for all count c in [c1, c2,...] do sum = sum + c Emit(term t, count sum) 27 / 61
  28. 28. MapReduce Intro Applying MapReduce Overview-Word Counter 28 / 61
  29. 29. MapReduce Intro Applying MapReduce Numerical Summarization-Word Counter § ¤ public void map ( LongWritable key , Text value , Context context ) throws Exception { String line = value . toString () ; StringTokenizer tokenizer = new StringTokenizer ( line ) ; while ( tokenizer . hasMoreTokens () ) { word . set ( tokenizer . nextToken () ) ; context . write ( word , one ) ; } } public void reduce ( Text key , Iterable IntWritable values , Context context ) throws IOException , I n t e r r u p t e d E x c e p t i o n { int sum = 0; for ( IntWritable val : values ) { sum += val . get () ; } context . write ( key , new IntWritable ( sum ) ) ; } ¦ ¥ 29 / 61
  30. 30. MapReduce Intro Applying MapReduce Example-II Min/Max Given a list of tweets (username, date, text) determine first and last time an user commented and the number of times. Implementation See https://github.com/chemaar/seqos/tree/master/prototypes/mapreduce-intro 30 / 61
  31. 31. MapReduce Intro Applying MapReduce Overview - Min/Max ∗ Min and max creation date are the same in the map phase. 31 / 61
  32. 32. MapReduce Intro Applying MapReduce Example II-Min/Max, function Map § ¤ public void map ( Object key , Text value , Context context ) throws IOException , InterruptedException , ParseException { Map String , String parsed = MRDPUtils . parse ( value . toString () ) ; String strDate = parsed . get ( MRDPUtils . CREATION_DATE ) ; String userId = parsed . get ( MRDPUtils . USER_ID ) ; if ( strDate == null || userId == null ) { return ; } Date creationDate = MRDPUtils . frmt . parse ( strDate ) ; outTuple . setMin ( creationDate ) ; outTuple . setMax ( creationDate ) ; outTuple . setCount (1) ; outUserId . set ( userId ) ; context . write ( outUserId , outTuple ) ; } ¦ ¥ 32 / 61
  33. 33. MapReduce Intro Applying MapReduce Example II-Min/Max, function Reduce § ¤ public void reduce ( Text key , Iterable MinMaxCountTuple values , Context context ) throws IOException , I n t e r r u p t e d E x c e p t i o n { result . setMin ( null ) ; result . setMax ( null ) ; int sum = 0; for ( MinMaxCountTuple val : values ) { if ( result . getMin () == null || val . getMin () . compareTo ( result . getMin () ) 0) { result . setMin ( val . getMin () ) ; } if ( result . getMax () == null || val . getMax () . compareTo ( result . getMax () ) 0) { result . setMax ( val . getMax () ) ; } sum += val . getCount () ;} result . setCount ( sum ) ; context . write ( key , result ) ; } ¦ ¥ 33 / 61
  34. 34. MapReduce Intro Applying MapReduce Example-III Average Given a list of tweets (username, date, text) determine the average comment length per hour of day. Implementation See https://github.com/chemaar/seqos/tree/master/prototypes/mapreduce-intro 34 / 61
  35. 35. MapReduce Intro Applying MapReduce Overview - Average 35 / 61
  36. 36. MapReduce Intro Applying MapReduce Example III-Average, function Map § ¤ public void map ( Object key , Text value , Context context ) throws IOException , InterruptedException , ParseException { Map String , String parsed = MRDPUtils . parse ( value . toString () ) ; String strDate = parsed . get ( MRDPUtils . CREATION_DATE ) ; String text = parsed . get ( MRDPUtils . TEXT ) ; if ( strDate == null || text == null ) { return ; } Date creationDate = MRDPUtils . frmt . parse ( strDate ) ; outHour . set ( creationDate . getHours () ) ; outCountAverage . setCount (1) ; outCountAverage . setAverage ( text . length () ) ; context . write ( outHour , outCountAverage ) ; } ¦ ¥ 36 / 61
  37. 37. MapReduce Intro Applying MapReduce Example III-Average, function Reduce § ¤ public void reduce ( IntWritable key , Iterable CountAverageTuple values , Context context ) throws IOException , I n t e r r u p t e d E x c e p t i o n { float sum = 0; float count = 0; for ( Co unt Ave rage Tup le val : values ) { sum += val . getCount () * val . getAverage () ; count += val . getCount () ; } result . setCount ( count ) ; result . setAverage ( sum / count ) ; context . write ( key , result ) ; } ¦ ¥ 37 / 61
  38. 38. MapReduce Intro Applying MapReduce Numerical Summarization-Other approaches Relation to SQL § ¤ SELECT MIN ( numcol1 ) , MAX ( numcol1 ) , COUNT (*) FROM table GROUP BY groupcol2 ; ¦ ¥ Implementation in PIG § ¤ b = GROUP a BY groupcol2 ; c = FOREACH b GENERATE group , MIN ( a . numcol1 ) , MAX ( a . numcol1 ) , COUNT_STAR ( a ) ; ¦ ¥ 38 / 61
  39. 39. MapReduce Intro Applying MapReduce Numerical Summarization-Other approaches Relation to SQL § ¤ SELECT MIN ( numcol1 ) , MAX ( numcol1 ) , COUNT (*) FROM table GROUP BY groupcol2 ; ¦ ¥ Implementation in PIG § ¤ b = GROUP a BY groupcol2 ; c = FOREACH b GENERATE group , MIN ( a . numcol1 ) , MAX ( a . numcol1 ) , COUNT_STAR ( a ) ; ¦ ¥ 39 / 61
  40. 40. MapReduce Intro Applying MapReduce Filtering Types 1 Filtering 2 Top N records 3 Bloom filtering 4 Distinct 40 / 61
  41. 41. MapReduce Intro Applying MapReduce Filtering-I Description It evaluates each record separately and decides, based on some condition, whether it should stay or go. Intent Filter out records that are not of interest and keep ones that are. 41 / 61
  42. 42. MapReduce Intro Applying MapReduce Filtering-II Applicability To collate data Examples 1 Closer view of dataset 2 Data cleansing 3 Tracking a thread of events 4 Simple random sampling 5 Distributed Grep 6 Removing low scoring dataset 7 Log Analysis 8 Data Querying 9 Data Validation 10 . . . 42 / 61
  43. 43. MapReduce Intro Applying MapReduce Filtering-Pseudocode class Mapper method Map(recordid id, record r) field f = extract(r) if predicate (f) Emit(recordid id, value(r)) class Reducer method Reduce(recordid id, values [r1, r2,...]) //Whatever Emit(recordid id, aggregate (values)) 43 / 61
  44. 44. MapReduce Intro Applying MapReduce Example-IV Distributed Grep Given a list of tweets (username, date, text) determine the tweets that contain a word. Implementation See https://github.com/chemaar/seqos/tree/master/prototypes/mapreduce-intro 44 / 61
  45. 45. MapReduce Intro Applying MapReduce Overview - Distributed Grep 45 / 61
  46. 46. MapReduce Intro Applying MapReduce Example IV-Distributed Grep, function Map § ¤ public void map ( Object key , Text value , Context context ) throws IOException , I n t e r r u p t e d E x c e p t i o n { Map String , String parsed = MRDPUtils . parse ( value . toString () ) ; String txt = parsed . get ( MRDPUtils . TEXT ) ; String mapRegex = .* b + context . getConfiguration () . get ( mapregex ) + (.) * b .* ; if ( txt . matches ( mapRegex ) ) { context . write ( NullWritable . get () , value ) ; } } ¦ ¥ ...and the Reduce function? In this case it is not necessary and output values are directly writing to the output. 46 / 61
  47. 47. MapReduce Intro Applying MapReduce Example-V Top 5 Given a list of tweets (username, date, text) determine the 5 users that wrote longer tweets Implementation See https://github.com/chemaar/seqos/tree/master/prototypes/mapreduce-intro 47 / 61
  48. 48. MapReduce Intro Applying MapReduce Overview - Top 5 48 / 61
  49. 49. MapReduce Intro Applying MapReduce Example V-Top 5, function Map § ¤ private TreeMap Integer , Text repToRecordMap = new TreeMap Integer , Text () ; public void map ( Object key , Text value , Context context ) throws IOException , I n t e r r u p t e d E x c e p t i o n { Map String , String parsed = MRDPUtils . parse ( value . toString () ) ; if ( parsed == null ) { return ;} String userId = parsed . get ( MRDPUtils . USER_ID ) ; String reputation = String . valueOf ( parsed . get ( MRDPUtils . TEXT ) . length () ) ; // Max reputation if you write tweets longer if ( userId == null || reputation == null ) { return ;} repToRecordMap . put ( Integer . parseInt ( reputation ) , new Text ( value ) ) ; if ( repToRecordMap . size () MAX_TOP ) { repToRecordMap . remove ( repToRecordMap . firstKey () ); } } ¦ ¥ 49 / 61
  50. 50. MapReduce Intro Applying MapReduce Example V-Top 5, function Reduce § ¤ public void reduce ( NullWritable key , Iterable Text values , Context context ) throws IOException , I n t e r r u p t e d E x c e p t i o n { for ( Text value : values ) { Map String , String parsed = MRDPUtils . parse ( value . toString () ) ; repToRecordMap . put ( parsed . get ( MRDPUtils . TEXT ) . length () , new Text ( value ) ) ; if ( repToRecordMap . size () MAX_TOP ) { repToRecordMap . remove ( repToRecordMap . firstKey () ); } } for ( Text t : repToRecordMap . descendingMap () . values () ) { context . write ( NullWritable . get () , t ) ; } } ¦ ¥ 50 / 61
  51. 51. MapReduce Intro Applying MapReduce Filtering-Other approaches Relation to SQL § ¤ SELECT * FROM table WHERE colvalue VALUE ; ¦ ¥ Implementation in PIG § ¤ b = FILTER a BY colvalue VALUE ; ¦ ¥ 51 / 61
  52. 52. MapReduce Intro Applying MapReduce Filtering-Other approaches Relation to SQL § ¤ SELECT * FROM table WHERE colvalue VALUE ; ¦ ¥ Implementation in PIG § ¤ b = FILTER a BY colvalue VALUE ; ¦ ¥ 52 / 61
  53. 53. MapReduce Intro Applying MapReduce Tip How can I use and run a MapReduce framework? You should identify what kind of problem you are addressing and apply a design pattern to be implemented in a framework such as Apache Hadoop. 53 / 61
  54. 54. MapReduce Intro Success Stories with MapReduce Tip Who is using MapReduce? All companies that are dealing with Big Data problems for analytics such as: Cloudera Datasalt Elasticsearch ... 54 / 61
  55. 55. MapReduce Intro Success Stories with MapReduce Apache Hadoop-Related Projects 55 / 61
  56. 56. MapReduce Intro Success Stories with MapReduce More tips FAQ MapReduce is a framework based on a simple programming model ...to deal with large datasets in a distributed fashion ...scalability, replication, fault-tolerant, etc. Apache Hadoop is not a database New frameworks on top of Hadoop for specific tasks: querying, analysis, etc. Other similar frameworks: Storm, Signal/Collect, etc. ... 56 / 61
  57. 57. MapReduce Intro Summary and Conclusions Summary 57 / 61
  58. 58. MapReduce Intro Summary and Conclusions Conclusions What is MapReduce? It is a framework inspired in functional programming to tackle problems in which steps can be paralellized applying a divide and conquer approach. What can I do in MapReduce? Three main functions: 1 Querying 2 Summarizing 3 Analyzing . . . large datasets in off-line mode for boosting other on-line processes. How can I use and run a MapReduce framework? You should identify what kind of problem you are addressing and apply a design pattern to be implemented in a framework such as Apache Hadoop. 58 / 61
  59. 59. MapReduce Intro Summary and Conclusions Conclusions What is MapReduce? It is a framework inspired in functional programming to tackle problems in which steps can be paralellized applying a divide and conquer approach. What can I do in MapReduce? Three main functions: 1 Querying 2 Summarizing 3 Analyzing . . . large datasets in off-line mode for boosting other on-line processes. How can I use and run a MapReduce framework? You should identify what kind of problem you are addressing and apply a design pattern to be implemented in a framework such as Apache Hadoop. 59 / 61
  60. 60. MapReduce Intro Summary and Conclusions Conclusions What is MapReduce? It is a framework inspired in functional programming to tackle problems in which steps can be paralellized applying a divide and conquer approach. What can I do in MapReduce? Three main functions: 1 Querying 2 Summarizing 3 Analyzing . . . large datasets in off-line mode for boosting other on-line processes. How can I use and run a MapReduce framework? You should identify what kind of problem you are addressing and apply a design pattern to be implemented in a framework such as Apache Hadoop. 60 / 61
  61. 61. MapReduce Intro Summary and Conclusions What’s next? ... Concatenate MapReduce jobs Optimization using combiners and setting the parameters (size of partition, etc.) Pipelining with other languages such as Python Hadoop in Action: more examples, etc. New trending problems (image/video processing) Real-time processing ... 61 / 61
  62. 62. MapReduce Intro References J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Commun. ACM, 51(1):107–113, Jan. 2008. J. L. Jonathan R. Owens, Brian Femiano. Hadoop Real-World Solutions Cookbook. Packt Publishing Ltd, 2013. C. Lam. Hadoop in Action. Manning Publications Co., Greenwich, CT, USA, 1st edition, 2010. J. Lin and C. Dyer. Data-intensive text processing with MapReduce. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion 62 / 61
  63. 63. MapReduce Intro References Volume: Tutorial Abstracts, NAACL-Tutorials ’09, pages 1–2, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics. D. Miner and A. Shook. Mapreduce Design Patterns. Oreilly and Associates Inc, 2012. T. G. Srinath Perera. Hadoop MapReduce Cookbook. Packt Publishing Ltd, 2013. T. White. Hadoop: The Definitive Guide. O’Reilly Media, Inc., 1st edition, 2009. I. H. Witten and E. Frank. Data Mining: Practical Machine LearningTools and Techniques. 63 / 61
  64. 64. MapReduce Intro References Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2005. 64 / 61
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×