Map/Reduce - In the Small and in the Cloud

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    3 Favorites, 1 Group & 1 Event

    Map/Reduce - In the Small and in the Cloud - Presentation Transcript

    1. Map | Filter | Reduce In the Small and in the Cloud Sebastian Bergmann March 19th 2009
    2. Who I Am Sebastian Bergmann  Involved in the PHP  project since 2000 Creator of PHPUnit  Co-Founder and  Principal Consultant with thePHP.cc
    3. Programming Paradigms Declarative vs. Imperative  Functional vs. Imperative  Object-Oriented vs. Procedural  Aspect-Oriented vs. Object-Oriented  ... 
    4. Functional Programming Concepts Higher-Order Functions  Pure Functions  Recursion  Non-Strict Evaluation / Lazy Evaluation 
    5. Functional Programming Concepts Higher-Order Functions  Function that takes one or more functions as  input and/or returns a function Pure Functions  Recursion  Non-Strict Evaluation / Lazy Evaluation 
    6. Functional Programming Concepts Higher-Order Functions  Function that takes one or more functions as  input and/or returns a function Lambda Calculus only allows unary functions  (single input, single output) Pure Functions  Recursion  Non-Strict Evaluation / Lazy Evaluation 
    7. Functional Programming Concepts Higher-Order Functions  Pure Functions  Function that has no semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices Recursion  Non-Strict Evaluation / Lazy Evaluation 
    8. Functional Programming Concepts Higher-Order Functions  Pure Functions  Recursion  Method of defining functions in which the function being defined is applied within its own definition Non-Strict Evaluation / Lazy Evaluation 
    9. Functional Programming Concepts Higher-Order Functions  Pure Functions  Recursion  Non-Strict Evaluation / Lazy Evaluation  Technique of delaying a computation until such time as the result of the computation is known to be needed
    10. Functional Programming Examples of Higher-Order Functions Map  Filter  Reduce 
    11. Functional Programming Examples of Higher-Order Functions Map  Applies a given function to a sequence of elements and returns a sequence of results Filter  Reduce 
    12. Functional Programming Examples of Higher-Order Functions Map  Filter  Filters a sequence of elements using a given predicate function Reduce 
    13. Functional Programming Examples of Higher-Order Functions Map  Filter  Reduce  Processes a sequence of elements and builds up a single return value
    14. Higher-Order PHP Functions array_map() array_map() applies a callback to all elements of an array and returns the resulting array
    15. Higher-Order PHP Functions array_map() <?php print_r( array_map( function ($x) { return $x + 1; }, array(1, 2, 3, 4, 5) ) ); ?> Array ( [0] => 2 [1] => 3 [2] => 4 [3] => 5 [4] => 6 )
    16. Higher-Order PHP Functions array_filter() <?php print_r( array_filter( array(1, 2, 3, 4, 5), function ($x) { return $x % 2 == 0; } ) ); ?> Array ( [1] => 2 [3] => 4 )
    17. Higher-Order PHP Functions array_reduce() array_reduce() iteratively reduces an array to a single value using a callback
    18. Higher-Order PHP Functions array_reduce() <?php print array_reduce( array(1, 2, 3, 4, 5), function ($x, $y) { return $x + $y; } ); ?> 15
    19. The Map/Reduce Model Programming model for processing large  data sets map(k1, v1) → list(k2, v2)  Processes a key/value pair to generate a set of intermediate key/value pairs reduce(k2, list(v2)) → list(v2)  Merges all intermediate values associated with the same intermediate key
    20. The Map/Reduce Model Programming model for processing large  data sets Map function processes a key/value pair to  generate a set of intermediate key/value pairs Reduce function merges all intermediate values  associated with the same intermediate key Programs written in this functional style  can be automatically parallelized
    21. The Map/Reduce Model Paper by Jeffrey Dean and Sanjay Ghemawat @inproceedings{citeulike:430834, author = {Dean, Jeffrey and Ghemawat, Sanjay}, citeulike-article-id = {430834}, journal = {OSDI '04}, keywords = {distributed-computing, mapreduce}, pages = {137--150}, posted-at = {2007-09-07 10:07:39}, priority = {0}, title = {MapReduce: Simplified Data Processing on Large Clusters}, url = {http://www.usenix.org/events/osdi04/tech/dean.html} }
    22. The Map/Reduce Model Use Cases Apache CouchDB  Views are defined using JavaScript functions acting as the map part in a Map/Reduce system Facebook  Hadoop cluster with more than 2.500 cores, 1 PB of data, and 2 TB of new data every day is used for user analytics Yahoo!  Hadoop cluster with more than 10.000 cores and 4 PB of data is used for Webmap Google  More than 20 PB of data is analyzed using MapReduce every day, BigTable and GFS are built around MapReduce
    23. The Map/Reduce Model Example: Word Counting <?php function map($buffer) { $words = preg_split( '/\\W/', strtolower($buffer), 0, PREG_SPLIT_NO_EMPTY ); }
    24. The Map/Reduce Model Example: Word Counting <?php function map($buffer) { $words = preg_split( '/\\W/', strtolower($buffer), 0, PREG_SPLIT_NO_EMPTY ); foreach ($words as $word) { emit_intermediate($word, 1); } }
    25. The Map/Reduce Model Example: Word Counting <?php function map($buffer) { $words = preg_split( '/\\W/', strtolower($buffer), 0, PREG_SPLIT_NO_EMPTY ); foreach ($words as $word) { emit_intermediate($word, 1); } } function reduce($word, array $values) { emit($word, array_sum($values)); }
    26. The Map/Reduce Model Example: Word Counting function emit_intermediate($key, $value) { global $data; if (isset($data[$key])) { $data[$key][] = $value; } else { $data[$key] = array($value); } } function emit($key, $value) { print \"$key: $value\\n\"; }
    27. The Map/Reduce Model Example: Word Counting $data = array(); $text = <<<EOT foo bar baz bar bar baz foo EOT;
    28. The Map/Reduce Model Example: Word Counting $data = array(); $text = <<<EOT foo bar baz bar bar baz foo EOT; foreach (explode(\"\\n\", $text) as $line) { map($line); }
    29. The Map/Reduce Model Example: Word Counting $data = array(); $text = <<<EOT foo bar baz bar bar baz foo EOT; foreach (explode(\"\\n\", $text) as $line) { map($line); } foreach ($data as $key => $values) { reduce($key, $values); }
    30. The Map/Reduce Model Example: Word Counting foo: 2 bar: 3 baz: 2 emit_intermediate('foo', 1); reduce('foo', array(1, 1)); emit_intermediate('bar', 1); reduce('bar', array(1, 1, 1)); emit_intermediate('baz', 1); reduce('baz', array(1, 1)); emit_intermediate('bar', 1); emit_intermediate('bar', 1); emit_intermediate('baz', 1); emit_intermediate('foo', 1);
    31. Apache Hadoop Software Framework that makes it easy to write programs that  process vast amounts of data in parallel  on large clusters of commodity hardware  in a reliable, fault-tolerant manner 
    32. Apache Hadoop Components Hadoop Core: HDFS and Map/Reduce  HBase: Distributed database  ZooKepper: Coordination service 
    33. Apache Hadoop Streaming MapReduce programs for Hadoop are  written in Java Hadoop Streaming allow the use of  arbitrary commands – including interpreters such as PHP – as mappers and reducers STDIN and STDOUT are the interface  One key/value pair per line  Key and value separated by TAB 
    34. Apache Hadoop Streaming with PHP: map.php #!/usr/bin/env php <?php while (($buffer = fgets(STDIN)) !== FALSE) { $words = preg_split( '/\\W/', strtolower($buffer), 0, PREG_SPLIT_NO_EMPTY ); foreach ($words as $word) { printf(\"%s\\t1\\n\", $word); } } ?>
    35. Apache Hadoop Streaming with PHP: reduce.php #!/usr/bin/env php <?php $data = array(); while (($buffer = fgets(STDIN)) !== FALSE) { list($word, $count) = explode(\"\\t\", $buffer); if (!isset($data[$word])) { $data[$word] = (int)$count; } else { $data[$word] += (int)$count; } } foreach ($data as $word => $count) { printf(\"%s\\t%d\\n\", $word, $count); } ?>
    36. Apache Hadoop Streaming with PHP $ hadoop dfs -copyFromLocal input input $ hadoop jar contrib/streaming/hadoop-0.18.1-streaming.jar \\ -mapper /home/hadoop/map.php \\ -reducer /home/hadoop/reduce.php \\ -input input/* \\ -output output \\ additionalConfSpec_:null null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming packageJobJar: [/tmp/hadoop-hadoop/hadoop-unjar5743/] [] /tmp/streamjob5744.jar tmpDir=null 08/10/14 12:03:48 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 08/10/14 12:03:48 INFO mapred.FileInputFormat: Total input paths to process : 3 08/10/14 12:03:48 INFO mapred.FileInputFormat: Total input paths to process : 3 08/10/14 12:03:49 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-hadoop/mapred/local] 08/10/14 12:03:49 INFO streaming.StreamJob: Running job: job_200810141152_0001 08/10/14 12:03:49 INFO streaming.StreamJob: To kill this job, run: 08/10/14 12:03:49 INFO streaming.StreamJob: /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_200810141152_0001 08/10/14 12:03:49 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_200810141152_0001 08/10/14 12:03:50 INFO streaming.StreamJob: map 0% reduce 0% 08/10/14 12:04:05 INFO streaming.StreamJob: map 18% reduce 0% 08/10/14 12:04:10 INFO streaming.StreamJob: map 54% reduce 0% 08/10/14 12:04:12 INFO streaming.StreamJob: map 67% reduce 0% 08/10/14 12:04:19 INFO streaming.StreamJob: map 100% reduce 0% 08/10/14 12:04:28 INFO streaming.StreamJob: map 100% reduce 33% 08/10/14 12:04:29 INFO streaming.StreamJob: map 100% reduce 93% 08/10/14 12:04:30 INFO streaming.StreamJob: map 100% reduce 100% 08/10/14 12:04:30 INFO streaming.StreamJob: Job complete: job_200810141152_0001 08/10/14 12:04:30 INFO streaming.StreamJob: Output: output
    37. The End Thank you for your interest! These slides will be posted on http://www.slideshare.net/sebastian_bergmann You can vote for this talk on http://joind.in/220
    38. License This presentation material is published under the Attribution-Share Alike 3.0 Unported   license. You are free:   to Share – to copy, distribute and transmit the work. ✔ to Remix – to adapt the work. ✔ Under the following conditions:   Attribution. You must attribute the work in the manner specified by the author or ● licensor (but not in any way that suggests that they endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the ● resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this   work. Any of the above conditions can be waived if you get permission from the copyright   holder. Nothing in this license impairs or restricts the author's moral rights.  

    + Sebastian BergmannSebastian Bergmann, 8 months ago

    custom

    1673 views, 3 favs, 0 embeds more stats

    More info about this document

    CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

    Go to text version

    • Total Views 1673
      • 1673 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 3
    • Downloads 40
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories

    Groups / Events