Map, Filter, Reduce – In the Small and in the Cloud

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    6 Favorites

    Map, Filter, Reduce – In the Small and in the Cloud - Presentation Transcript

    1. Welcome! Map, Filter, Reduce In the Small and in the Cloud Sebastian Bergmann http://sebastian-bergmann.de/ October 29th 2008
    2. Who I am ● Sebastian Bergmann ● Involved in the PHP Project since 2000 ● Developer of PHPUnit ● Author, Consultant, Coach, Trainer
    3. Programming Paradigms A programming paradigm is a fundamental style of computer programming This slide contains material from Wikipedia
    4. Programming Paradigms A programming paradigm is a fundamental style of computer programming – Declarative vs. Imperative Programming This slide contains material from Wikipedia
    5. Programming Paradigms A programming paradigm is a fundamental style of computer programming – Declarative vs. Imperative Programming – Procedural vs. Object-Oriented Programming This slide contains material from Wikipedia
    6. Programming Paradigms A programming paradigm is a fundamental style of computer programming – Declarative vs. Imperative Programming – Procedural vs. Object-Oriented Programming – Functional Programming – ... This slide contains material from Wikipedia
    7. Functional Programming Concepts ● Higher-Order Functions ● Pure Functions ● Recursion ● Non-Strict Evaluation / Lazy Evaluation This slide contains material from Wikipedia
    8. Functional Programming Concepts ● Higher-Order Functions – Function that takes one or more functions as input and/or returns a function ● Pure Functions ● Recursion ● Non-Strict Evaluation / Lazy Evaluation This slide contains material from Wikipedia
    9. Functional Programming Concepts ● Higher-Order Functions – Function that takes one or more functions as input and/or returns a function – Lambda Calculus only allows unary functions (single input, single output) ● Pure Functions ● Recursion ● Non-Strict Evaluation / Lazy Evaluation This slide contains material from Wikipedia
    10. Functional Programming Concepts ● Higher-Order Functions – Function that takes one or more functions as input and/or returns a function – Lambda Calculus only allows unary functions (single input, single output) A function of two variables is expressed as a function of one argument which returns a function of one argument (Currying) ● Pure Functions ● Recursion ● Non-Strict Evaluation / Lazy Evaluation This slide contains material from Wikipedia
    11. Functional Programming Concepts ● Higher-Order Functions ● Pure Functions – Function that has no semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices ● Recursion ● Non-Strict Evaluation / Lazy Evaluation This slide contains material from Wikipedia
    12. Functional Programming Concepts ● Higher-Order Functions ● Pure Functions ● Recursion – Method of defining functions in which the function being defined is applied within its own definition ● Non-Strict Evaluation / Lazy Evaluation This slide contains material from Wikipedia
    13. Functional Programming Concepts ● Higher-Order Functions ● Pure Functions ● Recursion ● Non-Strict Evaluation / Lazy Evaluation – Technique of delaying a computation until such time as the result of the computation is known to be needed This slide contains material from Wikipedia
    14. Functional Programming Examples of Higher-Order Functions ● Map Applies a given function to a sequence of elements and returns a sequence of results This slide contains material from Wikipedia
    15. Functional Programming Examples of Higher-Order Functions ● Map Applies a given function to a sequence of elements and returns a sequence of results ● Filter Filters a sequence of elements using a given predicate function This slide contains material from Wikipedia
    16. Functional Programming Examples of Higher-Order Functions ● Map Applies a given function to a sequence of elements and returns a sequence of results ● Filter Filters a sequence of elements using a given predicate function ● Reduce Processes a sequence of elements and builds up a single return value This slide contains material from Wikipedia
    17. Higher-Order Functions in PHP array_map() array_map() applies a callback to all elements of an array and returns the resulting array
    18. Higher-Order Functions in PHP array_map() <?php print_r( array_map( function ($x) { return $x + 1; }, array(1, 2, 3, 4, 5) ) ); Array ?> ( [0] => 2 [1] => 3 [2] => 4 [3] => 5 [4] => 6 )
    19. Higher-Order Functions in PHP array_map() <?php print_r( array_map( function ($x) { return $x + 1; }, array(1, 2, 3, 4, 5) ) ); Array ?> Lambda Function ( Anonymous function ● [0] => 2 New syntax construct (with support [1] => 3 ● for Closures) in PHP 5.3 [2] => 4 [3] => 5 [4] => 6 ) This slide contains material by Stuart Langridge
    20. Higher-Order Functions in PHP array_filter() <?php print_r( array_filter( array(1, 2, 3, 4, 5), function ($x) { return $x % 2 == 0; } ) ); ?> Array ( [1] => 2 [3] => 4 )
    21. Higher-Order Functions in PHP array_reduce() array_reduce() iteratively reduces an array to a single value using a callback
    22. Higher-Order Functions in PHP array_reduce() <?php print array_reduce( array(1, 2, 3, 4, 5), function ($x, $y) { return $x + $y; } ); ?> 15
    23. The Map/Reduce Model ● Programming model for processing large data sets
    24. The Map/Reduce Model ● Programming model for processing large data sets – map(k1, v1) → list(k2, v2) Processes a key/value pair to generate a set of intermediate key/value pairs
    25. The Map/Reduce Model ● Programming model for processing large data sets – map(k1, v1) → list(k2, v2) Processes a key/value pair to generate a set of intermediate key/value pairs – reduce(k2, list(v2)) → list(v2) Merges all intermediate values associated with the same intermediate key
    26. The Map/Reduce Model ● Programming model for processing large data sets – Map function processes a key/value pair to generate a set of intermediate key/value pairs – Reduce function merges all intermediate values associated with the same intermediate key ● Programs written in this functional style can be automatically parallelized A run-time system takes care of, among other things, partitioning the input data, scheduling the program's execution across a cluster, and fault tolerance
    27. The Map/Reduce Model Paper by Jeffrey Dean and Sanjay Ghemawat @inproceedings{citeulike:430834, author = {Dean, Jeffrey and Ghemawat, Sanjay}, citeulike-article-id = {430834}, journal = {OSDI '04}, keywords = {distributed-computing, mapreduce}, pages = {137--150}, posted-at = {2007-09-07 10:07:39}, priority = {0}, title = {MapReduce: Simplified Data Processing on Large Clusters}, url = {http://www.usenix.org/events/osdi04/tech/dean.html} }
    28. The Map/Reduce Model Use Cases ● Apache CouchDB Views are defined using JavaScript functions acting as the map part in a Map/Reduce system
    29. The Map/Reduce Model Use Cases ● Apache CouchDB Views are defined using JavaScript functions acting as the map part in a Map/Reduce system ● Facebook Hadoop cluster with more than 2.500 cores, 1 PB of data, and 2 TB of new data every day is used for user analytics
    30. The Map/Reduce Model Use Cases ● Apache CouchDB Views are defined using JavaScript functions acting as the map part in a Map/Reduce system ● Facebook Hadoop cluster with more than 2.500 cores, 1 PB of data, and 2 TB of new data every day is used for user analytics ● Yahoo! Hadoop cluster with more than 10.000 cores and 4 PB of data is used for Webmap
    31. The Map/Reduce Model Use Cases ● Apache CouchDB Views are defined using JavaScript functions acting as the map part in a Map/Reduce system ● Facebook Hadoop cluster with more than 2.500 cores, 1 PB of data, and 2 TB of new data every day is used for user analytics ● Yahoo! Hadoop cluster with more than 10.000 cores and 4 PB of data is used for Webmap ● Google More than 20 PB of data is analyzed using MapReduce every day, BigTable and GFS are built around MapReduce
    32. The Map/Reduce Model Example: Word Counting <?php function map($buffer) { $words = preg_split( '/\\W/', strtolower($buffer), 0, PREG_SPLIT_NO_EMPTY ); }
    33. The Map/Reduce Model Example: Word Counting <?php function map($buffer) { $words = preg_split( '/\\W/', strtolower($buffer), 0, PREG_SPLIT_NO_EMPTY ); foreach ($words as $word) { emit_intermediate($word, 1); } }
    34. The Map/Reduce Model Example: Word Counting <?php function map($buffer) { $words = preg_split( '/\\W/', strtolower($buffer), 0, PREG_SPLIT_NO_EMPTY ); foreach ($words as $word) { emit_intermediate($word, 1); } } function reduce($word, array $values) { emit($word, array_sum($values)); }
    35. The Map/Reduce Model Example: Word Counting function emit_intermediate($key, $value) { global $data; if (isset($data[$key])) { $data[$key][] = $value; } else { $data[$key] = array($value); } } function emit($key, $value) { print \"$key: $value\\n\"; }
    36. The Map/Reduce Model Example: Word Counting $data = array(); $text = <<<EOT foo bar baz bar bar baz foo EOT;
    37. The Map/Reduce Model Example: Word Counting $data = array(); $text = <<<EOT foo bar baz bar bar baz foo EOT; foreach (explode(\"\\n\", $text) as $line) { map($line); }
    38. The Map/Reduce Model Example: Word Counting $data = array(); $text = <<<EOT foo bar baz bar bar baz foo EOT; foreach (explode(\"\\n\", $text) as $line) { map($line); } foreach ($data as $key => $values) { reduce($key, $values); }
    39. The Map/Reduce Model Example: Word Counting foo: 2 bar: 3 baz: 2 emit_intermediate('foo', 1); reduce('foo', array(1, 1)); emit_intermediate('bar', 1); reduce('bar', array(1, 1, 1)); emit_intermediate('baz', 1); reduce('baz', array(1, 1)); emit_intermediate('bar', 1); emit_intermediate('bar', 1); emit_intermediate('baz', 1); emit_intermediate('foo', 1);
    40. Apache Hadoop http://hadoop.apache.org/ Apache Hadoop is a software framework for easily writing applications which process vast amounts of data in parallel on large clusters of commodity hardware in a reliable, fault-tolerant manner.
    41. Apache Hadoop http://hadoop.apache.org/ ● Hadoop Core – Distributed File System (HDFS) – Map / Reduce ● HBase – Distributed, scalable database ● ZooKeeper – Coordination service for distributed applications
    42. Apache Hadoop Streaming ● Normally, MapReduce programs for Hadoop are written in Java ● Using Hadoop Streaming, one can use arbitrary commands as mapper and reducer, including interpreters such as PHP STDIN and STDOUT are the interface ● One key/value pair per line ● Key and value separated by TAB
    43. Apache Hadoop Streaming with PHP: map.php #!/usr/local/php-5.2/bin/php <?php while (($buffer = fgets(STDIN)) !== FALSE) { $words = preg_split( '/\\W/', strtolower($buffer), 0, PREG_SPLIT_NO_EMPTY ); foreach ($words as $word) { printf(\"%s\\t1\\n\", $word); } } ?>
    44. Apache Hadoop Streaming with PHP: reduce.php #!/usr/local/php-5.2/bin/php <?php $data = array(); while (($buffer = fgets(STDIN)) !== FALSE) { list($word, $count) = explode(\"\\t\", $buffer); if (!isset($data[$word])) { $data[$word] = (int)$count; } else { $data[$word] += (int)$count; } } foreach ($data as $word => $count) { printf(\"%s\\t%d\\n\", $word, $count); } ?>
    45. Apache Hadoop Streaming with PHP $ hadoop dfs -copyFromLocal input input $ hadoop jar contrib/streaming/hadoop-0.18.1-streaming.jar \\ -mapper /home/hadoop/map.php \\ -reducer /home/hadoop/reduce.php \\ -input input/* \\ -output output \\ additionalConfSpec_:null null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming packageJobJar: [/tmp/hadoop-hadoop/hadoop-unjar5743/] [] /tmp/streamjob5744.jar tmpDir=null 08/10/14 12:03:48 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 08/10/14 12:03:48 INFO mapred.FileInputFormat: Total input paths to process : 3 08/10/14 12:03:48 INFO mapred.FileInputFormat: Total input paths to process : 3 08/10/14 12:03:49 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-hadoop/mapred/local] 08/10/14 12:03:49 INFO streaming.StreamJob: Running job: job_200810141152_0001 08/10/14 12:03:49 INFO streaming.StreamJob: To kill this job, run: 08/10/14 12:03:49 INFO streaming.StreamJob: /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311 -kill job_200810141152_0001 08/10/14 12:03:49 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_200810141152_0001 08/10/14 12:03:50 INFO streaming.StreamJob: map 0% reduce 0% 08/10/14 12:04:05 INFO streaming.StreamJob: map 18% reduce 0% 08/10/14 12:04:10 INFO streaming.StreamJob: map 54% reduce 0% 08/10/14 12:04:12 INFO streaming.StreamJob: map 67% reduce 0% 08/10/14 12:04:19 INFO streaming.StreamJob: map 100% reduce 0% 08/10/14 12:04:28 INFO streaming.StreamJob: map 100% reduce 33% 08/10/14 12:04:29 INFO streaming.StreamJob: map 100% reduce 93% 08/10/14 12:04:30 INFO streaming.StreamJob: map 100% reduce 100% 08/10/14 12:04:30 INFO streaming.StreamJob: Job complete: job_200810141152_0001 08/10/14 12:04:30 INFO streaming.StreamJob: Output: output
    46. The End ● Thank you for your interest! ● These slides will be available shortly on http://sebastian-bergmann.de/talks/.
    47. License   This presentation material is published under the Attribution-Share Alike 3.0 Unported license.   You are free: ✔ to Share – to copy, distribute and transmit the work. ✔ to Remix – to adapt the work.   Under the following conditions: ● Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). ● Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.   For any reuse or distribution, you must make clear to others the license terms of this work.   Any of the above conditions can be waived if you get permission from the copyright holder.   Nothing in this license impairs or restricts the author's moral rights.

    + Sebastian BergmannSebastian Bergmann, 2 years ago

    custom

    3073 views, 6 favs, 12 embeds more stats

    PHP is not a full-fledged functional language, but more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 3073
      • 2681 on SlideShare
      • 392 from embeds
    • Comments 0
    • Favorites 6
    • Downloads 144
    Most viewed embeds
    • 280 views on http://sebastian-bergmann.de
    • 74 views on http://www.planet-php.net
    • 12 views on http://www.planet-php.org
    • 12 views on http://planet-php.org
    • 3 views on http://planet-php.net

    more

    All embeds
    • 280 views on http://sebastian-bergmann.de
    • 74 views on http://www.planet-php.net
    • 12 views on http://www.planet-php.org
    • 12 views on http://planet-php.org
    • 3 views on http://planet-php.net
    • 2 views on http://127.0.0.1:8795
    • 2 views on http://lj-toys.com
    • 2 views on http://xss.yandex.net
    • 2 views on http://static.slideshare.net
    • 1 views on http://blog.ulf-wendel.de
    • 1 views on applewebdata://D71F30F8-27E1-4DF2-B5FE-8EEA37D34D2B
    • 1 views on http://localhost

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories