Reducers                         A library and model for collection processing in Clojure                                 ...
Reducers                         A library and model for collection processing in Clojure                                 ...
Reducers huh? Here’s the gistThursday, 30 August 12
Reducers huh? Here’s the gist                         You get parallel versions of reduce, map and filterThursday, 30 Augu...
Reducers huh? Here’s the gist                         You get parallel versions of reduce, map and filter                 ...
Reducers huh? Here’s the gist                         You get parallel versions of reduce, map and filter                 ...
Alright, alright I’m kiddingThursday, 30 August 12
How do reducers make parallelism possible?Thursday, 30 August 12
How do reducers make parallelism possible?                                   • JVM’s Fork/Join framework                  ...
Before we start - this is bleeding edge stuff                         Java requirements                         • Fork/Joi...
The Fork/Join FrameworkThursday, 30 August 12
The Fork/Join Framework                         •Based on divide and conquerThursday, 30 August 12
The Fork/Join Framework                         •Based on divide and conquer                         •Work stealing algori...
The Fork/Join Framework                         •Based on divide and conquer                         •Work stealing algori...
The Fork/Join Framework                         •Based on divide and conquer                         •Work stealing algori...
The Fork/Join Framework                         •Based on divide and conquer                         •Work stealing algori...
The Fork/Join Framework                         •Based on divide and conquer                         •Work stealing algori...
The Fork/Join Framework                         •Based on divide and conquer                         •Work stealing algori...
Text is boringThursday, 30 August 12
Fork/Join algorithm - simplified viewThursday, 30 August 12
Fork/Join algorithm - simplified view   Workload is put in “deques”Thursday, 30 August 12
Fork/Join algorithm - simplified view                                                         ...and progressively halvedTh...
Fork/Join algorithm - simplified viewThursday, 30 August 12
Fork/Join algorithm - simplified view                         ...up to a configured thresholdThursday, 30 August 12
Fork/Join algorithm - simplified view                          Worker 1                    Worker 2Thursday, 30 August 12
Fork/Join algorithm - simplified view                          Worker 1                    Worker 2Thursday, 30 August 12
Fork/Join algorithm - simplified view                         Combine                                    Worker 1          ...
Fork/Join algorithm - simplified view                          Worker 1                    Worker 2Thursday, 30 August 12
Fork/Join algorithm - simplified view                          Worker 1                    Worker 2Thursday, 30 August 12
Fork/Join algorithm - simplified view                         Combine                          Worker 1                    ...
Fork/Join algorithm - simplified view                         Combine                            Combine                   ...
Fork/Join algorithm - simplified view                                                           Combine                    ...
Fork/Join algorithm - simplified view                          Worker 1                    Worker 2Thursday, 30 August 12
Fork/Join algorithm - simplified view                                 Combine                          Worker 1            ...
Fork/Join algorithm - simplified view                           Combine                          Worker 1                  ...
Fork/Join algorithm - simplified view                          Worker 1                    Worker 2Thursday, 30 August 12
Fork/Join algorithm - simplified view                             Worker 1                    Worker 2                     ...
Fork/Join algorithm - simplified view                                             Combine Combine                          ...
Fork/Join algorithm - simplified view                                                        Combine                       ...
Fork/Join algorithm - simplified view                                                    Combine                          W...
Fork/Join algorithm - simplified view                                       Combine                          Worker 1      ...
Fork/Join algorithm - simplified view                                      Final result                          Worker 1  ...
Let’s talk about ReducersThursday, 30 August 12
Let’s talk about Reducers                         Motivations                         • Performance                       ...
Let’s talk about Reducers                         Motivations                               Issues                        ...
A closer look at what map does                         ;; a naive map implementation                         (defn map [f ...
A closer look at what map does                             ;; a naive map implementation                             (defn...
A closer look at what map does                             ;; a naive map implementation                             (defn...
A closer look at what map does                              ;; a naive map implementation                              (de...
A closer look at what map does                              ;; a naive map implementation                              (de...
A closer look at what map does                              ;; a naive map implementation                              (de...
A closer look at what map does                              ;; a naive map implementation                              (de...
A closer look at what map does                              ;; a naive map implementation                              (de...
Reduction TransformersThursday, 30 August 12
Reduction Transformers                         • Idea is to build map / filter on top of reduce to break from sequentialit...
Reduction Transformers                         • Idea is to build map / filter on top of reduce to break from sequentialit...
Reduction Transformers                         • Idea is to build map / filter on top of reduce to break from sequentialit...
What map is really all about                         (defn mapping [f]                           (fn [f1]                 ...
But wait!                         If map doesn’t consume the list any longer, who does?                             • redu...
Now we can use mapping to create reducing functions                               (reduce ((mapping inc) +) 0 [1 2 3 4])  ...
Now we can use mapping to create reducing functions                               (reduce ((mapping inc) +) 0 [1 2 3 4])  ...
Now we can use mapping to create reducing functions                             (reduce ((mapping inc) conj) [] [1 2 3 4])...
Now we can use mapping to create reducing functions                             (reduce ((mapping inc) conj) [] [1 2 3 4])...
Now we can use mapping to create reducing functions                             (reduce ((mapping inc) conj) [] [1 2 3 4])...
What do we have so far?                         • Performance has been improved due to less allocations                   ...
Enters foldThursday, 30 August 12
Enters fold                         • Takes the sequentiality out or foldl, foldr and reduceThursday, 30 August 12
Enters fold                         • Takes the sequentiality out or foldl, foldr and reduce                         • Pot...
Enters fold                         • Takes the sequentiality out or foldl, foldr and reduce                         • Pot...
Enters fold                         • Takes the sequentiality out or foldl, foldr and reduce                         • Pot...
Enters fold                         • Takes the sequentiality out or foldl, foldr and reduce                         • Pot...
Enters fold                         • Takes the sequentiality out or foldl, foldr and reduce                         • Pot...
Enters fold                         • Takes the sequentiality out or foldl, foldr and reduce                         • Pot...
The combining function is a monoid                         • A binary function with an identity element                   ...
The combining function is a monoid                         • A binary function with an identity element                   ...
The combining function is a monoid                         • A binary function with an identity element                   ...
The combining function is a monoid                         • A binary function with an identity element                   ...
fold by examples                         ;; all examples assume the reducers library                         is available ...
fold by examples:                         increment all even positive integers up to 10 million                           ...
fold by examples:                          increment all even positive integers up to 10 million                          ...
fold by examples:                          increment all even positive integers up to 10 million                          ...
fold by examples:                          increment all even positive integers up to 10 million                          ...
fold by examples:                          increment all even positive integers up to 10 million                          ...
fold by examples:                          increment all even positive integers up to 10 million                          ...
fold by examples:                          increment all even positive integers up to 10 million                          ...
fold by examples:                          increment all even positive integers up to 10 million                          ...
fold by examples:                          increment all even positive integers up to 10 million                          ...
fold by examples:                                    standard word count                (def wiki-dump (slurp "subset-wiki...
fold by examples:                                    standard word count                (def wiki-dump (slurp "subset-wiki...
fold by examples:                                     parallel word count                (def wiki-dump (slurp "subset-wik...
fold by examples:                                     parallel word count                (def wiki-dump (slurp "subset-wik...
fold by examples:                                     parallel word count                (def wiki-dump (slurp "subset-wik...
fold by examples:                                     parallel word count                (def wiki-dump (slurp "subset-wik...
fold by examples:                                     parallel word count                (def wiki-dump (slurp "subset-wik...
fold by examples:                                     parallel word count                (def wiki-dump (slurp "subset-wik...
fold by examples:                               Load 100k records into PostgreSQL                  (def records           ...
fold by examples:                                    Load 100k records into PostgreSQL                         (time (dose...
fold by examples:                                      Load 100k records into PostgreSQL                         (time (do...
fold by examples:                         Load 100k records into PostgreSQL in parallel(time (r/fold       +       (r/map ...
fold by examples:                         Load 100k records into PostgreSQL in parallel(time (r/fold       +       (r/map ...
When to use itThursday, 30 August 12
When to use it                         • Exploring decision treesThursday, 30 August 12
When to use it                         • Exploring decision trees                         • Image processingThursday, 30 A...
When to use it                         • Exploring decision trees                         • Image processing              ...
When to use it                         • Exploring decision trees                         • Image processing              ...
When to use it                         • Exploring decision trees                         • Image processing              ...
Resources                         • The Anatomy of a Reducer - http://bit.ly/anatomyReducers                         • Ric...
Thanks!                             Questions?                                 Leonardo Borges                            ...
Upcoming SlideShare
Loading in...5
×

Clojure Reducers / clj-syd Aug 2012

2,877

Published on

Talk given at the Sydney Clojure User group, August 2012

Published in: Technology, Education
0 Comments
8 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,877
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
46
Comments
0
Likes
8
Embeds 0
No embeds

No notes for slide

Transcript of "Clojure Reducers / clj-syd Aug 2012"

  1. 1. Reducers A library and model for collection processing in Clojure Leonardo Borges @leonardo_borges http://www.leonardoborges.com http://www.thoughtworks.comThursday, 30 August 12
  2. 2. Reducers A library and model for collection processing in Clojure less or m i ns in 20 ... Leonardo Borges @leonardo_borges http://www.leonardoborges.com http://www.thoughtworks.comThursday, 30 August 12
  3. 3. Reducers huh? Here’s the gistThursday, 30 August 12
  4. 4. Reducers huh? Here’s the gist You get parallel versions of reduce, map and filterThursday, 30 August 12
  5. 5. Reducers huh? Here’s the gist You get parallel versions of reduce, map and filter Ta-da! I’m done!Thursday, 30 August 12
  6. 6. Reducers huh? Here’s the gist You get parallel versions of reduce, map and filter Ta-da! I’m done! and well under my 20 min limit :)Thursday, 30 August 12
  7. 7. Alright, alright I’m kiddingThursday, 30 August 12
  8. 8. How do reducers make parallelism possible?Thursday, 30 August 12
  9. 9. How do reducers make parallelism possible? • JVM’s Fork/Join framework • Reduction TransformersThursday, 30 August 12
  10. 10. Before we start - this is bleeding edge stuff Java requirements • Fork/Join framework • Java 7 [1] or • Java 6 + the JSR166 jar [2] Clojure requirements • 1.5.0-* (this is still MASTER on github [3] as of 30/08/2012) [1] - http://jdk7.java.net/ [2] - http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166.jar [3] - https://github.com/clojure/clojureThursday, 30 August 12
  11. 11. The Fork/Join FrameworkThursday, 30 August 12
  12. 12. The Fork/Join Framework •Based on divide and conquerThursday, 30 August 12
  13. 13. The Fork/Join Framework •Based on divide and conquer •Work stealing algorithmThursday, 30 August 12
  14. 14. The Fork/Join Framework •Based on divide and conquer •Work stealing algorithm •Uses deques - double ended queues.Thursday, 30 August 12
  15. 15. The Fork/Join Framework •Based on divide and conquer •Work stealing algorithm •Uses deques - double ended queues. •Progressively divides the workload into tasks, up to a thresholdThursday, 30 August 12
  16. 16. The Fork/Join Framework •Based on divide and conquer •Work stealing algorithm •Uses deques - double ended queues. •Progressively divides the workload into tasks, up to a threshold •Once it finished one task, it pops another one form its dequeThursday, 30 August 12
  17. 17. The Fork/Join Framework •Based on divide and conquer •Work stealing algorithm •Uses deques - double ended queues. •Progressively divides the workload into tasks, up to a threshold •Once it finished one task, it pops another one form its deque •After at least two tasks have finished, results can be combined/joinedThursday, 30 August 12
  18. 18. The Fork/Join Framework •Based on divide and conquer •Work stealing algorithm •Uses deques - double ended queues. •Progressively divides the workload into tasks, up to a threshold •Once it finished one task, it pops another one form its deque •After at least two tasks have finished, results can be combined/joined •Idle workers can pop tasks from the deques of workers which fall behindThursday, 30 August 12
  19. 19. Text is boringThursday, 30 August 12
  20. 20. Fork/Join algorithm - simplified viewThursday, 30 August 12
  21. 21. Fork/Join algorithm - simplified view Workload is put in “deques”Thursday, 30 August 12
  22. 22. Fork/Join algorithm - simplified view ...and progressively halvedThursday, 30 August 12
  23. 23. Fork/Join algorithm - simplified viewThursday, 30 August 12
  24. 24. Fork/Join algorithm - simplified view ...up to a configured thresholdThursday, 30 August 12
  25. 25. Fork/Join algorithm - simplified view Worker 1 Worker 2Thursday, 30 August 12
  26. 26. Fork/Join algorithm - simplified view Worker 1 Worker 2Thursday, 30 August 12
  27. 27. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2Thursday, 30 August 12
  28. 28. Fork/Join algorithm - simplified view Worker 1 Worker 2Thursday, 30 August 12
  29. 29. Fork/Join algorithm - simplified view Worker 1 Worker 2Thursday, 30 August 12
  30. 30. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2Thursday, 30 August 12
  31. 31. Fork/Join algorithm - simplified view Combine Combine Worker 1 Worker 2Thursday, 30 August 12
  32. 32. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2Thursday, 30 August 12
  33. 33. Fork/Join algorithm - simplified view Worker 1 Worker 2Thursday, 30 August 12
  34. 34. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2Thursday, 30 August 12
  35. 35. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2Thursday, 30 August 12
  36. 36. Fork/Join algorithm - simplified view Worker 1 Worker 2Thursday, 30 August 12
  37. 37. Fork/Join algorithm - simplified view Worker 1 Worker 2 Idle workers can “steal” items from other workersThursday, 30 August 12
  38. 38. Fork/Join algorithm - simplified view Combine Combine Worker 1 Worker 2Thursday, 30 August 12
  39. 39. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2Thursday, 30 August 12
  40. 40. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2Thursday, 30 August 12
  41. 41. Fork/Join algorithm - simplified view Combine Worker 1 Worker 2Thursday, 30 August 12
  42. 42. Fork/Join algorithm - simplified view Final result Worker 1 Worker 2Thursday, 30 August 12
  43. 43. Let’s talk about ReducersThursday, 30 August 12
  44. 44. Let’s talk about Reducers Motivations • Performance • via less allocation • via parallelism (leverage Fork/Join)Thursday, 30 August 12
  45. 45. Let’s talk about Reducers Motivations Issues • Performance • Lists and Seqs are sequential • via less allocation • map / filter implies order • via parallelism (leverage Fork/Join)Thursday, 30 August 12
  46. 46. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) ()))Thursday, 30 August 12
  47. 47. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) ())) • RecursionThursday, 30 August 12
  48. 48. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) ())) • Recursion • OrderThursday, 30 August 12
  49. 49. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) ())) • Recursion • Order • Laziness (not shown)Thursday, 30 August 12
  50. 50. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) ())) • Recursion • Order • Laziness (not shown) • Consumes ListThursday, 30 August 12
  51. 51. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) ())) • Recursion • Order • Laziness (not shown) • Consumes List • Builds ListThursday, 30 August 12
  52. 52. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) ())) • Recursion • Order Oh, and it also applies the function • Laziness (not shown) to each item before putting the result • Consumes List into the new list • Builds ListThursday, 30 August 12
  53. 53. A closer look at what map does ;; a naive map implementation (defn map [f coll] (if (seq coll) (cons (f (first coll)) (map f (rest coll))) ())) This is what mapping means! • Recursion • Order Oh, and it also applies the function • Laziness (not shown) to each item before putting the result • Consumes List into the new list • Builds ListThursday, 30 August 12
  54. 54. Reduction TransformersThursday, 30 August 12
  55. 55. Reduction Transformers • Idea is to build map / filter on top of reduce to break from sequentialityThursday, 30 August 12
  56. 56. Reduction Transformers • Idea is to build map / filter on top of reduce to break from sequentiality • map / filter then builds nothing and consumes nothingThursday, 30 August 12
  57. 57. Reduction Transformers • Idea is to build map / filter on top of reduce to break from sequentiality • map / filter then builds nothing and consumes nothing • It changes what reduce means to the collection by transforming the reducing functionsThursday, 30 August 12
  58. 58. What map is really all about (defn mapping [f] (fn [f1] (fn [result input] (f1 result (f input)))))Thursday, 30 August 12
  59. 59. But wait! If map doesn’t consume the list any longer, who does? • reduce does! • Since Clojure 1.4 reduce lets the collection reduce itself (through the CollReduce / CollFold protocols) • Think of what this means for tree-like structures such as vectors • This is key to leveraging the Fork/Join frameworkThursday, 30 August 12
  60. 60. Now we can use mapping to create reducing functions (reduce ((mapping inc) +) 0 [1 2 3 4]) ;; 14Thursday, 30 August 12
  61. 61. Now we can use mapping to create reducing functions (reduce ((mapping inc) +) 0 [1 2 3 4]) ;; 14 (fn [result input] (+ result (inc input)))Thursday, 30 August 12
  62. 62. Now we can use mapping to create reducing functions (reduce ((mapping inc) conj) [] [1 2 3 4]) ;; [2 3 4 5]Thursday, 30 August 12
  63. 63. Now we can use mapping to create reducing functions (reduce ((mapping inc) conj) [] [1 2 3 4]) ;; [2 3 4 5] (fn [result input] (conj result (inc input)))Thursday, 30 August 12
  64. 64. Now we can use mapping to create reducing functions (reduce ((mapping inc) conj) [] [1 2 3 4]) ;; [2 3 4 5] (fn [result input] (conj result (inc input))) But it feels awkward to use it in this formThursday, 30 August 12
  65. 65. What do we have so far? • Performance has been improved due to less allocations • No intermediary lists need to be built (see Haskell’s StreamFusion [4]) • However reduce is still sequential [4] - http://bit.ly/streamFusionThursday, 30 August 12
  66. 66. Enters foldThursday, 30 August 12
  67. 67. Enters fold • Takes the sequentiality out or foldl, foldr and reduceThursday, 30 August 12
  68. 68. Enters fold • Takes the sequentiality out or foldl, foldr and reduce • Potentially parallel (fallsback to standard reduce otherwise)Thursday, 30 August 12
  69. 69. Enters fold • Takes the sequentiality out or foldl, foldr and reduce • Potentially parallel (fallsback to standard reduce otherwise) • Reduce/Combine strategy (think Fork/Join Framework)Thursday, 30 August 12
  70. 70. Enters fold • Takes the sequentiality out or foldl, foldr and reduce • Potentially parallel (fallsback to standard reduce otherwise) • Reduce/Combine strategy (think Fork/Join Framework) • Segments the collectionThursday, 30 August 12
  71. 71. Enters fold • Takes the sequentiality out or foldl, foldr and reduce • Potentially parallel (fallsback to standard reduce otherwise) • Reduce/Combine strategy (think Fork/Join Framework) • Segments the collection • Runs multiple reduces in parallelThursday, 30 August 12
  72. 72. Enters fold • Takes the sequentiality out or foldl, foldr and reduce • Potentially parallel (fallsback to standard reduce otherwise) • Reduce/Combine strategy (think Fork/Join Framework) • Segments the collection • Runs multiple reduces in parallel • Uses a combining function to join/reduce resultsThursday, 30 August 12
  73. 73. Enters fold • Takes the sequentiality out or foldl, foldr and reduce • Potentially parallel (fallsback to standard reduce otherwise) • Reduce/Combine strategy (think Fork/Join Framework) • Segments the collection • Runs multiple reduces in parallel • Uses a combining function to join/reduce results (defn fold [combinef reducef coll] ...)Thursday, 30 August 12
  74. 74. The combining function is a monoid • A binary function with an identity element • All the following functions are equivalent monoidsThursday, 30 August 12
  75. 75. The combining function is a monoid • A binary function with an identity element • All the following functions are equivalent monoids + (+ 2 3) ; 5 (+) ; 0Thursday, 30 August 12
  76. 76. The combining function is a monoid • A binary function with an identity element • All the following functions are equivalent monoids (defn my-+ ([] 0) ([a b] (+ a b))) (my-+ 2 3) ; 5 (my-+) ; 0Thursday, 30 August 12
  77. 77. The combining function is a monoid • A binary function with an identity element • All the following functions are equivalent monoids (require ‘[clojure.core.reducers :as r]) (def my-+ (r/monoid + (fn [] 0))) (my-+ 2 3) ; 5 (my-+) ; 0Thursday, 30 August 12
  78. 78. fold by examples ;; all examples assume the reducers library is available as r (ns reducers-playground.core (:require [clojure.core.reducers :as r]))Thursday, 30 August 12
  79. 79. fold by examples: increment all even positive integers up to 10 million and add them all upThursday, 30 August 12
  80. 80. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talkThursday, 30 August 12
  81. 81. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk (def my-vector (into [] (range 10000000)))Thursday, 30 August 12
  82. 82. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk (def my-vector (into [] (range 10000000))) (time (reduce + (map inc (filter even? my-vector))))Thursday, 30 August 12
  83. 83. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk (def my-vector (into [] (range 10000000))) (time (reduce + (map inc (filter even? my-vector)))) ;; 500msecsThursday, 30 August 12
  84. 84. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk (def my-vector (into [] (range 10000000))) (time (reduce + (map inc (filter even? my-vector)))) ;; 500msecs (time (reduce + (r/map inc (r/filter even? my-vector))))Thursday, 30 August 12
  85. 85. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk (def my-vector (into [] (range 10000000))) (time (reduce + (map inc (filter even? my-vector)))) ;; 500msecs (time (reduce + (r/map inc (r/filter even? my-vector)))) ;; 260msecsThursday, 30 August 12
  86. 86. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk (def my-vector (into [] (range 10000000))) (time (reduce + (map inc (filter even? my-vector)))) ;; 500msecs (time (reduce + (r/map inc (r/filter even? my-vector)))) ;; 260msecs (time (r/fold + (r/map inc (r/filter even? my-vector))))Thursday, 30 August 12
  87. 87. fold by examples: increment all even positive integers up to 10 million and add them all up ;; these were taken from Rich’s reducers talk (def my-vector (into [] (range 10000000))) (time (reduce + (map inc (filter even? my-vector)))) ;; 500msecs (time (reduce + (r/map inc (r/filter even? my-vector)))) ;; 260msecs (time (r/fold + (r/map inc (r/filter even? my-vector)))) ;; 130msecsThursday, 30 August 12
  88. 88. fold by examples: standard word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB (defn count-words [text] (reduce (fn [memo word] (assoc memo word (inc (get memo word 0)))) {} (map #(.toLowerCase %) (into [] (re-seq #"w+" text)))))Thursday, 30 August 12
  89. 89. fold by examples: standard word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB (defn count-words [text] (reduce (fn [memo word] (assoc memo word (inc (get memo word 0)))) {} (map #(.toLowerCase %) (into [] (re-seq #"w+" text))))) (time (count-words wiki-dump)) ;; 45 secsThursday, 30 August 12
  90. 90. fold by examples: parallel word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB (defn p-count-words [text] (r/fold (r/monoid (partial merge-with +) hash-map) (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text)))))Thursday, 30 August 12
  91. 91. fold by examples: parallel word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB (defn p-count-words [text] (r/fold (r/monoid (partial merge-with +) hash-map) Combining fn (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text)))))Thursday, 30 August 12
  92. 92. fold by examples: parallel word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB Will be called at the leaves to merge the (defn p-count-words [text] partial computations (r/fold (r/monoid (partial merge-with +) hash-map) (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text)))))Thursday, 30 August 12
  93. 93. fold by examples: parallel word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB Will be called with no arguments to (defn p-count-words [text] provide a seed value (r/fold (r/monoid (partial merge-with +) hash-map) (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text)))))Thursday, 30 August 12
  94. 94. fold by examples: parallel word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB (defn p-count-words [text] (r/fold (r/monoid (partial merge-with +) hash-map) (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text)))))Thursday, 30 August 12
  95. 95. fold by examples: parallel word count (def wiki-dump (slurp "subset-wiki-dump50")) ;50 MB (defn p-count-words [text] (r/fold (r/monoid (partial merge-with +) hash-map) (fn [memo word] (assoc memo word (inc (get memo word 0)))) (r/map #(.toLowerCase %) (into [] (re-seq #"w+" text))))) (time (p-count-words wiki-dump)) ;; 30 secsThursday, 30 August 12
  96. 96. fold by examples: Load 100k records into PostgreSQL (def records (into [] (line-seq (BufferedReader. (FileReader. "dump.txt")))))Thursday, 30 August 12
  97. 97. fold by examples: Load 100k records into PostgreSQL (time (doseq [record records] (let [tokens (clojure.string/split record #"t" )] (insert users/users (values { :account-id (nth tokens 0) ... })))))Thursday, 30 August 12
  98. 98. fold by examples: Load 100k records into PostgreSQL (time (doseq [record records] (let [tokens (clojure.string/split record #"t" )] (insert users/users (values { :account-id (nth tokens 0) ... }))))) ;; 90 secsThursday, 30 August 12
  99. 99. fold by examples: Load 100k records into PostgreSQL in parallel(time (r/fold + (r/map (fn [record] (let [tokens (clojure.string/split record #"t" )] (do (insert users/users (values { :account-id (nth tokens 0) ... })) 1))) records)))Thursday, 30 August 12
  100. 100. fold by examples: Load 100k records into PostgreSQL in parallel(time (r/fold + (r/map (fn [record] (let [tokens (clojure.string/split record #"t" )] (do (insert users/users (values { :account-id (nth tokens 0) ... })) 1))) records)));; 50 secsThursday, 30 August 12
  101. 101. When to use itThursday, 30 August 12
  102. 102. When to use it • Exploring decision treesThursday, 30 August 12
  103. 103. When to use it • Exploring decision trees • Image processingThursday, 30 August 12
  104. 104. When to use it • Exploring decision trees • Image processing • As a building block for bigger, distributed systems such as Datomic and Cascalog (maybe around parallel agregators)Thursday, 30 August 12
  105. 105. When to use it • Exploring decision trees • Image processing • As a building block for bigger, distributed systems such as Datomic and Cascalog (maybe around parallel agregators) • Basically any list intensive programThursday, 30 August 12
  106. 106. When to use it • Exploring decision trees • Image processing • As a building block for bigger, distributed systems such as Datomic and Cascalog (maybe around parallel agregators) • Basically any list intensive program But the tools are available to anyone so be creative!Thursday, 30 August 12
  107. 107. Resources • The Anatomy of a Reducer - http://bit.ly/anatomyReducers • Rich’s announcement post on Reducers - http://bit.ly/reducersANN • Rich Hickey - Reducers - EuroClojure 2012 - http://bit.ly/reducersVideo (this presentation was heavily inspired by this video) • The Source on github - http://bit.ly/reducersCore Leonardo Borges @leonardo_borges http://www.leonardoborges.com http://www.thoughtworks.comThursday, 30 August 12
  108. 108. Thanks! Questions? Leonardo Borges @leonardo_borges http://www.leonardoborges.com http://www.thoughtworks.comThursday, 30 August 12
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×