0
Map(), flatMap() and reduce()
are your new best friends:
simpler collections,
concurrency, and big data
Chris Richardson
Au...
@crichardson
Presentation goal
How functional programming simplifies your code
Show that
map(), flatMap() and reduce()
are r...
@crichardson
About Chris
@crichardson
About Chris
Founder of a buzzword compliant (stealthy, social, mobile, big data, machine
learning, ...) start...
@crichardson
Agenda
Why functional programming?
Simplifying collection processing
Simplifying concurrency with Futures and...
@crichardson
Functional programming is a programming paradigm
Functions are the building blocks of the application
Best do...
@crichardson
Functions as first class citizens
Assign functions to variables
Store functions in fields
Use and write higher-...
@crichardson
Avoids mutable state
Use:
Immutable data structures
Single assignment variables
Some functional languages suc...
@crichardson
Why functional programming?
"the highest goal of programming-
language design to enable good
ideas to be eleg...
@crichardson
Why functional programming?
More expressive
More intuitive - declarative code matches problem definition
Funct...
@crichardson
An ancient idea that has recently
become popular
@crichardson
Mathematical foundation:
λ-calculus
Introduced by
Alonzo Church in the 1930s
@crichardson
Lisp = an early functional language
invented in 1958
http://en.wikipedia.org/wiki/Lisp_(programming_language)...
@crichardson
My final year project in 1985:
Implementing SASL
sieve (p:xs) =
p : sieve [x | x <- xs, rem x p > 0];
primes =...
Mostly an Ivory Tower technology
Lisp was used for AI
FP languages: Miranda, ML,
Haskell, ...
“Side-effects kills
kittens a...
@crichardson
http://steve-yegge.blogspot.com/2010/12/haskell-researchers-announce-discovery.html
!*
!*
!*
@crichardson
But today FP is mainstream
Clojure - a dialect of Lisp
A hybrid OO/functional language
A hybrid OO/FP languag...
@crichardson
Java 8 lambda expressions are
functions x -> x * x
x -> {
for (int i = 2; i < Math.sqrt(x); i = i + 1) {
if (...
@crichardson
Agenda
Why functional programming?
Simplifying collection processing
Simplifying concurrency with Futures and...
@crichardson
Lot’s of application code
=
collection processing:
Mapping, filtering, and reducing
@crichardson
Social network example
public class Person {
enum Gender { MALE, FEMALE }
private Name name;
private LocalDat...
@crichardson
Typical iterative code - e.g. filtering
public class SocialNetwork {
private Set<Person> people;
...
public Se...
@crichardson
Problems with this style of programming
Low level
Imperative (how to do it) NOT declarative (what to do)
Verb...
@crichardson
Java 8 streams to the rescue
A sequence of elements
“Wrapper” around a collection (and other types: e.g. JarF...
@crichardson
public class SocialNetwork {
private Set<Person> people;
...
public Set<Person> peopleWithNoFriends() {
Set<P...
@crichardson
The filter() function
s1 a b c d e ...
s2 a c d ...
s2 = s1.filter(f)
Elements that satisfy predicate f
@crichardson
Using Java 8 streams - mapping
class Person ..
private Set<Friend> friends = ...;
public Set<Hometown> hometo...
@crichardson
The map() function
s1 a b c d e ...
s2 f(a) f(b) f(c) f(d) f(e) ...
s2 = s1.map(f)
@crichardson
Using Java 8 streams - friend of friends
using flatMap
class Person ..
public Set<Person> friendOfFriends() {
...
@crichardson
The flatMap() function
s1 a b ...
s2 f(a)0 f(a)1 f(b)0 f(b)1 f(b)2 ...
s2 = s1.flatMap(f)
@crichardson
Using Java 8 streams - reducing
public class SocialNetwork {
private Set<Person> people;
...
public long aver...
@crichardson
The reduce() function
s1 a b c d e ...
x = s1.reduce(initial, f)
f(f(f(f(f(f(initial, a), b), c), d), e), ...)
@crichardson
Adopting FP with Java 8 is
straightforward
Simply start using streams and lambdas
Eclipse can refactor anonym...
@crichardson
Agenda
Why functional programming?
Simplifying collection processing
Simplifying concurrency with Futures and...
@crichardson
Let’s imagine
that you are writing code to display the
products in a user’s wish list
@crichardson
The need for concurrency
Step #1
Web service request to get the user profile including wish list (list of prod...
@crichardson
Futures are a great abstraction for
composing concurrent operations
http://en.wikipedia.org/wiki/Futures_and_...
@crichardson
Worker thread or event-
driven code
Main thread
Composition with futures
Outcome
Future 2
Client
get Asynchro...
@crichardson
But composition with basic futures is
difficult
Java 7 future.get([timeout]):
Blocking API client blocks threa...
@crichardson
Functional futures - Scala, Java 8 CompletableFuture
def asyncPlus(x : Int, y : Int) : Future[Int] = ... x + ...
@crichardson
Functions like map() are asynchronous
someFn(outcome1)
f2
f2 = f1 map (someFn) Outcome1
f1
Implemented using ...
@crichardson
class WishListService(...) {
def getWishList(userId : Long) : Future[WishList] = {
userService.getUserProfile...
@crichardson
Your mouse is your database
Erik Meijer
http://queue.acm.org/detail.cfm?id=2169076
@crichardson
Introducing Reactive Extensions (Rx)
The Reactive Extensions (Rx) is a library for composing asynchronous and...
@crichardson
About RxJava
Reactive Extensions (Rx) for the JVM
Original motivation for Netflix was to provide rich Futures
...
@crichardson
RxJava core concepts
trait Observable[T] {
def subscribe(observer : Observer[T]) : Subscription
...
}
trait O...
Comparing Observable to...
Observer pattern - similar but adds
Observer.onComplete()
Observer.onError()
Iterator pattern -...
@crichardson
Fun with observables
val every10Seconds = Observable.interval(10 seconds)
-1 0 1 ...
t=0 t=10 t=20 ...
val on...
@crichardson
def getTableStatus(tableName: String) : Observable[DynamoDbStatus]=
Observable { subscriber: Subscriber[Dynam...
@crichardson
Transforming observables
val tableStatus : Observable[DynamoDbMessage] = ticker.flatMap { i =>
logger.info("{...
@crichardson
Calculating rolling average
class AverageTradePriceCalculator {
def calculateAverages(trades: Observable[Trad...
@crichardson
Calculating average prices
def calculateAverages(trades: Observable[Trade]): Observable[AveragePrice] = {
tra...
@crichardson
Agenda
Why functional programming?
Simplifying collection processing
Simplifying concurrency with Futures and...
@crichardson
Let’s imagine that you want to count
word frequencies
@crichardson
Scala Word Count
val frequency : Map[String, Int] =
Source.fromFile("gettysburgaddress.txt").getLines()
.flat...
@crichardson
But how to scale to a cluster of
machines?
@crichardson
Apache Hadoop
Open-source software for reliable, scalable, distributed computing
Hadoop Distributed File Syst...
@crichardson
Overview of MapReduce
Input
Data
Mapper
Mapper
Mapper
Reducer
Reducer
Reducer
Output
Data
Shuffle
(K,V)
(K,V)
...
@crichardson
MapReduce Word count - mapper
class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final...
@crichardson
Hadoop then shuffles the key-value
pairs...
@crichardson
MapReduce Word count - reducer
class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public vo...
@crichardson
About MapReduce
Very simple programming abstract yet incredibly powerful
By chaining together multiple map/re...
@crichardson
Scalding: Scala DSL for MapReduce
class WordCountJob(args : Args) extends Job(args) {
TextLine( args("input")...
@crichardson
Apache Spark
Part of the Hadoop ecosystem
Key abstraction = Resilient Distributed Datasets (RDD)
Collection t...
@crichardson
Spark Word Count
val sc = new SparkContext(...)
sc.textFile("s3n://mybucket/...")
.flatMap { _.split(" ")}
.g...
@crichardson
Summary
Functional programming enables the elegant expression of good ideas in a wide
variety of domains
map(...
@crichardson
Questions?
@crichardson chris@chrisrichardson.net
http://plainoldobjects.com
Upcoming SlideShare
Loading in...5
×

Map, Flatmap and Reduce are Your New Best Friends: Simpler Collections, Concurrency, and Big Data (#oscon)

7,138

Published on

Higher-order functions such as map(), flatmap(), filter() and reduce() have their origins in mathematics and ancient functional programming languages such as Lisp. But today they have entered the mainstream and are available in languages such as JavaScript, Scala and Java 8. They are well on their way to becoming an essential part of every developer’s toolbox.

In this talk you will learn how these and other higher-order functions enable you to write simple, expressive and concise code that solve problems in a diverse set of domains. We will describe how you use them to process collections in Java and Scala. You will learn how functional Futures and Rx (Reactive Extensions) Observables simplify concurrent code. We will even talk about how to write big data applications in a functional style using libraries such as Scalding.

Published in: Software
0 Comments
13 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,138
On Slideshare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
63
Comments
0
Likes
13
Embeds 0
No embeds

No notes for slide

Transcript of "Map, Flatmap and Reduce are Your New Best Friends: Simpler Collections, Concurrency, and Big Data (#oscon)"

  1. 1. Map(), flatMap() and reduce() are your new best friends: simpler collections, concurrency, and big data Chris Richardson Author of POJOs in Action Founder of the original CloudFoundry.com @crichardson chris@chrisrichardson.net http://plainoldobjects.com
  2. 2. @crichardson Presentation goal How functional programming simplifies your code Show that map(), flatMap() and reduce() are remarkably versatile functions
  3. 3. @crichardson About Chris
  4. 4. @crichardson About Chris Founder of a buzzword compliant (stealthy, social, mobile, big data, machine learning, ...) startup Consultant helping organizations improve how they architect and deploy applications using cloud, micro services, polyglot applications, NoSQL, ...
  5. 5. @crichardson Agenda Why functional programming? Simplifying collection processing Simplifying concurrency with Futures and Rx Observables Tackling big data problems with functional programming
  6. 6. @crichardson Functional programming is a programming paradigm Functions are the building blocks of the application Best done in a functional programming language
  7. 7. @crichardson Functions as first class citizens Assign functions to variables Store functions in fields Use and write higher-order functions: Pass functions as arguments Return functions as values
  8. 8. @crichardson Avoids mutable state Use: Immutable data structures Single assignment variables Some functional languages such as Haskell don’t allow side-effects
  9. 9. @crichardson Why functional programming? "the highest goal of programming- language design to enable good ideas to be elegantly expressed" http://en.wikipedia.org/wiki/Tony_Hoare
  10. 10. @crichardson Why functional programming? More expressive More intuitive - declarative code matches problem definition Functional code is usually much more composable Immutable state: Less error-prone Easy parallelization and concurrency But be pragmatic
  11. 11. @crichardson An ancient idea that has recently become popular
  12. 12. @crichardson Mathematical foundation: λ-calculus Introduced by Alonzo Church in the 1930s
  13. 13. @crichardson Lisp = an early functional language invented in 1958 http://en.wikipedia.org/wiki/Lisp_(programming_language) 1940 1950 1960 1970 1980 1990 2000 2010 garbage collection dynamic typing self-hosting compiler tree data structures (defun factorial (n) (if (<= n 1) 1 (* n (factorial (- n 1)))))
  14. 14. @crichardson My final year project in 1985: Implementing SASL sieve (p:xs) = p : sieve [x | x <- xs, rem x p > 0]; primes = sieve [2..] A list of integers starting with 2 Filter out multiples of p
  15. 15. Mostly an Ivory Tower technology Lisp was used for AI FP languages: Miranda, ML, Haskell, ... “Side-effects kills kittens and puppies”
  16. 16. @crichardson http://steve-yegge.blogspot.com/2010/12/haskell-researchers-announce-discovery.html !* !* !*
  17. 17. @crichardson But today FP is mainstream Clojure - a dialect of Lisp A hybrid OO/functional language A hybrid OO/FP language for .NET Java 8 has lambda expressions
  18. 18. @crichardson Java 8 lambda expressions are functions x -> x * x x -> { for (int i = 2; i < Math.sqrt(x); i = i + 1) { if (x % i == 0) return false; } return true; }; (x, y) -> x * x + y * y An instance of an anonymous inner class that implements a functional interface (kinda)
  19. 19. @crichardson Agenda Why functional programming? Simplifying collection processing Simplifying concurrency with Futures and Rx Observables Tackling big data problems with functional programming
  20. 20. @crichardson Lot’s of application code = collection processing: Mapping, filtering, and reducing
  21. 21. @crichardson Social network example public class Person { enum Gender { MALE, FEMALE } private Name name; private LocalDate birthday; private Gender gender; private Hometown hometown; private Set<Friend> friends = new HashSet<Friend>(); .... public class Friend { private Person friend; private LocalDate becameFriends; ... } public class SocialNetwork { private Set<Person> people; ...
  22. 22. @crichardson Typical iterative code - e.g. filtering public class SocialNetwork { private Set<Person> people; ... public Set<Person> lonelyPeople() { Set<Person> result = new HashSet<Person>(); for (Person p : people) { if (p.getFriends().isEmpty()) result.add(p); } return result; } Declare result variable Modify result Return result Iterate
  23. 23. @crichardson Problems with this style of programming Low level Imperative (how to do it) NOT declarative (what to do) Verbose Mutable variables are potentially error prone Difficult to parallelize
  24. 24. @crichardson Java 8 streams to the rescue A sequence of elements “Wrapper” around a collection (and other types: e.g. JarFile.stream(), Files.lines()) Streams can also be infinite Provides a functional/lambda-based API for transforming, filtering and aggregating elements Much simpler, cleaner and declarative code
  25. 25. @crichardson public class SocialNetwork { private Set<Person> people; ... public Set<Person> peopleWithNoFriends() { Set<Person> result = new HashSet<Person>(); for (Person p : people) { if (p.getFriends().isEmpty()) result.add(p); } return result; } Using Java 8 streams - filtering public class SocialNetwork { private Set<Person> people; ... public Set<Person> lonelyPeople() { return people.stream() .filter(p -> p.getFriends().isEmpty()) .collect(Collectors.toSet()); } predicate lambda expression
  26. 26. @crichardson The filter() function s1 a b c d e ... s2 a c d ... s2 = s1.filter(f) Elements that satisfy predicate f
  27. 27. @crichardson Using Java 8 streams - mapping class Person .. private Set<Friend> friends = ...; public Set<Hometown> hometownsOfFriends() { return friends.stream() .map(f -> f.getPerson().getHometown()) .collect(Collectors.toSet()); }
  28. 28. @crichardson The map() function s1 a b c d e ... s2 f(a) f(b) f(c) f(d) f(e) ... s2 = s1.map(f)
  29. 29. @crichardson Using Java 8 streams - friend of friends using flatMap class Person .. public Set<Person> friendOfFriends() { return friends.stream() .flatMap(friend -> friend.getPerson().friends.stream()) .map(Friend::getPerson) .filter(f -> f != this) .collect(Collectors.toSet()); } maps and flattens
  30. 30. @crichardson The flatMap() function s1 a b ... s2 f(a)0 f(a)1 f(b)0 f(b)1 f(b)2 ... s2 = s1.flatMap(f)
  31. 31. @crichardson Using Java 8 streams - reducing public class SocialNetwork { private Set<Person> people; ... public long averageNumberOfFriends() { return people.stream() .map ( p -> p.getFriends().size() ) .reduce(0, (x, y) -> x + y) / people.size(); } int x = 0; for (int y : inputStream) x = x + y return x;
  32. 32. @crichardson The reduce() function s1 a b c d e ... x = s1.reduce(initial, f) f(f(f(f(f(f(initial, a), b), c), d), e), ...)
  33. 33. @crichardson Adopting FP with Java 8 is straightforward Simply start using streams and lambdas Eclipse can refactor anonymous inner classes to lambdas
  34. 34. @crichardson Agenda Why functional programming? Simplifying collection processing Simplifying concurrency with Futures and Rx Observables Tackling big data problems with functional programming
  35. 35. @crichardson Let’s imagine that you are writing code to display the products in a user’s wish list
  36. 36. @crichardson The need for concurrency Step #1 Web service request to get the user profile including wish list (list of product Ids) Step #2 For each productId: web service request to get product info But Getting products sequentially terrible response time Need fetch productInfo concurrently Composing sequential + scatter/gather-style operations is very common
  37. 37. @crichardson Futures are a great abstraction for composing concurrent operations http://en.wikipedia.org/wiki/Futures_and_promises
  38. 38. @crichardson Worker thread or event- driven code Main thread Composition with futures Outcome Future 2 Client get Asynchronous operation 2 set initiates Asynchronous operation 1 Outcome Future 1 get set
  39. 39. @crichardson But composition with basic futures is difficult Java 7 future.get([timeout]): Blocking API client blocks thread Difficult to compose multiple concurrent operations Futures with callbacks: e.g. Guava ListenableFutures, Spring 4 ListenableFuture Attach callbacks to all futures and asynchronously consume outcomes But callback-based code = messy code See http://techblog.netflix.com/2013/02/rxjava-netflix-api.html We need functional futures!
  40. 40. @crichardson Functional futures - Scala, Java 8 CompletableFuture def asyncPlus(x : Int, y : Int) : Future[Int] = ... x + y ... val future2 = asyncPlus(4, 5).map{ _ * 3 } assertEquals(27, Await.result(future2, 1 second)) Asynchronously transforms future def asyncSquare(x : Int) : Future[Int] = ... x * x ... val f2 = asyncPlus(5, 8).flatMap { x => asyncSquare(x) } assertEquals(169, Await.result(f2, 1 second)) Calls asyncSquare() with the eventual outcome of asyncPlus()
  41. 41. @crichardson Functions like map() are asynchronous someFn(outcome1) f2 f2 = f1 map (someFn) Outcome1 f1 Implemented using callbacks
  42. 42. @crichardson class WishListService(...) { def getWishList(userId : Long) : Future[WishList] = { userService.getUserProfile(userId). Scala wish list service Java 8 Completable Futures let you write similar code Future[UserProfile] map { userProfile => userProfile.wishListProductIds}. flatMap { productIds => val listOfProductFutures = productIds map productInfoService.getProductInfo Future.sequence(listOfProductFutures) }. map { products => WishList(products) } Future[List[Long]] List[Future[ProductInfo]] Future[List[ProductInfo]] Future[WishList]
  43. 43. @crichardson Your mouse is your database Erik Meijer http://queue.acm.org/detail.cfm?id=2169076
  44. 44. @crichardson Introducing Reactive Extensions (Rx) The Reactive Extensions (Rx) is a library for composing asynchronous and event-based programs using observable sequences and LINQ-style query operators. Using Rx, developers represent asynchronous data streams with Observables , query asynchronous data streams using LINQ operators , and ..... https://rx.codeplex.com/
  45. 45. @crichardson About RxJava Reactive Extensions (Rx) for the JVM Original motivation for Netflix was to provide rich Futures Implemented in Java Adaptors for Scala, Groovy and Clojure Embraced by Akka and Spring Reactor: http://www.reactive-streams.org/ https://github.com/Netflix/RxJava
  46. 46. @crichardson RxJava core concepts trait Observable[T] { def subscribe(observer : Observer[T]) : Subscription ... } trait Observer[T] { def onNext(value : T) def onCompleted() def onError(e : Throwable) } Notifies An asynchronous stream of items Used to unsubscribe
  47. 47. Comparing Observable to... Observer pattern - similar but adds Observer.onComplete() Observer.onError() Iterator pattern - mirror image Push rather than pull Futures - similar Can be used as Futures But Observables = a stream of multiple values Collections and Streams - similar Functional API supporting map(), flatMap(), ... But Observables are asynchronous
  48. 48. @crichardson Fun with observables val every10Seconds = Observable.interval(10 seconds) -1 0 1 ... t=0 t=10 t=20 ... val oneItem = Observable.items(-1L) val ticker = oneItem ++ every10Seconds val subscription = ticker.subscribe { (value: Long) => println("value=" + value) } ... subscription.unsubscribe()
  49. 49. @crichardson def getTableStatus(tableName: String) : Observable[DynamoDbStatus]= Observable { subscriber: Subscriber[DynamoDbStatus] => } Connecting observables to the outside world amazonDynamoDBAsyncClient.describeTableAsync( new DescribeTableRequest(tableName), new AsyncHandler[DescribeTableRequest, DescribeTableResult] { override def onSuccess(request: DescribeTableRequest, result: DescribeTableResult) = { subscriber.onNext(DynamoDbStatus(result.getTable.getTableStatus)) subscriber.onCompleted() } override def onError(exception: Exception) = exception match { case t: ResourceNotFoundException => subscriber.onNext(DynamoDbStatus("NOT_FOUND")) subscriber.onCompleted() case _ => subscriber.onError(exception) } }) } Called once per subscriber Asynchronously gets information about DynamoDB table
  50. 50. @crichardson Transforming observables val tableStatus : Observable[DynamoDbMessage] = ticker.flatMap { i => logger.info("{}th describe table", i + 1) getTableStatus(name) } Status1 Status2 Status3 ... t=0 t=10 t=20 ... + Usual collection methods: map(), filter(), take(), drop(), ...
  51. 51. @crichardson Calculating rolling average class AverageTradePriceCalculator { def calculateAverages(trades: Observable[Trade]): Observable[AveragePrice] = { ... } case class Trade( symbol : String, price : Double, quantity : Int ... ) case class AveragePrice( symbol : String, price : Double, ...)
  52. 52. @crichardson Calculating average prices def calculateAverages(trades: Observable[Trade]): Observable[AveragePrice] = { trades.groupBy(_.symbol). map { case (symbol, tradesForSymbol) => val openingEverySecond = Observable.items(-1L) ++ Observable.interval(1 seconds) def closingAfterSixSeconds(opening: Any) = Observable.interval(6 seconds).take(1) tradesForSymbol.window(openingEverySecond, closingAfterSixSeconds _).map { windowOfTradesForSymbol => windowOfTradesForSymbol.fold((0.0, 0, List[Double]())) { (soFar, trade) => val (sum, count, prices) = soFar (sum + trade.price, count + trade.quantity, trade.price +: prices) } map { case (sum, length, prices) => AveragePrice(symbol, sum / length, prices) } }.flatten }.flatten } Create an Observable of per-symbol Observables Create an Observable of per-symbol Observables
  53. 53. @crichardson Agenda Why functional programming? Simplifying collection processing Simplifying concurrency with Futures and Rx Observables Tackling big data problems with functional programming
  54. 54. @crichardson Let’s imagine that you want to count word frequencies
  55. 55. @crichardson Scala Word Count val frequency : Map[String, Int] = Source.fromFile("gettysburgaddress.txt").getLines() .flatMap { _.split(" ") }.toList frequency("THE") should be(11) frequency("LIBERTY") should be(1) .groupBy(identity) .mapValues(_.length)) Map Reduce
  56. 56. @crichardson But how to scale to a cluster of machines?
  57. 57. @crichardson Apache Hadoop Open-source software for reliable, scalable, distributed computing Hadoop Distributed File System (HDFS) Efficiently stores very large amounts of data Files are partitioned and replicated across multiple machines Hadoop MapReduce Batch processing system Provides plumbing for writing distributed jobs Handles failures ...
  58. 58. @crichardson Overview of MapReduce Input Data Mapper Mapper Mapper Reducer Reducer Reducer Output Data Shuffle (K,V) (K,V) (K,V) (K,V)* (K,V)* (K,V)* (K1,V, ....)* (K2,V, ....)* (K3,V, ....)* (K,V) (K,V) (K,V)
  59. 59. @crichardson MapReduce Word count - mapper class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } (“Four”, 1), (“score”, 1), (“and”, 1), (“seven”, 1), ... Four score and seven years http://wiki.apache.org/hadoop/WordCount
  60. 60. @crichardson Hadoop then shuffles the key-value pairs...
  61. 61. @crichardson MapReduce Word count - reducer class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } (“the”, 11) (“the”, (1, 1, 1, 1, 1, 1, ...)) http://wiki.apache.org/hadoop/WordCount
  62. 62. @crichardson About MapReduce Very simple programming abstract yet incredibly powerful By chaining together multiple map/reduce jobs you can process very large amounts of data in interesting ways e.g. Apache Mahout for machine learning But Mappers and Reducers = verbose code Development is challenging, e.g. unit testing is difficult It’s disk-based, batch processing slow
  63. 63. @crichardson Scalding: Scala DSL for MapReduce class WordCountJob(args : Args) extends Job(args) { TextLine( args("input") ) .flatMap('line -> 'word) { line : String => tokenize(line) } .groupBy('word) { _.size } .write( Tsv( args("output") ) ) def tokenize(text : String) : Array[String] = { text.toLowerCase.replaceAll("[^a-zA-Z0-9s]", "") .split("s+") } } https://github.com/twitter/scalding Expressive and unit testable Each row is a map of named fields
  64. 64. @crichardson Apache Spark Part of the Hadoop ecosystem Key abstraction = Resilient Distributed Datasets (RDD) Collection that is partitioned across cluster members Operations are parallelized Created from either a Scala collection or a Hadoop supported datasource - HDFS, S3 etc Can be cached in-memory for super-fast performance Can be replicated for fault-tolerance REPL for executing ad hoc queries http://spark.apache.org
  65. 65. @crichardson Spark Word Count val sc = new SparkContext(...) sc.textFile("s3n://mybucket/...") .flatMap { _.split(" ")} .groupBy(identity) .mapValues(_.length) .toArray.toMap } } Expressive, unit testable and very fast
  66. 66. @crichardson Summary Functional programming enables the elegant expression of good ideas in a wide variety of domains map(), flatMap() and reduce() are remarkably versatile higher-order functions Use FP and OOP together Java 8 has taken a good first step towards supporting FP
  67. 67. @crichardson Questions? @crichardson chris@chrisrichardson.net http://plainoldobjects.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×