Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Map, flatmap and reduce are your new best friends (javaone, svcc)

1,207 views

Published on

Higher-order functions such as map(), flatmap(), filter() and reduce() have their origins in mathematics and ancient functional programming languages such as Lisp. But today they have entered the mainstream and are available in languages such as JavaScript, Scala and Java 8. They are well on their way to becoming an essential part of every developer’s toolbox. In this talk you will learn how these and other higher-order functions enable you to write simple, expressive and concise code that solve problems in a diverse set of domains. We will describe how you use them to process collections in Java and Scala. You will learn how functional Futures and Rx (Reactive Extensions) Observables simplify concurrent code. We will even talk about how to write big data applications in a functional style using libraries such as Scalding.

Published in: Software

Map, flatmap and reduce are your new best friends (javaone, svcc)

  1. 1. Map(), flatMap() and reduce() are your @crichardson new best friends: Simpler collections, concurrency, and big data Chris Richardson Author of POJOs in Action Founder of the original CloudFoundry.com @crichardson chris@chrisrichardson.net http://plainoldobjects.com
  2. 2. Presentation goal How functional programming simplifies @crichardson your code Show that map(), flatMap() and reduce() are remarkably versatile functions
  3. 3. @crichardson About Chris
  4. 4. @crichardson About Chris Founder of a buzzword compliant (stealthy, social, mobile, big data, machine learning, ...) startup Consultant helping organizations improve how they architect and deploy applications using cloud, micro services, polyglot applications, NoSQL, ...
  5. 5. @crichardson Agenda Why functional programming? Simplifying collection processing Eliminating NullPointerExceptions Simplifying concurrency with Futures and Rx Observables Tackling big data problems with functional programming
  6. 6. Functional programming is a programming @crichardson paradigm Functions are the building blocks of the application Best done in a functional programming language
  7. 7. @crichardson Functions as first class citizens Assign functions to variables Store functions in fields Use and write higher-order functions: Take functions as parameters Return functions as values
  8. 8. @crichardson Avoids mutable state Use: Immutable data structures Single assignment variables Some functional languages such as Haskell don’t allow side-effects
  9. 9. Why functional programming? "the highest goal of programming-language design to enable good ideas to be elegantly @crichardson expressed" http://en.wikipedia.org/wiki/Tony_Hoare
  10. 10. Why functional programming? @crichardson More expressive More concise More intuitive - solution matches problem definition Functional code is usually much more composable Immutable state: Less error-prone Easy parallelization and concurrency But be pragmatic
  11. 11. @crichardson An ancient idea that has recently become popular
  12. 12. @crichardson Mathematical foundation: λ-calculus Introduced by Alonzo Church in the 1930s
  13. 13. @crichardson Lisp = an early functional language invented in 1958 http://en.wikipedia.org/wiki/ Lisp_(programming_language) 2010 2000 1990 1980 1970 1960 1950 1940 garbage collection dynamic typing self-hosting compiler tree data structures (defun factorial (n) (if (<= n 1) 1 (* n (factorial (- n 1)))))
  14. 14. My final year project in 1985: Implementing SASL in LISP Filter out multiples of p sieve (p:xs) = p : sieve [x | x <- xs, rem x p > 0]; primes = sieve [2..] @crichardson A list of integers starting with 2
  15. 15. Mostly an Ivory Tower technology Lisp was used for AI FP languages: Miranda, ML, Haskell, ... “Side-effects kills kittens and puppies”
  16. 16. http://steve-yegge.blogspot.com/2010/12/haskell-researchers-announce-discovery.html @crichardson !* !* !*
  17. 17. But today FP is mainstream @crichardson Clojure - a dialect of Lisp A hybrid OO/functional language A hybrid OO/FP language for .NET Java 8 has lambda expressions
  18. 18. @crichardson Java 8 lambda expressions are functions x -> x * x x -> { for (int i = 2; i < Math.sqrt(x); i = i + 1) { if (x % i == 0) return false; } return true; }; (x, y) -> x * x + y * y
  19. 19. @crichardson Agenda Why functional programming? Simplifying collection processing Eliminating NullPointerExceptions Simplifying concurrency with Futures and Rx Observables Tackling big data problems with functional programming
  20. 20. @crichardson Lot’s of application code = collection processing: Mapping, filtering, and reducing
  21. 21. @crichardson Social network example public class Person { enum Gender { MALE, FEMALE } private Name name; private LocalDate birthday; private Gender gender; private Hometown hometown; private Set<Friend> friends = new HashSet<Friend>(); .... public class Friend { private Person friend; private LocalDate becameFriends; ... } public class SocialNetwork { private Set<Person> people; ...
  22. 22. @crichardson Mapping, filtering, and reducing public class Person { public Set<Hometown> hometownsOfFriends() { Set<Hometown> result = new HashSet<>(); for (Friend friend : friends) { result.add(friend.getPerson().getHometown()); } return result; } Declare result variable Modify result Return result Iterate
  23. 23. @crichardson Mapping, filtering, and reducing public class SocialNetwork { private Set<Person> people; ... public Set<Person> lonelyPeople() { Set<Person> result = new HashSet<Person>(); for (Person p : people) { if (p.getFriends().isEmpty()) result.add(p); } return result; } Declare result variable Modify result Return result Iterate
  24. 24. Iterate @crichardson Mapping, filtering, and reducing public class SocialNetwork { private Set<Person> people; ... public int averageNumberOfFriends() { int sum = 0; for (Person p : people) { sum += p.getFriends().size(); } return sum / people.size(); } Declare scalar result variable Modify result Return result
  25. 25. @crichardson Problems with this style of programming Lots of verbose boilerplate - basic operations require 5+ LOC Imperative (how to do it) NOT declarative (what to do) Mutable variables are potentially error prone Difficult to parallelize
  26. 26. Java 8 streams to the rescue A sequence of elements “Wrapper” around a collection Streams are lazy, i.e. can be infinite Provides a functional/lambda-based API for transforming, filtering and aggregating elements Much simpler, cleaner and @crichardson declarative code
  27. 27. @crichardson Using Java 8 streams - mapping class Person .. private Set<Friend> friends = ...; public Set<Hometown> hometownsOfFriends() { return friends.stream() .map(f -> f.getPerson().getHometown()) .collect(Collectors.toSet()); } transforming lambda expression
  28. 28. @crichardson The map() function s1 a b c d e ... s2 = s1.map(f) s2 f(a) f(b) f(c) f(d) f(e) ...
  29. 29. @crichardson Using Java 8 streams - filtering public class SocialNetwork { private Set<Person> people; ... public Set<Person> lonelyPeople() { return people.stream() .filter(p -> p.getFriends().isEmpty()) .collect(Collectors.toSet()); } predicate lambda expression
  30. 30. Using Java 8 streams - friend of friends V1 @crichardson class Person .. public Set<Person> friendOfFriends() { Set<Set<Friend>> fof = friends.stream() .map(friend -> friend.getPerson().friends) .collect(Collectors.toSet()); ... } Using map() => Set of Sets :-( Somehow we need to flatten
  31. 31. @crichardson Using Java 8 streams - mapping class Person .. public Set<Person> friendOfFriends() { return friends.stream() .flatMap(friend -> friend.getPerson().friends.stream()) .map(Friend::getPerson) .filter(person -> person != this) .collect(Collectors.toSet()); } maps and flattens
  32. 32. @crichardson Chaining with flatMap() s1 a b ... s2 = s1.flatMap(f) s2 f(a)0 f(a)1 f(b)0 f(b)1 f(b)2 ...
  33. 33. @crichardson Using Java 8 streams - reducing public class SocialNetwork { private Set<Person> people; ... public long averageNumberOfFriends() { return people.stream() .map ( p -> p.getFriends().size() ) .reduce(0, (x, y) -> x + y) / people.size(); } int x = 0; for (int y : inputStream) x = x + y return x;
  34. 34. @crichardson The reduce() function s1 a b c d e ... x = s1.reduce(initial, f) f(f(f(f(f(f(initial, a), b), c), d), e), ...)
  35. 35. @crichardson Newton's method for calculating sqrt(x) It’s an iterative algorithm initial value = guess betterValue = value - (value * value - x) / (2 * value) Iterate until |value - betterValue| < precision
  36. 36. Functional square root in Scala Creates an infinite stream: seed, f(seed), f(f(seed)), ..... @crichardson package net.chrisrichardson.fp.scala.squareroot object SquareRootCalculator { def squareRoot(x: Double, precision: Double) : Double = Stream.iterate(x / 2)( value => value - (value * value - x) / (2 * value) ). sliding(2).map( s => (s.head, s.last)). find { case (value , newValue) => Math.abs(value - newValue) < precision}. get._2 } a, b, c, ... => (a, b), (b, c), (c, ...), ... Find the first convergent approximation
  37. 37. @crichardson Adopting FP with Java 8 is straightforward Switch your application to Java 8 Start using streams and lambdas Eclipse can refactor anonymous inner classes to lambdas Or write modules in Scala: more expressive and runs on older JVMs
  38. 38. @crichardson Agenda Why functional programming? Simplifying collection processing Eliminating NullPointerExceptions Simplifying concurrency with Futures and Rx Observables Tackling big data problems with functional programming
  39. 39. @crichardson Tony’s $1B mistake “I call it my billion-dollar mistake. It was the invention of the null reference in 1965....But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement...” http://qconlondon.com/london-2009/presentation/ Null+References:+The+Billion+Dollar+Mistake
  40. 40. Return null if no friends @crichardson Coding with null pointers class Person public Friend longestFriendship() { Friend result = null; for (Friend friend : friends) { if (result == null || friend.getBecameFriends() .isBefore(result.getBecameFriends())) result = friend; } return result; } Friend oldestFriend = person.longestFriendship(); if (oldestFriend != null) { ... } else { ... } Null check is essential yet easily forgotten
  41. 41. @crichardson Java 8 Optional<T> A wrapper for nullable references It has two states: empty ⇒ throws an exception if you try to get the reference non-empty ⇒ contain a non-null reference Provides methods for: testing whether it has a value, getting the value, ... Use an Optional<T> parameter if caller can pass in null Return reference wrapped in an instance of this type instead of null Uses the type system to explicitly represent nullability
  42. 42. @crichardson Coding with optionals class Person public Optional<Friend> longestFriendship() { Friend result = null; for (Friend friend : friends) { if (result == null || friend.getBecameFriends().isBefore(result.getBecameFriends())) result = friend; } return Optional.ofNullable(result); } Optional<Friend> oldestFriend = person.longestFriendship(); // Might throw java.util.NoSuchElementException: No value present // Person dangerous = popularPerson.get(); if (oldestFriend.isPresent) { ...oldestFriend.get() } else { ... }
  43. 43. Friend whoToCall2 = oldestFriendship.orElseGet(() -> lazilyFindSomeoneElse()); @crichardson Using Optionals - better Optional<Friend> oldestFriendship = ...; Friend whoToCall1 = oldestFriendship.orElse(mother); Friend whoToCall3 = oldestFriendship.orElseThrow( () -> new LonelyPersonException()); Avoid calling isPresent() and get()
  44. 44. @crichardson Transforming with map() public class Person { public Optional<Friend> longestFriendship() { return ...; } public Optional<Long> ageDifferenceWithOldestFriend() { Optional<Friend> oldestFriend = longestFriendship(); return oldestFriend.map ( of -> Math.abs(of.getPerson().getAge() - getAge())) ); } Eliminates messy conditional logic
  45. 45. @crichardson Chaining with flatMap() class Person public Optional<Friend> longestFriendship() {...} public Optional<Friend> longestFriendshipOfLongestFriend() { return longestFriendship() .flatMap(friend -> friend.getPerson().longestFriendship()); } not always a symmetric relationship. :-)
  46. 46. @crichardson Agenda Why functional programming? Simplifying collection processing Eliminating NullPointerExceptions Simplifying concurrency with Futures and Rx Observables Tackling big data problems with functional programming
  47. 47. Let’s imagine you are performing a CPU intensive operation @crichardson class Person .. public Set<Hometown> hometownsOfFriends() { return friends.stream() .map(f -> cpuIntensiveOperation(f)) .collect(Collectors.toSet()); }
  48. 48. Parallel streams = simple concurrency Potentially uses N cores @crichardson class Person .. public Set<Hometown> hometownsOfFriends() { return friends.parallelStream() .map(f -> cpuIntensiveOperation(f)) .collect(Collectors.toSet()); } ⇒ Nx speed up Perhaps this will be faster. Perhaps not
  49. 49. Let’s imagine that you are writing code to display the products in a user’s wish list @crichardson
  50. 50. @crichardson The need for concurrency Step #1 Web service request to get the user profile including wish list (list of product Ids) Step #2 For each productId: web service request to get product info Sequentially ⇒ terrible response time Need fetch productInfo concurrently Composing sequential + scatter/gather-style operations is very common
  51. 51. @crichardson Futures are a great concurrency abstraction http://en.wikipedia.org/wiki/Futures_and_promises
  52. 52. Composition with futures Worker thread or event-driven code @crichardson Main thread Future 1 Outcome Future 2 Client get Asynchronous operation 2 set initiates Asynchronous operation 1 Outcome get set
  53. 53. @crichardson Benefits Simple way for multiple concurrent activities to communicate safely Abstraction: Client does not know how the asynchronous operation is implemented, e.g. thread pool, event-driven, .... Easy to implement scatter/gather: Scatter: Client can invoke multiple asynchronous operations and gets a Future for each one. Gather: Get values from the futures
  54. 54. @crichardson But composition with basic futures is difficult Java 7 future.get([timeout]): Blocking API ⇒ client blocks thread ⇒ poor scalability Difficult to compose multiple concurrent operations Futures with callbacks: e.g. Guava ListenableFutures, Spring 4 ListenableFuture Attach callbacks to all futures and asynchronously consume outcomes But callback-based code = messy code See http://techblog.netflix.com/2013/02/rxjava-netflix-api.html We need functional futures!
  55. 55. Asynchronously transforms future Calls asyncSquare() with the eventual outcome of asyncPlus(), i.e. chaining @crichardson Functional futures - Scala, Java 8 CompletableFuture def asyncPlus(x : Int, y :Int): Future[Int] = ... x + y ... val future2 = asyncPlus(4, 5).map{ _ * 3 } assertEquals(27, Await.result(future2, 1 second)) def asyncSquare(x : Int) : Future[Int] = ... x * x ... val f2 = asyncPlus(5, 8).flatMap { x => asyncSquare(x) } assertEquals(169, Await.result(f2, 1 second))
  56. 56. map() etc are asynchronous outcome2 = someFn(outcome1) @crichardson outcome2 f2 Outcome1 f1 f2 = f1 map (someFn) Implemented using callbacks
  57. 57. @crichardson Scala wish list service class WishListService(...) { def getWishList(userId : Long) : Future[WishList] = { userService.getUserProfile(userId). Future[UserProfile] map { userProfile => userProfile.wishListProductIds}. flatMap { productIds => val listOfProductFutures = productIds map productInfoService.getProductInfo Future.sequence(listOfProductFutures) }. map { products => WishList(products) } Future[List[Long]] List[Future[ProductInfo]] Future[List[ProductInfo]] Future[WishList]
  58. 58. Using Java 8 CompletableFutures flatMap()! map()! @crichardson public CompletableFuture<Wishlist> getWishlistDetails(long userId) { return userService.getUserProfile(userId).thenComposeAsync(userProfile -> { Stream<CompletableFuture<ProductInfo>> s1 = userProfile.getWishListProductIds() .stream() .map(productInfoService::getProductInfo); Stream<CompletableFuture<List<ProductInfo>>> s2 = s1.map(fOfPi -> fOfPi.thenApplyAsync(pi -> Arrays.asList(pi))); CompletableFuture<List<ProductInfo>> productInfos = s2 .reduce((f1, f2) -> f1.thenCombine(f2, ListUtils::union)) .orElse(CompletableFuture.completedFuture(Collections.emptyList())); return productInfos.thenApply(list -> new Wishlist()); }); } Java 8 is missing Future.sequence()
  59. 59. Your mouse is your database @crichardson Erik Meijer http://queue.acm.org/detail.cfm?id=2169076
  60. 60. @crichardson Introducing Reactive Extensions (Rx) The Reactive Extensions (Rx) is a library for composing asynchronous and event-based programs .... Using Rx, developers represent asynchronous data streams with Observables , query asynchronous data streams using LINQ operators , and ..... https://rx.codeplex.com/
  61. 61. @crichardson About RxJava Reactive Extensions (Rx) for the JVM Developed by Netflix Original motivation was to provide rich, functional Futures Implemented in Java Adaptors for Scala, Groovy and Clojure Embraced by Akka and Spring Reactor: http://www.reactive-streams. org/ https://github.com/Netflix/RxJava
  62. 62. An asynchronous stream of items @crichardson RxJava core concepts trait Observable[T] { def subscribe(observer : Observer[T]) : Subscription ... } Notifies trait Observer[T] { def onNext(value : T) def onCompleted() def onError(e : Throwable) } Used to unsubscribe
  63. 63. Comparing Observable to... Observer pattern - similar but adds Observer.onComplete() Observer.onError() Iterator pattern - mirror image Push rather than pull Futures - similar Can be used as Futures But Observables = a stream of multiple values Collections and Streams - similar Functional API supporting map(), flatMap(), ... But Observables are asynchronous
  64. 64. val subscription = ticker.subscribe { (value: Long) => println("value=" + value) } ... subscription.unsubscribe() @crichardson Fun with observables val oneItem = Observable.items(-1L) val every10Seconds = Observable.interval(10 seconds) val ticker = oneItem ++ every10Seconds -1 0 1 ... t=0 t=10 t=20 ...
  65. 65. Observables as the result of an asynchronous operation @crichardson def getTableStatus(tableName: String) : Observable[DynamoDbStatus]= Observable { subscriber: Subscriber[DynamoDbStatus] => } amazonDynamoDBAsyncClient.describeTableAsync( new DescribeTableRequest(tableName), new AsyncHandler[DescribeTableRequest, DescribeTableResult] { override def onSuccess(request: DescribeTableRequest, result: DescribeTableResult) = { subscriber.onNext(DynamoDbStatus(result.getTable.getTableStatus)) subscriber.onCompleted() } override def onError(exception: Exception) = exception match { case t: ResourceNotFoundException => subscriber.onNext(DynamoDbStatus("NOT_FOUND")) subscriber.onCompleted() case _ => subscriber.onError(exception) } }) }
  66. 66. @crichardson Transforming/chaining observables with flatMap() val tableStatus = ticker.flatMap { i => logger.info("{}th describe table", i + 1) getTableStatus(name) } Status1 Status2 Status3 ... t=0 t=10 t=20 ... + Usual collection methods: map(), filter(), take(), drop(), ...
  67. 67. @crichardson Calculating rolling average class AverageTradePriceCalculator { def calculateAverages(trades: Observable[Trade]): Observable[AveragePrice] = { ... } case class Trade( symbol : String, price : Double, quantity : Int ... ) case class AveragePrice( symbol : String, price : Double, ...)
  68. 68. @crichardson Calculating average prices def calculateAverages(trades: Observable[Trade]): Observable[AveragePrice] = { trades.groupBy(_.symbol).map { symbolAndTrades => val (symbol, tradesForSymbol) = symbolAndTrades val openingEverySecond = Observable.items(-1L) ++ Observable.interval(1 seconds) def closingAfterSixSeconds(opening: Any) = Observable.interval(6 seconds).take(1) tradesForSymbol.window(...).map { windowOfTradesForSymbol => windowOfTradesForSymbol.fold((0.0, 0, List[Double]())) { (soFar, trade) => val (sum, count, prices) = soFar (sum + trade.price, count + trade.quantity, trade.price +: prices) } map { x => val (sum, length, prices) = x AveragePrice(symbol, sum / length, prices) } }.flatten }.flatten }
  69. 69. @crichardson Agenda Why functional programming? Simplifying collection processing Eliminating NullPointerExceptions Simplifying concurrency with Futures and Rx Observables Tackling big data problems with functional programming
  70. 70. Let’s imagine that you want to count word frequencies @crichardson
  71. 71. @crichardson Scala Word Count val frequency : Map[String, Int] = Source.fromFile("gettysburgaddress.txt").getLines() .flatMap { _.split(" ") }.toList .groupBy(identity) .mapValues(_.length)) frequency("THE") should be(11) frequency("LIBERTY") should be(1) Map Reduce
  72. 72. But how to scale to a cluster @crichardson of machines?
  73. 73. @crichardson Apache Hadoop Open-source ecosystem for reliable, scalable, distributed computing Hadoop Distributed File System (HDFS) Efficiently stores very large amounts of data Files are partitioned and replicated across multiple machines Hadoop MapReduce Batch processing system Provides plumbing for writing distributed jobs Handles failures And, much, much more...
  74. 74. @crichardson Overview of MapReduce Input Data Mapper Mapper Mapper Reducer Reducer Reducer Out put Data Shuffle (K,V) (K,V) (K,V) (K,V)* (K,V)* (K,V)* (K1,V, ....)* (K2,V, ....)* (K3,V, ....)* (K,V) (K,V) (K,V)
  75. 75. http://wiki.apache.org/hadoop/WordCount @crichardson MapReduce Word count - mapper class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } Four score and seven years ⇒ (“Four”, 1), (“score”, 1), (“and”, 1), (“seven”, 1), ...
  76. 76. @crichardson Hadoop then shuffles the key-value pairs...
  77. 77. @crichardson MapReduce Word count - reducer class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } (“the”, (1, 1, 1, 1, 1, 1, ...)) ⇒ (“the”, 11) http://wiki.apache.org/hadoop/WordCount
  78. 78. @crichardson About MapReduce Very simple programming abstraction yet incredibly powerful By chaining together multiple map/reduce jobs you can process very large amounts of data in interesting ways e.g. Apache Mahout for machine learning But Mappers and Reducers = verbose code Development is challenging, e.g. unit testing is difficult It’s disk-based, batch processing ⇒ slow
  79. 79. Each row is a map of @crichardson Scalding: Scala DSL for MapReduce class WordCountJob(args : Args) extends Job(args) { TextLine( args("input") ) .flatMap('line -> 'word) { line : String => tokenize(line) } .groupBy('word) { _.size } .write( Tsv( args("output") ) ) def tokenize(text : String) : Array[String] = { text.toLowerCase.replaceAll("[^a-zA-Z0-9s]", "") .split("s+") } } Expressive and unit testable https://github.com/twitter/scalding named fields
  80. 80. @crichardson Apache Spark Created at UC Berkeley and now part of the Hadoop ecosystem Key abstraction = Resilient Distributed Datasets (RDD) Collection that is partitioned across cluster members Operations are parallelized Created from either a collection or a Hadoop supported datasource - HDFS, S3 etc Can be cached in-memory for super-fast performance Can be replicated for fault-tolerance Scala, Java, and Python APIs http://spark.apache.org
  81. 81. Spark Word Count val sc = new SparkContext(...) sc.textFile(“s3n://mybucket/...”) Very similar to Scala collection @crichardson .flatMap { _.split(" ")} .groupBy(identity) .mapValues(_.length) .toArray.toMap } code!! } Expressive, unit testable and very fast
  82. 82. @crichardson Summary Functional programming enables the elegant expression of good ideas in a wide variety of domains map(), flatMap() and reduce() are remarkably versatile higher-order functions Use FP and OOP together Java 8 has taken a good first step towards supporting FP Go write some functional code!
  83. 83. @crichardson chris@chrisrichardson.net @crichardson Questions? http://plainoldobjects.com

×