• Like
Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Map(), flatmap() and reduce() are your new best friends: simpler collections, concurrency, and big data (jax, jax2014)

  • 5,817 views
Published

Higher-order functions such as map(), flatmap(), filter() and reduce() have their origins in mathematics and ancient functional programming languages such as Lisp. But today they have entered the …

Higher-order functions such as map(), flatmap(), filter() and reduce() have their origins in mathematics and ancient functional programming languages such as Lisp. But today they have entered the mainstream and are available in languages such as JavaScript, Scala and Java 8. They are well on their way to becoming an essential part of every developer’s toolbox.

In this talk you will learn how these and other higher-order functions enable you to write simple, expressive and concise code that solve problems in a diverse set of domains. We will describe how you use them to process collections in Java and Scala. You will learn how functional Futures and Rx (Reactive Extensions) Observables simplify concurrent code. We will even talk about how to write big data applications in a functional style using libraries such as Scalding.

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,817
On SlideShare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
102
Comments
0
Likes
23

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. @crichardson Map(), flatMap() and reduce() are your new best friends: Simpler collections, concurrency, and big data Chris Richardson Author of POJOs in Action Founder of the original CloudFoundry.com @crichardson chris@chrisrichardson.net http://plainoldobjects.com
  • 2. @crichardson Presentation goal How functional programming simplifies your code Show that map(), flatMap() and reduce() are remarkably versatile functions
  • 3. @crichardson About Chris
  • 4. @crichardson About Chris Founder of a buzzword compliant (stealthy, social, mobile, big data, machine learning, ...) startup Consultant helping organizations improve how they architect and deploy applications using cloud, micro services, polyglot applications, NoSQL, ...
  • 5. @crichardson Agenda Why functional programming? Simplifying collection processing Eliminating NullPointerExceptions Simplifying concurrency with Futures and Rx Observables Tackling big data problems with functional programming
  • 6. @crichardson What’s functional programming?
  • 7. @crichardson It’s a programming paradigm
  • 8. @crichardson It’s a kind of programming language
  • 9. @crichardson Functions as the building blocks of the application
  • 10. @crichardson Functions as first class citizens Assign functions to variables Store functions in fields Use and write higher-order functions: Pass functions as arguments Return functions as values
  • 11. @crichardson Avoids mutable state Use: Immutable data structures Single assignment variables Some functional languages such as Haskell don’t side-effects There are benefits to immutability Easier concurrency More reliable code But be pragmatic
  • 12. @crichardson Why functional programming? "the highest goal of programming-language design to enable good ideas to be elegantly expressed" http://en.wikipedia.org/wiki/Tony_Hoare
  • 13. @crichardson Why functional programming? More expressive More concise More intuitive - solution matches problem definition Elimination of error-prone mutable state Easy parallelization
  • 14. @crichardson An ancient idea that has recently become popular
  • 15. @crichardson Mathematical foundation: λ-calculus Introduced by Alonzo Church in the 1930s
  • 16. @crichardson Lisp = an early functional language invented in 1958 http://en.wikipedia.org/wiki/ Lisp_(programming_language) 1940 1950 1960 1970 1980 1990 2000 2010 garbage collection dynamic typing self-hosting compiler tree data structures (defun factorial (n) (if (<= n 1) 1 (* n (factorial (- n 1)))))
  • 17. @crichardson My final year project in 1985: Implementing SASL sieve (p:xs) = p : sieve [x | x <- xs, rem x p > 0]; primes = sieve [2..] A list of integers starting with 2 Filter out multiples of p
  • 18. Mostly an Ivory Tower technology Lisp was used for AI FP languages: Miranda, ML, Haskell, ... “Side-effects kills kittens and puppies”
  • 19. @crichardson http://steve-yegge.blogspot.com/2010/12/haskell-researchers-announce-discovery.html !* !* !*
  • 20. @crichardson But today FP is mainstream Clojure - a dialect of Lisp A hybrid OO/functional language A hybrid OO/FP language for .NET Java 8 has lambda expressions
  • 21. @crichardson Java 8 lambda expressions are functions x -> x * x x -> { for (int i = 2; i < Math.sqrt(x); i = i + 1) { if (x % i == 0) return false; } return true; }; (x, y) -> x * x + y * y
  • 22. @crichardson Java 8 lambdas are a shorthand* for an anonymous inner class * not exactly. See http://programmers.stackexchange.com/questions/ 177879/type-inference-in-java-8
  • 23. @crichardson Java 8 functional interfaces Interface with a single abstract method e.g. Runnable, Callable, Spring’s TransactionCallback A lambda expression is an instance of a functional interface. You can use a lambda wherever a function interface “value” is expected The type of the lambda expression is determined from it’s context
  • 24. @crichardson Example Functional Interface Function<Integer, Integer> square = x -> x * x; BiFunction<Integer, Integer, Integer> sumSquares = (x, y) -> x * x + y * y; Predicate<Integer> makeIsDivisibleBy(int y) { return x -> x % y == 0; } Predicate<Integer> isEven = makeIsDivisibleBy(2); Assert.assertTrue(isEven.test(8)); Assert.assertFalse(isEven.test(11));
  • 25. @crichardson Example Functional Interface ExecutorService executor = ...; final int x = 999 Future<Boolean> outcome = executor.submit(() -> { for (int i = 2; i < Math.sqrt(x); i = i + 1) { if (x % i == 0) return false; } return true; } This lambda is a Callable
  • 26. @crichardson Agenda Why functional programming? Simplifying collection processing Eliminating NullPointerExceptions Simplifying concurrency with Futures and Rx Observables Tackling big data problems with functional programming
  • 27. @crichardson Lot’s of application code = collection processing: Mapping, filtering, and reducing
  • 28. @crichardson Social network example public class Person { enum Gender { MALE, FEMALE } private Name name; private LocalDate birthday; private Gender gender; private Hometown hometown; private Set<Friend> friends = new HashSet<Friend>(); .... public class Friend { private Person friend; private LocalDate becameFriends; ... } public class SocialNetwork { private Set<Person> people; ...
  • 29. @crichardson Mapping, filtering, and reducing public class Person { public Set<Hometown> hometownsOfFriends() { Set<Hometown> result = new HashSet<>(); for (Friend friend : friends) { result.add(friend.getPerson().getHometown()); } return result; }
  • 30. @crichardson Mapping, filtering, and reducing public class Person { public Set<Person> friendOfFriends() { Set<Person> result = new HashSet(); for (Friend friend : friends) for (Friend friendOfFriend : friend.getPerson().friends) if (friendOfFriend.getPerson() != this) result.add(friendOfFriend.getPerson()); return result; }
  • 31. @crichardson Mapping, filtering, and reducing public class SocialNetwork { private Set<Person> people; ... public Set<Person> lonelyPeople() { Set<Person> result = new HashSet<Person>(); for (Person p : people) { if (p.getFriends().isEmpty()) result.add(p); } return result; }
  • 32. @crichardson Mapping, filtering, and reducing public class SocialNetwork { private Set<Person> people; ... public int averageNumberOfFriends() { int sum = 0; for (Person p : people) { sum += p.getFriends().size(); } return sum / people.size(); }
  • 33. @crichardson Problems with this style of programming Low level Imperative (how to do it) NOT declarative (what to do) Verbose Mutable variables are potentially error prone Difficult to parallelize
  • 34. @crichardson Java 8 streams to the rescue A sequence of elements “Wrapper” around a collection Streams can also be infinite Provides a functional/lambda-based API for transforming, filtering and aggregating elements Much simpler, cleaner code
  • 35. @crichardson Using Java 8 streams - mapping class Person .. private Set<Friend> friends = ...; public Set<Hometown> hometownsOfFriends() { return friends.stream() .map(f -> f.getPerson().getHometown()) .collect(Collectors.toSet()); }
  • 36. @crichardson The map() function s1 a b c d e ... s2 f(a) f(b) f(c) f(d) f(e) ... s2 = s1.map(f)
  • 37. @crichardson public class SocialNetwork { private Set<Person> people; ... public Set<Person> peopleWithNoFriends() { Set<Person> result = new HashSet<Person>(); for (Person p : people) { if (p.getFriends().isEmpty()) result.add(p); } return result; } Using Java 8 streams - filtering public class SocialNetwork { private Set<Person> people; ... public Set<Person> lonelyPeople() { return people.stream() .filter(p -> p.getFriends().isEmpty()) .collect(Collectors.toSet()); }
  • 38. @crichardson Using Java 8 streams - friend of friends V1 class Person .. public Set<Person> friendOfFriends() { Set<Set<Friend>> fof = friends.stream() .map(friend -> friend.getPerson().friends) .collect(Collectors.toSet()); ... } Using map() => Set of Sets :-( Somehow we need to flatten
  • 39. @crichardson Using Java 8 streams - mapping class Person .. public Set<Person> friendOfFriends() { return friends.stream() .flatMap(friend -> friend.getPerson().friends.stream()) .map(Friend::getPerson) .filter(f -> f != this) .collect(Collectors.toSet()); } maps and flattens
  • 40. @crichardson The flatMap() function s1 a b ... s2 f(a)0 f(a)1 f(b)0 f(b)1 f(b)2 ... s2 = s1.flatMap(f)
  • 41. @crichardson Using Java 8 streams - reducing public class SocialNetwork { private Set<Person> people; ... public long averageNumberOfFriends() { return people.stream() .map ( p -> p.getFriends().size() ) .reduce(0, (x, y) -> x + y) / people.size(); } int x = 0; for (int y : inputStream) x = x + y return x;
  • 42. @crichardson The reduce() function s1 a b c d e ... x = s1.reduce(initial, f) f(f(f(f(f(f(initial, a), b), c), d), e), ...)
  • 43. @crichardson Newton's method for finding square roots public class SquareRootCalculator { public double squareRoot(double input, double precision) { return Stream.iterate( new Result(1.0), current -> refine(current, input, precision)) .filter(r -> r.done) .findFirst().get().value; } private static Result refine(Result current, double input, double precision) { double value = current.value; double newCurrent = value - (value * value - input) / (2 * value); boolean done = Math.abs(value - newCurrent) < precision; return new Result(newCurrent, done); } class Result { boolean done; double value; } Creates an infinite stream: seed, f(seed), f(f(seed)), ..... Don’t panic! Streams are lazy
  • 44. @crichardson Agenda Why functional programming? Simplifying collection processing Eliminating NullPointerExceptions Simplifying concurrency with Futures and Rx Observables Tackling big data problems with functional programming
  • 45. @crichardson Tony’s $1B mistake “I call it my billion-dollar mistake. It was the invention of the null reference in 1965....But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement...” http://qconlondon.com/london-2009/presentation/ Null+References:+The+Billion+Dollar+Mistake
  • 46. @crichardson Coding with null pointers class Person public Friend longestFriendship() { Friend result = null; for (Friend friend : friends) { if (result == null || friend.getBecameFriends() .isBefore(result.getBecameFriends())) result = friend; } return result; } Friend oldestFriend = person.longestFriendship(); if (oldestFriend != null) { ... } else { ... } Null check is essential yet easily forgotten
  • 47. @crichardson Java 8 Optional<T> A wrapper for nullable references It has two states: empty throws an exception if you try to get the reference non-empty contain a non-null reference Provides methods for: testing whether it has a value getting the value ... Return reference wrapped in an instance of this type instead of null
  • 48. @crichardson Coding with optionals class Person public Optional<Friend> longestFriendship() { Friend result = null; for (Friend friend : friends) { if (result == null || friend.getBecameFriends().isBefore(result.getBecameFriends())) result = friend; } return Optional.ofNullable(result); } Optional<Friend> oldestFriend = person.longestFriendship(); // Might throw java.util.NoSuchElementException: No value present // Person dangerous = popularPerson.get(); if (oldestFriend.isPresent) { ...oldestFriend.get() } else { ... }
  • 49. @crichardson Using Optionals - better Optional<Friend> oldestFriendship = ...; Friend whoToCall1 = oldestFriendship.orElse(mother); Avoid calling isPresent() and get() Friend whoToCall3 = oldestFriendship.orElseThrow( () -> new LonelyPersonException()); Friend whoToCall2 = oldestFriendship.orElseGet(() -> lazilyFindSomeoneElse());
  • 50. @crichardson Using Optional.map() public class Person { public Optional<Friend> longestFriendship() { return ...; } public Optional<Long> ageDifferenceWithOldestFriend() { Optional<Friend> oldestFriend = longestFriendship(); return oldestFriend.map ( of -> Math.abs(of.getPerson().getAge() - getAge())) ); } Eliminates messy conditional logic
  • 51. @crichardson Using flatMap() class Person public Optional<Friend> longestFriendship() {...} public Optional<Friend> longestFriendshipOfLongestFriend() { return longestFriendship() .flatMap(friend -> friend.getPerson().longestFriendship()); } not always a symmetric relationship. :-)
  • 52. @crichardson Agenda Why functional programming? Simplifying collection processing Eliminating NullPointerExceptions Simplifying concurrency with Futures and Rx Observables Tackling big data problems with functional programming
  • 53. @crichardson Let’s imagine you are performing a CPU intensive operation class Person .. public Set<Hometown> hometownsOfFriends() { return friends.stream() .map(f -> cpuIntensiveOperation()) .collect(Collectors.toSet()); }
  • 54. @crichardson class Person .. public Set<Hometown> hometownsOfFriends() { return friends.parallelStream() .map(f -> cpuIntensiveOperation()) .collect(Collectors.toSet()); } Parallel streams = simple concurrency Potentially uses N cores Nx speed up
  • 55. @crichardson Let’s imagine that you are writing code to display the products in a user’s wish list
  • 56. @crichardson The need for concurrency Step #1 Web service request to get the user profile including wish list (list of product Ids) Step #2 For each productId: web service request to get product info Sequentially terrible response time Need fetch productInfo concurrently
  • 57. @crichardson Futures are a great concurrency abstraction http://en.wikipedia.org/wiki/Futures_and_promises
  • 58. @crichardson Worker thread or event-driven Main thread How futures work Outcome Future Client get Asynchronous operation set initiates
  • 59. @crichardson Benefits Simple way for two concurrent activities to communicate safely Abstraction: Client does not know how the asynchronous operation is implemented Easy to implement scatter/gather: Scatter: Client can invoke multiple asynchronous operations and gets a Future for each one. Gather: Get values from the futures
  • 60. @crichardson Example wish list service public interface UserService { Future<UserProfile> getUserProfile(long userId); } public class UserServiceProxy implements UserService { private ExecutorService executorService; @Override public Future<UserProfile> getUserProfile(long userId) { return executorService.submit(() -> restfulGet("http://uservice/user/" + userId, UserProfile.class)); } ... } public interface ProductInfoService { Future<ProductInfo> getProductInfo(long productId); }
  • 61. @crichardson public class WishlistService { private UserService userService; private ProductInfoService productInfoService; public Wishlist getWishlistDetails(long userId) throws Exception { Future<UserProfile> userProfileFuture = userService.getUserProfile(userId); UserProfile userProfile = userProfileFuture.get(300, TimeUnit.MILLISECONDS); Example wish list service get user info List<Future<ProductInfo>> productInfoFutures = userProfile.getWishListProductIds().stream() .map(productInfoService::getProductInfo) .collect(Collectors.toList()); long deadline = System.currentTimeMillis() + 300; List<ProductInfo> products = new ArrayList<ProductInfo>(); for (Future<ProductInfo> pif : productInfoFutures) { long timeout = deadline - System.currentTimeMillis(); if (timeout <= 0) throw new TimeoutException(...); products.add(pif.get(timeout, TimeUnit.MILLISECONDS)); } ... return new Wishlist(products); } asynchronously get all products wait for product info
  • 62. @crichardson It works BUT Code is very low-level and messy And, it’s blocking
  • 63. @crichardson Better: Futures with callbacks no blocking! def asyncSquare(x : Int) : Future[Int] = ... x * x... val f = asyncSquare(25) Guava ListenableFutures, Spring 4 ListenableFuture Java 8 CompletableFuture, Scala Futures f onSuccess { case x : Int => println(x) } f onFailure { case e : Exception => println("exception thrown") } Partial function applied to successful outcome Applied to failed outcome
  • 64. @crichardson But callback-based scatter/gather Messy, tangled code (aka. callback hell)
  • 65. @crichardson Functional futures - map def asyncPlus(x : Int, y : Int) = ... x + y ... val future2 = asyncPlus(4, 5).map{ _ * 3 } assertEquals(27, Await.result(future2, 1 second)) Scala, Java 8 CompletableFuture Asynchronously transforms future
  • 66. @crichardson Functional futures - flatMap() val f2 = asyncPlus(5, 8).flatMap { x => asyncSquare(x) } assertEquals(169, Await.result(f2, 1 second)) Scala, Java 8 CompletableFuture (partially) Calls asyncSquare() with the eventual outcome of asyncPlus()
  • 67. @crichardson flatMap() is asynchronous Outcome3f3 Outcome3 f2 f2 = f1 flatMap (someFn) Outcome1 f1 Implemented using callbacks someFn(outcome1)
  • 68. @crichardson class WishListService(...) { def getWishList(userId : Long) : Future[WishList] = { userService.getUserProfile(userId) flatMap { userProfile => Scala wishlist service val futureOfProductsList : Future[List[ProductInfo]] = Future.sequence(listOfProductFutures) val timeoutFuture = ... Future.firstCompletedOf(Seq(wishlist, timeoutFuture)) } } val wishlist = futureOfProductsList.map { products => WishList(products) } val listOfProductFutures : List[Future[ProductInfo]] = userProfile.wishListProductIds .map { productInfoService.getProductInfo }
  • 69. @crichardson Using Java 8 CompletableFutures public class UserServiceImpl implements UserService { @Override public CompletableFuture<UserInfo> getUserInfo(long userId) { return CompletableFuture.supplyAsync( () -> httpGetRequest("http://myuservice/user" + userId, UserInfo.class)); } Runs in ExecutorService
  • 70. @crichardson Using Java 8 CompletableFutures public CompletableFuture<Wishlist> getWishlistDetails(long userId) { return userService.getUserProfile(userId).thenComposeAsync(userProfile -> { Stream<CompletableFuture<ProductInfo>> s1 = userProfile.getWishListProductIds() .stream() .map(productInfoService::getProductInfo); Stream<CompletableFuture<List<ProductInfo>>> s2 = s1.map(fOfPi -> fOfPi.thenApplyAsync(pi -> Arrays.asList(pi))); CompletableFuture<List<ProductInfo>> productInfos = s2 .reduce((f1, f2) -> f1.thenCombine(f2, ListUtils::union)) .orElse(CompletableFuture.completedFuture(Collections.emptyList())); return productInfos.thenApply(list -> new Wishlist()); }); } Java 8 is missing Future.sequence() flatMap()! map()!
  • 71. @crichardson Introducing Reactive Extensions (Rx) The Reactive Extensions (Rx) is a library for composing asynchronous and event-based programs using observable sequences and LINQ-style query operators. Using Rx, developers represent asynchronous data streams with Observables , query asynchronous data streams using LINQ operators , and ..... https://rx.codeplex.com/
  • 72. @crichardson About RxJava Reactive Extensions (Rx) for the JVM Original motivation for Netflix was to provide rich Futures Implemented in Java Adaptors for Scala, Groovy and Clojure https://github.com/Netflix/RxJava
  • 73. @crichardson RxJava core concepts trait Observable[T] { def subscribe(observer : Observer[T]) : Subscription ... } trait Observer[T] { def onNext(value : T) def onCompleted() def onError(e : Throwable) } Notifies An asynchronous stream of items Used to unsubscribe
  • 74. Comparing Observable to... Observer pattern - similar but adds Observer.onComplete() Observer.onError() Iterator pattern - mirror image Push rather than pull Futures - similar Can be used as Futures But Observables = a stream of multiple values Collections and Streams - similar Functional API supporting map(), flatMap(), ... But Observables are asynchronous
  • 75. @crichardson Fun with observables val every10Seconds = Observable.interval(10 seconds) -1 0 1 ... t=0 t=10 t=20 ... val oneItem = Observable.items(-1L) val ticker = oneItem ++ every10Seconds val subscription = ticker.subscribe { (value: Long) => println("value=" + value) } ... subscription.unsubscribe()
  • 76. @crichardson def getTableStatus(tableName: String) : Observable[DynamoDbStatus]= Observable { subscriber: Subscriber[DynamoDbMessage] => } Connecting observables to the outside world amazonDynamoDBAsyncClient.describeTableAsync( new DescribeTableRequest(tableName), new AsyncHandler[DescribeTableRequest, DescribeTableResult] { override def onSuccess(request: DescribeTableRequest, result: DescribeTableResult) = { subscriber.onNext(DynamoDbStatus(result.getTable.getTableStatus)) subscriber.onCompleted() } override def onError(exception: Exception) = exception match { case t: ResourceNotFoundException => subscriber.onNext(DynamoDbStatus("NOT_FOUND")) subscriber.onCompleted() case _ => subscriber.onError(exception) } }) }
  • 77. @crichardson Transforming observables val tableStatus = ticker.flatMap { i => logger.info("{}th describe table", i + 1) getTableStatus(name) } Status1 Status2 Status3 ... t=0 t=10 t=20 ... + Usual collection methods: map(), filter(), take(), drop(), ...
  • 78. @crichardson Calculating rolling average class AverageTradePriceCalculator { def calculateAverages(trades: Observable[Trade]): Observable[AveragePrice] = { ... } case class Trade( symbol : String, price : Double, quantity : Int ... ) case class AveragePrice( symbol : String, price : Double, ...)
  • 79. @crichardson Calculating average pricesdef calculateAverages(trades: Observable[Trade]): Observable[AveragePrice] = { trades.groupBy(_.symbol).map { symbolAndTrades => val (symbol, tradesForSymbol) = symbolAndTrades val openingEverySecond = Observable.items(-1L) ++ Observable.interval(1 seconds) def closingAfterSixSeconds(opening: Any) = Observable.interval(6 seconds).take(1) tradesForSymbol.window(...).map { windowOfTradesForSymbol => windowOfTradesForSymbol.fold((0.0, 0, List[Double]())) { (soFar, trade) => val (sum, count, prices) = soFar (sum + trade.price, count + trade.quantity, trade.price +: prices) } map { x => val (sum, length, prices) = x AveragePrice(symbol, sum / length, prices) } }.flatten }.flatten }
  • 80. @crichardson Agenda Why functional programming? Simplifying collection processing Eliminating NullPointerExceptions Simplifying concurrency with Futures and Rx Observables Tackling big data problems with functional programming
  • 81. @crichardson Let’s imagine that you want to count word frequencies
  • 82. @crichardson Scala Word Count val frequency : Map[String, Int] = Source.fromFile("gettysburgaddress.txt").getLines() .flatMap { _.split(" ") }.toList frequency("THE") should be(11) frequency("LIBERTY") should be(1) .groupBy(identity) .mapValues(_.length)) Map Reduce
  • 83. @crichardson But how to scale to a cluster of machines?
  • 84. @crichardson Apache Hadoop Open-source software for reliable, scalable, distributed computing Hadoop Distributed File System (HDFS) Efficiently stores very large amounts of data Files are partitioned and replicated across multiple machines Hadoop MapReduce Batch processing system Provides plumbing for writing distributed jobs Handles failures ...
  • 85. @crichardson Overview of MapReduce Input Data Mapper Mapper Mapper Reducer Reducer Reducer Out put Data Shuffle (K,V) (K,V) (K,V) (K,V)* (K,V)* (K,V)* (K1,V, ....)* (K2,V, ....)* (K3,V, ....)* (K,V) (K,V) (K,V)
  • 86. @crichardson MapReduce Word count - mapperclass Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } (“Four”, 1), (“score”, 1), (“and”, 1), (“seven”, 1), ... Four score and seven years http://wiki.apache.org/hadoop/WordCount
  • 87. @crichardson Hadoop then shuffles the key-value pairs...
  • 88. @crichardson MapReduce Word count - reducer class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } (“the”, 11) (“the”, (1, 1, 1, 1, 1, 1, ...)) http://wiki.apache.org/hadoop/WordCount
  • 89. @crichardson About MapReduce Very simple programming abstract yet incredibly powerful By chaining together multiple map/reduce jobs you can process very large amounts of data e.g. Apache Mahout for machine learning But Mappers and Reducers = verbose code Development is challenging, e.g. unit testing is difficult It’s disk-based, batch processing slow
  • 90. @crichardson Scalding: Scala DSL for MapReduce class WordCountJob(args : Args) extends Job(args) { TextLine( args("input") ) .flatMap('line -> 'word) { line : String => tokenize(line) } .groupBy('word) { _.size } .write( Tsv( args("output") ) ) def tokenize(text : String) : Array[String] = { text.toLowerCase.replaceAll("[^a-zA-Z0-9s]", "") .split("s+") } } https://github.com/twitter/scalding Expressive and unit testable Each row is a map of named fields
  • 91. @crichardson Apache Spark Part of the Hadoop ecosystem Key abstraction = Resilient Distributed Datasets (RDD) Collection that is partitioned across cluster members Operations are parallelized Created from either a Scala collection or a Hadoop supported datasource - HDFS, S3 etc Can be cached in-memory for super-fast performance Can be replicated for fault-tolerance http://spark.apache.org
  • 92. @crichardson Spark Word Count val sc = new SparkContext(...) sc.textFile(“s3n://mybucket/...”) .flatMap { _.split(" ")} .groupBy(identity) .mapValues(_.length) .toArray.toMap } } Expressive, unit testable and very fast
  • 93. @crichardson Summary Functional programming enables the elegant expression of good ideas in a wide variety of domains map(), flatMap() and reduce() are remarkably versatile higher-order functions Use FP and OOP together Java 8 has taken a good first step towards supporting FP
  • 94. @crichardson Questions? @crichardson chris@chrisrichardson.net http://plainoldobjects.com