Map(), flatMap() and reduce() are your 
@crichardson 
new best friends: 
Simpler collections, concurrency, and big 
data 
Chris Richardson 
Author of POJOs in Action 
Founder of the original CloudFoundry.com 
@crichardson 
chris@chrisrichardson.net 
http://plainoldobjects.com
Presentation goal 
How functional programming simplifies 
@crichardson 
your code 
Show that 
map(), flatMap() and reduce() 
are remarkably versatile functions
@crichardson 
About Chris
@crichardson 
About Chris 
Founder of a buzzword compliant (stealthy, social, mobile, 
big data, machine learning, ...) startup 
Consultant helping organizations improve how they 
architect and deploy applications using cloud, micro 
services, polyglot applications, NoSQL, ...
@crichardson 
Agenda 
Why functional programming? 
Simplifying collection processing 
Eliminating NullPointerExceptions 
Simplifying concurrency with Futures and Rx Observables 
Tackling big data problems with functional programming
Functional programming is a programming 
@crichardson 
paradigm 
Functions are the building blocks of the 
application 
Best done in a functional 
programming language
@crichardson 
Functions as first class 
citizens 
Assign functions to variables 
Store functions in fields 
Use and write higher-order functions: 
Take functions as parameters 
Return functions as values
@crichardson 
Avoids mutable state 
Use: 
Immutable data structures 
Single assignment variables 
Some functional languages such as Haskell don’t allow 
side-effects
Why functional programming? 
"the highest goal of 
programming-language 
design to enable good 
ideas to be elegantly 
@crichardson 
expressed" 
http://en.wikipedia.org/wiki/Tony_Hoare
Why functional programming? 
@crichardson 
More expressive 
More concise 
More intuitive - solution matches problem definition 
Functional code is usually much more composable 
Immutable state: 
Less error-prone 
Easy parallelization and concurrency 
But be pragmatic
@crichardson 
An ancient idea that has 
recently become popular
@crichardson 
Mathematical foundation: 
λ-calculus 
Introduced by 
Alonzo Church in the 1930s
@crichardson 
Lisp = an early functional 
language invented in 1958 
http://en.wikipedia.org/wiki/ 
Lisp_(programming_language) 
2010 
2000 
1990 
1980 
1970 
1960 
1950 
1940 
garbage collection 
dynamic typing 
self-hosting compiler 
tree data structures 
(defun factorial (n) 
(if (<= n 1) 
1 
(* n (factorial (- n 1)))))
My final year project in 1985: 
Implementing SASL in LISP 
Filter out multiples of p 
sieve (p:xs) = 
p : sieve [x | x <- xs, rem x p > 0]; 
primes = sieve [2..] 
@crichardson 
A list of integers starting with 2
Mostly an Ivory Tower 
technology 
Lisp was used for AI 
FP languages: Miranda, 
ML, Haskell, ... 
“Side-effects 
kills kittens and 
puppies”
http://steve-yegge.blogspot.com/2010/12/haskell-researchers-announce-discovery.html 
@crichardson 
!* 
!* 
!*
But today FP is mainstream 
@crichardson 
Clojure - a dialect of Lisp 
A hybrid OO/functional language 
A hybrid OO/FP language for .NET 
Java 8 has lambda expressions
@crichardson 
Java 8 lambda expressions 
are functions 
x -> x * x 
x -> { 
for (int i = 2; i < Math.sqrt(x); i = i + 1) { 
if (x % i == 0) 
return false; 
} 
return true; 
}; 
(x, y) -> x * x + y * y
@crichardson 
Agenda 
Why functional programming? 
Simplifying collection processing 
Eliminating NullPointerExceptions 
Simplifying concurrency with Futures and Rx Observables 
Tackling big data problems with functional programming
@crichardson 
Lot’s of application code 
= 
collection processing: 
Mapping, filtering, and reducing
@crichardson 
Social network example 
public class Person { 
enum Gender { MALE, FEMALE } 
private Name name; 
private LocalDate birthday; 
private Gender gender; 
private Hometown hometown; 
private Set<Friend> friends = new HashSet<Friend>(); 
.... 
public class Friend { 
private Person friend; 
private LocalDate becameFriends; 
... 
} 
public class SocialNetwork { 
private Set<Person> people; 
...
@crichardson 
Mapping, filtering, and 
reducing 
public class Person { 
public Set<Hometown> hometownsOfFriends() { 
Set<Hometown> result = new HashSet<>(); 
for (Friend friend : friends) { 
result.add(friend.getPerson().getHometown()); 
} 
return result; 
} 
Declare result variable 
Modify result 
Return result 
Iterate
@crichardson 
Mapping, filtering, and 
reducing 
public class SocialNetwork { 
private Set<Person> people; 
... 
public Set<Person> lonelyPeople() { 
Set<Person> result = new HashSet<Person>(); 
for (Person p : people) { 
if (p.getFriends().isEmpty()) 
result.add(p); 
} 
return result; 
} 
Declare result variable 
Modify result 
Return result 
Iterate
Iterate 
@crichardson 
Mapping, filtering, and 
reducing 
public class SocialNetwork { 
private Set<Person> people; 
... 
public int averageNumberOfFriends() { 
int sum = 0; 
for (Person p : people) { 
sum += p.getFriends().size(); 
} 
return sum / people.size(); 
} 
Declare scalar result 
variable 
Modify result 
Return result
@crichardson 
Problems with this style of 
programming 
Lots of verbose boilerplate - basic operations require 5+ 
LOC 
Imperative (how to do it) NOT declarative (what to do) 
Mutable variables are potentially error prone 
Difficult to parallelize
Java 8 streams to the rescue 
A sequence of elements 
“Wrapper” around a collection 
Streams are lazy, i.e. can be infinite 
Provides a functional/lambda-based API for transforming, 
filtering and aggregating elements 
Much simpler, cleaner and 
@crichardson 
declarative code
@crichardson 
Using Java 8 streams - 
mapping 
class Person .. 
private Set<Friend> friends = ...; 
public Set<Hometown> hometownsOfFriends() { 
return friends.stream() 
.map(f -> f.getPerson().getHometown()) 
.collect(Collectors.toSet()); 
} 
transforming 
lambda expression
@crichardson 
The map() function 
s1 a b c d e ... 
s2 = s1.map(f) 
s2 f(a) f(b) f(c) f(d) f(e) ...
@crichardson 
Using Java 8 streams - 
filtering 
public class SocialNetwork { 
private Set<Person> people; 
... 
public Set<Person> lonelyPeople() { 
return people.stream() 
.filter(p -> p.getFriends().isEmpty()) 
.collect(Collectors.toSet()); 
} 
predicate 
lambda expression
Using Java 8 streams - friend 
of friends V1 
@crichardson 
class Person .. 
public Set<Person> friendOfFriends() { 
Set<Set<Friend>> fof = friends.stream() 
.map(friend -> friend.getPerson().friends) 
.collect(Collectors.toSet()); 
... 
} 
Using map() 
=> Set of Sets :-( 
Somehow we need to flatten
@crichardson 
Using Java 8 streams - 
mapping 
class Person .. 
public Set<Person> friendOfFriends() { 
return friends.stream() 
.flatMap(friend -> friend.getPerson().friends.stream()) 
.map(Friend::getPerson) 
.filter(person -> person != this) 
.collect(Collectors.toSet()); 
} 
maps and flattens
@crichardson 
Chaining with flatMap() 
s1 a b ... 
s2 = s1.flatMap(f) 
s2 f(a)0 f(a)1 f(b)0 f(b)1 f(b)2 ...
@crichardson 
Using Java 8 streams - 
reducing 
public class SocialNetwork { 
private Set<Person> people; 
... 
public long averageNumberOfFriends() { 
return people.stream() 
.map ( p -> p.getFriends().size() ) 
.reduce(0, (x, y) -> x + y) 
/ people.size(); 
} int x = 0; 
for (int y : inputStream) 
x = x + y 
return x;
@crichardson 
The reduce() function 
s1 a b c d e ... 
x = s1.reduce(initial, f) 
f(f(f(f(f(f(initial, a), b), c), d), e), ...)
@crichardson 
Newton's method for 
calculating sqrt(x) 
It’s an iterative algorithm 
initial value = guess 
betterValue = value - (value * value - x) / (2 * value) 
Iterate until |value - betterValue| < precision
Functional square root in Scala 
Creates an infinite stream: 
seed, f(seed), f(f(seed)), ..... 
@crichardson 
package net.chrisrichardson.fp.scala.squareroot 
object SquareRootCalculator { 
def squareRoot(x: Double, precision: Double) : Double = 
Stream.iterate(x / 2)( 
value => value - (value * value - x) / (2 * value) ). 
sliding(2).map( s => (s.head, s.last)). 
find { case (value , newValue) => 
Math.abs(value - newValue) < precision}. 
get._2 
} 
a, b, c, ... => 
(a, b), (b, c), (c, ...), ... 
Find the first convergent 
approximation
@crichardson 
Adopting FP with Java 8 is 
straightforward 
Switch your application to Java 8 
Start using streams and lambdas 
Eclipse can refactor anonymous inner 
classes to lambdas 
Or write modules in Scala: more 
expressive and runs on older JVMs
@crichardson 
Agenda 
Why functional programming? 
Simplifying collection processing 
Eliminating NullPointerExceptions 
Simplifying concurrency with Futures and Rx Observables 
Tackling big data problems with functional programming
@crichardson 
Tony’s $1B mistake 
“I call it my billion-dollar mistake. 
It was the invention of the null 
reference in 1965....But I couldn't 
resist the temptation to put in a 
null reference, simply because it 
was so easy to implement...” 
http://qconlondon.com/london-2009/presentation/ 
Null+References:+The+Billion+Dollar+Mistake
Return null if no friends 
@crichardson 
Coding with null pointers 
class Person 
public Friend longestFriendship() { 
Friend result = null; 
for (Friend friend : friends) { 
if (result == null || 
friend.getBecameFriends() 
.isBefore(result.getBecameFriends())) 
result = friend; 
} 
return result; 
} 
Friend oldestFriend = person.longestFriendship(); 
if (oldestFriend != null) { 
... 
} else { 
... 
} 
Null check is essential yet 
easily forgotten
@crichardson 
Java 8 Optional<T> 
A wrapper for nullable references 
It has two states: 
empty ⇒ throws an exception if you try to get the reference 
non-empty ⇒ contain a non-null reference 
Provides methods for: testing whether it has a value, getting the 
value, ... 
Use an Optional<T> parameter if caller can pass in null 
Return reference wrapped in an instance of this type instead of null 
Uses the type system to explicitly represent 
nullability
@crichardson 
Coding with optionals 
class Person 
public Optional<Friend> longestFriendship() { 
Friend result = null; 
for (Friend friend : friends) { 
if (result == null || 
friend.getBecameFriends().isBefore(result.getBecameFriends())) 
result = friend; 
} 
return Optional.ofNullable(result); 
} 
Optional<Friend> oldestFriend = person.longestFriendship(); 
// Might throw java.util.NoSuchElementException: No value present 
// Person dangerous = popularPerson.get(); 
if (oldestFriend.isPresent) { 
...oldestFriend.get() 
} else { 
... 
}
Friend whoToCall2 = 
oldestFriendship.orElseGet(() -> lazilyFindSomeoneElse()); 
@crichardson 
Using Optionals - better 
Optional<Friend> oldestFriendship = ...; 
Friend whoToCall1 = oldestFriendship.orElse(mother); 
Friend whoToCall3 = 
oldestFriendship.orElseThrow( 
() -> new LonelyPersonException()); 
Avoid calling isPresent() and get()
@crichardson 
Transforming with map() 
public class Person { 
public Optional<Friend> longestFriendship() { 
return ...; 
} 
public Optional<Long> ageDifferenceWithOldestFriend() { 
Optional<Friend> oldestFriend = longestFriendship(); 
return oldestFriend.map ( of -> 
Math.abs(of.getPerson().getAge() - getAge())) ); 
} 
Eliminates messy conditional logic
@crichardson 
Chaining with flatMap() 
class Person 
public Optional<Friend> longestFriendship() {...} 
public Optional<Friend> longestFriendshipOfLongestFriend() { 
return 
longestFriendship() 
.flatMap(friend -> 
friend.getPerson().longestFriendship()); 
} 
not always a symmetric 
relationship. :-)
@crichardson 
Agenda 
Why functional programming? 
Simplifying collection processing 
Eliminating NullPointerExceptions 
Simplifying concurrency with Futures and Rx Observables 
Tackling big data problems with functional programming
Let’s imagine you are performing 
a CPU intensive operation 
@crichardson 
class Person .. 
public Set<Hometown> hometownsOfFriends() { 
return friends.stream() 
.map(f -> cpuIntensiveOperation(f)) 
.collect(Collectors.toSet()); 
}
Parallel streams = simple 
concurrency Potentially uses N cores 
@crichardson 
class Person .. 
public Set<Hometown> hometownsOfFriends() { 
return friends.parallelStream() 
.map(f -> cpuIntensiveOperation(f)) 
.collect(Collectors.toSet()); 
} 
⇒ 
Nx speed up 
Perhaps this will be faster. 
Perhaps not
Let’s imagine that you are 
writing code to display the 
products in a user’s wish list 
@crichardson
@crichardson 
The need for concurrency 
Step #1 
Web service request to get the user profile including wish 
list (list of product Ids) 
Step #2 
For each productId: web service request to get product info 
Sequentially ⇒ terrible response time 
Need fetch productInfo concurrently 
Composing sequential + scatter/gather-style 
operations is very common
@crichardson 
Futures are a great 
concurrency abstraction 
http://en.wikipedia.org/wiki/Futures_and_promises
Composition with futures 
Worker thread or 
event-driven code 
@crichardson 
Main thread 
Future 1 
Outcome 
Future 2 
Client 
get Asynchronous 
operation 2 
set 
initiates 
Asynchronous 
operation 1 
Outcome 
get 
set
@crichardson 
Benefits 
Simple way for multiple concurrent activities to communicate 
safely 
Abstraction: 
Client does not know how the asynchronous operation is 
implemented, e.g. thread pool, event-driven, .... 
Easy to implement scatter/gather: 
Scatter: Client can invoke multiple asynchronous operations 
and gets a Future for each one. 
Gather: Get values from the futures
@crichardson 
But composition with basic 
futures is difficult 
Java 7 future.get([timeout]): 
Blocking API ⇒ client blocks thread ⇒ poor scalability 
Difficult to compose multiple concurrent operations 
Futures with callbacks: 
e.g. Guava ListenableFutures, Spring 4 ListenableFuture 
Attach callbacks to all futures and asynchronously consume outcomes 
But callback-based code = messy code 
See http://techblog.netflix.com/2013/02/rxjava-netflix-api.html 
We need functional futures!
Asynchronously 
transforms future 
Calls asyncSquare() with the eventual 
outcome of asyncPlus(), i.e. chaining 
@crichardson 
Functional futures - Scala, Java 8 
CompletableFuture 
def asyncPlus(x : Int, y :Int): Future[Int] = ... x + y ... 
val future2 = asyncPlus(4, 5).map{ _ * 3 } 
assertEquals(27, Await.result(future2, 1 second)) 
def asyncSquare(x : Int) : Future[Int] = ... x * x ... 
val f2 = asyncPlus(5, 8).flatMap { x => asyncSquare(x) } 
assertEquals(169, Await.result(f2, 1 second))
map() etc are asynchronous 
outcome2 = someFn(outcome1) 
@crichardson 
outcome2 
f2 
Outcome1 
f1 
f2 = f1 map (someFn) 
Implemented using callbacks
@crichardson 
Scala wish list service 
class WishListService(...) { 
def getWishList(userId : Long) : Future[WishList] = { 
userService.getUserProfile(userId). 
Future[UserProfile] 
map { userProfile => userProfile.wishListProductIds}. 
flatMap { productIds => 
val listOfProductFutures = 
productIds map productInfoService.getProductInfo 
Future.sequence(listOfProductFutures) 
}. 
map { products => WishList(products) } 
Future[List[Long]] 
List[Future[ProductInfo]] 
Future[List[ProductInfo]] 
Future[WishList]
Using Java 8 CompletableFutures 
flatMap()! 
map()! 
@crichardson 
public CompletableFuture<Wishlist> getWishlistDetails(long userId) { 
return userService.getUserProfile(userId).thenComposeAsync(userProfile -> { 
Stream<CompletableFuture<ProductInfo>> s1 = 
userProfile.getWishListProductIds() 
.stream() 
.map(productInfoService::getProductInfo); 
Stream<CompletableFuture<List<ProductInfo>>> s2 = 
s1.map(fOfPi -> fOfPi.thenApplyAsync(pi -> Arrays.asList(pi))); 
CompletableFuture<List<ProductInfo>> productInfos = s2 
.reduce((f1, f2) -> f1.thenCombine(f2, ListUtils::union)) 
.orElse(CompletableFuture.completedFuture(Collections.emptyList())); 
return productInfos.thenApply(list -> new Wishlist()); 
}); 
} 
Java 8 is missing Future.sequence()
Your mouse is your database 
@crichardson 
Erik Meijer 
http://queue.acm.org/detail.cfm?id=2169076
@crichardson 
Introducing Reactive 
Extensions (Rx) 
The Reactive Extensions (Rx) is a library for composing 
asynchronous and event-based programs .... 
Using Rx, developers represent asynchronous data 
streams with Observables , query asynchronous 
data streams using LINQ operators , and ..... 
https://rx.codeplex.com/
@crichardson 
About RxJava 
Reactive Extensions (Rx) for the JVM 
Developed by Netflix 
Original motivation was to provide rich, functional Futures 
Implemented in Java 
Adaptors for Scala, Groovy and Clojure 
Embraced by Akka and Spring Reactor: http://www.reactive-streams. 
org/ 
https://github.com/Netflix/RxJava
An asynchronous stream of items 
@crichardson 
RxJava core concepts 
trait Observable[T] { 
def subscribe(observer : Observer[T]) : Subscription 
... 
} 
Notifies 
trait Observer[T] { 
def onNext(value : T) 
def onCompleted() 
def onError(e : Throwable) 
} 
Used to 
unsubscribe
Comparing Observable to... 
Observer pattern - similar but 
adds 
Observer.onComplete() 
Observer.onError() 
Iterator pattern - mirror image 
Push rather than pull 
Futures - similar 
Can be used as Futures 
But Observables = a stream 
of multiple values 
Collections and Streams - 
similar 
Functional API supporting 
map(), flatMap(), ... 
But Observables are 
asynchronous
val subscription = ticker.subscribe { (value: Long) => println("value=" + value) } 
... 
subscription.unsubscribe() 
@crichardson 
Fun with observables 
val oneItem = Observable.items(-1L) 
val every10Seconds = Observable.interval(10 seconds) 
val ticker = oneItem ++ every10Seconds 
-1 0 1 ... 
t=0 t=10 t=20 ...
Observables as the result of an 
asynchronous operation 
@crichardson 
def getTableStatus(tableName: String) : Observable[DynamoDbStatus]= 
Observable { subscriber: Subscriber[DynamoDbStatus] => 
} 
amazonDynamoDBAsyncClient.describeTableAsync( 
new DescribeTableRequest(tableName), 
new AsyncHandler[DescribeTableRequest, DescribeTableResult] { 
override def onSuccess(request: DescribeTableRequest, 
result: DescribeTableResult) = { 
subscriber.onNext(DynamoDbStatus(result.getTable.getTableStatus)) 
subscriber.onCompleted() 
} 
override def onError(exception: Exception) = exception match { 
case t: ResourceNotFoundException => 
subscriber.onNext(DynamoDbStatus("NOT_FOUND")) 
subscriber.onCompleted() 
case _ => 
subscriber.onError(exception) 
} 
}) 
}
@crichardson 
Transforming/chaining 
observables with flatMap() 
val tableStatus = ticker.flatMap { i => 
logger.info("{}th describe table", i + 1) 
getTableStatus(name) 
} 
Status1 Status2 Status3 ... 
t=0 t=10 t=20 ... 
+ Usual collection methods: map(), filter(), take(), drop(), ...
@crichardson 
Calculating rolling average 
class AverageTradePriceCalculator { 
def calculateAverages(trades: Observable[Trade]): 
Observable[AveragePrice] = { 
... 
} 
case class Trade( 
symbol : String, 
price : Double, 
quantity : Int 
... 
) 
case class AveragePrice( 
symbol : String, 
price : Double, 
...)
@crichardson 
Calculating average prices 
def calculateAverages(trades: Observable[Trade]): Observable[AveragePrice] = { 
trades.groupBy(_.symbol).map { symbolAndTrades => 
val (symbol, tradesForSymbol) = symbolAndTrades 
val openingEverySecond = 
Observable.items(-1L) ++ Observable.interval(1 seconds) 
def closingAfterSixSeconds(opening: Any) = 
Observable.interval(6 seconds).take(1) 
tradesForSymbol.window(...).map { 
windowOfTradesForSymbol => 
windowOfTradesForSymbol.fold((0.0, 0, List[Double]())) { (soFar, trade) => 
val (sum, count, prices) = soFar 
(sum + trade.price, count + trade.quantity, trade.price +: prices) 
} map { x => 
val (sum, length, prices) = x 
AveragePrice(symbol, sum / length, prices) 
} 
}.flatten 
}.flatten 
}
@crichardson 
Agenda 
Why functional programming? 
Simplifying collection processing 
Eliminating NullPointerExceptions 
Simplifying concurrency with Futures and Rx Observables 
Tackling big data problems with functional programming
Let’s imagine that you want 
to count word frequencies 
@crichardson
@crichardson 
Scala Word Count 
val frequency : Map[String, Int] = 
Source.fromFile("gettysburgaddress.txt").getLines() 
.flatMap { _.split(" ") }.toList 
.groupBy(identity) 
.mapValues(_.length)) 
frequency("THE") should be(11) 
frequency("LIBERTY") should be(1) 
Map 
Reduce
But how to scale to a cluster 
@crichardson 
of machines?
@crichardson 
Apache Hadoop 
Open-source ecosystem for reliable, scalable, distributed computing 
Hadoop Distributed File System (HDFS) 
Efficiently stores very large amounts of data 
Files are partitioned and replicated across multiple machines 
Hadoop MapReduce 
Batch processing system 
Provides plumbing for writing distributed jobs 
Handles failures 
And, much, much more...
@crichardson 
Overview of MapReduce 
Input 
Data 
Mapper 
Mapper 
Mapper 
Reducer 
Reducer 
Reducer 
Out 
put 
Data 
Shuffle 
(K,V) 
(K,V) 
(K,V) 
(K,V)* 
(K,V)* 
(K,V)* 
(K1,V, ....)* 
(K2,V, ....)* 
(K3,V, ....)* 
(K,V) 
(K,V) 
(K,V)
http://wiki.apache.org/hadoop/WordCount 
@crichardson 
MapReduce Word count - 
mapper 
class Map extends Mapper<LongWritable, Text, Text, IntWritable> { 
private final static IntWritable one = new IntWritable(1); 
private Text word = new Text(); 
public void map(LongWritable key, Text value, Context context) { 
String line = value.toString(); 
StringTokenizer tokenizer = new StringTokenizer(line); 
while (tokenizer.hasMoreTokens()) { 
word.set(tokenizer.nextToken()); 
context.write(word, one); 
} 
} 
} 
Four score and seven years 
⇒ 
(“Four”, 1), (“score”, 1), (“and”, 1), (“seven”, 1), ...
@crichardson 
Hadoop then shuffles the 
key-value pairs...
@crichardson 
MapReduce Word count - 
reducer 
class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { 
public void reduce(Text key, 
Iterable<IntWritable> values, Context context) { 
int sum = 0; 
for (IntWritable val : values) { 
sum += val.get(); 
} 
context.write(key, new IntWritable(sum)); 
} 
} 
(“the”, (1, 1, 1, 1, 1, 1, ...)) 
⇒ 
(“the”, 11) 
http://wiki.apache.org/hadoop/WordCount
@crichardson 
About MapReduce 
Very simple programming abstraction yet incredibly powerful 
By chaining together multiple map/reduce jobs you can process 
very large amounts of data in interesting ways 
e.g. Apache Mahout for machine learning 
But 
Mappers and Reducers = verbose code 
Development is challenging, e.g. unit testing is difficult 
It’s disk-based, batch processing ⇒ slow
Each row is a map of 
@crichardson 
Scalding: Scala DSL for 
MapReduce 
class WordCountJob(args : Args) extends Job(args) { 
TextLine( args("input") ) 
.flatMap('line -> 'word) { line : String => tokenize(line) } 
.groupBy('word) { _.size } 
.write( Tsv( args("output") ) ) 
def tokenize(text : String) : Array[String] = { 
text.toLowerCase.replaceAll("[^a-zA-Z0-9s]", "") 
.split("s+") 
} 
} 
Expressive and unit testable 
https://github.com/twitter/scalding 
named fields
@crichardson 
Apache Spark 
Created at UC Berkeley and now part of the Hadoop ecosystem 
Key abstraction = Resilient Distributed Datasets (RDD) 
Collection that is partitioned across cluster members 
Operations are parallelized 
Created from either a collection or a Hadoop supported datasource - 
HDFS, S3 etc 
Can be cached in-memory for super-fast performance 
Can be replicated for fault-tolerance 
Scala, Java, and Python APIs 
http://spark.apache.org
Spark Word Count 
val sc = new SparkContext(...) 
sc.textFile(“s3n://mybucket/...”) 
Very similar to 
Scala collection 
@crichardson 
.flatMap { _.split(" ")} 
.groupBy(identity) 
.mapValues(_.length) 
.toArray.toMap 
} 
code!! 
} 
Expressive, unit testable and very fast
@crichardson 
Summary 
Functional programming enables the elegant expression of 
good ideas in a wide variety of domains 
map(), flatMap() and reduce() are remarkably versatile 
higher-order functions 
Use FP and OOP together 
Java 8 has taken a good first step towards supporting FP 
Go write some functional code!
@crichardson chris@chrisrichardson.net 
@crichardson 
Questions? 
http://plainoldobjects.com

Map, flatmap and reduce are your new best friends (javaone, svcc)

  • 1.
    Map(), flatMap() andreduce() are your @crichardson new best friends: Simpler collections, concurrency, and big data Chris Richardson Author of POJOs in Action Founder of the original CloudFoundry.com @crichardson chris@chrisrichardson.net http://plainoldobjects.com
  • 2.
    Presentation goal Howfunctional programming simplifies @crichardson your code Show that map(), flatMap() and reduce() are remarkably versatile functions
  • 3.
  • 4.
    @crichardson About Chris Founder of a buzzword compliant (stealthy, social, mobile, big data, machine learning, ...) startup Consultant helping organizations improve how they architect and deploy applications using cloud, micro services, polyglot applications, NoSQL, ...
  • 5.
    @crichardson Agenda Whyfunctional programming? Simplifying collection processing Eliminating NullPointerExceptions Simplifying concurrency with Futures and Rx Observables Tackling big data problems with functional programming
  • 6.
    Functional programming isa programming @crichardson paradigm Functions are the building blocks of the application Best done in a functional programming language
  • 7.
    @crichardson Functions asfirst class citizens Assign functions to variables Store functions in fields Use and write higher-order functions: Take functions as parameters Return functions as values
  • 8.
    @crichardson Avoids mutablestate Use: Immutable data structures Single assignment variables Some functional languages such as Haskell don’t allow side-effects
  • 9.
    Why functional programming? "the highest goal of programming-language design to enable good ideas to be elegantly @crichardson expressed" http://en.wikipedia.org/wiki/Tony_Hoare
  • 10.
    Why functional programming? @crichardson More expressive More concise More intuitive - solution matches problem definition Functional code is usually much more composable Immutable state: Less error-prone Easy parallelization and concurrency But be pragmatic
  • 11.
    @crichardson An ancientidea that has recently become popular
  • 12.
    @crichardson Mathematical foundation: λ-calculus Introduced by Alonzo Church in the 1930s
  • 13.
    @crichardson Lisp =an early functional language invented in 1958 http://en.wikipedia.org/wiki/ Lisp_(programming_language) 2010 2000 1990 1980 1970 1960 1950 1940 garbage collection dynamic typing self-hosting compiler tree data structures (defun factorial (n) (if (<= n 1) 1 (* n (factorial (- n 1)))))
  • 14.
    My final yearproject in 1985: Implementing SASL in LISP Filter out multiples of p sieve (p:xs) = p : sieve [x | x <- xs, rem x p > 0]; primes = sieve [2..] @crichardson A list of integers starting with 2
  • 15.
    Mostly an IvoryTower technology Lisp was used for AI FP languages: Miranda, ML, Haskell, ... “Side-effects kills kittens and puppies”
  • 16.
  • 17.
    But today FPis mainstream @crichardson Clojure - a dialect of Lisp A hybrid OO/functional language A hybrid OO/FP language for .NET Java 8 has lambda expressions
  • 18.
    @crichardson Java 8lambda expressions are functions x -> x * x x -> { for (int i = 2; i < Math.sqrt(x); i = i + 1) { if (x % i == 0) return false; } return true; }; (x, y) -> x * x + y * y
  • 19.
    @crichardson Agenda Whyfunctional programming? Simplifying collection processing Eliminating NullPointerExceptions Simplifying concurrency with Futures and Rx Observables Tackling big data problems with functional programming
  • 20.
    @crichardson Lot’s ofapplication code = collection processing: Mapping, filtering, and reducing
  • 21.
    @crichardson Social networkexample public class Person { enum Gender { MALE, FEMALE } private Name name; private LocalDate birthday; private Gender gender; private Hometown hometown; private Set<Friend> friends = new HashSet<Friend>(); .... public class Friend { private Person friend; private LocalDate becameFriends; ... } public class SocialNetwork { private Set<Person> people; ...
  • 22.
    @crichardson Mapping, filtering,and reducing public class Person { public Set<Hometown> hometownsOfFriends() { Set<Hometown> result = new HashSet<>(); for (Friend friend : friends) { result.add(friend.getPerson().getHometown()); } return result; } Declare result variable Modify result Return result Iterate
  • 23.
    @crichardson Mapping, filtering,and reducing public class SocialNetwork { private Set<Person> people; ... public Set<Person> lonelyPeople() { Set<Person> result = new HashSet<Person>(); for (Person p : people) { if (p.getFriends().isEmpty()) result.add(p); } return result; } Declare result variable Modify result Return result Iterate
  • 24.
    Iterate @crichardson Mapping,filtering, and reducing public class SocialNetwork { private Set<Person> people; ... public int averageNumberOfFriends() { int sum = 0; for (Person p : people) { sum += p.getFriends().size(); } return sum / people.size(); } Declare scalar result variable Modify result Return result
  • 25.
    @crichardson Problems withthis style of programming Lots of verbose boilerplate - basic operations require 5+ LOC Imperative (how to do it) NOT declarative (what to do) Mutable variables are potentially error prone Difficult to parallelize
  • 26.
    Java 8 streamsto the rescue A sequence of elements “Wrapper” around a collection Streams are lazy, i.e. can be infinite Provides a functional/lambda-based API for transforming, filtering and aggregating elements Much simpler, cleaner and @crichardson declarative code
  • 27.
    @crichardson Using Java8 streams - mapping class Person .. private Set<Friend> friends = ...; public Set<Hometown> hometownsOfFriends() { return friends.stream() .map(f -> f.getPerson().getHometown()) .collect(Collectors.toSet()); } transforming lambda expression
  • 28.
    @crichardson The map()function s1 a b c d e ... s2 = s1.map(f) s2 f(a) f(b) f(c) f(d) f(e) ...
  • 29.
    @crichardson Using Java8 streams - filtering public class SocialNetwork { private Set<Person> people; ... public Set<Person> lonelyPeople() { return people.stream() .filter(p -> p.getFriends().isEmpty()) .collect(Collectors.toSet()); } predicate lambda expression
  • 30.
    Using Java 8streams - friend of friends V1 @crichardson class Person .. public Set<Person> friendOfFriends() { Set<Set<Friend>> fof = friends.stream() .map(friend -> friend.getPerson().friends) .collect(Collectors.toSet()); ... } Using map() => Set of Sets :-( Somehow we need to flatten
  • 31.
    @crichardson Using Java8 streams - mapping class Person .. public Set<Person> friendOfFriends() { return friends.stream() .flatMap(friend -> friend.getPerson().friends.stream()) .map(Friend::getPerson) .filter(person -> person != this) .collect(Collectors.toSet()); } maps and flattens
  • 32.
    @crichardson Chaining withflatMap() s1 a b ... s2 = s1.flatMap(f) s2 f(a)0 f(a)1 f(b)0 f(b)1 f(b)2 ...
  • 33.
    @crichardson Using Java8 streams - reducing public class SocialNetwork { private Set<Person> people; ... public long averageNumberOfFriends() { return people.stream() .map ( p -> p.getFriends().size() ) .reduce(0, (x, y) -> x + y) / people.size(); } int x = 0; for (int y : inputStream) x = x + y return x;
  • 34.
    @crichardson The reduce()function s1 a b c d e ... x = s1.reduce(initial, f) f(f(f(f(f(f(initial, a), b), c), d), e), ...)
  • 35.
    @crichardson Newton's methodfor calculating sqrt(x) It’s an iterative algorithm initial value = guess betterValue = value - (value * value - x) / (2 * value) Iterate until |value - betterValue| < precision
  • 36.
    Functional square rootin Scala Creates an infinite stream: seed, f(seed), f(f(seed)), ..... @crichardson package net.chrisrichardson.fp.scala.squareroot object SquareRootCalculator { def squareRoot(x: Double, precision: Double) : Double = Stream.iterate(x / 2)( value => value - (value * value - x) / (2 * value) ). sliding(2).map( s => (s.head, s.last)). find { case (value , newValue) => Math.abs(value - newValue) < precision}. get._2 } a, b, c, ... => (a, b), (b, c), (c, ...), ... Find the first convergent approximation
  • 37.
    @crichardson Adopting FPwith Java 8 is straightforward Switch your application to Java 8 Start using streams and lambdas Eclipse can refactor anonymous inner classes to lambdas Or write modules in Scala: more expressive and runs on older JVMs
  • 38.
    @crichardson Agenda Whyfunctional programming? Simplifying collection processing Eliminating NullPointerExceptions Simplifying concurrency with Futures and Rx Observables Tackling big data problems with functional programming
  • 39.
    @crichardson Tony’s $1Bmistake “I call it my billion-dollar mistake. It was the invention of the null reference in 1965....But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement...” http://qconlondon.com/london-2009/presentation/ Null+References:+The+Billion+Dollar+Mistake
  • 40.
    Return null ifno friends @crichardson Coding with null pointers class Person public Friend longestFriendship() { Friend result = null; for (Friend friend : friends) { if (result == null || friend.getBecameFriends() .isBefore(result.getBecameFriends())) result = friend; } return result; } Friend oldestFriend = person.longestFriendship(); if (oldestFriend != null) { ... } else { ... } Null check is essential yet easily forgotten
  • 41.
    @crichardson Java 8Optional<T> A wrapper for nullable references It has two states: empty ⇒ throws an exception if you try to get the reference non-empty ⇒ contain a non-null reference Provides methods for: testing whether it has a value, getting the value, ... Use an Optional<T> parameter if caller can pass in null Return reference wrapped in an instance of this type instead of null Uses the type system to explicitly represent nullability
  • 42.
    @crichardson Coding withoptionals class Person public Optional<Friend> longestFriendship() { Friend result = null; for (Friend friend : friends) { if (result == null || friend.getBecameFriends().isBefore(result.getBecameFriends())) result = friend; } return Optional.ofNullable(result); } Optional<Friend> oldestFriend = person.longestFriendship(); // Might throw java.util.NoSuchElementException: No value present // Person dangerous = popularPerson.get(); if (oldestFriend.isPresent) { ...oldestFriend.get() } else { ... }
  • 43.
    Friend whoToCall2 = oldestFriendship.orElseGet(() -> lazilyFindSomeoneElse()); @crichardson Using Optionals - better Optional<Friend> oldestFriendship = ...; Friend whoToCall1 = oldestFriendship.orElse(mother); Friend whoToCall3 = oldestFriendship.orElseThrow( () -> new LonelyPersonException()); Avoid calling isPresent() and get()
  • 44.
    @crichardson Transforming withmap() public class Person { public Optional<Friend> longestFriendship() { return ...; } public Optional<Long> ageDifferenceWithOldestFriend() { Optional<Friend> oldestFriend = longestFriendship(); return oldestFriend.map ( of -> Math.abs(of.getPerson().getAge() - getAge())) ); } Eliminates messy conditional logic
  • 45.
    @crichardson Chaining withflatMap() class Person public Optional<Friend> longestFriendship() {...} public Optional<Friend> longestFriendshipOfLongestFriend() { return longestFriendship() .flatMap(friend -> friend.getPerson().longestFriendship()); } not always a symmetric relationship. :-)
  • 46.
    @crichardson Agenda Whyfunctional programming? Simplifying collection processing Eliminating NullPointerExceptions Simplifying concurrency with Futures and Rx Observables Tackling big data problems with functional programming
  • 47.
    Let’s imagine youare performing a CPU intensive operation @crichardson class Person .. public Set<Hometown> hometownsOfFriends() { return friends.stream() .map(f -> cpuIntensiveOperation(f)) .collect(Collectors.toSet()); }
  • 48.
    Parallel streams =simple concurrency Potentially uses N cores @crichardson class Person .. public Set<Hometown> hometownsOfFriends() { return friends.parallelStream() .map(f -> cpuIntensiveOperation(f)) .collect(Collectors.toSet()); } ⇒ Nx speed up Perhaps this will be faster. Perhaps not
  • 49.
    Let’s imagine thatyou are writing code to display the products in a user’s wish list @crichardson
  • 50.
    @crichardson The needfor concurrency Step #1 Web service request to get the user profile including wish list (list of product Ids) Step #2 For each productId: web service request to get product info Sequentially ⇒ terrible response time Need fetch productInfo concurrently Composing sequential + scatter/gather-style operations is very common
  • 51.
    @crichardson Futures area great concurrency abstraction http://en.wikipedia.org/wiki/Futures_and_promises
  • 52.
    Composition with futures Worker thread or event-driven code @crichardson Main thread Future 1 Outcome Future 2 Client get Asynchronous operation 2 set initiates Asynchronous operation 1 Outcome get set
  • 53.
    @crichardson Benefits Simpleway for multiple concurrent activities to communicate safely Abstraction: Client does not know how the asynchronous operation is implemented, e.g. thread pool, event-driven, .... Easy to implement scatter/gather: Scatter: Client can invoke multiple asynchronous operations and gets a Future for each one. Gather: Get values from the futures
  • 54.
    @crichardson But compositionwith basic futures is difficult Java 7 future.get([timeout]): Blocking API ⇒ client blocks thread ⇒ poor scalability Difficult to compose multiple concurrent operations Futures with callbacks: e.g. Guava ListenableFutures, Spring 4 ListenableFuture Attach callbacks to all futures and asynchronously consume outcomes But callback-based code = messy code See http://techblog.netflix.com/2013/02/rxjava-netflix-api.html We need functional futures!
  • 55.
    Asynchronously transforms future Calls asyncSquare() with the eventual outcome of asyncPlus(), i.e. chaining @crichardson Functional futures - Scala, Java 8 CompletableFuture def asyncPlus(x : Int, y :Int): Future[Int] = ... x + y ... val future2 = asyncPlus(4, 5).map{ _ * 3 } assertEquals(27, Await.result(future2, 1 second)) def asyncSquare(x : Int) : Future[Int] = ... x * x ... val f2 = asyncPlus(5, 8).flatMap { x => asyncSquare(x) } assertEquals(169, Await.result(f2, 1 second))
  • 56.
    map() etc areasynchronous outcome2 = someFn(outcome1) @crichardson outcome2 f2 Outcome1 f1 f2 = f1 map (someFn) Implemented using callbacks
  • 57.
    @crichardson Scala wishlist service class WishListService(...) { def getWishList(userId : Long) : Future[WishList] = { userService.getUserProfile(userId). Future[UserProfile] map { userProfile => userProfile.wishListProductIds}. flatMap { productIds => val listOfProductFutures = productIds map productInfoService.getProductInfo Future.sequence(listOfProductFutures) }. map { products => WishList(products) } Future[List[Long]] List[Future[ProductInfo]] Future[List[ProductInfo]] Future[WishList]
  • 58.
    Using Java 8CompletableFutures flatMap()! map()! @crichardson public CompletableFuture<Wishlist> getWishlistDetails(long userId) { return userService.getUserProfile(userId).thenComposeAsync(userProfile -> { Stream<CompletableFuture<ProductInfo>> s1 = userProfile.getWishListProductIds() .stream() .map(productInfoService::getProductInfo); Stream<CompletableFuture<List<ProductInfo>>> s2 = s1.map(fOfPi -> fOfPi.thenApplyAsync(pi -> Arrays.asList(pi))); CompletableFuture<List<ProductInfo>> productInfos = s2 .reduce((f1, f2) -> f1.thenCombine(f2, ListUtils::union)) .orElse(CompletableFuture.completedFuture(Collections.emptyList())); return productInfos.thenApply(list -> new Wishlist()); }); } Java 8 is missing Future.sequence()
  • 59.
    Your mouse isyour database @crichardson Erik Meijer http://queue.acm.org/detail.cfm?id=2169076
  • 60.
    @crichardson Introducing Reactive Extensions (Rx) The Reactive Extensions (Rx) is a library for composing asynchronous and event-based programs .... Using Rx, developers represent asynchronous data streams with Observables , query asynchronous data streams using LINQ operators , and ..... https://rx.codeplex.com/
  • 61.
    @crichardson About RxJava Reactive Extensions (Rx) for the JVM Developed by Netflix Original motivation was to provide rich, functional Futures Implemented in Java Adaptors for Scala, Groovy and Clojure Embraced by Akka and Spring Reactor: http://www.reactive-streams. org/ https://github.com/Netflix/RxJava
  • 62.
    An asynchronous streamof items @crichardson RxJava core concepts trait Observable[T] { def subscribe(observer : Observer[T]) : Subscription ... } Notifies trait Observer[T] { def onNext(value : T) def onCompleted() def onError(e : Throwable) } Used to unsubscribe
  • 63.
    Comparing Observable to... Observer pattern - similar but adds Observer.onComplete() Observer.onError() Iterator pattern - mirror image Push rather than pull Futures - similar Can be used as Futures But Observables = a stream of multiple values Collections and Streams - similar Functional API supporting map(), flatMap(), ... But Observables are asynchronous
  • 64.
    val subscription =ticker.subscribe { (value: Long) => println("value=" + value) } ... subscription.unsubscribe() @crichardson Fun with observables val oneItem = Observable.items(-1L) val every10Seconds = Observable.interval(10 seconds) val ticker = oneItem ++ every10Seconds -1 0 1 ... t=0 t=10 t=20 ...
  • 65.
    Observables as theresult of an asynchronous operation @crichardson def getTableStatus(tableName: String) : Observable[DynamoDbStatus]= Observable { subscriber: Subscriber[DynamoDbStatus] => } amazonDynamoDBAsyncClient.describeTableAsync( new DescribeTableRequest(tableName), new AsyncHandler[DescribeTableRequest, DescribeTableResult] { override def onSuccess(request: DescribeTableRequest, result: DescribeTableResult) = { subscriber.onNext(DynamoDbStatus(result.getTable.getTableStatus)) subscriber.onCompleted() } override def onError(exception: Exception) = exception match { case t: ResourceNotFoundException => subscriber.onNext(DynamoDbStatus("NOT_FOUND")) subscriber.onCompleted() case _ => subscriber.onError(exception) } }) }
  • 66.
    @crichardson Transforming/chaining observableswith flatMap() val tableStatus = ticker.flatMap { i => logger.info("{}th describe table", i + 1) getTableStatus(name) } Status1 Status2 Status3 ... t=0 t=10 t=20 ... + Usual collection methods: map(), filter(), take(), drop(), ...
  • 67.
    @crichardson Calculating rollingaverage class AverageTradePriceCalculator { def calculateAverages(trades: Observable[Trade]): Observable[AveragePrice] = { ... } case class Trade( symbol : String, price : Double, quantity : Int ... ) case class AveragePrice( symbol : String, price : Double, ...)
  • 68.
    @crichardson Calculating averageprices def calculateAverages(trades: Observable[Trade]): Observable[AveragePrice] = { trades.groupBy(_.symbol).map { symbolAndTrades => val (symbol, tradesForSymbol) = symbolAndTrades val openingEverySecond = Observable.items(-1L) ++ Observable.interval(1 seconds) def closingAfterSixSeconds(opening: Any) = Observable.interval(6 seconds).take(1) tradesForSymbol.window(...).map { windowOfTradesForSymbol => windowOfTradesForSymbol.fold((0.0, 0, List[Double]())) { (soFar, trade) => val (sum, count, prices) = soFar (sum + trade.price, count + trade.quantity, trade.price +: prices) } map { x => val (sum, length, prices) = x AveragePrice(symbol, sum / length, prices) } }.flatten }.flatten }
  • 69.
    @crichardson Agenda Whyfunctional programming? Simplifying collection processing Eliminating NullPointerExceptions Simplifying concurrency with Futures and Rx Observables Tackling big data problems with functional programming
  • 70.
    Let’s imagine thatyou want to count word frequencies @crichardson
  • 71.
    @crichardson Scala WordCount val frequency : Map[String, Int] = Source.fromFile("gettysburgaddress.txt").getLines() .flatMap { _.split(" ") }.toList .groupBy(identity) .mapValues(_.length)) frequency("THE") should be(11) frequency("LIBERTY") should be(1) Map Reduce
  • 72.
    But how toscale to a cluster @crichardson of machines?
  • 73.
    @crichardson Apache Hadoop Open-source ecosystem for reliable, scalable, distributed computing Hadoop Distributed File System (HDFS) Efficiently stores very large amounts of data Files are partitioned and replicated across multiple machines Hadoop MapReduce Batch processing system Provides plumbing for writing distributed jobs Handles failures And, much, much more...
  • 74.
    @crichardson Overview ofMapReduce Input Data Mapper Mapper Mapper Reducer Reducer Reducer Out put Data Shuffle (K,V) (K,V) (K,V) (K,V)* (K,V)* (K,V)* (K1,V, ....)* (K2,V, ....)* (K3,V, ....)* (K,V) (K,V) (K,V)
  • 75.
    http://wiki.apache.org/hadoop/WordCount @crichardson MapReduceWord count - mapper class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } Four score and seven years ⇒ (“Four”, 1), (“score”, 1), (“and”, 1), (“seven”, 1), ...
  • 76.
    @crichardson Hadoop thenshuffles the key-value pairs...
  • 77.
    @crichardson MapReduce Wordcount - reducer class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context) { int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } (“the”, (1, 1, 1, 1, 1, 1, ...)) ⇒ (“the”, 11) http://wiki.apache.org/hadoop/WordCount
  • 78.
    @crichardson About MapReduce Very simple programming abstraction yet incredibly powerful By chaining together multiple map/reduce jobs you can process very large amounts of data in interesting ways e.g. Apache Mahout for machine learning But Mappers and Reducers = verbose code Development is challenging, e.g. unit testing is difficult It’s disk-based, batch processing ⇒ slow
  • 79.
    Each row isa map of @crichardson Scalding: Scala DSL for MapReduce class WordCountJob(args : Args) extends Job(args) { TextLine( args("input") ) .flatMap('line -> 'word) { line : String => tokenize(line) } .groupBy('word) { _.size } .write( Tsv( args("output") ) ) def tokenize(text : String) : Array[String] = { text.toLowerCase.replaceAll("[^a-zA-Z0-9s]", "") .split("s+") } } Expressive and unit testable https://github.com/twitter/scalding named fields
  • 80.
    @crichardson Apache Spark Created at UC Berkeley and now part of the Hadoop ecosystem Key abstraction = Resilient Distributed Datasets (RDD) Collection that is partitioned across cluster members Operations are parallelized Created from either a collection or a Hadoop supported datasource - HDFS, S3 etc Can be cached in-memory for super-fast performance Can be replicated for fault-tolerance Scala, Java, and Python APIs http://spark.apache.org
  • 81.
    Spark Word Count val sc = new SparkContext(...) sc.textFile(“s3n://mybucket/...”) Very similar to Scala collection @crichardson .flatMap { _.split(" ")} .groupBy(identity) .mapValues(_.length) .toArray.toMap } code!! } Expressive, unit testable and very fast
  • 82.
    @crichardson Summary Functionalprogramming enables the elegant expression of good ideas in a wide variety of domains map(), flatMap() and reduce() are remarkably versatile higher-order functions Use FP and OOP together Java 8 has taken a good first step towards supporting FP Go write some functional code!
  • 83.
    @crichardson chris@chrisrichardson.net @crichardson Questions? http://plainoldobjects.com