Collectors in the
Wild
@JosePaumard
Collectors?
Why should we be interested in collectors?
▪ They are part of the Stream API
▪ And kind of left aside…
Collectors?
YouTube:
▪ Stream tutorials ~700k
▪ Collectors tutorials < 5k
Collectors?
Why should we be interested in collectors?
▪ They are part of the Stream API
▪ And kind of left aside…
And it’s a pity because it is a very powerful API
@JosePaumard
Microsoft Virtual Academy
Questions?
#ColJ8
movies.stream()
.flatMap(movie -> movie.actors().stream())
.collect(
Collectors.groupingBy(
Function.identity(),
Collectors.counting()
)
)
.entrySet().stream()
.max(Map.Entry.comparingByValue())
.get();
movies.stream()
.collect(
Collectors.groupingBy(
movie -> movie.releaseYear(),
Collector.of(
() -> new HashMap<Actor, AtomicLong>(),
(map, movie) -> {
movie.actors().forEach(
actor -> map.computeIfAbsent(actor, a -> new AtomicLong()).incrementAndGet()
) ;
},
(map1, map2) -> {
map2.entrySet().stream().forEach(
entry -> map1.computeIfAbsent(entry.getKey(), a -> new AtomicLong()).addAndGet(entry.getValue().get())
) ;
return map1 ;
},
new Collector.Characteristics [] {
Collector.Characteristics.CONCURRENT.CONCURRENT
}
)
)
)
.entrySet().stream()
.collect(
Collectors.toMap(
entry5 -> entry5.getKey(),
entry5 -> entry5.getValue()
.entrySet().stream()
.max(Map.Entry.comparingByValue(Comparator.comparing(l -> l.get())))
.get()
)
)
.entrySet()
.stream()
.max(Comparator.comparing(entry -> entry.getValue().getValue().get()))
.get();
Do not give bugs a place to
hide!
Brian Goetz
Collectors?
Why should we be interested in collectors?
▪ They are part of the Stream API
▪ And kind of left aside…
And it’s a pity because it is a very powerful API
▪ Even if we can also write unreadable code with it!
Agenda
Quick overview about streams
About collectors
Extending existing collectors
Making a collector readable
Creating new collectors
Composing Collectors
A Few Words on Streams
About Streams
A Stream:
▪ Is an object that connects to a source
▪ Has intermediate & terminal operations
▪ Some of the terminal operations can be collectors
▪ A collector can take more collectors as
parameters
A Stream is…
An object that connects to a source of data and
watch them flow
There is no data « in » a stream ≠ collection
stream
About Streams
On a stream:
▪ Any operation can be modeled with a collector
▪ Why is it interesting?
stream.collect(collector);
Intermediate Operations
stream
1st operation: mapping = changing the type
stream
2nd operation: filtering = removing some objects
3rd operation: flattening
stream
stream
3rd operation: flattening
Map, Filter, FlatMap
Three operations that do not need any buffer
to work
Not the case of all the operations…
Sorting elements using a comparator
The stream needs to see all the elements
before beginning to transmit them
stream
stream
Distinct
The Stream needs to remember all the
elements before transmitting them (or not)
Distinct, sorted
Both operations need a buffer to store all the
elements from the source
Intermediate Operations
2 categories:
- Stateless operations = do not need to
remember anything
- Stateful operations = do need a buffer
Limit and Skip
Two methods that rely on the order of the
elements:
- Limit = keeps the n first elements
- Skip = skips the n first elements
Needs to keep track of the index of the
elements and to process them in order
Terminal Operations
Intermediate vs Terminal
Only a terminal operation triggers the
consuming of the data from the source
movies.stream()
.filter(movie -> movie.releaseYear() == 2007)
.flatMap(movie -> movie.actors().stream())
.map(movie -> movie.getTitle());
Intermediate vs Terminal
Only a terminal operation triggers the
consuming of the data from the source
movies.stream()
.filter(movie -> movie.releaseYear() == 2007)
.flatMap(movie -> movie.actors().stream())
.map(movie -> movie.getTitle())
.forEach(movie -> System.out.println(movie.getTitle()));
Terminal Operations
First batch:
- forEach
- count
- max, min
- reduce
- toArray
Terminal Operations
First batch:
- forEach
- count
- max, min
- reduce
- toArray
Will consume all the data
Terminal Operations
Second Batch:
- allMatch
- anyMatch
- noneMatch
- findFirst
- findAny
Terminal Operations
Second Batch:
- allMatch
- anyMatch
- noneMatch
- findFirst
- findAny
Do not need to consume
all the data = short-circuit
operations
Terminal Operations
Special cases:
- max
- min
- reduce
Returns an Optional (to handle empty streams)
https://www.youtube.com/watch?v=Ej0sss6cq14@StuartMarks
A First Collector
And then there is collect!
The most seen:
Takes a collector as a parameter
List<String> result =
strings.stream()
.filter(s -> s.itEmpty())
.collect(Collectors.toList());
A First Collector (bis)
And then there is collect!
The most seen:
Takes a collector as a parameter
Set<String> result =
strings.stream()
.filter(s -> s.itEmpty())
.collect(Collectors.toSet());
A Second Collector
And then there is collect!
Maybe less known?:
Takes a collector as a parameter
String authors =
authors.stream()
.map(Author::getName)
.collect(Collectors.joining(", "));
Demo Time
A Third Collector
Creating a Map
Map<Integer, List<String>> result =
strings.stream()
.filter(s -> !s.isEmpty())
.collect(
Collectors.groupingBy(
s -> s.length()
)
);
3
4
5
one, two, three, four, five, six, seven, eight, nine, ten
one, two, six, ten
four, five, nine
three, seven, eight
groupingBy(String::length)



Map<Integer, List<String>>
3
4
5
one, two, three, four, five, six, seven, eight, nine, ten
one, two, six, ten
four, five, nine
three, seven, eight
groupingBy(String::length, downstream)



.stream().collect(downstream)
.stream().collect(downstream)
.stream().collect(downstream)
3
4
5
one, two, three, four, five, six, seven, eight, nine, ten
one, two, six, ten
four, five, nine
three, seven, eight
groupingBy(String::length, Collectors.counting())



4L
3L
3L
Map<Integer, Long>
A Third Collector (bis)
Creating a Map
Map<Integer, Long> result =
strings.stream()
.filter(s -> s.itEmpty())
.collect(
Collectors.groupingBy(
s -> s.length(), Collectors.counting()
)
);
Demo Time
A Collector that Counts
Number of articles per author
Gent & Walsh, Beyond NP: The QSAT Phase Transition
Gent & Hoos & Prosser & Walsh, Morphing: Combining…
A1 A2
Gent
Walsh
Gent
Hoos
Prosser
Walsh
flatMap(Article::getAuthors)
Gent & Walsh, Beyond NP: The QSAT Phase Transition
Gent & Hoos & Prosser & Walsh, Morphing: Combining…
Gent, Walsh, Gent, Hoos, Prosser, Walsh
flatMap(Article::getAuthors)
Gent
Walsh
Hoos



2L
2L
1L
Prosser  1L
groupingBy(
)
groupingBy(
identity(),
counting()
)
groupingBy(
identity(),
)
Demo Time
Supply, Accumulate and
Combine
Creating Lists
A closer look at that code:
List<String> result =
strings.stream()
.filter(s -> !s.isEmpty())
.collect(Collectors.toList());
stream a b b
collector
1) Build the list
2) Add elements one
by one
a b c
ArrayList
Creating Lists
1) Building the list: supplier
2) Adding an element to that list: accumulator
Supplier<List> supplier = () -> new ArrayList();
BiConsumer<List<E>, E> accumulator = (list, e) -> list.add(e);
In parallel
Stream
Collector
collector
1) Build a list
2) Add elements one
by one
3) Merge the lists
CPU 2
Stream
Collector
CPU 1
Creating Lists
1) Building the list: supplier
2) Adding an element to that list: accumulator
3) Combining two lists
Supplier<List> supplier = ArrayList::new;
BiConsumer<List<E>, E> accumulator = List::add;
BiConsumer<List<E>, List<E>> combiner = List::addAll;
Creating Lists
So we have:
List<String> result =
strings.stream()
.filter(s -> !s.isEmpty())
.collect(ArrayList::new,
List::add,
List::adAll);
Creating Lists
So we have:
List<String> result =
strings.stream()
.filter(s -> !s.isEmpty())
.collect(ArrayList::new,
Collection::add,
Collection::adAll);
Creating Sets
Almost the same:
Set<String> result =
strings.stream()
.filter(s -> !s.isEmpty())
.collect(HashSet::new,
Collection::add,
Collection::adAll);
String Concatenation
Now we need to create a String by
concatenating the elements using a separator:
« one, two, six »
Works with Streams of Strings
String Concatenation
Let us collect
strings.stream()
.filter(s -> s.length() == 3)
.collect(() -> new String(),
(finalString, s) -> finalString.concat(s),
(s1, s2) -> s1.concat(s2));
String Concatenation
Let us collect
strings.stream()
.filter(s -> s.length() == 3)
.collect(() -> new String(),
(finalString, s) -> finalString.concat(s),
(s1, s2) -> s1.concat(s2));
String Concatenation
Let us collect
strings.stream()
.filter(s -> s.length() == 3)
.collect(() -> new StringBuilder(),
(sb, s) -> sb.append(s),
(sb1, sb2) -> sb1.append(sb2));
String Concatenation
Let us collect
strings.stream()
.filter(s -> s.length() == 3)
.collect(StringBuilder::new,
StringBuilder::append,
StringBuilder::append);
String Concatenation
Let us collect
StringBuilder stringBuilder =
strings.stream()
.filter(s -> s.length() == 3)
.collect(StringBuilder::new,
StringBuilder::append,
StringBuilder::append);
String Concatenation
Let us collect
String string =
strings.stream()
.filter(s -> s.length() == 3)
.collect(StringBuilder::new,
StringBuilder::append,
StringBuilder::append)
.toString();
A Collector is…
3 Operations
- Supplier: creates the mutable container
- Accumulator
- Combiner
A Collector is…
3 + 1 Operations
- Supplier: creates the mutable container
- Accumulator
- Combiner
- Finisher, that can be the identity function
Collecting and Then
And we have a collector for that!
strings.stream()
.filter(s -> s.length() == 3)
.collect(
Collectors.collectingAndThen(
collector,
finisher // Function
)
);
Demo Time
7634L {2004, 7634L}
Map<Long, List<Entry<Integer, Long>>>
7634L {2004, 7634L}
Map<Long, List<Entry<Integer, Long>>>
Entry<Integer, Long> -> Integer = mapping
7634L {2004, 7634L}
Map<Long, List<Entry<Integer, Long>>>
Entry<Integer, Long> -> Integer = mapping
Function<> mapper = entry -> entry.getKey();
Collectors.mapping(mapper, toList());
Demo Time
Collect toMap
Useful for remapping maps
Do not generate duplicate keys!
map.entrySet().stream()
.collect(
Collectors.toMap(
entry -> entry.getKey(),
entry -> // create a new value
)
);
Custom Collectors:
1) Filter, Flat Map
2) Joins
3) Composition
Coffee break!
About Types
The Collector Interface
public interface Collector<T, A, R> {
public Supplier<A> supplier(); // A: mutable container
public BiConsumer<A, T> accumulator(); // T: processed elments
public BinaryOperator<A> combiner(); // Often the type returned
public Function<A, R> finisher(); // Final touch
}
The Collector Interface
public interface Collector<T, A, R> {
public Supplier<A> supplier(); // A: mutable container
public BiConsumer<A, T> accumulator(); // T: processed elments
public BinaryOperator<A> combiner(); // Often the type returned
public Function<A, R> finisher(); // Final touch
public Set<Characteristics> characteristics();
}
Type of a Collector
In a nutshell:
- T: type of the elements of the stream
- A: type the mutable container
- R: type of the final container
We often have A = R
The finisher may be the identity function
≠
one, two, three, four, five, six, seven, eight, nine, ten
groupingBy(String::length)
3
4
5
one, two, six, ten
four, five, nine
three, seven, eight



one, two, three, four, five, six, seven, eight, nine, ten
Collector<String, ?, Map<Integer, List<String>> > c =
groupingBy(String::length)
3
4
5
one, two, six, ten
four, five, nine
three, seven, eight



one, two, three, four, five, six, seven, eight, nine, ten
Collector<String, ?, Map<Integer, List<String>> > c =
groupingBy(String::length)
3
4
5
one, two, six, ten
four, five, nine
three, seven, eight



one, two, three, four, five, six, seven, eight, nine, ten
Collector<String, ?, Map<Integer, List<String>> > c =
groupingBy(String::length)
3
4
5
one, two, six, ten
four, five, nine
three, seven, eight



one, two, three, four, five, six, seven, eight, nine, ten
Collector<String, ?, Map<Integer, List<String>> > c =
groupingBy(
String::length,
?
)
3
4
5
one, two, six, ten
four, five, nine
three, seven, eight



one, two, three, four, five, six, seven, eight, nine, ten
Collector<String, ?, Map<Integer, List<String>> > c =
groupingBy(
String::length,
Collector<String, ?, >
)
3
4
5
one, two, six, ten
four, five, nine
three, seven, eight



one, two, three, four, five, six, seven, eight, nine, ten
Collector<String, ?, Map<Integer, Value>> c =
groupingBy(
String::length,
Collector<String, ?, Value>
)
counting() : Collector<T, ?, Long>
3
4
5
4L
3L
3L



Intermediate Operations
Intermediate Collectors
Back to the mapping collector
This collector takes a downstream collector
stream.collect(mapping(function, downstream));
Intermediate Collectors
The mapping Collector provides an
intermediate operation
stream.collect(mapping(function, downstream));
Intermediate Collectors
The mapping Collector provides an
intermediate operation
Why is it interesting?
To create downstream collectors!
So what about integrating all our stream
processing as a collector?
stream.collect(mapping(function, downstream));
Intermediate Collectors
If collectors can map, why would’nt they filter,
or flatMap?
…in fact they can in 9 ☺
Intermediate Collectors
The mapping Collector provides an
intermediate operation
We have a Stream<T>
So predicate is a Predicate<T>
Downstream is a Collector<T, ?, R>
stream.collect(mapping(function, downstream));
stream.collect(filtering(predicate, downstream));
Intermediate Collectors
The mapping Collector provides an
intermediate operation
We have a Stream<T>
So flatMapper is a Function<T, Stream<TT>>
And downstream is a Collector<TT, ?, R>
stream.collect(mapping(function, downstream));
stream.collect(flatMapping(flatMapper, downstream));
Demo Time
Characteristics
Three characteristics for the collectors:
- IDENTITY_FINISH: the finisher is the identity
function
- UNORDERED: the collector does not preserve
the order of the elements
- CONCURRENT: the collector is thread safe
Handling Empty Optionals
Two things:
- Make an Optional a Stream
- Remove the empty Streams with flatMap
Map<K, Optional<V>> // with empty Optionals...
-> Map<K, Steam<V>> // with empty Streams
-> Stream<Map.Entry<K, V>> // the empty are gone
-> Map<K, V> // using a toMap
Joins
1) The authors that published the most
together
2) The authors that published the most
together in a year
StreamsUtils to the rescue!
Gent & Walsh, Beyond NP: The QSAT Phase Transition
Gent & Hoos & Prosser & Walsh, Morphing: Combining…
Gent, Hoos, Prosser, Walsh
Gent, Walsh
{Gent, Walsh}
{Gent, Hoos} {Gent, Prosser} {Gent, Walsh}
{Hoos, Prosser} {Hoos, Walsh}
{Prosser, Walsh}
flatMap()
Demo Time
Application
What is interesting in modeling a processing as
a collector?
We can reuse this collector as a downstream
collector for other processings
What About Readability?
Creating composable Collectors
Demo Time
Dealing with Issues
The main issue is the empty stream
A whole stream may have elements
But when we build an histogram, a given
substream may become empty…
Conclusion
API Collector
A very rich API indeed
Quite complex…
One needs to have a very precise idea of the
data processing pipeline
Can be extended!
API Collector
A collector can model a whole processing
Once it is written, it can be passed as a
downstream to another processing pipeline
Can be made composable to improve
readability
https://github.com/JosePaumard
Thank you for your
attention!
Questions?
@JosePaumard
https://github.com/JosePaumard
https://www.slideshare.net/jpaumard
https://www.youtube.com/user/JPaumard

Collectors in the Wild

  • 1.
  • 2.
    Collectors? Why should webe interested in collectors? ▪ They are part of the Stream API ▪ And kind of left aside…
  • 3.
    Collectors? YouTube: ▪ Stream tutorials~700k ▪ Collectors tutorials < 5k
  • 4.
    Collectors? Why should webe interested in collectors? ▪ They are part of the Stream API ▪ And kind of left aside… And it’s a pity because it is a very powerful API
  • 5.
  • 6.
  • 7.
  • 8.
    movies.stream() .collect( Collectors.groupingBy( movie -> movie.releaseYear(), Collector.of( ()-> new HashMap<Actor, AtomicLong>(), (map, movie) -> { movie.actors().forEach( actor -> map.computeIfAbsent(actor, a -> new AtomicLong()).incrementAndGet() ) ; }, (map1, map2) -> { map2.entrySet().stream().forEach( entry -> map1.computeIfAbsent(entry.getKey(), a -> new AtomicLong()).addAndGet(entry.getValue().get()) ) ; return map1 ; }, new Collector.Characteristics [] { Collector.Characteristics.CONCURRENT.CONCURRENT } ) ) ) .entrySet().stream() .collect( Collectors.toMap( entry5 -> entry5.getKey(), entry5 -> entry5.getValue() .entrySet().stream() .max(Map.Entry.comparingByValue(Comparator.comparing(l -> l.get()))) .get() ) ) .entrySet() .stream() .max(Comparator.comparing(entry -> entry.getValue().getValue().get())) .get();
  • 9.
    Do not givebugs a place to hide! Brian Goetz
  • 10.
    Collectors? Why should webe interested in collectors? ▪ They are part of the Stream API ▪ And kind of left aside… And it’s a pity because it is a very powerful API ▪ Even if we can also write unreadable code with it!
  • 11.
    Agenda Quick overview aboutstreams About collectors Extending existing collectors Making a collector readable Creating new collectors Composing Collectors
  • 12.
    A Few Wordson Streams
  • 13.
    About Streams A Stream: ▪Is an object that connects to a source ▪ Has intermediate & terminal operations ▪ Some of the terminal operations can be collectors ▪ A collector can take more collectors as parameters
  • 14.
    A Stream is… Anobject that connects to a source of data and watch them flow There is no data « in » a stream ≠ collection stream
  • 15.
    About Streams On astream: ▪ Any operation can be modeled with a collector ▪ Why is it interesting? stream.collect(collector);
  • 16.
  • 17.
    stream 1st operation: mapping= changing the type
  • 18.
    stream 2nd operation: filtering= removing some objects
  • 19.
  • 20.
  • 21.
    Map, Filter, FlatMap Threeoperations that do not need any buffer to work Not the case of all the operations…
  • 22.
    Sorting elements usinga comparator The stream needs to see all the elements before beginning to transmit them stream
  • 23.
    stream Distinct The Stream needsto remember all the elements before transmitting them (or not)
  • 24.
    Distinct, sorted Both operationsneed a buffer to store all the elements from the source
  • 25.
    Intermediate Operations 2 categories: -Stateless operations = do not need to remember anything - Stateful operations = do need a buffer
  • 26.
    Limit and Skip Twomethods that rely on the order of the elements: - Limit = keeps the n first elements - Skip = skips the n first elements Needs to keep track of the index of the elements and to process them in order
  • 27.
  • 28.
    Intermediate vs Terminal Onlya terminal operation triggers the consuming of the data from the source movies.stream() .filter(movie -> movie.releaseYear() == 2007) .flatMap(movie -> movie.actors().stream()) .map(movie -> movie.getTitle());
  • 29.
    Intermediate vs Terminal Onlya terminal operation triggers the consuming of the data from the source movies.stream() .filter(movie -> movie.releaseYear() == 2007) .flatMap(movie -> movie.actors().stream()) .map(movie -> movie.getTitle()) .forEach(movie -> System.out.println(movie.getTitle()));
  • 30.
    Terminal Operations First batch: -forEach - count - max, min - reduce - toArray
  • 31.
    Terminal Operations First batch: -forEach - count - max, min - reduce - toArray Will consume all the data
  • 32.
    Terminal Operations Second Batch: -allMatch - anyMatch - noneMatch - findFirst - findAny
  • 33.
    Terminal Operations Second Batch: -allMatch - anyMatch - noneMatch - findFirst - findAny Do not need to consume all the data = short-circuit operations
  • 34.
    Terminal Operations Special cases: -max - min - reduce Returns an Optional (to handle empty streams) https://www.youtube.com/watch?v=Ej0sss6cq14@StuartMarks
  • 35.
    A First Collector Andthen there is collect! The most seen: Takes a collector as a parameter List<String> result = strings.stream() .filter(s -> s.itEmpty()) .collect(Collectors.toList());
  • 36.
    A First Collector(bis) And then there is collect! The most seen: Takes a collector as a parameter Set<String> result = strings.stream() .filter(s -> s.itEmpty()) .collect(Collectors.toSet());
  • 37.
    A Second Collector Andthen there is collect! Maybe less known?: Takes a collector as a parameter String authors = authors.stream() .map(Author::getName) .collect(Collectors.joining(", "));
  • 38.
  • 39.
    A Third Collector Creatinga Map Map<Integer, List<String>> result = strings.stream() .filter(s -> !s.isEmpty()) .collect( Collectors.groupingBy( s -> s.length() ) );
  • 40.
    3 4 5 one, two, three,four, five, six, seven, eight, nine, ten one, two, six, ten four, five, nine three, seven, eight groupingBy(String::length)    Map<Integer, List<String>>
  • 41.
    3 4 5 one, two, three,four, five, six, seven, eight, nine, ten one, two, six, ten four, five, nine three, seven, eight groupingBy(String::length, downstream)    .stream().collect(downstream) .stream().collect(downstream) .stream().collect(downstream)
  • 42.
    3 4 5 one, two, three,four, five, six, seven, eight, nine, ten one, two, six, ten four, five, nine three, seven, eight groupingBy(String::length, Collectors.counting())    4L 3L 3L Map<Integer, Long>
  • 43.
    A Third Collector(bis) Creating a Map Map<Integer, Long> result = strings.stream() .filter(s -> s.itEmpty()) .collect( Collectors.groupingBy( s -> s.length(), Collectors.counting() ) );
  • 44.
  • 45.
    A Collector thatCounts Number of articles per author
  • 46.
    Gent & Walsh,Beyond NP: The QSAT Phase Transition Gent & Hoos & Prosser & Walsh, Morphing: Combining… A1 A2 Gent Walsh Gent Hoos Prosser Walsh flatMap(Article::getAuthors)
  • 47.
    Gent & Walsh,Beyond NP: The QSAT Phase Transition Gent & Hoos & Prosser & Walsh, Morphing: Combining… Gent, Walsh, Gent, Hoos, Prosser, Walsh flatMap(Article::getAuthors) Gent Walsh Hoos    2L 2L 1L Prosser  1L groupingBy( ) groupingBy( identity(), counting() ) groupingBy( identity(), )
  • 48.
  • 49.
  • 50.
    Creating Lists A closerlook at that code: List<String> result = strings.stream() .filter(s -> !s.isEmpty()) .collect(Collectors.toList());
  • 51.
    stream a bb collector 1) Build the list 2) Add elements one by one a b c ArrayList
  • 52.
    Creating Lists 1) Buildingthe list: supplier 2) Adding an element to that list: accumulator Supplier<List> supplier = () -> new ArrayList(); BiConsumer<List<E>, E> accumulator = (list, e) -> list.add(e);
  • 53.
    In parallel Stream Collector collector 1) Builda list 2) Add elements one by one 3) Merge the lists CPU 2 Stream Collector CPU 1
  • 54.
    Creating Lists 1) Buildingthe list: supplier 2) Adding an element to that list: accumulator 3) Combining two lists Supplier<List> supplier = ArrayList::new; BiConsumer<List<E>, E> accumulator = List::add; BiConsumer<List<E>, List<E>> combiner = List::addAll;
  • 55.
    Creating Lists So wehave: List<String> result = strings.stream() .filter(s -> !s.isEmpty()) .collect(ArrayList::new, List::add, List::adAll);
  • 56.
    Creating Lists So wehave: List<String> result = strings.stream() .filter(s -> !s.isEmpty()) .collect(ArrayList::new, Collection::add, Collection::adAll);
  • 57.
    Creating Sets Almost thesame: Set<String> result = strings.stream() .filter(s -> !s.isEmpty()) .collect(HashSet::new, Collection::add, Collection::adAll);
  • 58.
    String Concatenation Now weneed to create a String by concatenating the elements using a separator: « one, two, six » Works with Streams of Strings
  • 59.
    String Concatenation Let uscollect strings.stream() .filter(s -> s.length() == 3) .collect(() -> new String(), (finalString, s) -> finalString.concat(s), (s1, s2) -> s1.concat(s2));
  • 60.
    String Concatenation Let uscollect strings.stream() .filter(s -> s.length() == 3) .collect(() -> new String(), (finalString, s) -> finalString.concat(s), (s1, s2) -> s1.concat(s2));
  • 61.
    String Concatenation Let uscollect strings.stream() .filter(s -> s.length() == 3) .collect(() -> new StringBuilder(), (sb, s) -> sb.append(s), (sb1, sb2) -> sb1.append(sb2));
  • 62.
    String Concatenation Let uscollect strings.stream() .filter(s -> s.length() == 3) .collect(StringBuilder::new, StringBuilder::append, StringBuilder::append);
  • 63.
    String Concatenation Let uscollect StringBuilder stringBuilder = strings.stream() .filter(s -> s.length() == 3) .collect(StringBuilder::new, StringBuilder::append, StringBuilder::append);
  • 64.
    String Concatenation Let uscollect String string = strings.stream() .filter(s -> s.length() == 3) .collect(StringBuilder::new, StringBuilder::append, StringBuilder::append) .toString();
  • 65.
    A Collector is… 3Operations - Supplier: creates the mutable container - Accumulator - Combiner
  • 66.
    A Collector is… 3+ 1 Operations - Supplier: creates the mutable container - Accumulator - Combiner - Finisher, that can be the identity function
  • 67.
    Collecting and Then Andwe have a collector for that! strings.stream() .filter(s -> s.length() == 3) .collect( Collectors.collectingAndThen( collector, finisher // Function ) );
  • 68.
  • 69.
    7634L {2004, 7634L} Map<Long,List<Entry<Integer, Long>>>
  • 70.
    7634L {2004, 7634L} Map<Long,List<Entry<Integer, Long>>> Entry<Integer, Long> -> Integer = mapping
  • 71.
    7634L {2004, 7634L} Map<Long,List<Entry<Integer, Long>>> Entry<Integer, Long> -> Integer = mapping Function<> mapper = entry -> entry.getKey(); Collectors.mapping(mapper, toList());
  • 72.
  • 73.
    Collect toMap Useful forremapping maps Do not generate duplicate keys! map.entrySet().stream() .collect( Collectors.toMap( entry -> entry.getKey(), entry -> // create a new value ) );
  • 74.
    Custom Collectors: 1) Filter,Flat Map 2) Joins 3) Composition Coffee break!
  • 75.
  • 76.
    The Collector Interface publicinterface Collector<T, A, R> { public Supplier<A> supplier(); // A: mutable container public BiConsumer<A, T> accumulator(); // T: processed elments public BinaryOperator<A> combiner(); // Often the type returned public Function<A, R> finisher(); // Final touch }
  • 77.
    The Collector Interface publicinterface Collector<T, A, R> { public Supplier<A> supplier(); // A: mutable container public BiConsumer<A, T> accumulator(); // T: processed elments public BinaryOperator<A> combiner(); // Often the type returned public Function<A, R> finisher(); // Final touch public Set<Characteristics> characteristics(); }
  • 78.
    Type of aCollector In a nutshell: - T: type of the elements of the stream - A: type the mutable container - R: type of the final container We often have A = R The finisher may be the identity function ≠
  • 79.
    one, two, three,four, five, six, seven, eight, nine, ten groupingBy(String::length) 3 4 5 one, two, six, ten four, five, nine three, seven, eight   
  • 80.
    one, two, three,four, five, six, seven, eight, nine, ten Collector<String, ?, Map<Integer, List<String>> > c = groupingBy(String::length) 3 4 5 one, two, six, ten four, five, nine three, seven, eight   
  • 81.
    one, two, three,four, five, six, seven, eight, nine, ten Collector<String, ?, Map<Integer, List<String>> > c = groupingBy(String::length) 3 4 5 one, two, six, ten four, five, nine three, seven, eight   
  • 82.
    one, two, three,four, five, six, seven, eight, nine, ten Collector<String, ?, Map<Integer, List<String>> > c = groupingBy(String::length) 3 4 5 one, two, six, ten four, five, nine three, seven, eight   
  • 83.
    one, two, three,four, five, six, seven, eight, nine, ten Collector<String, ?, Map<Integer, List<String>> > c = groupingBy( String::length, ? ) 3 4 5 one, two, six, ten four, five, nine three, seven, eight   
  • 84.
    one, two, three,four, five, six, seven, eight, nine, ten Collector<String, ?, Map<Integer, List<String>> > c = groupingBy( String::length, Collector<String, ?, > ) 3 4 5 one, two, six, ten four, five, nine three, seven, eight   
  • 85.
    one, two, three,four, five, six, seven, eight, nine, ten Collector<String, ?, Map<Integer, Value>> c = groupingBy( String::length, Collector<String, ?, Value> ) counting() : Collector<T, ?, Long> 3 4 5 4L 3L 3L   
  • 86.
  • 87.
    Intermediate Collectors Back tothe mapping collector This collector takes a downstream collector stream.collect(mapping(function, downstream));
  • 88.
    Intermediate Collectors The mappingCollector provides an intermediate operation stream.collect(mapping(function, downstream));
  • 89.
    Intermediate Collectors The mappingCollector provides an intermediate operation Why is it interesting? To create downstream collectors! So what about integrating all our stream processing as a collector? stream.collect(mapping(function, downstream));
  • 90.
    Intermediate Collectors If collectorscan map, why would’nt they filter, or flatMap? …in fact they can in 9 ☺
  • 91.
    Intermediate Collectors The mappingCollector provides an intermediate operation We have a Stream<T> So predicate is a Predicate<T> Downstream is a Collector<T, ?, R> stream.collect(mapping(function, downstream)); stream.collect(filtering(predicate, downstream));
  • 92.
    Intermediate Collectors The mappingCollector provides an intermediate operation We have a Stream<T> So flatMapper is a Function<T, Stream<TT>> And downstream is a Collector<TT, ?, R> stream.collect(mapping(function, downstream)); stream.collect(flatMapping(flatMapper, downstream));
  • 93.
  • 94.
    Characteristics Three characteristics forthe collectors: - IDENTITY_FINISH: the finisher is the identity function - UNORDERED: the collector does not preserve the order of the elements - CONCURRENT: the collector is thread safe
  • 95.
    Handling Empty Optionals Twothings: - Make an Optional a Stream - Remove the empty Streams with flatMap Map<K, Optional<V>> // with empty Optionals... -> Map<K, Steam<V>> // with empty Streams -> Stream<Map.Entry<K, V>> // the empty are gone -> Map<K, V> // using a toMap
  • 96.
    Joins 1) The authorsthat published the most together 2) The authors that published the most together in a year StreamsUtils to the rescue!
  • 97.
    Gent & Walsh,Beyond NP: The QSAT Phase Transition Gent & Hoos & Prosser & Walsh, Morphing: Combining… Gent, Hoos, Prosser, Walsh Gent, Walsh {Gent, Walsh} {Gent, Hoos} {Gent, Prosser} {Gent, Walsh} {Hoos, Prosser} {Hoos, Walsh} {Prosser, Walsh} flatMap()
  • 98.
  • 99.
    Application What is interestingin modeling a processing as a collector? We can reuse this collector as a downstream collector for other processings
  • 100.
    What About Readability? Creatingcomposable Collectors
  • 101.
  • 102.
    Dealing with Issues Themain issue is the empty stream A whole stream may have elements But when we build an histogram, a given substream may become empty…
  • 103.
  • 104.
    API Collector A veryrich API indeed Quite complex… One needs to have a very precise idea of the data processing pipeline Can be extended!
  • 105.
    API Collector A collectorcan model a whole processing Once it is written, it can be passed as a downstream to another processing pipeline Can be made composable to improve readability https://github.com/JosePaumard
  • 106.
    Thank you foryour attention!
  • 107.