Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Copyright 2015 Kirk Pepperdine. All rights reserved
Ripping Apart Java 8
Parallel Streams
Kirk Pepperdine

About Me
Specialize in performance tuning

speak frequently about performance

author of performance tuning workshop

Co-founder jClarity

performance diagnositic tooling

Java Champion (since 2006)

Java 8 Lambda expressions
Syntaxtic sugar for a speciﬁc type of inner class
(parameters) -> body;

() -> 42;

(x,y) -> x * y;

Pattern applicationTimePattern = Pattern.
compile("(d+.d+): Application time: (d+.d+)"); 
Function<String,Matcher> match = new Function<String,Matcher>() { 
@Override 
public Matcher apply(String logEntry) { 
return applicationTimePattern.matcher(logEntry); 
} 
};
 
String example = "2.869: Application time: 1.0001540 seconds"; 
Matcher matcher = match.apply(example);

Function<String,Matcher> match =
logEntry -> applicationTimePattern.matcher(logEntry);
 
String example = "2.869: Application time: 1.0001540 seconds"; 
Matcher matcher = match.apply(example);

Predicate<Matcher> matches = new Predicate<Matcher>() { 
@Override 
public boolean test(Matcher matcher) { 
return matcher.find(); 
} 
};
 
String example = "2.869: Application time: 1.0001540 seconds";

Predicate<Matcher> matches = matcher -> matcher.find();
 
String example = "2.869: Application time: 1.0001540 seconds”;
Boolean b = matches.test(match.apply(example));

Function<Matcher,String> extract = matcher -> { 
if ( matches.test( matcher)) 
return matcher.group(2); 
else return ""; 
};
 
String example = "2.869: Application time: 1.0001540 seconds”;
String value = extract.apply(match.apply(example));

Function<Matcher,String> extract = matcher -> { 
if ( matches.test( matcher)) 
return matcher.group(2); 
else return ""; 
};
 
Consumer<Matcher> toDouble = matcher -> { 
if ( matches.test(matcher)) 
pauseTimes.add(Double.parseDouble(extract.apply(matcher))); 
}; 
 
Matcher matcher = match.apply(example); 
toDouble.accept(matcher);

Lazy Execution
for ( int i = 0; i < 5000; i++) { 
LOGGER.fine (() -> "Trace value: " + getValue()); 
}
for ( int i = 0; i < 5000; i++) { 
LOGGER.fine ("Trace value: " + getValue()); 
}
3200ms
0ms

public void processStrings(
List<String> logEntries,
Function<String,Matcher> mapper,
Consumer collector)
{
 
logEntries.forEach(
entry -> collector.accept( mapper.apply(entry)));
main.processStrings(logEntries, match, toDouble);
main.processStrings(logEntries, anotherMatch, toKBytes);

Language changes in Java 8
Method references
String::length

aString::length

ClassName::new
(aString) -> aString::length;

Stream
Abstraction that allows you to apply a set of
functions over the elements in a collection

functions are combined to form a stream pipeline

sequential or parallel

operations divided into intermediate and terminal
operations

Stream
Deﬁned in interface Collection::stream()

many classes implement stream()

Arrays.stream(Object[])

Stream.of(Object[]), .iterate(Object,UnaryOperator)

File.lines(), BufferedReader.lines(), Random.ints(),
JarFile.stream()

Old School Code
public void imperative() throws IOException {

Population pop = new Population();

String logRecord;

double value = 0.0d;

BufferedReader reader = new BufferedReader(new FileReader(gcLogFileName));

while ( ( logRecord = reader.readLine()) != null) {

Matcher matcher = applicationStoppedTimePattern.matcher(logRecord);

if ( matcher.ﬁnd()) {

pop.add(Double.parseDouble( matcher.group(2)));

}

}

System.out.println(pop.toString());

}

gcLogEntries.stream()

 
.map(applicationStoppedTimePattern::matcher)

 
.filter(Matcher::find)

 
.map(matcher -> matcher.group(2))

 
.mapToDouble(Double::parseDouble)

 
.summaryStatistics();
data source
start streaming
map to Matcher
ﬁlter out

uninteresting bits
map to Double
extract group
aggregate results

Parallel Streams
If collection supports Spliterator operations can
be performed in parallel

Spliterator is like an internal iterator

split source into several spliterator

traverse source

Many of the same rules as Iterator

however may work (non-deterministically) with I/O

gcLogEntries.stream().parallel().

 
.map(applicationStoppedTimePattern::matcher)

 
.filter(Matcher::find)

 
.map(matcher -> matcher.group(2))

 
.mapToDouble(Double::parseDouble)

 
.summaryStatistics();
parallelize stream

The ﬁrst time we tried,

the imperative single threaded code

ran faster

(????)

Flat proﬁle of 8.70 secs (638 total ticks): ForkJoinPool.commonPool-worker-3

Compiled + native Method

60.8% 118 + 0 java.util.concurrent.ForkJoinTask.doExec
Raw Proﬁler Output

Flat profile of 8.70 secs (638 total ticks): ForkJoinPool.commonPool-worker-3

Compiled + native Method

60.8% 118 + 0 java.util.concurrent.ForkJoinTask.doExec
Raw Profiler Output
Learned 2 things very quickly

Parallel streams use Fork-Join

Fork-Join comes with some significant over-
head

Predicted that applcations using parallel
streams would
For other reasons….
run very very slowly

Then, witnessing it!

When to use Streams
Tragedy of the commons

Handoff costs

CNPQ performance model

Fork-Join

Tragidy of the Commons

You have a ﬁnite amount of hardware

it might be in your best interest to grab it all

if everyone behaves the same way….
Be a good neighbour

Handoff Costs
It costs about ~80,000 clocks to handoff data
between threads
overhead = 80000*handoff rate/cycles
10%= 80000*handoff rate/2000000000
~2500 handoffs/sec

CPNQ Performance Model
C - number of submitters

P - number of CPUs

N - number of elements

Q - cost of the operation

C/P/N/Q
Need to amortize setup costs

NQ needs to be large

Q can often only be estimated

N often should be > 10,000 elements

P assumes CPU is the limiting resource

Fork-Join
Support for Fork-Join added in Java 7

difﬁcult coding idiom to master

Streams make Fork-Join more reachable

how fork-join works and performs is important to your
latency picture

Implementation
Used internally by a parallel stream

break the stream up into chunks and submit each chunk
as a ForkJoinTask

apply ﬁlter().map().reduce() to each ForkJoinTask

Calls ForkJoinTask invoke() and join() to retrieve results

Common Thread Pool
Fork-Join by default uses a common thread pool

default number of worker threads == number of logical
cores - 1

Always contains at least one thread

Little’s Law
Fork-Join uses a work queue

work queue behavior can be modeled using Little’s Law
Number of tasks in the system is the

Arrival rate * average service time

Little’s Law Example
System sees 400 TPS. It takes 100ms to process a
request

Number of tasks in system = 0.100 * 400 = 40

On an 8 core machine

implies 39 Fork-Join tasks are sitting in queue
accumulating dead time

40*100 = 4000 ms of which 3900 is dead time

97.5% of service time is in waiting

Normal Reaction
if there is sufﬁcient capacity then

make the pool bigger

else

add capacity

or

tune to reduce strength of the dependency

Conﬁguring Common Pool
Size of common ForkJoinPool is

Runtime.getRuntime().availableProcessors() - 1

Can conﬁgure things
-Djava.util.concurrent.ForkJoinPool.common.parallelism=N
-Djava.util.concurrent.ForkJoinPool.common.threadFactory
-Djava.util.concurrent.ForkJoinPool.common.exceptionHandler

All Hands on Deck!!!
Everyone is working together to get it done!

Resource Dependency
If task is CPU bound, you can’t increase the size
of the thread pool

CPU may not be the limiting factor

throughput is limited by another dependent resource

I/O, shared variable (lock)

Adding threads == increased latency

predicted by Little’s Law

ForkJoinPool invoke
ForkJoinPool.invoke(ForkJoinTask) uses the
submitting thread as a worker

If 100 threads all call invoke(), we would have
100+ForkJoinThreads exhausting the limiting resource, e.g.
CPUs, IO, etc.

ForkJoinPool submit/get
ForkJoinPool.submit(Callable).get() suspends the
submitting thread

If 100 jobs all call submit(), the work queue can become
very long, thus adding latency

Our own ForkJoinPool
When used inside a ForkJoinPool, the
ForkJoinTask.fork() method uses the current pool
ForkJoinPool ourOwnPool = new ForkJoinPool(10);

ourOwnPool.invoke(() -> stream.parallel(). …

ForkJoinPool
public void parallel() throws IOException {

ForkJoinPool forkJoinPool = new ForkJoinPool(10);

Stream<String> stream = Files.lines(

new File(gcLogFileName).toPath());

forkJoinPool.submit(() ->

stream.parallel() 
.map(applicationStoppedTimePattern::matcher) 
.ﬁlter(Matcher::ﬁnd) 
.map(matcher -> matcher.group(2)) 
.mapToDouble(Double::parseDouble) 
.summaryStatistics().toString());

}

ForkJoinPool Observability
ForkJoinPool comes with no visibility

need to instrument ForkJoinTask.invoke()

gather needed from ForkJoinPool needed to feed into
Little’s Law

Instrumenting ForkJoinPool
We can get the statistics needed from
ForkJoinPool needed for Little’s Law

need to instrument ForkJoinTask::invoke()
public ﬁnal V invoke() {

ForkJoinPool.common.getMonitor().submitTask(this);

int s;

if ((s = doInvoke() & DONE_MASK) != NORMAL) reportException(s);

ForkJoinPool.common.getMonitor().retireTask(this);

return getRawResult();

}

In an application where you have many
parallel stream operations all running
concurrently performance will be limited
by the size of the common thread pool

Benchmark Alert!!!!!!

Performance
Submitt log parsing to our own ForkJoinPool
new ForkJoinPool(16).submit(() -> ……… ).get()



16 worker
threads
8 worker
threads
4 worker
threads

0
75000
150000
225000
300000
Stream Parallel Flood Stream Flood Parallel

Going parallel might not give you the expected gains

You may not know this until you hit production!

Monitoring internals of JDK is important to
understanding where bottlenecks are

JDK is not all that well instrumented

APIs have changed so you need to re-read the
javadocs

even for your old familiar classes

Performance Seminar
www.kodewerk.com
Java
Performance
Tuning,
May 26-29, Chania
Greece

Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams

Similar to Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams (20)

Recently uploaded

Recently uploaded (20)

Tech talks annual 2015 kirk pepperdine_ripping apart java 8 streams