Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Shooting the Rapids:
Getting the Best from Java 8
Streams
Kirk Pepperdine
@kcpeppe
Maurice Naftalin
@mauricenaftalin
About Kirk
• Specialises in performance tuning
• speaks frequently about performance
• author of performance tuning worksh...
About Kirk
• Specialises in performance tuning
• speaks frequently about performance
• author of performance tuning worksh...
About Maurice
About Maurice
About Maurice
Co-author Author
About Maurice
Co-author Author
Java
Champion
JavaOne
Rock Star
Agenda
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
• Tragedy Of The Com...
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
• Tragedy Of The Com...
Example: Processing GC Logfile
⋮
2.869: Application time: 1.0001540 seconds
5.342: Application time: 0.0801231 seconds
8.38...
Example: Processing GC Logfile
⋮
2.869: Application time: 1.0001540 seconds
5.342: Application time: 0.0801231 seconds
8.38...
sum=2.181635
Example: Processing GC Logfile
⋮
2.869: Application time: 1.0001540 seconds
5.342: Application time: 0.0801231...
Example: Processing GC Logfile
⋮
2.869: Application time: 1.0001540 seconds
5.342: Application time: 0.0801231 seconds
8.38...
Application time: (d+.d+)
Example: Processing GC Logfile
⋮
2.869: Application time: 1.0001540 seconds
5.342: Application ti...
Application time: (d+.d+)
Pattern stoppedTimePattern =
Pattern.compile(" ");
⋮
Matcher matcher = stoppedTimePattern.matche...
Processing GC Logfile: Old School Code
Pattern stoppedTimePattern =
Pattern.compile("Application time: (d+.d+)");
String lo...
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher...
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher...
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher...
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher...
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher...
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher...
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher...
Processing Logfile: Stream Code
DoubleSummaryStatistics summaryStatistics =

logFileReader.lines()

.map(input -> stoppedTi...
Processing Logfile: Stream Code
DoubleSummaryStatistics summaryStatistics =

logFileReader.lines()

.map(input -> stoppedTi...
Processing Logfile: Stream Code
DoubleSummaryStatistics summaryStatistics =

logFileReader.lines()

.map(input -> stoppedTi...
Processing Logfile: Stream Code
DoubleSummaryStatistics summaryStatistics =

logFileReader.lines()

.map(input -> stoppedTi...
Processing Logfile: Stream Code
DoubleSummaryStatistics summaryStatistics =

logFileReader.lines()

.map(input -> stoppedTi...
Processing Logfile: Stream Code
DoubleSummaryStatistics summaryStatistics =

logFileReader.lines()

.map(input -> stoppedTi...
Processing Logfile: Stream Code
DoubleSummaryStatistics summaryStatistics =

logFileReader.lines()

.map(input -> stoppedTi...
Processing Logfile: Stream Code
DoubleSummaryStatistics summaryStatistics =

logFileReader.lines()

.map(input -> stoppedTi...
What is a Stream?
• A sequence of values
• source and intermediate operations set the stream up lazily:
Stream<String> gro...
What is a Stream?
• A sequence of values
• source and intermediate operations set the stream up lazily:
Stream<String> gro...
What is a Stream?
• The terminal operation pulls the values down the stream:
SummaryStatistics statistics =

logFileReader...
Visualising Sequential Streams
x2x0 x1 x3x0 x1 x2 x3
Source Map Filter Reduction
Intermediate
Operations
Terminal
Operatio...
Visualising Sequential Streams
x2x0 x1 x3x1 x2 x3 ✔
Source Map Filter Reduction
Intermediate
Operations
Terminal
Operation...
Visualising Sequential Streams
x2x0 x1 x3 x1x2 x3 ❌✔
Source Map Filter Reduction
Intermediate
Operations
Terminal
Operatio...
Visualising Sequential Streams
x2x0 x1 x3 x1x2x3 ❌✔
Source Map Filter Reduction
Intermediate
Operations
Terminal
Operation...
Old School: 80200ms
Sequential: 25800ms
(>9m lines, MacBook Pro, Haswell i7, 4 cores, hyperthreaded)
Stream code is faster...
Can We Do Better?
Parallel streams make use of multiple cores
• split the data into segments
• each segment processed by i...
Splitting the Data
Implemented by a Spliterator:
Splitting the Data
Implemented by a Spliterator:
Splitting the Data
Implemented by a Spliterator:
Splitting the Data
Implemented by a Spliterator:
Splitting the Data
Implemented by a Spliterator:
Splitting the Data
Implemented by a Spliterator:
Splitting the Data
Implemented by a Spliterator:
Splitting the Data
Implemented by a Spliterator:
Splitting the Data
Implemented by a Spliterator:
x2
Visualizing Parallel Streams
x0
x1
x3
x0
x1
x2
x3
x2
Visualizing Parallel Streams
x0
x1
x3
x0
x1
x2
x3
x2
Visualizing Parallel Streams
x1
x3
x0
x1
x3
✔
❌
x2
Visualizing Parallel Streams
x1 y3
x0
x1
x3
✔
❌
Stream Code
DoubleSummaryStatistics summaryStatistics =

logFileReader.lines().parallel()

.map(stoppedTimePattern::matche...
Results of Going Parallel:
• No benefit from using parallel streams while streaming data
Agenda
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
• Tragedy Of The Com...
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
• Tragedy Of The Com...
Poorly Splitting Sources
• Some sources split much worse than others
– LinkedList vs. ArrayList
Poorly Splitting Sources
• Some sources split much worse than others
– LinkedList vs. ArrayList
• Streaming I/O is bad.
– kills the advantage of go...
• Some sources split much worse than others
– LinkedList vs. ArrayList
• Streaming I/O is bad.
– kills the advantage of go...
Streaming I/O Bottleneck
x2x0 x1 x3x0 x1 x2 x3
Streaming I/O Bottleneck
✔
❌
x2x1x0 x1 x3
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coverage
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coverage
MappedByte...
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coverage
MappedByte...
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coverage
MappedByte...
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coveragenew spliter...
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coveragenew spliter...
StreamingIO: 56s
Spliterator: 88s
(>9m lines, MacBook Pro, Haswell i7, 4 cores, hyperthreaded)
Stream code is faster becau...
When to Use Parallel Streams?
When to Use Parallel Streams?
• Task must be recursively decomposable
– subtasks for each data segment must be independent
When to Use Parallel Streams?
• Task must be recursively decomposable
– subtasks for each data segment must be independent...
When to Use Parallel Streams?
• Task must be recursively decomposable
– subtasks for each data segment must be independent...
When to Use Parallel Streams?
• Task must be recursively decomposable
– subtasks for each data segment must be independent...
When to Use Parallel Streams?
• Task must be recursively decomposable
– subtasks for each data segment must be independent...
Agenda
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
• Tragedy Of The Com...
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
• Tragedy Of The Com...
Tragedy of the Commons
Tragedy of the Commons
You have a finite amount of hardware
– it might be in your best interest to grab it all
– but if eve...
Agenda
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
• Tragedy Of The Com...
Agenda
• Introduction
– lambdas, streams, and a logfile processing problem
• Optimizing stream sources
• Tragedy Of The Com...
Justifying the Overhead
CPNQ performance model:
C - number of submitters
P - number of CPUs
N - number of elements
Q - cos...
Justifying the Overhead
Need to amortize setup costs
– N*Q needs to be large
– Q can often only be estimated
– N often sho...
Don’t Have Too Many Threads!
• Too many threads cause frequent handoffs
• It costs ~80,000 cycles to handoff data between ...
Fork/Join
Fork/Join
• Parallel streams implemented by Fork/Join framework
• added in Java 7, but difficult to code
• parallel streams...
Fork/Join
• Parallel streams implemented by Fork/Join framework
• added in Java 7, but difficult to code
• parallel streams...
Fork/Join
• Parallel streams implemented by Fork/Join framework
• added in Java 7, but difficult to code
• parallel streams...
Fork/Join
• Parallel streams implemented by Fork/Join framework
• added in Java 7, but difficult to code
• parallel streams...
Common Fork/Join Pool
Fork/Join by default uses a common thread pool
- default number of worker threads == number of logic...
Custom Fork/Join Pool
When used inside a ForkJoinPool, the ForkJoinTask.fork()
method uses the current pool:
ForkJoinPool ...
Don’t Have Too Few Threads!
• Fork/Join pool uses a work queue
• If tasks are CPU bound, no use increasing the size of the...
Little’s Law Example
System receives 400 Txs and it takes 100ms to clear a request
- Number of tasks in system = 0.100 * 4...
public final V invoke() {
ForkJoinPool.common.getMonitor().submitTask(this);
int s;
if ((s = doInvoke() & DONE_MASK) != NO...
Conclusions
Sequential stream performance comparable to imperative code
Going parallel is worthwhile IF
- task is suitable...
Questions?
Questions?
Upcoming SlideShare
Loading in …5
×

Shooting the Rapids: Getting the Best from Java 8 Streams

426 views

Published on

Conference talk at JavaOne, San Francisco, October 2015.

Published in: Software
  • Be the first to comment

Shooting the Rapids: Getting the Best from Java 8 Streams

  1. 1. Shooting the Rapids: Getting the Best from Java 8 Streams Kirk Pepperdine @kcpeppe Maurice Naftalin @mauricenaftalin
  2. 2. About Kirk • Specialises in performance tuning • speaks frequently about performance • author of performance tuning workshop • Co-founder jClarity • performance diagnositic tooling • Java Champion (since 2006)
  3. 3. About Kirk • Specialises in performance tuning • speaks frequently about performance • author of performance tuning workshop • Co-founder jClarity • performance diagnositic tooling • Java Champion (since 2006)
  4. 4. About Maurice
  5. 5. About Maurice
  6. 6. About Maurice Co-author Author
  7. 7. About Maurice Co-author Author Java Champion JavaOne Rock Star
  8. 8. Agenda
  9. 9. Agenda • Introduction – lambdas, streams, and a logfile processing problem
  10. 10. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources
  11. 11. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources • Tragedy Of The Commons
  12. 12. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources • Tragedy Of The Commons • Justifying the Overhead
  13. 13. Example: Processing GC Logfile ⋮ 2.869: Application time: 1.0001540 seconds 5.342: Application time: 0.0801231 seconds 8.382: Application time: 1.1013574 seconds ⋮
  14. 14. Example: Processing GC Logfile ⋮ 2.869: Application time: 1.0001540 seconds 5.342: Application time: 0.0801231 seconds 8.382: Application time: 1.1013574 seconds ⋮
  15. 15. sum=2.181635 Example: Processing GC Logfile ⋮ 2.869: Application time: 1.0001540 seconds 5.342: Application time: 0.0801231 seconds 8.382: Application time: 1.1013574 seconds ⋮
  16. 16. Example: Processing GC Logfile ⋮ 2.869: Application time: 1.0001540 seconds 5.342: Application time: 0.0801231 seconds 8.382: Application time: 1.1013574 seconds ⋮ DoubleSummaryStatistics {count=3, sum=2.181635, min=0.080123, average=0.727212, max=1.101357}
  17. 17. Application time: (d+.d+) Example: Processing GC Logfile ⋮ 2.869: Application time: 1.0001540 seconds 5.342: Application time: 0.0801231 seconds 8.382: Application time: 1.1013574 seconds ⋮ Regex:
  18. 18. Application time: (d+.d+) Pattern stoppedTimePattern = Pattern.compile(" "); ⋮ Matcher matcher = stoppedTimePattern.matcher(logRecord); String value = matcher.group(1); Example: Processing GC Logfile
  19. 19. Processing GC Logfile: Old School Code Pattern stoppedTimePattern = Pattern.compile("Application time: (d+.d+)"); String logRecord;
 double value = 0;
 while ( ( logRecord = logFileReader.readLine()) != null) {
 Matcher matcher = stoppedTimePattern.matcher(logRecord);
 if ( matcher.find()) {
 value += (Double.parseDouble( matcher.group(1)));
 }
 }
  20. 20. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; What is a Lambda? matcher matcher.find() matcher matcher.find()
  21. 21. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcher matcher.find() matcher matcher.find()
  22. 22. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcher Predicate<Matcher> matches = matcher.find() matcher matcher.find()
  23. 23. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcherPredicate<Matcher> matches = matcher.find() matcher matcher.find()
  24. 24. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcherPredicate<Matcher> matches = matcher.find() -> matcher matcher.find()
  25. 25. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcherPredicate<Matcher> matches = matcher.find()-> matcher matcher.find()
  26. 26. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcherPredicate<Matcher> matches = A lambda is a function from arguments to result matcher.find()-> matcher matcher.find()
  27. 27. Processing Logfile: Stream Code DoubleSummaryStatistics summaryStatistics =
 logFileReader.lines()
 .map(input -> stoppedTimePattern.matcher(input))
 .filter(matcher -> matcher.find())
 .map(matcher -> matcher.group(1))
 .mapToDouble(s -> Double.parseDouble(s))
 .summaryStatistics();

  28. 28. Processing Logfile: Stream Code DoubleSummaryStatistics summaryStatistics =
 logFileReader.lines()
 .map(input -> stoppedTimePattern.matcher(input))
 .filter(matcher -> matcher.find())
 .map(matcher -> matcher.group(1))
 .mapToDouble(s -> Double.parseDouble(s))
 .summaryStatistics();
 data source
  29. 29. Processing Logfile: Stream Code DoubleSummaryStatistics summaryStatistics =
 logFileReader.lines()
 .map(input -> stoppedTimePattern.matcher(input))
 .filter(matcher -> matcher.find())
 .map(matcher -> matcher.group(1))
 .mapToDouble(s -> Double.parseDouble(s))
 .summaryStatistics();
 start streaming
  30. 30. Processing Logfile: Stream Code DoubleSummaryStatistics summaryStatistics =
 logFileReader.lines()
 .map(input -> stoppedTimePattern.matcher(input))
 .filter(matcher -> matcher.find())
 .map(matcher -> matcher.group(1))
 .mapToDouble(s -> Double.parseDouble(s))
 .summaryStatistics();
 map to Matcher
  31. 31. Processing Logfile: Stream Code DoubleSummaryStatistics summaryStatistics =
 logFileReader.lines()
 .map(input -> stoppedTimePattern.matcher(input))
 .filter(matcher -> matcher.find())
 .map(matcher -> matcher.group(1))
 .mapToDouble(s -> Double.parseDouble(s))
 .summaryStatistics();
 filter out uninteresting bits
  32. 32. Processing Logfile: Stream Code DoubleSummaryStatistics summaryStatistics =
 logFileReader.lines()
 .map(input -> stoppedTimePattern.matcher(input))
 .filter(matcher -> matcher.find())
 .map(matcher -> matcher.group(1))
 .mapToDouble(s -> Double.parseDouble(s))
 .summaryStatistics();
 extract group
  33. 33. Processing Logfile: Stream Code DoubleSummaryStatistics summaryStatistics =
 logFileReader.lines()
 .map(input -> stoppedTimePattern.matcher(input))
 .filter(matcher -> matcher.find())
 .map(matcher -> matcher.group(1))
 .mapToDouble(s -> Double.parseDouble(s))
 .summaryStatistics();
 map String to Double
  34. 34. Processing Logfile: Stream Code DoubleSummaryStatistics summaryStatistics =
 logFileReader.lines()
 .map(input -> stoppedTimePattern.matcher(input))
 .filter(matcher -> matcher.find())
 .map(matcher -> matcher.group(1))
 .mapToDouble(s -> Double.parseDouble(s))
 .summaryStatistics();
 aggregate results
  35. 35. What is a Stream? • A sequence of values • source and intermediate operations set the stream up lazily: Stream<String> groupStream =
 logFileReader.lines()
 .map(stoppedTimePattern::matcher)
 .filter(Matcher::find)
 .map(matcher -> matcher.group(1)) .mapToDouble(Double::parseDouble); Source
  36. 36. What is a Stream? • A sequence of values • source and intermediate operations set the stream up lazily: Stream<String> groupStream =
 logFileReader.lines()
 .map(stoppedTimePattern::matcher)
 .filter(Matcher::find)
 .map(matcher -> matcher.group(1)) .mapToDouble(Double::parseDouble); Intermediate Operations
  37. 37. What is a Stream? • The terminal operation pulls the values down the stream: SummaryStatistics statistics =
 logFileReader.lines()
 .map(stoppedTimePattern::matcher)
 .filter(Matcher::find)
 .map(matcher -> matcher.group(1)) .mapToDouble(Double::parseDouble) .summaryStatistics();Terminal Operation
  38. 38. Visualising Sequential Streams x2x0 x1 x3x0 x1 x2 x3 Source Map Filter Reduction Intermediate Operations Terminal Operation “Values in Motion”
  39. 39. Visualising Sequential Streams x2x0 x1 x3x1 x2 x3 ✔ Source Map Filter Reduction Intermediate Operations Terminal Operation “Values in Motion”
  40. 40. Visualising Sequential Streams x2x0 x1 x3 x1x2 x3 ❌✔ Source Map Filter Reduction Intermediate Operations Terminal Operation “Values in Motion”
  41. 41. Visualising Sequential Streams x2x0 x1 x3 x1x2x3 ❌✔ Source Map Filter Reduction Intermediate Operations Terminal Operation “Values in Motion”
  42. 42. Old School: 80200ms Sequential: 25800ms (>9m lines, MacBook Pro, Haswell i7, 4 cores, hyperthreaded) Stream code is faster because operations are fused How Does That Perform?
  43. 43. Can We Do Better? Parallel streams make use of multiple cores • split the data into segments • each segment processed by its own thread - on its own core – if possible
  44. 44. Splitting the Data Implemented by a Spliterator:
  45. 45. Splitting the Data Implemented by a Spliterator:
  46. 46. Splitting the Data Implemented by a Spliterator:
  47. 47. Splitting the Data Implemented by a Spliterator:
  48. 48. Splitting the Data Implemented by a Spliterator:
  49. 49. Splitting the Data Implemented by a Spliterator:
  50. 50. Splitting the Data Implemented by a Spliterator:
  51. 51. Splitting the Data Implemented by a Spliterator:
  52. 52. Splitting the Data Implemented by a Spliterator:
  53. 53. x2 Visualizing Parallel Streams x0 x1 x3 x0 x1 x2 x3
  54. 54. x2 Visualizing Parallel Streams x0 x1 x3 x0 x1 x2 x3
  55. 55. x2 Visualizing Parallel Streams x1 x3 x0 x1 x3 ✔ ❌
  56. 56. x2 Visualizing Parallel Streams x1 y3 x0 x1 x3 ✔ ❌
  57. 57. Stream Code DoubleSummaryStatistics summaryStatistics =
 logFileReader.lines().parallel()
 .map(stoppedTimePattern::matcher)
 .filter(Matcher::find)
 .map(matcher -> matcher.group(1))
 .mapToDouble(Double::parseDouble)
 .summaryStatistics();
  58. 58. Results of Going Parallel: • No benefit from using parallel streams while streaming data
  59. 59. Agenda
  60. 60. Agenda • Introduction – lambdas, streams, and a logfile processing problem
  61. 61. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources
  62. 62. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources • Tragedy Of The Commons
  63. 63. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources • Tragedy Of The Commons • Justifying the Overhead
  64. 64. Poorly Splitting Sources
  65. 65. • Some sources split much worse than others – LinkedList vs. ArrayList Poorly Splitting Sources
  66. 66. • Some sources split much worse than others – LinkedList vs. ArrayList • Streaming I/O is bad. – kills the advantage of going parallel Poorly Splitting Sources
  67. 67. • Some sources split much worse than others – LinkedList vs. ArrayList • Streaming I/O is bad. – kills the advantage of going parallel Poorly Splitting Sources
  68. 68. Streaming I/O Bottleneck x2x0 x1 x3x0 x1 x2 x3
  69. 69. Streaming I/O Bottleneck ✔ ❌ x2x1x0 x1 x3
  70. 70. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coverage
  71. 71. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coverage MappedByteBuffer
  72. 72. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coverage MappedByteBuffer mid
  73. 73. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coverage MappedByteBuffer mid
  74. 74. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coveragenew spliterator coverage MappedByteBuffer mid
  75. 75. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coveragenew spliterator coverage MappedByteBuffer mid Included in JDK9 as FileChannelLinesSpliterator
  76. 76. StreamingIO: 56s Spliterator: 88s (>9m lines, MacBook Pro, Haswell i7, 4 cores, hyperthreaded) Stream code is faster because operations are fused LineSpliterator – results
  77. 77. When to Use Parallel Streams?
  78. 78. When to Use Parallel Streams? • Task must be recursively decomposable – subtasks for each data segment must be independent
  79. 79. When to Use Parallel Streams? • Task must be recursively decomposable – subtasks for each data segment must be independent • Source must be well-splitting
  80. 80. When to Use Parallel Streams? • Task must be recursively decomposable – subtasks for each data segment must be independent • Source must be well-splitting • Enough hardware to support allVM needs – there may be other business afoot
  81. 81. When to Use Parallel Streams? • Task must be recursively decomposable – subtasks for each data segment must be independent • Source must be well-splitting • Enough hardware to support allVM needs – there may be other business afoot • Overhead of splitting must be justified – intermediate operations need to be expensive – and CPU-bound
  82. 82. When to Use Parallel Streams? • Task must be recursively decomposable – subtasks for each data segment must be independent • Source must be well-splitting • Enough hardware to support allVM needs – there may be other business afoot • Overhead of splitting must be justified – intermediate operations need to be expensive – and CPU-bound http://gee.cs.oswego.edu/dl/html/StreamParallelGuidance.html
  83. 83. Agenda
  84. 84. Agenda • Introduction – lambdas, streams, and a logfile processing problem
  85. 85. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources
  86. 86. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources • Tragedy Of The Commons
  87. 87. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources • Tragedy Of The Commons • Justifying the Overhead
  88. 88. Tragedy of the Commons
  89. 89. Tragedy of the Commons You have a finite amount of hardware – it might be in your best interest to grab it all – but if everyone behaves the same way…
  90. 90. Agenda
  91. 91. Agenda • Introduction – lambdas, streams, and a logfile processing problem
  92. 92. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources
  93. 93. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources • Tragedy Of The Commons
  94. 94. Agenda • Introduction – lambdas, streams, and a logfile processing problem • Optimizing stream sources • Tragedy Of The Commons • Justifying the Overhead
  95. 95. Justifying the Overhead CPNQ performance model: C - number of submitters P - number of CPUs N - number of elements Q - cost of the operation
  96. 96. Justifying the Overhead Need to amortize setup costs – N*Q needs to be large – Q can often only be estimated – N often should be >10,000 elements If P is the number of processors, the formula assumes that intermediate tasks are CPU bound
  97. 97. Don’t Have Too Many Threads! • Too many threads cause frequent handoffs • It costs ~80,000 cycles to handoff data between threads • You can do a lot of processing in 80,000 cycles!
  98. 98. Fork/Join
  99. 99. Fork/Join • Parallel streams implemented by Fork/Join framework • added in Java 7, but difficult to code • parallel streams are more usable
  100. 100. Fork/Join • Parallel streams implemented by Fork/Join framework • added in Java 7, but difficult to code • parallel streams are more usable
  101. 101. Fork/Join • Parallel streams implemented by Fork/Join framework • added in Java 7, but difficult to code • parallel streams are more usable • Each segment of data is submitted as a ForkJoinTask • ForkJoinTask.invoke() spawns a new task • ForkJoinTask.join() retrieves the result
  102. 102. Fork/Join • Parallel streams implemented by Fork/Join framework • added in Java 7, but difficult to code • parallel streams are more usable • Each segment of data is submitted as a ForkJoinTask • ForkJoinTask.invoke() spawns a new task • ForkJoinTask.join() retrieves the result • How Fork/Join works and performs is important to your latency picture
  103. 103. Common Fork/Join Pool Fork/Join by default uses a common thread pool - default number of worker threads == number of logical cores - 1 - (submitting thread is pressed into service) - can configure the pool via system properties: - or create our own pool… java.util.concurrent.ForkJoinPool.common.parallelism
 java.util.concurrent.ForkJoinPool.common.threadFactory
 java.util.concurrent.ForkJoinPool.common.exceptionHandler
  104. 104. Custom Fork/Join Pool When used inside a ForkJoinPool, the ForkJoinTask.fork() method uses the current pool: ForkJoinPool ourOwnPool = new ForkJoinPool(10); ourOwnPool.invoke( () -> stream.parallel(). ⋮
  105. 105. Don’t Have Too Few Threads! • Fork/Join pool uses a work queue • If tasks are CPU bound, no use increasing the size of the thread pool • But if not CPU bound, they are sitting in queue accumulating dead time • Can make thread pool bigger to reduce dead time • Little’s Law tells us Number of tasks in the system =
 Arrival rate * Average service time
  106. 106. Little’s Law Example System receives 400 Txs and it takes 100ms to clear a request - Number of tasks in system = 0.100 * 400 = 40 On an 8 core machine with a CPU bound task - implies 32 tasks are sitting in queue accumulating dead time - Average response time 600 ms of which 500ms is dead time - ~83% of service time is in waiting
  107. 107. public final V invoke() { ForkJoinPool.common.getMonitor().submitTask(this); int s; if ((s = doInvoke() & DONE_MASK) != NORMAL) reportException(s); ForkJoinPool.common.getMonitor().retireTask(this); return getRawResult(); } ForkJoinPool Observability ForkJoinPool comes with no visibility - need to instrument ForkJoinTask.invoke() – gather data from ForkJoinPool to feed into Little’s Law
  108. 108. Conclusions Sequential stream performance comparable to imperative code Going parallel is worthwhile IF - task is suitable - data source is suitable - environment is suitable Need to monitor JDK to understanding bottlenecks - Fork/Join pool is not well instrumented
  109. 109. Questions?
  110. 110. Questions?

×