Java 8 Parallel Streams
MASUD HASAN
What is stream
 A stream is not a collection , sequence , or a stream of objects
 A stream is an abstraction that holds zero or more values
 Not (necessarily) a collection : values might not be stored anywhere
 Not (necessarily) a sequence : order might not matter
 Values ,not objects : avoids mutation and side effects
Pipelines
 A stream source
 Zero of more intermediate operations
 A terminal operation
Collection.strteam()
.filter(…)
.map(…)
.collect(…);
Parallel Streams
 Sources start with stream(),parallelStream() or other stream factory
 Can be switched using parallel and sequential stream
 Parallel vs sequential is a property of the entire pipeline
 Can’t switch between parallel and sequential in the middle
 Last one wins
 Parallel makes it auto-magically go faster?
NO
collection().stream()
.filter(...)
.parallel()
.map(...)
.sequential()
.collect(...)
entire stream runs sequentially
Parallel stream considerations
 Parallel and sequential stream should give same result
 Parallelism leads to non determinism which is bad
 Encounter order vs processing order
 Stateful vs stateless : side effects
 Accumulation vs Reduction
 Reduction : Identity and associativity
 Explicit nondeterminism can speed things up
 Parallel has a overload , might also slow up things
Parallel vs sequential
Source op1 op2 terminal op
op1 op2
Source op1 op2
op1 op2 terminal op
op1 op2
Encounter order vs processing order
 The ordering of the source determines the ordering in the result
 Processing order is non deterministic
Accumulation vs Reduction
long sum = 0L;
for (long i = 0; i <= 1_000_000L; i++) {
sum += i;
}
A better way : reduction
1+2+3+4+5+6+7+8
{1+2}+ {3+4} + {5+6} + {7+8}
{3 + 7} + {11 + 15}
{10 + 26}
36
Identity Value
 The starting value of each partition in parallel stream
 Becomes the result if the stream is empty
 The values must be correct
 must really a VALUE(immutable)
Associativity
 Reduction operation must be associative in parallel stream
Where are threads
 Stream workload split and dispatched to the common-fork-join pool
 Control over concurrency is explicitly opaque in the api
 Common pool controlled by system properties
When go parallel
 Parallel stream has startup overload
 Typically 1000 misroseconds
 If you computation is shorter , do not even bother
 Consider parallel if N * Q >= 10,000
 N = number of elements
 Q = cost per element
 Assumptions
 Element processing is idependent and source is spliatble
END

Java 8 parallel streams

  • 1.
    Java 8 ParallelStreams MASUD HASAN
  • 2.
    What is stream A stream is not a collection , sequence , or a stream of objects  A stream is an abstraction that holds zero or more values  Not (necessarily) a collection : values might not be stored anywhere  Not (necessarily) a sequence : order might not matter  Values ,not objects : avoids mutation and side effects
  • 3.
    Pipelines  A streamsource  Zero of more intermediate operations  A terminal operation Collection.strteam() .filter(…) .map(…) .collect(…);
  • 4.
    Parallel Streams  Sourcesstart with stream(),parallelStream() or other stream factory  Can be switched using parallel and sequential stream  Parallel vs sequential is a property of the entire pipeline  Can’t switch between parallel and sequential in the middle  Last one wins  Parallel makes it auto-magically go faster? NO collection().stream() .filter(...) .parallel() .map(...) .sequential() .collect(...) entire stream runs sequentially
  • 5.
    Parallel stream considerations Parallel and sequential stream should give same result  Parallelism leads to non determinism which is bad  Encounter order vs processing order  Stateful vs stateless : side effects  Accumulation vs Reduction  Reduction : Identity and associativity  Explicit nondeterminism can speed things up  Parallel has a overload , might also slow up things
  • 6.
    Parallel vs sequential Sourceop1 op2 terminal op op1 op2 Source op1 op2 op1 op2 terminal op op1 op2
  • 7.
    Encounter order vsprocessing order  The ordering of the source determines the ordering in the result  Processing order is non deterministic
  • 8.
    Accumulation vs Reduction longsum = 0L; for (long i = 0; i <= 1_000_000L; i++) { sum += i; }
  • 9.
    A better way: reduction 1+2+3+4+5+6+7+8 {1+2}+ {3+4} + {5+6} + {7+8} {3 + 7} + {11 + 15} {10 + 26} 36
  • 10.
    Identity Value  Thestarting value of each partition in parallel stream  Becomes the result if the stream is empty  The values must be correct  must really a VALUE(immutable)
  • 11.
    Associativity  Reduction operationmust be associative in parallel stream
  • 12.
    Where are threads Stream workload split and dispatched to the common-fork-join pool  Control over concurrency is explicitly opaque in the api  Common pool controlled by system properties
  • 13.
    When go parallel Parallel stream has startup overload  Typically 1000 misroseconds  If you computation is shorter , do not even bother  Consider parallel if N * Q >= 10,000  N = number of elements  Q = cost per element  Assumptions  Element processing is idependent and source is spliatble
  • 14.