SlideShare a Scribd company logo
1 of 89
Download to read offline
Shooting the Rapids:
Getting the Best from Java 8
Streams
Kirk Pepperdine @kcpeppe
Maurice Naftalin @mauricenaftalin
Devoxx Belgium, Nov. 2015
• Specialises in performance tuning
• speaks frequently about performance
• author of performance tuning workshop
• Co-founder
• performance diagnostic tooling
• Java Champion (since 2006)
About Kirk
• Specialises in performance tuning
• speaks frequently about performance
• author of performance tuning workshop
• Co-founder
• performance diagnostic tooling
• Java Champion (since 2006)
About Kirk
About Maurice
About Maurice
About Maurice
Co-author Author
About Maurice
Co-author Author
Java
Champion
JavaOne
Rock Star
Subjects Covered in this Talk
• Background – lambdas and streams
• Performance of our example
• Effect of parallelizing
• Splitting input data efficiently
• When to go parallel
• Parallel streams in the real world
Benchmark Alert
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher.find();

}

};
What is a Lambda?
matcher
matcher.find()
matcher
matcher.find()
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher.find();

}

};
Predicate<Matcher> matches =
What is a Lambda?
matcher
matcher.find()
matcher
matcher.find()
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher.find();

}

};
Predicate<Matcher> matches =
What is a Lambda?
matcher
Predicate<Matcher> matches =
matcher.find()
matcher
matcher.find()
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher.find();

}

};
Predicate<Matcher> matches =
What is a Lambda?
matcherPredicate<Matcher> matches =
matcher.find()
matcher
matcher.find()
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher.find();

}

};
Predicate<Matcher> matches =
What is a Lambda?
matcherPredicate<Matcher> matches =
matcher.find()
->
matcher
matcher.find()
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher.find();

}

};
Predicate<Matcher> matches =
What is a Lambda?
matcherPredicate<Matcher> matches = matcher.find()->
matcher
matcher.find()
Predicate<Matcher> matches = new Predicate<Matcher>() {

@Override

public boolean test(Matcher matcher) {

return matcher.find();

}

};
Predicate<Matcher> matches =
What is a Lambda?
matcherPredicate<Matcher> matches =
A lambda is a function
from arguments to result
matcher.find()->
matcher
matcher.find()
Example: Processing GC Logfile
⋮
2.869: Application time: 1.0001540 seconds
5.342: Application time: 0.0801231 seconds
8.382: Application time: 1.1013574 seconds
⋮
Example: Processing GC Logfile
⋮
2.869: Application time: 1.0001540 seconds
5.342: Application time: 0.0801231 seconds
8.382: Application time: 1.1013574 seconds
⋮
DoubleSummaryStatistics
{count=3, sum=2.181635, min=0.080123, average=0.727212, max=1.101357}
Old School Code
DoubleSummaryStatistic summary = new DoubleSummaryStatistic();
Pattern stoppedTimePattern =
Pattern.compile("Application time: (d+.d+)");


while ( ( logRecord = logFileReader.readLine()) != null) {

Matcher matcher = stoppedTimePattern.matcher(logRecord);

if ( matcher.find()) {
double value = Double.parseDouble( matcher.group(1));

summary.add( value);

}

}
Old School Code
DoubleSummaryStatistic summary = new DoubleSummaryStatistic();
Pattern stoppedTimePattern =
Pattern.compile("Application time: (d+.d+)");


while ( ( logRecord = logFileReader.readLine()) != null) {

Matcher matcher = stoppedTimePattern.matcher(logRecord);

if ( matcher.find()) {
double value = Double.parseDouble( matcher.group(1));

summary.add( value);

}

}
Let’s look at the features in this code
Data Source
DoubleSummaryStatistic summary = new DoubleSummaryStatistic();
Pattern stoppedTimePattern =
Pattern.compile("Application time: (d+.d+)");


while ( ( logRecord = logFileReader.readLine()) != null) {

Matcher matcher = stoppedTimePattern.matcher(logRecord);

if ( matcher.find()) {
double value = Double.parseDouble( matcher.group(1));

summary.add( value);

}

}
Map to Matcher
DoubleSummaryStatistic summary = new DoubleSummaryStatistic();
Pattern stoppedTimePattern =
Pattern.compile("Application time: (d+.d+)");


while ( ( logRecord = logFileReader.readLine()) != null) {

Matcher matcher = stoppedTimePattern.matcher(logRecord);

if ( matcher.find()) {
double value = Double.parseDouble( matcher.group(1));

summary.add( value);

}

}
Filter
DoubleSummaryStatistic summary = new DoubleSummaryStatistic();
Pattern stoppedTimePattern =
Pattern.compile("Application time: (d+.d+)");


while ( ( logRecord = logFileReader.readLine()) != null) {

Matcher matcher = stoppedTimePattern.matcher(logRecord);

if ( matcher.find()) {
double value = Double.parseDouble( matcher.group(1));

summary.add( value);

}

}
Map to Double
DoubleSummaryStatistic summary = new DoubleSummaryStatistic();
Pattern stoppedTimePattern =
Pattern.compile("Application time: (d+.d+)");


while ( ( logRecord = logFileReader.readLine()) != null) {

Matcher matcher = stoppedTimePattern.matcher(logRecord);

if ( matcher.find()) {
double value = Double.parseDouble( matcher.group(1));

summary.add( value);

}

}
Collect Results (Reduce)
DoubleSummaryStatistic summary = new DoubleSummaryStatistic();
Pattern stoppedTimePattern =
Pattern.compile("Application time: (d+.d+)");


while ( ( logRecord = logFileReader.readLine()) != null) {

Matcher matcher = stoppedTimePattern.matcher(logRecord);

if ( matcher.find()) {
double value = Double.parseDouble( matcher.group(1));

summary.add( value);

}

}
Java 8 Streams
• A sequence of values,“in motion”
• source and intermediate operations set the stream up lazily
• a terminal operation “pulls” values eagerly down the stream
collection.stream()
.intermediateOp
⋮
.intermediateOp
.terminalOp
Stream Sources
• New method Collection.stream()
• Many other sources:
• Arrays.stream(Object[])
• Streams.of(Object...)
• Stream.iterate(Object,UnaryOperator)
• Files.lines()
• BufferedReader.lines()
• Random.ints()
• JarFile.stream()
• …
Imperative to Stream
DoubleSummaryStatistics statistics =

Files.lines(new File(“gc.log”).toPath())

.map(stoppedTimePattern::matcher)

.filter(Matcher::find)

.map(matcher -> matcher.group(1))
.mapToDouble(Double::parseDouble)
.summaryStatistics();
Stream Source
DoubleSummaryStatistics statistics =

Files.lines(new File(“gc.log”).toPath())

.map(stoppedTimePattern::matcher)

.filter(Matcher::find)

.map(matcher -> matcher.group(1))
.mapToDouble(Double::parseDouble)
.summaryStatistics();
Intermediate Operations
DoubleSummaryStatistics statistics =

Files.lines(new File(“gc.log”).toPath())

.map(stoppedTimePattern::matcher)

.filter(Matcher::find)

.map(matcher -> matcher.group(1))
.mapToDouble(Double::parseDouble)
.summaryStatistics();
Method References
DoubleSummaryStatistics statistics =

Files.lines(new File(“gc.log”).toPath())

.map(stoppedTimePattern::matcher)

.filter(Matcher::find)

.map(matcher -> matcher.group(1))
.mapToDouble(Double::parseDouble)
.summaryStatistics();
Terminal Operation
DoubleSummaryStatistics statistics =

Files.lines(new File(“gc.log”).toPath())

.map(stoppedTimePattern::matcher)

.filter(Matcher::find)

.map(matcher -> matcher.group(1))
.mapToDouble(Double::parseDouble)
.summaryStatistics();
Visualising Sequential Streams
x2x0 x1 x3x0 x1 x2 x3
Source Map Filter Reduction
Intermediate
Operations
Terminal
Operation
“Values in Motion”
Visualising Sequential Streams
x2x0 x1 x3x1 x2 x3 ✔
Source Map Filter Reduction
Intermediate
Operations
Terminal
Operation
“Values in Motion”
Visualising Sequential Streams
x2x0 x1 x3 x1x2 x3 ❌✔
Source Map Filter Reduction
Intermediate
Operations
Terminal
Operation
“Values in Motion”
Visualising Sequential Streams
x2x0 x1 x3 x1x2x3 ❌✔
Source Map Filter Reduction
Intermediate
Operations
Terminal
Operation
“Values in Motion”
Old School: 13.3 secs
Sequential: 13.8 secs
- Should be the same workload
- Stream code is cleaner, easier to read
How Does It Perform?
24M line file, MacBook Pro, Haswell i7, 4 cores, hyperthreaded, Java 9.0
Can We Do Better?
• We might be able to if the workload is parallelizable
• split stream into many segments
• process each segment
• combine results
• Requirements exactly match Fork/Join workflow
x2
Visualizing Parallel Streams
x0
x1
x3
x0
x1
x2
x3
x2
Visualizing Parallel Streams
x0
x1
x3
x0
x1
x2
x3
x2
Visualizing Parallel Streams
x1
x3
x0
x1
x3
✔
❌
x2
Visualizing Parallel Streams
x1 y3
x0
x1
x3
✔
❌
Splitting Stream Sources
• Stream source is a Spliterator
• can both iterate over data and – where possible – split it
Splitting the Data
Splitting the Data
Splitting the Data
Splitting the Data
Splitting the Data
Splitting the Data
Splitting the Data
Splitting the Data
Splitting the Data
Parallel Streams
DoubleSummaryStatistics statistics =

Files.lines(new File(“gc.log”).toPath())
.parallel()

.map(stoppedTimePattern::matcher)

.filter(Matcher::find)

.map(matcher -> matcher.group(1))
.mapToDouble(Double::parseDouble)
.summaryStatistics();
About Fork/Join
• Introduced in Java 7
• draws from a common pool of ForkJoinWorkerThread
• default pool size == HW cores – 1
• assumes workload will be CPU bound
• On its own, not an easy coding idiom
• parallel streams provide an abstraction layer
• Spliterator defines how to split stream
• framework code submits sub-tasks to the common Fork/Join pool
Old School: 13.3 secs
Sequential: 13.8 secs
Parallel: 9.5 secs
- 1.45x faster
- but not 8x faster (????)
How Does That Perform?
24M lines, 2.8GHz 8-core i7, 16GB, OS X, Java 9.0
In Fact!!!!
• Different benchmarks yield a mixed bag of results
• some were better
• some were the same
• some were worse!
Open Questions
• Under what conditions are things better
• or worse
• When should we parallelize
• and when is serial better
Open Questions
• Under what conditions are things better
• or worse
• When should we parallelize
• and when is serial better
Answer depends upon where the bottleneck is
Where is Our Bottleneck?
• I/O operations
• not a surprise, we’re reading from a file
• Java 9 uses FileChannelLineSpliterator
• 2x better than Java 8’s implementation
76.0% 0 + 5941 sun.nio.ch.FileDispatcherImpl.pread0
Poorly Splitting Sources
• Some sources split worse than others
• LinkedList vs ArrayList
• Streaming I/O is problematic
• more threads == more pressure on contended resource
• thrashing and other ill effects
• Workload size doesn’t cover the overheads
Streaming I/O Bottleneck
x2x0 x1 x3x0 x1 x2 x3
Streaming I/O Bottleneck
✔
❌
x2x1x0 x1 x3
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coverage
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coverage
MappedByteBuffer
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coverage
MappedByteBuffer mid
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coverage
MappedByteBuffer mid
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coveragenew spliterator coverage
MappedByteBuffer mid
5.342: … nds
LineSpliterator
2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n
spliterator coveragenew spliterator coverage
MappedByteBuffer mid
Included in JDK9 as FileChannelLinesSpliterator
In-memory Comparison
• Read GC log into an ArrayList prior to processing
Old School: 9.4 secs
Sequential: 9.9 secs
Parallel: 2.7 secs
- 4.25x faster
- better but still not 8x faster
In-memory Comparison
24M lines, 2.8GHz 8 core i7, 16GB, OS X, JDK 9.0
Justifying the Overhead
CPNQ performance model:
C - number of submitters
P - number of CPUs
N - number of elements
Q - cost of the operation
cost of intermediate operations is N * Q
overhead of setting up F/J framework is ~100µs
Amortizing Setup Costs
• N*Q needs to be large
• Q can often only be estimated
• N may only be known at run time
• Rule of thumb, N > 10,000
• P is the number of processors
• P == number for cores for CPU bound
• P < number of cores otherwise
Other Gotchas
• Frequent hand-offs place pressure on thread schedulers
• effect is magnified when a hypervisor is involved
• estimated 80,000 cycles to handoff data between threads
• you can do a lot of processing in 80,000 cycles
• Too many threads places pressure on thread schedulers
• responsible for other ill effects (TTSP)
• too few threads may leave hardware under-utilized
Simulated Server Environment
ExecutorService threadPool = Executors.newFixedThreadPool(10);
threadPool.execute(() -> {
try {
long timer = System.currentTimeMillis();
value = Files.lines( new File(“gc.log").toPath()).parallel()
.map(applicationStoppedTimePattern::matcher)
.filter(Matcher::find)
.map( matcher -> matcher.group(2))
.mapToDouble(Double::parseDouble)
.summaryStatistics().getSum();
} catch (Exception ex) {}
});
Work Flow and Results
• First task to arrive will consume all ForkJoinWorkerThread
• downstream tasks wait for a ForkJoinWorkerThread
• downstream tasks start intermixing with initial task
• Initial task collects dead time as it competes for threads
• all other tasks collect dead time as they either
• compete or wait for a ForkJoinWorkerThread
Work Flow and Results
• First task to arrive will consume all ForkJoinWorkerThread
• downstream tasks wait for a ForkJoinWorkerThread
• downstream tasks start intermixing with initial task
• Initial task collects dead time as it competes for threads
• all other tasks collect dead time as they either
• compete or wait for a ForkJoinWorkerThread
System is stressed beyond capacity
Intermediate Operation Bottleneck
68.6% 1384 + 0 java.util.regex.Pattern$Curly.match
26.6% 521 + 15 java.util.stream.ReferencePipeline$3$1.accept
Intermediate Operation Bottleneck
• Bottleneck is in pattern matching
• but, streaming infrastructure isn’t far behind!
68.6% 1384 + 0 java.util.regex.Pattern$Curly.match
26.6% 521 + 15 java.util.stream.ReferencePipeline$3$1.accept
Tragedy of the Commons
Garrett Hardin, ecologist (1968):
Imagine the grazing of animals on a common ground. Each
flock owner gains if they add to their own flock. But
every animal added to the total degrades the commons a
small amount.
Tragedy of the Commons
Tragedy of the Commons
You have a finite amount of hardware
– it might be in your best interest to grab it all
– but if everyone behaves the same way…
Simulated Server Environment
Simulated Server Environment
• Submit 10 tasks to Fork-Join (via Executor thread-pool)
• first result comes out in 32 seconds
• compared to 9.5 seconds for individually submitted task
• high system time reflects task is I/O bounded
In-MemoryVariation
In-MemoryVariation
• Preload log file
In-MemoryVariation
• Preload log file
• Submit 10 tasks to Fork-Join (via Executor thread-pool)
• first result comes out in 23 seconds
• compared to 4.5 seconds for individually submitted task
• task is CPU bound
Conclusions
Sequential stream performance comparable to imperative code
Going parallel is worthwhile IF
- task is suitable
- expensive enough to amortize setup costs
- no inter-task communication needed
- data source is suitable
- environment is suitable
Need to monitor JDK to understanding bottlenecks
- Fork/Join pool is not well instrumented
Resources
http://gee.cs.oswego.edu/dl/html/StreamParallelGuidance.html
Resources
http://gee.cs.oswego.edu/dl/html/StreamParallelGuidance.html

More Related Content

What's hot

Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015Holden Karau
 
Scalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/CascadingScalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/Cascadingjohnynek
 
HBase RowKey design for Akka Persistence
HBase RowKey design for Akka PersistenceHBase RowKey design for Akka Persistence
HBase RowKey design for Akka PersistenceKonrad Malawski
 
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Holden Karau
 
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018Sunghyouk Bae
 
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Weaving Dataflows with Silk - ScalaMatsuri 2014, TokyoWeaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Weaving Dataflows with Silk - ScalaMatsuri 2014, TokyoTaro L. Saito
 
Beyond parallelize and collect - Spark Summit East 2016
Beyond parallelize and collect - Spark Summit East 2016Beyond parallelize and collect - Spark Summit East 2016
Beyond parallelize and collect - Spark Summit East 2016Holden Karau
 
Ge aviation spark application experience porting analytics into py spark ml p...
Ge aviation spark application experience porting analytics into py spark ml p...Ge aviation spark application experience porting analytics into py spark ml p...
Ge aviation spark application experience porting analytics into py spark ml p...Databricks
 
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...Vyacheslav Lapin
 
Reactive Streams / Akka Streams - GeeCON Prague 2014
Reactive Streams / Akka Streams - GeeCON Prague 2014Reactive Streams / Akka Streams - GeeCON Prague 2014
Reactive Streams / Akka Streams - GeeCON Prague 2014Konrad Malawski
 
Algebird : Abstract Algebra for big data analytics. Devoxx 2014
Algebird : Abstract Algebra for big data analytics. Devoxx 2014Algebird : Abstract Algebra for big data analytics. Devoxx 2014
Algebird : Abstract Algebra for big data analytics. Devoxx 2014Samir Bessalah
 
2014 akka-streams-tokyo-japanese
2014 akka-streams-tokyo-japanese2014 akka-streams-tokyo-japanese
2014 akka-streams-tokyo-japaneseKonrad Malawski
 
Kotlin @ Coupang Backend 2017
Kotlin @ Coupang Backend 2017Kotlin @ Coupang Backend 2017
Kotlin @ Coupang Backend 2017Sunghyouk Bae
 
Introduction of failsafe
Introduction of failsafeIntroduction of failsafe
Introduction of failsafeSunghyouk Bae
 
Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applicationsKnoldus Inc.
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Dan Lynn
 
あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法x1 ichi
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012Dan Lynn
 
Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Stormthe100rabh
 
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016Holden Karau
 

What's hot (20)

Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015
 
Scalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/CascadingScalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/Cascading
 
HBase RowKey design for Akka Persistence
HBase RowKey design for Akka PersistenceHBase RowKey design for Akka Persistence
HBase RowKey design for Akka Persistence
 
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
Apache Spark Structured Streaming for Machine Learning - StrataConf 2016
 
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
 
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Weaving Dataflows with Silk - ScalaMatsuri 2014, TokyoWeaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
 
Beyond parallelize and collect - Spark Summit East 2016
Beyond parallelize and collect - Spark Summit East 2016Beyond parallelize and collect - Spark Summit East 2016
Beyond parallelize and collect - Spark Summit East 2016
 
Ge aviation spark application experience porting analytics into py spark ml p...
Ge aviation spark application experience porting analytics into py spark ml p...Ge aviation spark application experience porting analytics into py spark ml p...
Ge aviation spark application experience porting analytics into py spark ml p...
 
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
ITSubbotik - как скрестить ежа с ужом или подводные камни внедрения функциона...
 
Reactive Streams / Akka Streams - GeeCON Prague 2014
Reactive Streams / Akka Streams - GeeCON Prague 2014Reactive Streams / Akka Streams - GeeCON Prague 2014
Reactive Streams / Akka Streams - GeeCON Prague 2014
 
Algebird : Abstract Algebra for big data analytics. Devoxx 2014
Algebird : Abstract Algebra for big data analytics. Devoxx 2014Algebird : Abstract Algebra for big data analytics. Devoxx 2014
Algebird : Abstract Algebra for big data analytics. Devoxx 2014
 
2014 akka-streams-tokyo-japanese
2014 akka-streams-tokyo-japanese2014 akka-streams-tokyo-japanese
2014 akka-streams-tokyo-japanese
 
Kotlin @ Coupang Backend 2017
Kotlin @ Coupang Backend 2017Kotlin @ Coupang Backend 2017
Kotlin @ Coupang Backend 2017
 
Introduction of failsafe
Introduction of failsafeIntroduction of failsafe
Introduction of failsafe
 
Unit testing of spark applications
Unit testing of spark applicationsUnit testing of spark applications
Unit testing of spark applications
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.
 
あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
 
Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Storm
 
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
Improving PySpark Performance - Spark Beyond the JVM @ PyData DC 2016
 

Viewers also liked

erlang at hover.in , Devcamp Blr 09
erlang at hover.in , Devcamp Blr 09erlang at hover.in , Devcamp Blr 09
erlang at hover.in , Devcamp Blr 09Bhasker Kode
 
Metasepi team meeting #6: "Snatch-driven development"
Metasepi team meeting #6: "Snatch-driven development"Metasepi team meeting #6: "Snatch-driven development"
Metasepi team meeting #6: "Snatch-driven development"Kiwamu Okabe
 
Présentation Kivy (et projets associés) à Pycon-fr 2013
Présentation Kivy (et projets associés) à Pycon-fr 2013Présentation Kivy (et projets associés) à Pycon-fr 2013
Présentation Kivy (et projets associés) à Pycon-fr 2013Gabriel Pettier
 
Managing gang of chaotic developers is complex at Agile Tour Riga 2012
Managing gang of chaotic developers is complex at Agile Tour Riga 2012Managing gang of chaotic developers is complex at Agile Tour Riga 2012
Managing gang of chaotic developers is complex at Agile Tour Riga 2012Piotr Burdylo
 
Wakanda: a new end-to-end JavaScript platform - JSConf Berlin 2009
Wakanda: a new end-to-end JavaScript platform - JSConf Berlin 2009Wakanda: a new end-to-end JavaScript platform - JSConf Berlin 2009
Wakanda: a new end-to-end JavaScript platform - JSConf Berlin 2009Alexandre Morgaut
 
Vert.x - JDD 2013 (English)
Vert.x - JDD 2013 (English)Vert.x - JDD 2013 (English)
Vert.x - JDD 2013 (English)Bartek Zdanowski
 
Agile Management 2013 - Nie tylko it
Agile Management 2013 - Nie tylko itAgile Management 2013 - Nie tylko it
Agile Management 2013 - Nie tylko itPiotr Burdylo
 
There's a Monster in My Closet: Architecture of a MongoDB-powered Event Proce...
There's a Monster in My Closet: Architecture of a MongoDB-powered Event Proce...There's a Monster in My Closet: Architecture of a MongoDB-powered Event Proce...
There's a Monster in My Closet: Architecture of a MongoDB-powered Event Proce...thegdb
 
A Scalable I/O Manager for GHC
A Scalable I/O Manager for GHCA Scalable I/O Manager for GHC
A Scalable I/O Manager for GHCJohan Tibell
 
O'Reilly ETech Conference: Laszlo RIA
O'Reilly ETech Conference: Laszlo RIAO'Reilly ETech Conference: Laszlo RIA
O'Reilly ETech Conference: Laszlo RIAOliver Steele
 
Federated CDNs: What every service provider should know
Federated CDNs: What every service provider should knowFederated CDNs: What every service provider should know
Federated CDNs: What every service provider should knowPatrick Hurley
 
Jensimmons html5live-responsivedesign
Jensimmons html5live-responsivedesignJensimmons html5live-responsivedesign
Jensimmons html5live-responsivedesignJen Simmons
 
Mendeley presentation
Mendeley presentationMendeley presentation
Mendeley presentationDiogo Provete
 
Monadologie
MonadologieMonadologie
Monadologieleague
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Romain Francois
 
JavaScript Growing Up
JavaScript Growing UpJavaScript Growing Up
JavaScript Growing UpDavid Padbury
 
Be careful when entering a casino (Agile by Example 2012)
Be careful when entering a casino (Agile by Example 2012)Be careful when entering a casino (Agile by Example 2012)
Be careful when entering a casino (Agile by Example 2012)Piotr Burdylo
 
Sneaking Scala through the Back Door
Sneaking Scala through the Back DoorSneaking Scala through the Back Door
Sneaking Scala through the Back DoorDianne Marsh
 

Viewers also liked (20)

erlang at hover.in , Devcamp Blr 09
erlang at hover.in , Devcamp Blr 09erlang at hover.in , Devcamp Blr 09
erlang at hover.in , Devcamp Blr 09
 
Metasepi team meeting #6: "Snatch-driven development"
Metasepi team meeting #6: "Snatch-driven development"Metasepi team meeting #6: "Snatch-driven development"
Metasepi team meeting #6: "Snatch-driven development"
 
Présentation Kivy (et projets associés) à Pycon-fr 2013
Présentation Kivy (et projets associés) à Pycon-fr 2013Présentation Kivy (et projets associés) à Pycon-fr 2013
Présentation Kivy (et projets associés) à Pycon-fr 2013
 
Managing gang of chaotic developers is complex at Agile Tour Riga 2012
Managing gang of chaotic developers is complex at Agile Tour Riga 2012Managing gang of chaotic developers is complex at Agile Tour Riga 2012
Managing gang of chaotic developers is complex at Agile Tour Riga 2012
 
Wakanda: a new end-to-end JavaScript platform - JSConf Berlin 2009
Wakanda: a new end-to-end JavaScript platform - JSConf Berlin 2009Wakanda: a new end-to-end JavaScript platform - JSConf Berlin 2009
Wakanda: a new end-to-end JavaScript platform - JSConf Berlin 2009
 
Vert.x - JDD 2013 (English)
Vert.x - JDD 2013 (English)Vert.x - JDD 2013 (English)
Vert.x - JDD 2013 (English)
 
Agile Management 2013 - Nie tylko it
Agile Management 2013 - Nie tylko itAgile Management 2013 - Nie tylko it
Agile Management 2013 - Nie tylko it
 
Laszlo PyCon 2005
Laszlo PyCon 2005Laszlo PyCon 2005
Laszlo PyCon 2005
 
There's a Monster in My Closet: Architecture of a MongoDB-powered Event Proce...
There's a Monster in My Closet: Architecture of a MongoDB-powered Event Proce...There's a Monster in My Closet: Architecture of a MongoDB-powered Event Proce...
There's a Monster in My Closet: Architecture of a MongoDB-powered Event Proce...
 
A Scalable I/O Manager for GHC
A Scalable I/O Manager for GHCA Scalable I/O Manager for GHC
A Scalable I/O Manager for GHC
 
O'Reilly ETech Conference: Laszlo RIA
O'Reilly ETech Conference: Laszlo RIAO'Reilly ETech Conference: Laszlo RIA
O'Reilly ETech Conference: Laszlo RIA
 
Federated CDNs: What every service provider should know
Federated CDNs: What every service provider should knowFederated CDNs: What every service provider should know
Federated CDNs: What every service provider should know
 
Jensimmons html5live-responsivedesign
Jensimmons html5live-responsivedesignJensimmons html5live-responsivedesign
Jensimmons html5live-responsivedesign
 
Mendeley presentation
Mendeley presentationMendeley presentation
Mendeley presentation
 
Monadologie
MonadologieMonadologie
Monadologie
 
Masters Defense 2013
Masters Defense 2013Masters Defense 2013
Masters Defense 2013
 
Rcpp: Seemless R and C++
Rcpp: Seemless R and C++Rcpp: Seemless R and C++
Rcpp: Seemless R and C++
 
JavaScript Growing Up
JavaScript Growing UpJavaScript Growing Up
JavaScript Growing Up
 
Be careful when entering a casino (Agile by Example 2012)
Be careful when entering a casino (Agile by Example 2012)Be careful when entering a casino (Agile by Example 2012)
Be careful when entering a casino (Agile by Example 2012)
 
Sneaking Scala through the Back Door
Sneaking Scala through the Back DoorSneaking Scala through the Back Door
Sneaking Scala through the Back Door
 

Similar to Shooting the Rapids

AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)Paul Chao
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxpetabridge
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0Petr Zapletal
 
Spark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectMao Geng
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionChetan Khatri
 
Finagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at PinterestFinagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at PinterestPavan Chitumalla
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentationJoseph Adler
 
What is new in java 8 concurrency
What is new in java 8 concurrencyWhat is new in java 8 concurrency
What is new in java 8 concurrencykshanth2101
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaChetan Khatri
 
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...Spark Summit
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Petr Zapletal
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit
 
JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...
JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...
JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...Juan Cruz Nores
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingDatabricks
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupGwen (Chen) Shapira
 

Similar to Shooting the Rapids (20)

AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
NET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptxNET Systems Programming Learned the Hard Way.pptx
NET Systems Programming Learned the Hard Way.pptx
 
Solr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene EuroconSolr @ Etsy - Apache Lucene Eurocon
Solr @ Etsy - Apache Lucene Eurocon
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
 
Spark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van Hovell
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
 
Finagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at PinterestFinagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at Pinterest
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentation
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
What is new in java 8 concurrency
What is new in java 8 concurrencyWhat is new in java 8 concurrency
What is new in java 8 concurrency
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
 
JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...
JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...
JavaOne 2016: Code Generation with JavaCompiler for Fun, Speed and Business P...
 
Lambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter LawreyLambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter Lawrey
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
 

Recently uploaded

WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2
 
WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
WSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2
 
WSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in UgandaWSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in UgandaWSO2
 
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next IntegrationWSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next IntegrationWSO2
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfAzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfryanfarris8
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Eraconfluent
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
WSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital BusinessesWSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital BusinessesWSO2
 
WSO2CON 2024 - Lessons from the Field: Legacy Platforms – It's Time to Let Go...
WSO2CON 2024 - Lessons from the Field: Legacy Platforms – It's Time to Let Go...WSO2CON 2024 - Lessons from the Field: Legacy Platforms – It's Time to Let Go...
WSO2CON 2024 - Lessons from the Field: Legacy Platforms – It's Time to Let Go...WSO2
 

Recently uploaded (20)

WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - Kanchana
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid EnvironmentsWSO2Con2024 - Software Delivery in Hybrid Environments
WSO2Con2024 - Software Delivery in Hybrid Environments
 
WSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in UgandaWSO2CON 2024 - Building a Digital Government in Uganda
WSO2CON 2024 - Building a Digital Government in Uganda
 
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next IntegrationWSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
WSO2CON2024 - Why Should You Consider Ballerina for Your Next Integration
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdfAzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public AdministrationWSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
WSO2CON 2024 - How CSI Piemonte Is Apifying the Public Administration
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
WSO2Con2024 - GitOps in Action: Navigating Application Deployment in the Plat...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
WSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital BusinessesWSO2CON 2024 - Software Engineering for Digital Businesses
WSO2CON 2024 - Software Engineering for Digital Businesses
 
WSO2CON 2024 - Lessons from the Field: Legacy Platforms – It's Time to Let Go...
WSO2CON 2024 - Lessons from the Field: Legacy Platforms – It's Time to Let Go...WSO2CON 2024 - Lessons from the Field: Legacy Platforms – It's Time to Let Go...
WSO2CON 2024 - Lessons from the Field: Legacy Platforms – It's Time to Let Go...
 

Shooting the Rapids

  • 1. Shooting the Rapids: Getting the Best from Java 8 Streams Kirk Pepperdine @kcpeppe Maurice Naftalin @mauricenaftalin Devoxx Belgium, Nov. 2015
  • 2. • Specialises in performance tuning • speaks frequently about performance • author of performance tuning workshop • Co-founder • performance diagnostic tooling • Java Champion (since 2006) About Kirk
  • 3. • Specialises in performance tuning • speaks frequently about performance • author of performance tuning workshop • Co-founder • performance diagnostic tooling • Java Champion (since 2006) About Kirk
  • 8. Subjects Covered in this Talk • Background – lambdas and streams • Performance of our example • Effect of parallelizing • Splitting input data efficiently • When to go parallel • Parallel streams in the real world
  • 10. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; What is a Lambda? matcher matcher.find() matcher matcher.find()
  • 11. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcher matcher.find() matcher matcher.find()
  • 12. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcher Predicate<Matcher> matches = matcher.find() matcher matcher.find()
  • 13. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcherPredicate<Matcher> matches = matcher.find() matcher matcher.find()
  • 14. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcherPredicate<Matcher> matches = matcher.find() -> matcher matcher.find()
  • 15. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcherPredicate<Matcher> matches = matcher.find()-> matcher matcher.find()
  • 16. Predicate<Matcher> matches = new Predicate<Matcher>() {
 @Override
 public boolean test(Matcher matcher) {
 return matcher.find();
 }
 }; Predicate<Matcher> matches = What is a Lambda? matcherPredicate<Matcher> matches = A lambda is a function from arguments to result matcher.find()-> matcher matcher.find()
  • 17. Example: Processing GC Logfile ⋮ 2.869: Application time: 1.0001540 seconds 5.342: Application time: 0.0801231 seconds 8.382: Application time: 1.1013574 seconds ⋮
  • 18. Example: Processing GC Logfile ⋮ 2.869: Application time: 1.0001540 seconds 5.342: Application time: 0.0801231 seconds 8.382: Application time: 1.1013574 seconds ⋮ DoubleSummaryStatistics {count=3, sum=2.181635, min=0.080123, average=0.727212, max=1.101357}
  • 19. Old School Code DoubleSummaryStatistic summary = new DoubleSummaryStatistic(); Pattern stoppedTimePattern = Pattern.compile("Application time: (d+.d+)"); 
 while ( ( logRecord = logFileReader.readLine()) != null) {
 Matcher matcher = stoppedTimePattern.matcher(logRecord);
 if ( matcher.find()) { double value = Double.parseDouble( matcher.group(1));
 summary.add( value);
 }
 }
  • 20. Old School Code DoubleSummaryStatistic summary = new DoubleSummaryStatistic(); Pattern stoppedTimePattern = Pattern.compile("Application time: (d+.d+)"); 
 while ( ( logRecord = logFileReader.readLine()) != null) {
 Matcher matcher = stoppedTimePattern.matcher(logRecord);
 if ( matcher.find()) { double value = Double.parseDouble( matcher.group(1));
 summary.add( value);
 }
 } Let’s look at the features in this code
  • 21. Data Source DoubleSummaryStatistic summary = new DoubleSummaryStatistic(); Pattern stoppedTimePattern = Pattern.compile("Application time: (d+.d+)"); 
 while ( ( logRecord = logFileReader.readLine()) != null) {
 Matcher matcher = stoppedTimePattern.matcher(logRecord);
 if ( matcher.find()) { double value = Double.parseDouble( matcher.group(1));
 summary.add( value);
 }
 }
  • 22. Map to Matcher DoubleSummaryStatistic summary = new DoubleSummaryStatistic(); Pattern stoppedTimePattern = Pattern.compile("Application time: (d+.d+)"); 
 while ( ( logRecord = logFileReader.readLine()) != null) {
 Matcher matcher = stoppedTimePattern.matcher(logRecord);
 if ( matcher.find()) { double value = Double.parseDouble( matcher.group(1));
 summary.add( value);
 }
 }
  • 23. Filter DoubleSummaryStatistic summary = new DoubleSummaryStatistic(); Pattern stoppedTimePattern = Pattern.compile("Application time: (d+.d+)"); 
 while ( ( logRecord = logFileReader.readLine()) != null) {
 Matcher matcher = stoppedTimePattern.matcher(logRecord);
 if ( matcher.find()) { double value = Double.parseDouble( matcher.group(1));
 summary.add( value);
 }
 }
  • 24. Map to Double DoubleSummaryStatistic summary = new DoubleSummaryStatistic(); Pattern stoppedTimePattern = Pattern.compile("Application time: (d+.d+)"); 
 while ( ( logRecord = logFileReader.readLine()) != null) {
 Matcher matcher = stoppedTimePattern.matcher(logRecord);
 if ( matcher.find()) { double value = Double.parseDouble( matcher.group(1));
 summary.add( value);
 }
 }
  • 25. Collect Results (Reduce) DoubleSummaryStatistic summary = new DoubleSummaryStatistic(); Pattern stoppedTimePattern = Pattern.compile("Application time: (d+.d+)"); 
 while ( ( logRecord = logFileReader.readLine()) != null) {
 Matcher matcher = stoppedTimePattern.matcher(logRecord);
 if ( matcher.find()) { double value = Double.parseDouble( matcher.group(1));
 summary.add( value);
 }
 }
  • 26. Java 8 Streams • A sequence of values,“in motion” • source and intermediate operations set the stream up lazily • a terminal operation “pulls” values eagerly down the stream collection.stream() .intermediateOp ⋮ .intermediateOp .terminalOp
  • 27. Stream Sources • New method Collection.stream() • Many other sources: • Arrays.stream(Object[]) • Streams.of(Object...) • Stream.iterate(Object,UnaryOperator) • Files.lines() • BufferedReader.lines() • Random.ints() • JarFile.stream() • …
  • 28. Imperative to Stream DoubleSummaryStatistics statistics =
 Files.lines(new File(“gc.log”).toPath())
 .map(stoppedTimePattern::matcher)
 .filter(Matcher::find)
 .map(matcher -> matcher.group(1)) .mapToDouble(Double::parseDouble) .summaryStatistics();
  • 29. Stream Source DoubleSummaryStatistics statistics =
 Files.lines(new File(“gc.log”).toPath())
 .map(stoppedTimePattern::matcher)
 .filter(Matcher::find)
 .map(matcher -> matcher.group(1)) .mapToDouble(Double::parseDouble) .summaryStatistics();
  • 30. Intermediate Operations DoubleSummaryStatistics statistics =
 Files.lines(new File(“gc.log”).toPath())
 .map(stoppedTimePattern::matcher)
 .filter(Matcher::find)
 .map(matcher -> matcher.group(1)) .mapToDouble(Double::parseDouble) .summaryStatistics();
  • 31. Method References DoubleSummaryStatistics statistics =
 Files.lines(new File(“gc.log”).toPath())
 .map(stoppedTimePattern::matcher)
 .filter(Matcher::find)
 .map(matcher -> matcher.group(1)) .mapToDouble(Double::parseDouble) .summaryStatistics();
  • 32. Terminal Operation DoubleSummaryStatistics statistics =
 Files.lines(new File(“gc.log”).toPath())
 .map(stoppedTimePattern::matcher)
 .filter(Matcher::find)
 .map(matcher -> matcher.group(1)) .mapToDouble(Double::parseDouble) .summaryStatistics();
  • 33. Visualising Sequential Streams x2x0 x1 x3x0 x1 x2 x3 Source Map Filter Reduction Intermediate Operations Terminal Operation “Values in Motion”
  • 34. Visualising Sequential Streams x2x0 x1 x3x1 x2 x3 ✔ Source Map Filter Reduction Intermediate Operations Terminal Operation “Values in Motion”
  • 35. Visualising Sequential Streams x2x0 x1 x3 x1x2 x3 ❌✔ Source Map Filter Reduction Intermediate Operations Terminal Operation “Values in Motion”
  • 36. Visualising Sequential Streams x2x0 x1 x3 x1x2x3 ❌✔ Source Map Filter Reduction Intermediate Operations Terminal Operation “Values in Motion”
  • 37. Old School: 13.3 secs Sequential: 13.8 secs - Should be the same workload - Stream code is cleaner, easier to read How Does It Perform? 24M line file, MacBook Pro, Haswell i7, 4 cores, hyperthreaded, Java 9.0
  • 38. Can We Do Better? • We might be able to if the workload is parallelizable • split stream into many segments • process each segment • combine results • Requirements exactly match Fork/Join workflow
  • 42. x2 Visualizing Parallel Streams x1 y3 x0 x1 x3 ✔ ❌
  • 43. Splitting Stream Sources • Stream source is a Spliterator • can both iterate over data and – where possible – split it
  • 53. Parallel Streams DoubleSummaryStatistics statistics =
 Files.lines(new File(“gc.log”).toPath()) .parallel()
 .map(stoppedTimePattern::matcher)
 .filter(Matcher::find)
 .map(matcher -> matcher.group(1)) .mapToDouble(Double::parseDouble) .summaryStatistics();
  • 54. About Fork/Join • Introduced in Java 7 • draws from a common pool of ForkJoinWorkerThread • default pool size == HW cores – 1 • assumes workload will be CPU bound • On its own, not an easy coding idiom • parallel streams provide an abstraction layer • Spliterator defines how to split stream • framework code submits sub-tasks to the common Fork/Join pool
  • 55. Old School: 13.3 secs Sequential: 13.8 secs Parallel: 9.5 secs - 1.45x faster - but not 8x faster (????) How Does That Perform? 24M lines, 2.8GHz 8-core i7, 16GB, OS X, Java 9.0
  • 56. In Fact!!!! • Different benchmarks yield a mixed bag of results • some were better • some were the same • some were worse!
  • 57. Open Questions • Under what conditions are things better • or worse • When should we parallelize • and when is serial better
  • 58. Open Questions • Under what conditions are things better • or worse • When should we parallelize • and when is serial better Answer depends upon where the bottleneck is
  • 59. Where is Our Bottleneck? • I/O operations • not a surprise, we’re reading from a file • Java 9 uses FileChannelLineSpliterator • 2x better than Java 8’s implementation 76.0% 0 + 5941 sun.nio.ch.FileDispatcherImpl.pread0
  • 60. Poorly Splitting Sources • Some sources split worse than others • LinkedList vs ArrayList • Streaming I/O is problematic • more threads == more pressure on contended resource • thrashing and other ill effects • Workload size doesn’t cover the overheads
  • 61. Streaming I/O Bottleneck x2x0 x1 x3x0 x1 x2 x3
  • 63. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coverage
  • 64. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coverage MappedByteBuffer
  • 65. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coverage MappedByteBuffer mid
  • 66. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coverage MappedByteBuffer mid
  • 67. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coveragenew spliterator coverage MappedByteBuffer mid
  • 68. 5.342: … nds LineSpliterator 2.869:Applicati … seconds n 8.382: … nds 9.337:App … ndsn n n spliterator coveragenew spliterator coverage MappedByteBuffer mid Included in JDK9 as FileChannelLinesSpliterator
  • 69. In-memory Comparison • Read GC log into an ArrayList prior to processing
  • 70. Old School: 9.4 secs Sequential: 9.9 secs Parallel: 2.7 secs - 4.25x faster - better but still not 8x faster In-memory Comparison 24M lines, 2.8GHz 8 core i7, 16GB, OS X, JDK 9.0
  • 71. Justifying the Overhead CPNQ performance model: C - number of submitters P - number of CPUs N - number of elements Q - cost of the operation cost of intermediate operations is N * Q overhead of setting up F/J framework is ~100µs
  • 72. Amortizing Setup Costs • N*Q needs to be large • Q can often only be estimated • N may only be known at run time • Rule of thumb, N > 10,000 • P is the number of processors • P == number for cores for CPU bound • P < number of cores otherwise
  • 73. Other Gotchas • Frequent hand-offs place pressure on thread schedulers • effect is magnified when a hypervisor is involved • estimated 80,000 cycles to handoff data between threads • you can do a lot of processing in 80,000 cycles • Too many threads places pressure on thread schedulers • responsible for other ill effects (TTSP) • too few threads may leave hardware under-utilized
  • 74. Simulated Server Environment ExecutorService threadPool = Executors.newFixedThreadPool(10); threadPool.execute(() -> { try { long timer = System.currentTimeMillis(); value = Files.lines( new File(“gc.log").toPath()).parallel() .map(applicationStoppedTimePattern::matcher) .filter(Matcher::find) .map( matcher -> matcher.group(2)) .mapToDouble(Double::parseDouble) .summaryStatistics().getSum(); } catch (Exception ex) {} });
  • 75. Work Flow and Results • First task to arrive will consume all ForkJoinWorkerThread • downstream tasks wait for a ForkJoinWorkerThread • downstream tasks start intermixing with initial task • Initial task collects dead time as it competes for threads • all other tasks collect dead time as they either • compete or wait for a ForkJoinWorkerThread
  • 76. Work Flow and Results • First task to arrive will consume all ForkJoinWorkerThread • downstream tasks wait for a ForkJoinWorkerThread • downstream tasks start intermixing with initial task • Initial task collects dead time as it competes for threads • all other tasks collect dead time as they either • compete or wait for a ForkJoinWorkerThread System is stressed beyond capacity
  • 77. Intermediate Operation Bottleneck 68.6% 1384 + 0 java.util.regex.Pattern$Curly.match 26.6% 521 + 15 java.util.stream.ReferencePipeline$3$1.accept
  • 78. Intermediate Operation Bottleneck • Bottleneck is in pattern matching • but, streaming infrastructure isn’t far behind! 68.6% 1384 + 0 java.util.regex.Pattern$Curly.match 26.6% 521 + 15 java.util.stream.ReferencePipeline$3$1.accept
  • 79. Tragedy of the Commons Garrett Hardin, ecologist (1968): Imagine the grazing of animals on a common ground. Each flock owner gains if they add to their own flock. But every animal added to the total degrades the commons a small amount.
  • 80. Tragedy of the Commons
  • 81. Tragedy of the Commons You have a finite amount of hardware – it might be in your best interest to grab it all – but if everyone behaves the same way…
  • 83. Simulated Server Environment • Submit 10 tasks to Fork-Join (via Executor thread-pool) • first result comes out in 32 seconds • compared to 9.5 seconds for individually submitted task • high system time reflects task is I/O bounded
  • 86. In-MemoryVariation • Preload log file • Submit 10 tasks to Fork-Join (via Executor thread-pool) • first result comes out in 23 seconds • compared to 4.5 seconds for individually submitted task • task is CPU bound
  • 87. Conclusions Sequential stream performance comparable to imperative code Going parallel is worthwhile IF - task is suitable - expensive enough to amortize setup costs - no inter-task communication needed - data source is suitable - environment is suitable Need to monitor JDK to understanding bottlenecks - Fork/Join pool is not well instrumented