Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Introduction Motivation Study Conclusion
An Empirical Study on the Use and Misuse of
Java 8 Streams
Raffi Khatchadourian1,2
...
Introduction Motivation Study Conclusion Example
Streaming APIs
Streaming APIs are widely-available in today’s mainstream,...
Introduction Motivation Study Conclusion Example
Streaming API Example in Java >= 8
Consider this simple “widget” class co...
Introduction Motivation Study Conclusion Example
Streaming API Example in Java >= 8
Consider the following Widget client c...
Introduction Motivation Study Conclusion Example
Streaming API Example in Java >= 8
Now suppose we would like to sort the ...
Introduction Motivation Study Conclusion Example
Streaming API Example in Java >= 8
Now suppose we would like to sort the ...
Introduction Motivation Study Conclusion Example
Streaming API Example in Java >= 8
Without using the Streaming API, runni...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Problem
Big data processing traditionally runs in ...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 List<Widget> sortedWidgets =
2 un...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 List<Widget> sortedWidgets =
2 un...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 List<Widget> sortedWidgets =
2 un...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 List<Widget> sortedWidgets =
2 un...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 List<Widget> sortedWidgets =
2 un...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 // collect weights over 43.2 into...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 // collect weights over 43.2 into...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 // collect weights over 43.2 into...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 // sequentially collect into a li...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 // sequentially collect into a li...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 // sequentially collect into a li...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 // collect the first green widget...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 // collect the first green widget...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 // collect the first green widget...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 // collect the first green widget...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 // collect the first green widget...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 // collect distinct widget weight...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 // collect distinct widget weight...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 // collect distinct widget weight...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 // collect distinct widget weight...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 // collect distinct widget weight...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 // collect distinct widget colors...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 // collect distinct widget colors...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 // collect distinct widget colors...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 // collect distinct widget colors...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Stream Example
1 // collect distinct widget colors...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Goals
End Goal
Explore the use and misuse of a pop...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Benefits
Advance our understanding of this emerging...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
Evidence of Streaming API Misuse
Used our automate...
Introduction Motivation Study Conclusion More Examples Desired Outcomes
What Makes Streaming APIs Different?
What makes str...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Subject Selection
Selected up to 34 op...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
DFA for Determining Stream Execution M...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
DFA for Determining Stream Ordering
⊥
...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Static Analyses
Typestate analysis [Fi...
Results
Table: Stream characteristics.
subject KLOC age eps k str seq para ord unord se SIO
bootique 4.91 4.18 362 4 14 14...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Discussion
Not many subjects using SIO...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Configuration
Examined 34 projects that...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Finding Calls
Parsed Abstract Syntax T...
Figure: Stream method calls.
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Discussion I
Four most used stream met...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Discussion II
Not many non-determinist...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Evolution Study Methodology
Could have...
Table: Stream execution mode evolution.
subject vers commits kws s→p p→s
advantageous/qbit 533b367 1,717 0 0 0
bootique/bo...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Discussion I
JacpFX had fourteen chang...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Discussion II
Change set from JacpFX, ...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Discussion III
Change set from koral, ...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Bug Study Methodology
Explored 22 proj...
Table: Studied subjects.
subject KLOC studied periods cmts kws exe
binnavi 328.28 2015-08-19 to 2019-07-17 286 4 4
blueoce...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Quantitative Analysis
Manually examine...
Table: Stream bug/patch category legend.
name description acronym
Bounds Incorrect/Missing Bounds Check BC
Exceptions Inco...
Figure: Studied stream bugs and patches (hierarchical).
Table: Studied stream bugs and patches (nonhierarchical).
subject BC CI CO DO EH FO IA IO MO OS PP RO SS Other Total
binna...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Findings Summary
Bugs, e.g., performan...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Discussion I
Poor performance (PP) dom...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Discussion II
In commit 4f0d62d of Jac...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Discussion III
We also found cases whe...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Discussion IV
Commit 91e9e7b (abbrevia...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Best Practices
Best Practice
Ensure co...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Anti-Patterns I
Anti-Pattern
Using too...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Anti-Patterns II
Consider the followin...
Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs
Anti-Patterns III
New version is more ...
Introduction Motivation Study Conclusion
Conclusion
Studied various facets of stream use and misuse:
1 Method calls on str...
Introduction Motivation Study Conclusion
Conclusion
Studied various facets of stream use and misuse:
1 Method calls on str...
Introduction Motivation Study Conclusion
For Further Reading I
Biboudis, Aggelos et al. (2015). “Streams a la carte: Exten...
Introduction Motivation Study Conclusion
For Further Reading II
Lu, Shan et al. (2008). “Learning from Mistakes: A Compreh...
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

An Empirical Study on the Use and Misuse of Java 8 Streams

Download to read offline

Streaming APIs allow for big data processing of native data structures by providing MapReduce-like operations over these structures. However, unlike traditional big data systems, these data structures typically reside in shared memory accessed by multiple cores. Although popular, this emerging hybrid paradigm opens the door to possibly detrimental behavior, such as thread contention and bugs related to non-execution and non-determinism. This study explores the use and misuse of a popular streaming API, namely, Java 8 Streams. The focus is on how developers decide whether or not to run these operations sequentially or in parallel and bugs both specific and tangential to this paradigm. Our study involved analyzing 34 Java projects and 5.53 million lines of code, along with 719 manually examined code patches. Various automated, including interprocedural static analysis, and manual methodologies were employed. The results indicate that streams are pervasive, stream parallelization is not widely used, and performance is a crosscutting concern that accounted for the majority of fixes. We also present coincidences that both confirm and contradict the results of related studies. The study advances our understanding of streams, as well as benefits practitioners, programming language and API designers, tool developers, and educators alike.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

An Empirical Study on the Use and Misuse of Java 8 Streams

  1. 1. Introduction Motivation Study Conclusion An Empirical Study on the Use and Misuse of Java 8 Streams Raffi Khatchadourian1,2 Yiming Tang2 Mehdi Bagherzadeh3 Baishakhi Ray4 1 City University of New York (CUNY) Hunter College, 2 City University of New York (CUNY) Graduate Center, 3 Oakland University, 4 Columbia University CUNY Graduate Center, April 2, 2020 To appear in International Conference on Fundamental Approaches to Software Engineering, FASE 2020. ETAPS, Springer, April 2020. Preprint available at: http://academicworks.cuny.edu/hc_pubs/610. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 1 / 52
  2. 2. Introduction Motivation Study Conclusion Example Streaming APIs Streaming APIs are widely-available in today’s mainstream, Object-Oriented programming languages [Biboudis et al., 2015]. Allow for big data processing of native data structures (maps and lists). Provide MapReduce-like operations over these structures. Can make writing parallel code easier, less error-prone (avoid data races, thread contention). Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 2 / 52
  3. 3. Introduction Motivation Study Conclusion Example Streaming API Example in Java >= 8 Consider this simple “widget” class consisting of a “color” and “weight:” 1 // Widget class: 2 public class Widget { 3 4 // enumeration: 5 public enum Color { 6 RED, 7 BLUE, 8 GREEN 9 }; 10 11 // instance fields: 12 private Color color; 13 private double weight; 14 15 // continued ... 16 // constructor: 17 Widget(Color c, double w){ 18 this.color = c; 19 this.weight = w; 20 } 21 22 // accessors/mutators: 23 public Color getColor() { 24 return this.color; 25 } 26 27 public double getWeight(){ 28 return this.weight; 29 } // ... 30 } Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 3 / 52
  4. 4. Introduction Motivation Study Conclusion Example Streaming API Example in Java >= 8 Consider the following Widget client code: // an "unordered" collection of widgets. Collection<Widget> unorderedWidgets = new HashSet<>(); // populate the collection ... Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 4 / 52
  5. 5. Introduction Motivation Study Conclusion Example Streaming API Example in Java >= 8 Now suppose we would like to sort the collection by weight using the Java 8 Streaming API: // sort widgets by weight. List<Widget> sortedWidgets = unorderedWidgets .stream() .sorted(Comparator.comparing(Widget::getWeight)) .collect(Collectors.toList()); Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 5 / 52
  6. 6. Introduction Motivation Study Conclusion Example Streaming API Example in Java >= 8 Now suppose we would like to sort the collection by weight using the Java 8 Streaming API in parallel: // sort widgets by weight. List<Widget> sortedWidgets = unorderedWidgets .parallelStream() .sorted(Comparator.comparing(Widget::getWeight)) .collect(Collectors.toList()); Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 6 / 52
  7. 7. Introduction Motivation Study Conclusion Example Streaming API Example in Java >= 8 Without using the Streaming API, running this code in parallel would have required the use of explicit threads. The parallelizable operation (e.g., sorted()) would need to be isolated and placed into a thread object, forked, and then joined. Example new Thread( /* your code here */ ).run(); // ... Thread.join() Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 7 / 52
  8. 8. Introduction Motivation Study Conclusion More Examples Desired Outcomes Problem Big data processing traditionally runs in highly-distributed environments with no shared memory. Streaming APIs typically execute on a single node under multiple threads or cores in a shared memory space. Collections reside in local memory. Issues may arise from close ties between shared memory and the operations. Thread contention. Non-execution. Non-determinism. Operation sequencing. Data ordering. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 8 / 52
  9. 9. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 List<Widget> sortedWidgets = 2 unorderedWidgets 3 .stream() 4 .sorted(Comparator.comparing(Widget::getWeight)) 5 .collect(Collectors.toList()); Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 9 / 52
  10. 10. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 List<Widget> sortedWidgets = 2 unorderedWidgets 3 .stream() 4 .sorted(Comparator.comparing(Widget::getWeight)) 5 .collect(Collectors.toList()); The operations do not access shared memory, i.e., no side-effects. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 9 / 52
  11. 11. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 List<Widget> sortedWidgets = 2 unorderedWidgets 3 .stream() 4 .sorted(Comparator.comparing(Widget::getWeight)) 5 .collect(Collectors.toList()); The operations do not access shared memory, i.e., no side-effects. Had the stream been ordered, however, running in parallel may result in worse performance due to sorted() requiring multiple passes and data buffering. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 9 / 52
  12. 12. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 List<Widget> sortedWidgets = 2 unorderedWidgets 3 .stream() 4 .sorted(Comparator.comparing(Widget::getWeight)) 5 .collect(Collectors.toList()); The operations do not access shared memory, i.e., no side-effects. Had the stream been ordered, however, running in parallel may result in worse performance due to sorted() requiring multiple passes and data buffering. Such operations are called stateful intermediate operations (SIOs). Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 9 / 52
  13. 13. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 List<Widget> sortedWidgets = 2 unorderedWidgets 3 .stream() 4 .sorted(Comparator.comparing(Widget::getWeight)) 5 .collect(Collectors.toList()); The operations do not access shared memory, i.e., no side-effects. Had the stream been ordered, however, running in parallel may result in worse performance due to sorted() requiring multiple passes and data buffering. Such operations are called stateful intermediate operations (SIOs). Maintaining data ordering is detrimental to parallel performance. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 9 / 52
  14. 14. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 // collect weights over 43.2 into a set in parallel. 2 Set<Double> heavyWidgetWeightSet = 3 orderedWidgets 4 .parallelStream() 5 .map(Widget::getWeight) 6 .filter(w -> w > 43.2) 7 .collect(Collectors.toSet()); Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 10 / 52
  15. 15. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 // collect weights over 43.2 into a set in parallel. 2 Set<Double> heavyWidgetWeightSet = 3 orderedWidgets 4 .parallelStream() 5 .map(Widget::getWeight) 6 .filter(w -> w > 43.2) 7 .collect(Collectors.toSet()); There is no SIO. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 10 / 52
  16. 16. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 // collect weights over 43.2 into a set in parallel. 2 Set<Double> heavyWidgetWeightSet = 3 orderedWidgets 4 .parallelStream() 5 .map(Widget::getWeight) 6 .filter(w -> w > 43.2) 7 .collect(Collectors.toSet()); There is no SIO. No performance degradation. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 10 / 52
  17. 17. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 // sequentially collect into a list, skipping first 2 // 1,000. 3 List<Widget> 4 skippedWidgetList = 5 orderedWidgets 6 .stream() 7 .skip(1000) 8 .collect(Collectors.toList()); Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 11 / 52
  18. 18. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 // sequentially collect into a list, skipping first 2 // 1,000. 3 List<Widget> 4 skippedWidgetList = 5 orderedWidgets 6 .stream() 7 .skip(1000) 8 .collect(Collectors.toList()); Like sorted(), skip() is also an SIO. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 11 / 52
  19. 19. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 // sequentially collect into a list, skipping first 2 // 1,000. 3 List<Widget> 4 skippedWidgetList = 5 orderedWidgets 6 .stream() 7 .skip(1000) 8 .collect(Collectors.toList()); Like sorted(), skip() is also an SIO. But, the stream is ordered, making parallelism counterproductive. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 11 / 52
  20. 20. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 // collect the first green widgets into a list. 2 List<Widget> firstGreenList 3 = orderedWidgets 4 .stream() 5 .filter(w -> w.getColor() == Color.GREEN) 6 .unordered() 7 .limit(5) 8 .collect(Collectors.toList()); Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 12 / 52
  21. 21. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 // collect the first green widgets into a list. 2 List<Widget> firstGreenList 3 = orderedWidgets 4 .stream() 5 .filter(w -> w.getColor() == Color.GREEN) 6 .unordered() 7 .limit(5) 8 .collect(Collectors.toList()); limit() is an SIO and the stream is ordered. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 12 / 52
  22. 22. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 // collect the first green widgets into a list. 2 List<Widget> firstGreenList 3 = orderedWidgets 4 .stream() 5 .filter(w -> w.getColor() == Color.GREEN) 6 .unordered() 7 .limit(5) 8 .collect(Collectors.toList()); limit() is an SIO and the stream is ordered. But, the stream is unordered before limit(). Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 12 / 52
  23. 23. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 // collect the first green widgets into a list. 2 List<Widget> firstGreenList 3 = orderedWidgets 4 .stream() 5 .filter(w -> w.getColor() == Color.GREEN) 6 .unordered() 7 .limit(5) 8 .collect(Collectors.toList()); limit() is an SIO and the stream is ordered. But, the stream is unordered before limit(). Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 12 / 52
  24. 24. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 // collect the first green widgets into a list. 2 List<Widget> firstGreenList 3 = orderedWidgets 4 .stream() 5 .filter(w -> w.getColor() == Color.GREEN) 6 .unordered() 7 .limit(5) 8 .collect(Collectors.toList()); limit() is an SIO and the stream is ordered. But, the stream is unordered before limit(). A stream’s ordering does not only depend on its source. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 12 / 52
  25. 25. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 // collect distinct widget weights into a TreeSet. 2 Set<Double> distinctWeightSet = 3 orderedWidgets 4 .stream() 5 .parallel() 6 .map(Widget::getWeight) 7 .distinct() 8 .collect(Collectors.toCollection(TreeSet::new)); Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 13 / 52
  26. 26. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 // collect distinct widget weights into a TreeSet. 2 Set<Double> distinctWeightSet = 3 orderedWidgets 4 .stream() 5 .parallel() 6 .map(Widget::getWeight) 7 .distinct() 8 .collect(Collectors.toCollection(TreeSet::new)); Computation is in parallel (line 5) Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 13 / 52
  27. 27. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 // collect distinct widget weights into a TreeSet. 2 Set<Double> distinctWeightSet = 3 orderedWidgets 4 .stream() 5 .parallel() 6 .map(Widget::getWeight) 7 .distinct() 8 .collect(Collectors.toCollection(TreeSet::new)); Computation is in parallel (line 5) Doesn’t start that way (line 4). Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 13 / 52
  28. 28. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 // collect distinct widget weights into a TreeSet. 2 Set<Double> distinctWeightSet = 3 orderedWidgets 4 .stream() 5 .parallel() 6 .map(Widget::getWeight) 7 .distinct() 8 .collect(Collectors.toCollection(TreeSet::new)); Computation is in parallel (line 5) Doesn’t start that way (line 4). distinct() is an SIO. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 13 / 52
  29. 29. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 // collect distinct widget weights into a TreeSet. 2 Set<Double> distinctWeightSet = 3 orderedWidgets 4 .stream() 5 .parallel() 6 .map(Widget::getWeight) 7 .distinct() 8 .collect(Collectors.toCollection(TreeSet::new)); Computation is in parallel (line 5) Doesn’t start that way (line 4). distinct() is an SIO. The stream is ordered (line 3). Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 13 / 52
  30. 30. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 // collect distinct widget colors into a HashSet. 2 Set<Color> distinctColorSet = 3 orderedWidgets 4 .parallelStream() 5 .map(Widget::getColor) 6 .distinct() 7 .collect(HashSet::new, Set::add, Set::addAll); Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 14 / 52
  31. 31. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 // collect distinct widget colors into a HashSet. 2 Set<Color> distinctColorSet = 3 orderedWidgets 4 .parallelStream() 5 .map(Widget::getColor) 6 .distinct() 7 .collect(HashSet::new, Set::add, Set::addAll); Computation is in parallel (line 4). Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 14 / 52
  32. 32. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 // collect distinct widget colors into a HashSet. 2 Set<Color> distinctColorSet = 3 orderedWidgets 4 .parallelStream() 5 .map(Widget::getColor) 6 .distinct() 7 .collect(HashSet::new, Set::add, Set::addAll); Computation is in parallel (line 4). Direct form of collect() (line 7). Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 14 / 52
  33. 33. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 // collect distinct widget colors into a HashSet. 2 Set<Color> distinctColorSet = 3 orderedWidgets 4 .parallelStream() 5 .map(Widget::getColor) 6 .distinct() 7 .collect(HashSet::new, Set::add, Set::addAll); Computation is in parallel (line 4). Direct form of collect() (line 7). Reduction is to an unordered collection. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 14 / 52
  34. 34. Introduction Motivation Study Conclusion More Examples Desired Outcomes Stream Example 1 // collect distinct widget colors into a HashSet. 2 Set<Color> distinctColorSet = 3 orderedWidgets 4 .parallelStream() 5 .map(Widget::getColor) 6 .distinct() 7 .collect(HashSet::new, Set::add, Set::addAll); Computation is in parallel (line 4). Direct form of collect() (line 7). Reduction is to an unordered collection. Ordering unnecessarily (and detrimentally) maintained. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 14 / 52
  35. 35. Introduction Motivation Study Conclusion More Examples Desired Outcomes Goals End Goal Explore the use and misuse of a popular streaming API, namely, Java 8 Streams. How do developers use streams? How do developers decide whether to run operations sequentially or in parallel? What are the common operations on streams? What are the common attributes of streams; are they amenable to parallelization? Which bugs specific to this new hybrid paradigm do developers tackle? How often do developers choose incorrect stream APIs? How often do developers correctly use stream APIs? Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 15 / 52
  36. 36. Introduction Motivation Study Conclusion More Examples Desired Outcomes Benefits Advance our understanding of this emerging hybrid paradigm. Provide feedback to language and API designers for future API versions. Help tool designers understand struggles developers have with streaming APIs. Identify best-practices and anti-patterns for practitioners to use streaming APIs effectively. Assist educators in teaching students how to use streaming APIs. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 16 / 52
  37. 37. Introduction Motivation Study Conclusion More Examples Desired Outcomes Evidence of Streaming API Misuse Used our automated tool [Khatchadourian, Tang, et al., 2018] to refactor streams for improved parallelism. Transform sequential streams to parallel. Transform parallel stream to sequential. Unorder parallel streams. Conservative analysis (only ∼36% refactorable). Resulting speedup of 3.49 [Khatchadourian, Tang, et al., 2019]. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 17 / 52
  38. 38. Introduction Motivation Study Conclusion More Examples Desired Outcomes What Makes Streaming APIs Different? What makes streaming APIs different than other APIs or language features? More than an API. Mixed paradigm [Bloch, 2018]. Prone to concurrency bugs as succinct parallel execution is one of their main benefits. Make heavy use of λ-expressions [Mazinanian et al., 2017]. Mainstream Object-Oriented developers may not be familiar with functional programming. Pervasive. Used for many different kinds of tasks (not just, e.g., security, like OpenSSL). Found 14,536 calls to methods operating on streams across 34 projects. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 18 / 52
  39. 39. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Subject Selection Selected up to 34 open source Java projects that use streams. Various sizes, domains, and popularity from GitHub. Number of subjects depends on the experiment. Larger subjects are harder to analyze depending on the analysis. The subjects have been used in our other studies [Khatchadourian and Masuhara, 2017, 2018; Khatchadourian, Tang, et al., 2019]. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 19 / 52
  40. 40. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs DFA for Determining Stream Execution Mode ⊥ start seq para Col.stream(), BufferedReader.lines(), Files.lines(Path), JarFile.stream(), Pattern.splitAsStream(), Random.ints() Col.parallelStream() BaseStream.sequential() BaseStream.parallel() BaseStream.sequential() BaseStream.parallel() Figure: A subset of the relation E→ in E = (ES , EΛ, E→). Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 20 / 52
  41. 41. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs DFA for Determining Stream Ordering ⊥ start ord unord Arrays.stream(T[]), Stream.of(T...), IntStream.range(), Stream.iterate(), BitSet.stream(), Col.parallelStream() Stream.generate(), HashSet.stream(), PriorityQueue.stream(), CopyOnWrite.parallelStream(), BeanContextSupport.stream(), Random.ints() Stream.sorted() BaseStream.unordered(), Stream.concat(unordered), Stream.concat(ordered) Stream.sorted(), Stream.concat(ordered) BaseStream.unordered(), Stream.concat(unordered) Figure: A subset of the relation O→ in O = (OS , OΛ, O→). Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 21 / 52
  42. 42. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Static Analyses Typestate analysis [Fink et al., 2008; Strom and Yemini, 1986] (SAFE) used to infer stream execution modes and ordering. Augments the type system with “state.” Traditionally used for preventing resource usage errors. Requires interprocedural and alias analyses. Novel adaptation for possibly immutable objects (streams). ModRef (WALA) analysis used to identify possible side-effects. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 22 / 52
  43. 43. Results Table: Stream characteristics. subject KLOC age eps k str seq para ord unord se SIO bootique 4.91 4.18 362 4 14 14 0 11 3 4 0 cryptomator 7.99 6.05 148 3 12 12 0 11 1 2 0 dari 64.86 5.43 3 2 18 18 0 15 3 0 0 elasticsearch 585.71 10.03 78 6 210 210 0 165 45 10 0 htm.java 41.14 4.53 21 4 190 188 2 189 1 22 5 JabRef 138.83 16.36 3,064 2 301 290 11 239 62 9 0 JacpFX 23.79 4.71 195 4 12 12 0 9 3 1 0 jdp* 19.96 5.53 25 4 38 38 0 35 3 11 1 jdk8-exp* 3.43 6.35 34 4 49 49 0 47 2 5 0 jetty 354.48 10.93 106 4 57 57 0 47 10 8 0 JetUML 20.95 5.09 660 2 7 7 0 4 3 0 0 jOOQ 154.01 8.58 43 4 23 23 0 22 1 2 0 koral 7.13 3.47 51 3 8 8 0 8 0 0 0 monads 1.01 0.01 47 2 3 3 0 3 0 0 0 retrolambda 5.14 6.52 1 4 11 11 0 8 3 0 0 spring* 188.46 11.62 5,981 4 61 61 0 60 1 21 0 streamql 4.01 0.01 92 2 22 22 0 22 0 2 18 threeten* 27.53 7.01 36 2 2 2 0 2 0 0 0 Total 1,653.35 116.40 11,047 6 1,038 1,025 13 897 141 97 24 * jdp is java-design-patterns, jdk8-exp is jdk8-experiments, spring is a portion of spring-framework, and threeten is threeten-extra.
  44. 44. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Discussion Not many subjects using SIOs extensively except for streamql. Many streams ordered, possibly hindering parallelism. ∼10% with side-effects. Mostly sequential. May coincide with Lu et al. [2008] findings that many developers think “sequentially.” Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 24 / 52
  45. 45. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Configuration Examined 34 projects that use streams. Varied in their domain and application, as well as size and popularity. Sources publicly available on GitHub. Include popular libraries, frameworks, and applications that contain at least some Java code. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 25 / 52
  46. 46. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Finding Calls Parsed Abstract Syntax Trees (ASTs) with source-symbol bindings using the Eclipse Java Developer Tools (JDT; http://eclipse.org/jdt). Method invocation whose compile-time targets are declared in types residing in the java.util.stream package were counted. Includes types such as Streams and Collectors. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 26 / 52
  47. 47. Figure: Stream method calls.
  48. 48. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Discussion I Four most used stream methods are toList(), collect(), map(), and filter(). collect() is a specialized reduction. toList() is a static method of the Collector class, which are premade reductions. That collect() and toList(), along with other terminal operations such as forEach(), iterator(), toSet(), and toArray(), appear towards the top to the list suggests that, although developers are writing functional-style code to process data in a “big data” processing style, they are not staying there. “Bridge” back to imperative-style code by processing data further iteratively, as the case with forEach() and iterator(). Developers tend to favor more simplistic rather than specialized (advanced) reductions. More of the advanced reductions, such as those that return maps (e.g., toMap(), groupingBy()) are not used more frequently. Highly expressive operations that can save substantial amounts of imperative-style code. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 28 / 52
  49. 49. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Discussion II Not many non-deterministic calls. findAny() has only 62 calls, while its deterministic counterpart, findFirst(), has 270. Minimal amount of calls to parallel stream APIs. Only 4 calls to groupingByConcurrent() as opposed to the 87 calls to groupingBy(). Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 29 / 52
  50. 50. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Evolution Study Methodology Could have gathered characteristics over multiple versions (too difficult). Instead, feed the following API keywords into gitcproc [Casalnuovo et al., 2017]. Tool for processing and classifying git commits. Used in previous work [Gharbi et al., 2019]. Evolution Keywords parallel() parallelStream() sequential() Designed to catch execution mode changes based on DFA (Fig. 1). Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 30 / 52
  51. 51. Table: Stream execution mode evolution. subject vers commits kws s→p p→s advantageous/qbit 533b367 1,717 0 0 0 bootique/bootique 457bcd8 1,106 0 0 0 criscris/koral 8a8cbf3 109 14 2 0 cryptomator/cryptomator 59cd3de 1,443 3 0 0 deephacks/streamql 46b5835 27 0 0 0 eclipse/che 39bac9e 8,093 2 1 0 eclipse/eclipse.jdt.core 3635add 24,104 0 0 0 eclipse/eclipse.jdt.ui a1061a6 28,154 0 0 0 eclipse/jetty.project 1a0f08b 17,130 8 0 0 edalorzo/jdk8-experiments 3c7d717 8 0 0 0 google/binnavi a4616f9 286 0 0 0 google/error-prone e43a577 3,897 0 0 0 google/guava 794a10a 5,034 9 0 0 iluwatar/java-design-patterns 11c0550 2,196 0 0 0 JacpFX/JacpFX 14c2a4c 365 75 0 14 jenkinsci/blueocean-plugin fb1580e 4,044 1 0 0 jOOQ/jOOQ 66a4566 7,554 15 0 0 numenta/htm.java c874f02 1,507 0 0 0 orfjackal/retrolambda 3f1cdde 522 0 0 0 perfectsense/dari 9f28f38 2,466 0 0 0 RutledgePaulV/monads a7fc04b 16 0 0 0 SeleniumHQ/selenium 72f9d42 24,157 4 0 0 ThreeTen/threeten-extra 5aff57d 559 0 0 0 wala/WALA 15aa46b 6,279 1 0 0 Total 140,773 132 3 14
  52. 52. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Discussion I JacpFX had fourteen change sets that made parallel streams sequential. A common commit message corresponding to these change sets was, “fixed stream handling,” indicating that it was somehow erroneous to run these streams in parallel. Two streams that were altered from sequential to parallel in koral. Commit messages include “table parallel read” and “csv parallel read.” Provided a parameter/overloaded method to optionally run the stream in parallel. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 32 / 52
  53. 53. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Discussion II Change set from JacpFX, where a stream was de-parallelized: commit e5cb21ab2776cbcad8c6f3c7ba5d85598a49c26f Author: amoAHCP <amo.ahcp@gmail.com> Date: Tue Jul 15 22:58:56 2014 +0200 fixed stream handling diff --git a/ClassRegistry.java b/ClassRegistry.java index 4916391..1d3963c 100644 --- a/ClassRegistry.java +++ b/ClassRegistry.java @@ -46,7 +46,7 @@ public class ClassRegistry { public static Class getComponentClassById(final String id) { - final List<Class> result = allClasses.parallelStream() + final List<Class> result = allClasses.stream() .filter(ClassRegistry::checkForAnntotation) .filter(component -> checkIdMatch(component,id)) .collect(Collectors.toList()); Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 33 / 52
  54. 54. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Discussion III Change set from koral, where a stream was parallelized: commit da943ed1ab11d04bc89caa3ccc0e143657a3ff4b Author: criscris <christianschulz.net@gmail.com> Date: Fri Jun 23 19:37:54 2017 +0200 csv parallel read diff --git a/IO.java b/IO.java index 2b73c61..4ca9ee8 100644 --- a/IO.java +++ b/IO.java @@ -252,11 +252,21 @@ public interface IO static Stream<List<String>> readCSV(Reader reader_) { return readLines(reader_).map(line -> splitCSVLine(line)); } + static Stream<List<String>> readCSVParallel(Reader reader_) { + return readLines(reader_).parallel().map(line -> + splitCSVLine(line)); + } Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 34 / 52
  55. 55. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Bug Study Methodology Explored 22 projects that use streams. Comprised of: ∼4.68 million lines of source code. 140,446 git commits. 18 years of project change history. Again, used gitcproc. Compiled 140 keywords from the API documentation. Looked at commits after the Java 8 release date of March 18, 2014. gitcproc uses heuristics based on commit messages to identify commits that are bug fixes. Many commits reference bug reports or provide more details about the fix. We used these to understand the fixes more fully. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 35 / 52
  56. 56. Table: Studied subjects. subject KLOC studied periods cmts kws exe binnavi 328.28 2015-08-19 to 2019-07-17 286 4 4 blueocean-plugin 49.70 2016-01-23 to 2019-07-24 4,043 118 25 bootique 15.47 2015-12-10 to 2019-08-08 1,106 5 5 che 189.24 2016-02-11 to 2019-08-19 8,093 75 75 cryptomator 9.83 2014-02-01 to 2019-08-08 1,443 50 10 dari 72.46 2012-09-26 to 2018-03-02 2,466 18 6 eclipse.jdt.core 1,527.89 2001-06-05 to 2019-08-07 24,085 234 106 eclipse.jdt.ui 712.91 2001-05-02 to 2019-08-09 28,136 149 32 error-prone 165.85 2011-09-14 to 2019-08-15 3,893 71 71 guava 393.47 2009-06-18 to 2019-08-15 5,031 36 36 htm.java 41.63 2014-08-09 to 2019-02-19 1,507 40 1 JacpFX 24.06 2013-08-12 to 2018-04-27 365 37 14 jdk8-experiments 3.47 2013-08-03 to 2018-03-10 8 1 1 java-design-patterns 33.52 2014-08-09 to 2019-07-31 2,192 37 12 jetty 400.26 2009-03-16 to 2019-08-02 17,051 835 219 jOOQ 184.25 2011-07-24 to 2019-07-31 7,508 94 4 qbit 52.27 2014-08-25 to 2018-01-18 1,717 65 9 retrolambda 5.10 2013-07-20 to 2018-11-30 522 17 4 selenium 234.12 2004-11-03 to 2019-08-09 24,145 114 57 streamql 4.26 2014-04-27 to 2014-04-29 27 2 2 threeten-extra 31.26 2012-11-17 to 2019-07-14 559 28 2 WALA 203.84 2006-11-22 to 2019-07-24 6,263 52 24 Total 4,683.12 140,446 2,082 719
  57. 57. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Quantitative Analysis Manually examined 719 commits. Found 61 stream client code bug fixes. Devised a set of bug categories. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 37 / 52
  58. 58. Table: Stream bug/patch category legend. name description acronym Bounds Incorrect/Missing Bounds Check BC Exceptions Incorrect/Missing Exception Handling EH Other Other change (e.g., syntax, refactoring) Other Perf Poor Performance PP Concur Concurrency Issue CI Stream Source Incorrect/Missing Stream Source SS Intermediate Operations Incorrect/Missing Intermediate Operations IO Data Ordering Incorrect Data Ordering DO Operation Sequencing Incorrect Operation Sequencing OS Filter Operations Incorrect/Missing Filter Operations FO Map Operations Incorrect/Missing Map Operations MO Terminal Operations Incorrect/Missing Terminal Operations TO Reduction Operations Incorrect Reduction Operations RO Collector Operations Incorrect/Missing Collector Operations CO Incorrect Action Incorrect Action (e.g., λ-expression) IA
  59. 59. Figure: Studied stream bugs and patches (hierarchical).
  60. 60. Table: Studied stream bugs and patches (nonhierarchical). subject BC CI CO DO EH FO IA IO MO OS PP RO SS Other Total binnavi 1 1 blueocean-plugin 1 1 bootique 1 1 che 1 1 1 1 4 cryptomator 1 2 1 2 6 dari 2 2 eclipse.jdt.core 1 1 eclipse.jdt.ui 1 1 error-prone 2 1 1 3 1 1 2 1 12 guava 1 1 JacpFX 1 1 2 4 jdp 1 1 jetty 1 2 1 3 7 jOOQ 1 1 selenium 2 1 2 5 1 2 2 1 1 17 threeten-extra 1 1 Total 2 1 3 4 7 7 2 2 6 2 9 3 3 10 61
  61. 61. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Findings Summary Bugs, e.g., performance, crosscut concerns, affecting multiple categories, both specifically and tangentially, associated with streams. Although streams feature performance improving parallelism, developers tend to struggle with using streams efficiently. Concurrency issues were the least common streams bugs. However, concurrent variable access can cause thread contention. Motivates future refactoring approaches that may promote more parallel streams. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 41 / 52
  62. 62. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Discussion I Poor performance (PP) dominated non-other bug categories. Account for 14.75% (9/61) of the bugs found. While some fixes were more of a cleaning (e.g., superfluous operations), others affected central parts of the system and were found during performance regression testing [Wilkins, 2019]. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 42 / 52
  63. 63. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Discussion II In commit 4f0d62d of JacpFX, as part of a clean up, a pair of superfluous operations are removed: index fd12564..39aca60 100644 --- a/JACP.JavaFX/src/main/java/org/jacpfx/rcp/util/WorkbenchUtil.java +++ b/JACP.JavaFX/src/main/java/org/jacpfx/rcp/util/WorkbenchUtil.java @@ -58,8 +58,7 @@ public class WorkbenchUtil { final Stream<String> componentIds = CommonUtil .getStringStreamFromArray(annotation.perspectives()); final Stream<Injectable> perspectiveHandlerList = - componentIds.parallel().sequential().map(this::mapToInjectable); + componentIds.map(this::mapToInjectable); Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 43 / 52
  64. 64. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Discussion III We also found cases where developers “bridged” back to imperative-style programming from streams, performed an operation, then switched back to streams to continue more functional-style programming. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 44 / 52
  65. 65. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Discussion IV Commit 91e9e7b (abbreviated) of jetty fixes this issue as follows: 1 index 3fee880205..91f8a15f46 100644 2 --- a/Modules.java 3 +++ b/Modules.java 4 @@ -57,38 +61,71 @@ public class Modules 5 - List<String> ordered = _modules.stream() 6 - .map(m->{return m.getName();}).collect(Collectors.toList()); 7 - Collections.sort(ordered); 8 - ordered.stream().map(n->{return get(n);}).forEach(module-> 9 + _modules.stream().filter(m->...).sorted().forEach(module-> Each module is mapped to its name and collected into a list. The list is then sorted via a Collections API. Another stream is then derived from ordered. On line 9, the bridge to a collection and subsequent sort operation is removed. Computation remains within the streaming API confines by using sorted(). The new form is more amenable to efficient and effective parallelization. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 45 / 52
  66. 66. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Best Practices Best Practice Ensure concatenated streams have distinct elements. Call distinct() on a concatenated stream to ensure that no duplicates are created as a result of the concatenation. The following snippet is from selenium commit eb7d9bf: - return Stream.concat(fromOss, fromW3c); + return Stream.concat(fromOss, fromW3c).distinct(); Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 46 / 52
  67. 67. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Anti-Patterns I Anti-Pattern Using too many operations within a single map. Using a long λ-expression in a single map() operation may make stream client code difficult to read and maintain. It also makes the code less “functional,” as data transformation may not be easily tracked nor modular. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 47 / 52
  68. 68. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Anti-Patterns II Consider the following abbreviated snippet from commit b691e37 of cryptomator that returns the occupied drive letters on Windows systems by collecting the first uppercased character of the path: diff --git a/WindowsDriveLetters.java b/WindowsDriveLetters.java index ef4447ab..03b01e44 100644 --- a/WindowsDriveLetters.java +++ b/WindowsDriveLetters.java @@ -24,8 +27,11 @@ public final class WindowsDriveLetters { - return rootDirs.stream().map(path -> path.toString().toUpperCase() - .charAt(0)).collect(toSet()); + return rootDirs.stream().map(Path::toString).map(CharUtils::toChar) + .map(Character::toUpperCase).collect(toSet()); The λ-expression has been replaced with method references. CharUtils.toChar() returns the first character of a String Small performance improvement as the entire string is no longer turned to uppercase but rather only the first character. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 48 / 52
  69. 69. Introduction Motivation Study Conclusion Config Characteristics Usage Evolution Bugs Anti-Patterns III New version is more “functional.” Replaced single λ-expression passed to map() with multiple map() operations. It is easier to see how the data is transformed in the pipeline. Future data transformations can be easily integrated by simply adding additional operations rather than modifying a single yet more complex λ-expression. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 49 / 52
  70. 70. Introduction Motivation Study Conclusion Conclusion Studied various facets of stream use and misuse: 1 Method calls on streams and collectors. 2 Stream characteristics (execution mode, ordering). 3 Execution mode evolution (sequential to parallel and vice-versa). 4 Bug fixes and other stream changes. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 50 / 52
  71. 71. Introduction Motivation Study Conclusion Conclusion Studied various facets of stream use and misuse: 1 Method calls on streams and collectors. 2 Stream characteristics (execution mode, ordering). 3 Execution mode evolution (sequential to parallel and vice-versa). 4 Bug fixes and other stream changes. Future Work Explore stream creation. Use findings to devise automated error checkers and IDE completions. Mine topics that interest stream developers. Investigate applicability to other streaming frameworks and languages. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 50 / 52
  72. 72. Introduction Motivation Study Conclusion For Further Reading I Biboudis, Aggelos et al. (2015). “Streams a la carte: Extensible Pipelines with Object Algebras”. In: European Conference on Object-Oriented Programming, pp. 591–613. doi: 10.4230/LIPIcs.ECOOP.2015.591. Bloch, Joshua (Jan. 2018). Effective Java. 3rd ed. Upper Saddle River, NJ, USA: Prentice Hall. isbn: 978-0134685991. Casalnuovo, Casey et al. (2017). “GitcProc: A Tool for Processing and Classifying GitHub Commits”. In: International Symposium on Software Testing and Analysis. ISSTA 2017. Santa Barbara, CA, USA: ACM, pp. 396–399. isbn: 978-1-4503-5076-1. doi: 10.1145/3092703.3098230. Fink, Stephen J. et al. (May 2008). “Effective Typestate Verification in the Presence of Aliasing”. In: ACM Transactions on Software Engineering and Methodology 17.2, pp. 91–934. doi: 10.1145/1348250.1348255. Gharbi, Sirine et al. (2019). “On the Classification of Software Change Messages Using Multi-label Active Learning”. In: Symposium on Applied Computing. SAC ’19. ACM/SIGAPP. Limassol, Cyprus: ACM, pp. 1760–1767. isbn: 978-1-4503-5933-7. doi: 10.1145/3297280.3297452. Khatchadourian, Raffi and Hidehiko Masuhara (May 2017). “Automated Refactoring of Legacy Java Software to Default Methods”. In: International Conference on Software Engineering, pp. 82–93. doi: 10.1109/ICSE.2017.16. Khatchadourian, Raffi and Hidehiko Masuhara (2018). “Proactive Empirical Assessment of New Language Feature Adoption via Automated Refactoring: The Case of Java 8 Default Methods”. In: International Conference on the Art, Science, and Engineering of Programming, 6:1–6:30. doi: 10.22152/programming-journal.org/2018/2/6. Khatchadourian, Raffi, Yiming Tang, et al. (Sept. 2018). “A Tool for Optimizing Java 8 Stream Software via Automated Refactoring”. In: International Working Conference on Source Code Analysis and Manipulation. IEEE, pp. 34–39. doi: 10.1109/SCAM.2018.00011. Khatchadourian, Raffi, Yiming Tang, et al. (2019). “Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams”. In: International Conference on Software Engineering. ICSE ’19. IEEE Press, pp. 619–630. doi: 10.1109/ICSE.2019.00072. Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 51 / 52
  73. 73. Introduction Motivation Study Conclusion For Further Reading II Lu, Shan et al. (2008). “Learning from Mistakes: A Comprehensive Study on Real World Concurrency Bug Characteristics”. In: International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, pp. 329–339. doi: 10.1145/1346281.1346323. Mazinanian, Davood et al. (Oct. 2017). “Understanding the Use of Lambda Expressions in Java”. In: Proc. ACM Program. Lang. 1.OOPSLA, 85:1–85:31. issn: 2475-1421. doi: 10.1145/3133909. Strom, Robert E and Shaula Yemini (Jan. 1986). “Typestate: A programming language concept for enhancing software reliability”. In: IEEE Transactions on Software Engineering SE-12.1, pp. 157–171. doi: 10.1109/tse.1986.6312929. Wilkins, Greg (May 21, 2019). Jetty 9.4.x 3681 http fields optimize by gregw • Pull Request #3682 • eclipse/jetty.project. In reference to commit 70311fe. Eclipse Foundation. url: http://github.com/eclipse/jetty.project/pull/3682 (visited on 09/18/2019). Khatchadourian, Tang, Bagherzadeh, Ray An Empirical Study on Java 8 Streams 52 / 52
  • ssuserc51a6f

    Oct. 13, 2020

Streaming APIs allow for big data processing of native data structures by providing MapReduce-like operations over these structures. However, unlike traditional big data systems, these data structures typically reside in shared memory accessed by multiple cores. Although popular, this emerging hybrid paradigm opens the door to possibly detrimental behavior, such as thread contention and bugs related to non-execution and non-determinism. This study explores the use and misuse of a popular streaming API, namely, Java 8 Streams. The focus is on how developers decide whether or not to run these operations sequentially or in parallel and bugs both specific and tangential to this paradigm. Our study involved analyzing 34 Java projects and 5.53 million lines of code, along with 719 manually examined code patches. Various automated, including interprocedural static analysis, and manual methodologies were employed. The results indicate that streams are pervasive, stream parallelization is not widely used, and performance is a crosscutting concern that accounted for the majority of fixes. We also present coincidences that both confirm and contradict the results of related studies. The study advances our understanding of streams, as well as benefits practitioners, programming language and API designers, tool developers, and educators alike.

Views

Total views

296

On Slideshare

0

From embeds

0

Number of embeds

66

Actions

Downloads

5

Shares

0

Comments

0

Likes

1

×