Towards Continuous Performance Assessment of Java Applications With PerfBot

•Download as PPTX, PDF•

0 likes•39 views

Bots for continuous performance assessment are gaining use as a productivity tool. We discuss how and why open source projects use them and present an in-depth case study of the Nanosoldier bot used by the team behind the Julia programming language. Based on analysing the history of bot usage and interviews with developers we identify lack of a shared platform for performance measurement as an obstacle to wider adoption of performance measurement bots. To address this, we propose a prototype implementation of such a platform called PerfBot. Joint work with Florian Markusse and Philipp Leitner, presented at 5th International Workshop on Bots in Software Engineering, collocated with ICSE 2023, Melbourne Australia.

Software

USING BENCHMARKING
BOTS FOR CONTINUOUS
PERFORMANCE
ASSESSMENT
FLORIAN MARKUSSE
PHILIPP LEITNER
ALEXANDER SEREBRENIK
https://nl.freepik.com/vrije-photo/3d-geef-van-een-robot-met-een-traditionele-chroom-stopwatch_958078.htm Author
kjpargeter

MANY PROJECTS CARE
ABOUT PERFORMANCE
JUST AS MUCH AS ABOUT
CORRECTNESS

MANY PROJECTS CARE
ABOUT PERFORMANCE
JUST AS MUCH AS ABOUT
CORRECTNESS
EXAMPLES:
DOWNSTREAM
LIBRARIES
WEB APPLICATIONS
DATABASES
…..

RQ2: How does adoption
of benchmarking bots
affect the Open Source
Software project?
RQ1: How often are
benchmarking bots
adopted by the Open
Source Software
projects?

RQ1: How often are benchmarking
bots adopted by the Open Source
Software projects?
“performance regression”
“benchmark report”
“performance result”
GitHub Search
Projects w ≥ 10 pull requests

1004
80 44 34 31 29 26 25 21 21
1
10
100
1000
10000
0 1 2 3 4 5 6 7 8 9
Number
of
developers
(log
scale)
Number of interactions with The Nanosoldier

In the PRs with the bot Julia Rust Swift Fluent UI
#participants higher large medium small medium
#comments higher large medium small medium
comments longer large medium medium small
#reviews higher small small negligible n.s.
#commits higher small negligible small negligible

The Spruce / Steven Merkel https://www.thespruce.com/collect-clean-and-store-chicken-eggs-3016828

Grist / NAKphotos / Rolfo Brenner / EyeEm / Getty Images https://grist.org/climate/science-dishes-out-an-answer-on-the-old-handwashing-vs-dishwasher-debate/

Similar to Towards Continuous Performance Assessment of Java Applications With PerfBot

Metrics-Driven Devops: Delivering High Quality Software Faster! Dynatrace

Kubecon SIG Apps December 2017 UpdateMatthew Farina

15 Best React Native Developer Tools in 2024.pdfinfowindtech570

The Power of Feedback LoopsAgileCymru

Accelerate development with Visual Studio and Power Platform.pdfJuan Fabian

Developing Highly Instrumented Applications with Minimal EffortTim Hobson

XRebel - Real Time Insight, Faster AppsZeroTurnaround

Monitoring 改造計畫：流程觀點William Yeh

My life as a cyborg Alexander Serebrenik

The Evolution of Application Release AutomationJules Pierre-Louis

API Product Opportunity Responsibility Nicolas Sierro 2015.pptxBlockchainizator

Spring WebfluxCarlos E. Salazar

"How Can Web Devs Reach the Mobile Market?" by Dimitris Michalakos, Web Techn...Eurapp

Automation Testing Tool.pdfiDataScientists

Lecture_1_Introduction (Web Engineering).pdfssuserb933d8

Ibm_interconnect_restapi_workshopShubhra Kar

Creating an MVP with OracleFranco Ucci

Lecture_1_Introduction (Web Engineering).pptssuserb933d8

Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...Mike Villiger

Scale14x Patterns and Practices for Open Source Project SuccessStephen Walli

Similar to Towards Continuous Performance Assessment of Java Applications With PerfBot (20)

Metrics-Driven Devops: Delivering High Quality Software Faster!

Kubecon SIG Apps December 2017 Update

15 Best React Native Developer Tools in 2024.pdf

The Power of Feedback Loops

Accelerate development with Visual Studio and Power Platform.pdf

Developing Highly Instrumented Applications with Minimal Effort

XRebel - Real Time Insight, Faster Apps

Monitoring 改造計畫：流程觀點

My life as a cyborg

The Evolution of Application Release Automation

API Product Opportunity Responsibility Nicolas Sierro 2015.pptx

Spring Webflux

"How Can Web Devs Reach the Mobile Market?" by Dimitris Michalakos, Web Techn...

Automation Testing Tool.pdf

Lecture_1_Introduction (Web Engineering).pdf

Ibm_interconnect_restapi_workshop

Creating an MVP with Oracle

Lecture_1_Introduction (Web Engineering).ppt

Performance Metrics Driven CI/CD - Introduction to Continuous Innovation and ...

Scale14x Patterns and Practices for Open Source Project Success

Recently uploaded

Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort

What is Fashion PLM and Why Do You Need ItWave PLM

What is Advanced Excel and what are some best practices for designing and cre...Technogeeks

Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ

CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase

Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig

Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini

2.pdf Ejercicios de programación competitivaDiego Iván Oliveros Acosta

Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol

Recruitment Management Software Benefits (Infographic)Hr365.us smith

Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed

What are the key points to focus on before starting to learn ETL Development....kzayra69

Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH

EY_Graph Database Powered SustainabilityNeo4j

Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel

Recently uploaded (20)

Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)

What is Fashion PLM and Why Do You Need It

What is Advanced Excel and what are some best practices for designing and cre...

Cloud Management Software Platforms: OpenStack

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx

Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service

Cloud Data Center Network Construction - IEEE

CRM Contender Series: HubSpot vs. Salesforce

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024

Automate your Kamailio Test Calls - Kamailio World 2024

Xen Safety Embedded OSS Summit April 2024 v4.pdf

2.pdf Ejercicios de programación competitiva

Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha

Recruitment Management Software Benefits (Infographic)

Unveiling Design Patterns: A Visual Guide with UML Diagrams

What are the key points to focus on before starting to learn ETL Development....

Der Spagat zwischen BIAS und FAIRNESS (2024)

EY_Graph Database Powered Sustainability

Unveiling the Future: Sylius 2.0 New Features

Towards Continuous Performance Assessment of Java Applications With PerfBot

1. USING BENCHMARKING BOTS FOR CONTINUOUS PERFORMANCE ASSESSMENT FLORIAN MARKUSSE PHILIPP LEITNER ALEXANDER SEREBRENIK https://nl.freepik.com/vrije-photo/3d-geef-van-een-robot-met-een-traditionele-chroom-stopwatch_958078.htm Author kjpargeter

2. MANY PROJECTS CARE ABOUT PERFORMANCE JUST AS MUCH AS ABOUT CORRECTNESS

3. MANY PROJECTS CARE ABOUT PERFORMANCE JUST AS MUCH AS ABOUT CORRECTNESS EXAMPLES: DOWNSTREAM LIBRARIES WEB APPLICATIONS DATABASES …..

5. “Left-shifting” performance testing?

7. (Performance) bots to the rescue!

8. RQ2: How does adoption of benchmarking bots affect the Open Source Software project? RQ1: How often are benchmarking bots adopted by the Open Source Software projects?

9. RQ1: How often are benchmarking bots adopted by the Open Source Software projects? “performance regression” “benchmark report” “performance result” GitHub Search Projects w ≥ 10 pull requests

10. RQ1: How often are benchmarking bots adopted by the Open Source Software projects? “performance regression” “benchmark report” “performance result” GitHub Search Projects w ≥ 10 pull requests

11.

12. 534 bot PRs and 6957 non-bot PRs

13. 1004 80 44 34 31 29 26 25 21 21 1 10 100 1000 10000 0 1 2 3 4 5 6 7 8 9 Number of developers (log scale) Number of interactions with The Nanosoldier

14.

15. In the PRs with the bot Julia Rust Swift Fluent UI #participants higher large medium small medium #comments higher large medium small medium comments longer large medium medium small #reviews higher small small negligible n.s. #commits higher small negligible small negligible

16. The Spruce / Steven Merkel https://www.thespruce.com/collect-clean-and-store-chicken-eggs-3016828

17. Grist / NAKphotos / Rolfo Brenner / EyeEm / Getty Images https://grist.org/climate/science-dishes-out-an-answer-on-the-old-handwashing-vs-dishwasher-debate/

18.

19. MEET PERFBOT

Editor's Notes

Thank for the opportunity to give the talk. Florian, Alexander - Eindhoven; Philipp - Chalmers/Gothenburg Paper
Bots are omnipresent in software development: they flag issues where the discussion has been finished (stall bot), check whether the pull requests or issues adhere to the rules defined in the project and they can even contribute to the warm and welcoming community by detecting toxicity or onboarding the newcomers.
Bots are omnipresent in software development: they flag issues where the discussion has been finished (stall bot), check whether the pull requests or issues adhere to the rules defined in the project and they can even contribute to the warm and welcoming community by detecting toxicity or onboarding the newcomers.
However, so far, much less attention has been paid to the usage of bots for continuous performance assessment, even though software performance is a critical quality concern for many projects. This is understandable given that even experienced developers often struggle to correctly benchmark their systems, or judge whether an observed slowdown is more likely to be statistical noise or an actual performance regression.
This is why we ask the following RQs
To answer the first research question we have searched for a series of strings related to performance to identify relevant pull requests. We have manually checked that the pull requests indeed have involved performance bots. We say that a project has adopted the benchmarking bot if more than 10 pull requests in a year use this benchmarking bot.
In this way we have identified 11 projects. Projects that adopt benchmarking bots tend to be large, and, unsurprisingly, in performance-sensitive domains, such as programming languages (Julia, Swift, Rust) or frameworks for Web development (Salesforce LWC, Hexo, or Microsoft’s Fluent UI). 4 use GitHub actions. all custom-built or heavily customized for usage in their projects. As of today, no “standard benchmarking bot” We also observe that many projects in our list adopted their benchmarking bot as recently as 2020, with the notable exception of the Julia programming language, which has been using a benchmarking bot (The Nanosoldier) since early 2016. This is why we have decided to conduct a case study on Julia to answer our second research question, i.e., to understand the impact of adoption of the bot (The Nanosoldier) on the project itself (Julia).
In this way we have identified 11 projects. Projects that adopt benchmarking bots tend to be large, and, unsurprisingly, in performance-sensitive domains, such as programming languages (Julia, Swift, Rust) or frameworks for Web development (Salesforce LWC, Hexo, or Microsoft’s Fluent UI). We also observe that many projects in our list adopted their benchmarking bot as recently as 2020, with the notable exception of the Julia programming language, which has been using a benchmarking bot (The Nanosoldier) since early 2016. Finally, the benchmarking bots we observed in our study are all custom-built or heavily customized for usage in their projects. As of today, no “standard benchmarking bot” has emerged that sees usage across the open source ecosystem, akin to build or dependency management bots. Firstly, there are two different ways to surface benchmarking results to developers: most bots add comments that link to detailed reports to pull request discussion threads; only three projects (simdjson, hexo, and ibis) surface benchmark results as GitHub Actions “checkmarks”. This is surprising given that similar badges are commonly used for other project quality checks [4]. Secondly, despite being integrated into the CI pipeline, four projects elect to not execute their benchmarking bots automatically and on every pull request. Instead, they have to be manually triggered by a developer, for instance by getting tagged in the code review discussion of a pull request. Given that system benchmarking is well-known to be time consuming and resource-intensive [5], these projects are selective with how often they choose to run their benchmarking bots. For example, in the Julia project, only 5% of pull requests are actually benchmarked, and only 8% of contributors have ever interacted with the benchmarking bot.
When we have been collecting data in June 2021, we have observed that while most projects in our study have adopted bots in 2020, Julia has adopted its benchmarking bot called The Nanosoldier already in 2016. This is why we have decided to conduct a case study on Julia to answer our second research question, i.e., to understand the impact of adoption of the bot (The Nanosoldier) on the project itself (Julia). Ca 7% of the PRs involve bots.
Pull requests where The Nanosoldier gets invoked involve more participants and more comments. Our research has also indicated that discussions involving the bot require more commits and lead to longer comment threads as measured by the total length of all comments taken together. The differences are statistically significant (p < 0.0001), with Cliff’s δ effect sizes between small (0.28; number of commits) and large (0.5; number of participants). Hence, it appears that The Nanosoldier is predominately involved in the discussion of complex issues.
Still, the number of developers having interacted with the Nanosoldier remains limited; benchmarking/performance seems to be a concern for a relatively small group.
Of course, the previous investigation has focused on Julia, while ideally we would like to be able to generalise these findings for all other bots in our dataset. Unfortunately, this was not possible since for many projects the number of data points was too low for the statistical analysis. Still we have enough points for two other programming languages, Rust and Apple Swift, and the Fluent UI framework.
We see that the hypotheses we have formulated based on the Julia case tend to hold for other projects (with one minor exception). At the same time, the effect size for other projects is usually smaller than for Julia.
However, it is unclear if the bot causes long, complex discussions, or if it is commonly brought into issues that simply are inherently more complex: this is a kind of chicken and egg problem. So we want back to Julia. To get further insights in the ways The Nanosoldier is used, we (a) analysed the pull requests manually performing card sorting, and (b) conducted semi-structured interviews with two active and experienced Julia developers who have had ample interaction with The Nanosoldier.
Based on card sorting the pull requests the process model on the slide has emerged. Arrows should be read as “follows by” rather than “causes”. The process model is not particularly surprising so I would like to highlight one element, namely presence of backporting requests. We do not think nanosoldier causes back porting requests. Rather, we speculate that when the PR is in a state where it looks promising, a backport is requested by a contributor. This argument is similar to why nanosoldier is invoked in the first place. The difference is that nanosoldier is invoked faster than a request to backport it. Thus, the backport request follows the benchmark invocation.
Then we moved to the interviews. Interviewees confirmed our findings that PRs with a contribution from The Nanosoldier have a greater number of interactions, particularly because the report from the bot explicitly triggers discussion. In a way, developers see The Nanosoldier as a kind of dishwasher, something they are so used to, that one has to turn it off to really appreciate it. One of the interviewees said: “You think, oh, most things will stay mostly stable, what could go wrong [when it’s offline]? And then you turn it back on, and there’s like ten bugs.” Our study indicates that developers appreciate the peace of mind that using a performance benchmarking bot gives them.
Why only those??
However, if The Nanosoldier is as magnificent as the dishwasher why would developers not use it on every pull request? Only ca 7% of the pull requests involve The Nanosoldier…
Unfortunately, having the bot profile a pull request takes a lot of time. “Our benchmarks just take too long. If they were really fast, we would just run them on CI. But a combination of the fact that we’re using wall clock time and that we need to run it on a machine that’s closer to the actual hardware and less susceptible to interrupts.” Thus, invoking The Nanosoldier is reserved for pull requests that are judged to be potentially high-impact. But at the same time, The Nanosoldier is beneficial for the project even if it is invoked only on a limited number of pull requests.
RQ1: Rare, GitHub actions are promising, dependabot4benchmarking? RQ2: Significant positive impact, even if the use is limited due to speed

Towards Continuous Performance Assessment of Java Applications With PerfBot

Recommended

Recommended

More Related Content

Similar to Towards Continuous Performance Assessment of Java Applications With PerfBot

Similar to Towards Continuous Performance Assessment of Java Applications With PerfBot (20)

More from Alexander Serebrenik

More from Alexander Serebrenik (20)

Recently uploaded

Recently uploaded (20)

Towards Continuous Performance Assessment of Java Applications With PerfBot

Editor's Notes