The document discusses research on assisting software engineers in comprehending source code changes. It outlines techniques for differencing code at the text, syntactic and semantic levels. It also covers summarizing code changes, explaining changes through rules and control flow analysis, and leveraging related documentation. Future work opportunities include detecting work-item specific changes, decomposing and aggregating changes, and explaining changes through differential execution of co-changed test cases.
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Sung Kim
Yida's presentation at MSR 2015!
Abstract—Developers expend significant effort on reviewing source code changes, hence the comprehensibility of code changes directly affects development productivity. Our prior study has suggested that composite code changes, which mix multiple development issues together, are typically difficult to review. Unfortunately, our manual inspection of 453 open source code changes reveals a non-trivial occurrence (up to 29%) of such composite changes.
In this paper, we propose a heuristic-based approach to automatically partition composite changes, such that each sub-change in the partition is more cohesive and self-contained. Our quantitative and qualitative evaluation results are promising in demonstrating the potential benefits of our approach for facilitating code review of composite code changes.
Who Should Review My Code? A file-location based code-reviewer recommendation approach for modern code review.
This research study is presented at the 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER2015)
Find more information and preprint at patanamon.com
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Sung Kim
Yida's presentation at MSR 2015!
Abstract—Developers expend significant effort on reviewing source code changes, hence the comprehensibility of code changes directly affects development productivity. Our prior study has suggested that composite code changes, which mix multiple development issues together, are typically difficult to review. Unfortunately, our manual inspection of 453 open source code changes reveals a non-trivial occurrence (up to 29%) of such composite changes.
In this paper, we propose a heuristic-based approach to automatically partition composite changes, such that each sub-change in the partition is more cohesive and self-contained. Our quantitative and qualitative evaluation results are promising in demonstrating the potential benefits of our approach for facilitating code review of composite code changes.
Who Should Review My Code? A file-location based code-reviewer recommendation approach for modern code review.
This research study is presented at the 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER2015)
Find more information and preprint at patanamon.com
Code review is one of the crucial software activities where developers and stakeholders collaborate with each other in order to assess software changes. Since code review processes act as a final gate for new software changes to be integrated into the software product, an intense collaboration is necessary in order to prevent defects and produce a high quality of software products. Recently, code review analytics has been implemented in projects (for example, StackAnalytics4 of the OpenStack project) to monitor the collaboration activities between developers and stakeholders in the code review processes. Yet, due to the large volume of software data, code review analytics can only report a static summary (e.g., counting), while neither insights nor instant suggestions are provided. Hence, to better gain valuable insights from software data and help software projects make a better decision, we conduct an empirical investigation using statistical approaches. In particular, we use the large-scale data of 196,712 reviews spread across the Android, Qt, and OpenStack open source projects to train a prediction model in order to uncover the relationship between the characteristics of software changes and the likelihood of having poor code review collaborations. We extract 20 patch characteristics which are grouped along five dimensions, i.e., software changes properties, review participation history, past involvement of a code author, past involvement of reviewers, and review environment dimensions. To validate our findings, we use the bootstrap technique which repeats the experiment 1,000 times. Due to the large volume of studied data, and an intensive computation of characteristic extraction and find- ing validation, the use of the High-Performance-Computing (HPC) re- sources is mandatory to expedite the analysis and generate insights in a timely manner. Through our case study, we find that the amount of review participation in the past and the description length of software changes are a significant indicator that new software changes will suffer from poor code review collaborations [2017]. Moreover, we find that the purpose of introducing new features can increase the likelihood that new software changes will receive late collaboration from reviewers. Our findings highlight the need for the policies of software change submission that monitor these characteristics in order to help software projects improve the quality of code reviews processes. Moreover, based on our findings, future work should develop real-time code review analytics implemented on HPC resources in order to instantly provide insights and suggestions to software projects
Review Participation in Modern Code Review: An Empirical Study of the Android...The University of Adelaide
This work empirically investigates the factors influence review participation in the MCR process. Through a case study of the Android, Qt, and OpenStack open source projects, we find that the amount of review participation in the past is a significant indicator of patches that will suffer from poor review participation. Moreover, the description length of a patch and the purpose of introducing new features also share a relationship with the likelihood of receiving poor review participation.
This full article of this work is published in the Empirical Software Engineering journal. Available online at http://dx.doi.org/10.1007/s10664-016-9452-6
This research is presented at the 12th Working Conference on Mining Software Repositories (MSR2015)
Abstract: Software code review is a well-established software quality practice. Recently, Modern Code Review (MCR) has been widely adopted in both open source and industrial projects. To evaluate the impact that characteristics of MCR practices have on software quality, this paper comparatively studies MCR practices in defective and clean source code files. We investigate defective files along two perspectives: 1) files that will eventually have defects (i.e., future-defective files) and 2) files that have historically been defective (i.e., risky files). Through an empirical study of 11,736 reviews of changes to 24,486 files from the Qt open source system, we find that both future-defective files and risky files tend to be reviewed less rigorously than their clean counterparts. We also find that the concerns addressed during the code reviews of both defective and clean files tend to enhance evolvability, i.e., ease future maintenance (like documentation), rather than focus on functional issues (like incorrect program logic). Our findings suggest that although functionality concerns are rarely addressed during code review, the rigor of the reviewing process that is applied to a source code file throughout a development cycle shares a link with its defect proneness.
Revisiting Code Ownership and Its Relationship with Software Quality in the S...The University of Adelaide
This work was presented at The 38th International Conference on Software Engineering (ICSE2016).
Abstract: Code ownership establishes a chain of responsibility for modules in large software systems. Although prior work uncovers a link between code ownership heuristics and software quality, these heuristics rely solely on the authorship of code changes. In addition to authoring code changes, developers also make important contributions to a module by reviewing code changes. Indeed, recent work shows that reviewers are highly active in modern code review processes, often suggesting alternative solutions or providing updates to the code changes. In this paper, we complement traditional code ownership heuristics using code review activity. Through a case study of six releases of the large Qt and OpenStack systems, we find that: (1) 67%-86% of developers did not author any code changes for a module, but still actively contributed by reviewing 21%-39% of the code changes, (2) code ownership heuristics that are aware of reviewing activity share a relationship with software quality, and (3) the proportion of reviewers without expertise shares a strong, increasing relationship with the likelihood of having post-release defects. Our results suggest that reviewing activity captures an important aspect of code ownership, and should be included in approximations of it in future studies.
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Feng Zhang
Defect prediction on projects with limited historical data has attracted great interest from both researchers and practitioners. Cross-project defect prediction has been the main area of progress by reusing classifiers from other projects. However, existing approaches require some degree of homogeneity (e.g., a similar distribution of metric values) between the training projects and the target project. Satisfying the homogeneity requirement often requires significant effort (currently a very active area of research).
An unsupervised classifier does not require any training data, therefore the heterogeneity challenge is no longer an issue. In this paper, we examine two types of unsupervised classifiers: a) distance-based classifiers (e.g., k-means); and b) connectivity-based classifiers. While distance-based unsupervised classifiers have been previously used in the defect prediction literature with disappointing performance, connectivity-based classifiers have never been explored before in our community.
We compare the performance of unsupervised classifiers versus supervised classifiers using data from 26 projects from three publicly available datasets (i.e., AEEEM, NASA, and PROMISE). In the cross-project setting, our proposed connectivity-based classifier (via spectral clustering) ranks as one of the top classifiers among five widely-used supervised classifiers (i.e., random forest, naive Bayes, logistic regression, decision tree, and logistic model tree) and five unsupervised classifiers (i.e., k-means, partition around medoids, fuzzy C-means, neural-gas, and spectral clustering). In the within-project setting (i.e., models are built and applied on the same project), our spectral classifier ranks in the second tier, while only random forest ranks in the first tier. Hence, connectivity-based unsupervised classifiers offer a viable solution for cross and within project defect predictions.
Ph.D. Thesis Defense: Studying Reviewer Selection and Involvement in Modern ...The University of Adelaide
Patanamon's Ph.D. thesis defense at Graduate School of Infomation Science, Nara Institute of Science and Technology, Japan. The thesis title is Studying Reviewer Selection and Involvement in Modern Code Review Processes. This presentation takes 20 minutes.
This research study is presented at the 7th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE2014)
Abstract: Effectively performing code review increases the quality of software and reduces occurrence of defects. However, this requires reviewers with experiences and deep understandings of system code. Manual selection of such reviewers can be a costly and time-consuming task. To reduce this cost, we propose a reviewer recommendation algorithm determining file path similarity called FPS algorithm. Using three OSS projects as case studies, FPS algorithm was accurate up to 77.97%, which significantly outperformed the previous
approach.
Find more information and preprint at patanamon.com
Software analytics (for software quality purpose) is a statistical or machine learning classifier that is trained to identify defect-prone software modules. The goal of software analytics is to help software engineers prioritize their software testing effort on the most-risky modules and understand past pitfalls that lead to defective code. While the adoption of software analytics enables software organizations to distil actionable insights, there are still many barriers to broad and successful adoption of such analytics systems. Indeed, even if software organizations can access such invaluable software artifacts and toolkits for data analytics, researchers and practitioners often have little knowledge to properly develop analytics systems. Thus, the accuracy of the predictions and the insights that are derived from analytics systems is one of the most important challenges of data science in software engineering.
In this work, we conduct a series of empirical investigation to better understand the impact of experimental components (i.e., class mislabelling, parameter optimization of classification techniques, and model validation techniques) on the performance and interpretation of software analytics. To accelerate a large amount of compute-intensive experiment, we leverage the High-Performance-Computing (HPC) resources of Centre for Advanced Computing (CAC) from Queen’s University, Canada. Through case studies of systems that span both proprietary and open- source domains, we demonstrate that (1) realistic noise does not impact the precision of software analytics; (2) automated parameter optimization for classification techniques substantially improve the performance and stability of software analytics; and (3) the out-of- sample bootstrap validation technique produces a good balance between bias and variance of performance estimates. Our results lead us to conclude that the experimental components of analytics modelling impact the predictions and associated insights that are derived from software analytics. Empirical investigations on the impact of overlooked experimental components are needed to derive practical guidelines for analytics modelling.
Synthesizing Knowledge from Software Development ArtifactsJeongwhan Choi
The content was created from "The Art and Science of Analyzing Software Data"
O Baysal, Kononenko, O. (Oleksii), Holmes, R. (Reid), and Godfrey, M.W. (Michael W.), “Synthesizing Knowledge from Software Development Artifacts”, 2015.
Code review is one of the crucial software activities where developers and stakeholders collaborate with each other in order to assess software changes. Since code review processes act as a final gate for new software changes to be integrated into the software product, an intense collaboration is necessary in order to prevent defects and produce a high quality of software products. Recently, code review analytics has been implemented in projects (for example, StackAnalytics4 of the OpenStack project) to monitor the collaboration activities between developers and stakeholders in the code review processes. Yet, due to the large volume of software data, code review analytics can only report a static summary (e.g., counting), while neither insights nor instant suggestions are provided. Hence, to better gain valuable insights from software data and help software projects make a better decision, we conduct an empirical investigation using statistical approaches. In particular, we use the large-scale data of 196,712 reviews spread across the Android, Qt, and OpenStack open source projects to train a prediction model in order to uncover the relationship between the characteristics of software changes and the likelihood of having poor code review collaborations. We extract 20 patch characteristics which are grouped along five dimensions, i.e., software changes properties, review participation history, past involvement of a code author, past involvement of reviewers, and review environment dimensions. To validate our findings, we use the bootstrap technique which repeats the experiment 1,000 times. Due to the large volume of studied data, and an intensive computation of characteristic extraction and find- ing validation, the use of the High-Performance-Computing (HPC) re- sources is mandatory to expedite the analysis and generate insights in a timely manner. Through our case study, we find that the amount of review participation in the past and the description length of software changes are a significant indicator that new software changes will suffer from poor code review collaborations [2017]. Moreover, we find that the purpose of introducing new features can increase the likelihood that new software changes will receive late collaboration from reviewers. Our findings highlight the need for the policies of software change submission that monitor these characteristics in order to help software projects improve the quality of code reviews processes. Moreover, based on our findings, future work should develop real-time code review analytics implemented on HPC resources in order to instantly provide insights and suggestions to software projects
Review Participation in Modern Code Review: An Empirical Study of the Android...The University of Adelaide
This work empirically investigates the factors influence review participation in the MCR process. Through a case study of the Android, Qt, and OpenStack open source projects, we find that the amount of review participation in the past is a significant indicator of patches that will suffer from poor review participation. Moreover, the description length of a patch and the purpose of introducing new features also share a relationship with the likelihood of receiving poor review participation.
This full article of this work is published in the Empirical Software Engineering journal. Available online at http://dx.doi.org/10.1007/s10664-016-9452-6
This research is presented at the 12th Working Conference on Mining Software Repositories (MSR2015)
Abstract: Software code review is a well-established software quality practice. Recently, Modern Code Review (MCR) has been widely adopted in both open source and industrial projects. To evaluate the impact that characteristics of MCR practices have on software quality, this paper comparatively studies MCR practices in defective and clean source code files. We investigate defective files along two perspectives: 1) files that will eventually have defects (i.e., future-defective files) and 2) files that have historically been defective (i.e., risky files). Through an empirical study of 11,736 reviews of changes to 24,486 files from the Qt open source system, we find that both future-defective files and risky files tend to be reviewed less rigorously than their clean counterparts. We also find that the concerns addressed during the code reviews of both defective and clean files tend to enhance evolvability, i.e., ease future maintenance (like documentation), rather than focus on functional issues (like incorrect program logic). Our findings suggest that although functionality concerns are rarely addressed during code review, the rigor of the reviewing process that is applied to a source code file throughout a development cycle shares a link with its defect proneness.
Revisiting Code Ownership and Its Relationship with Software Quality in the S...The University of Adelaide
This work was presented at The 38th International Conference on Software Engineering (ICSE2016).
Abstract: Code ownership establishes a chain of responsibility for modules in large software systems. Although prior work uncovers a link between code ownership heuristics and software quality, these heuristics rely solely on the authorship of code changes. In addition to authoring code changes, developers also make important contributions to a module by reviewing code changes. Indeed, recent work shows that reviewers are highly active in modern code review processes, often suggesting alternative solutions or providing updates to the code changes. In this paper, we complement traditional code ownership heuristics using code review activity. Through a case study of six releases of the large Qt and OpenStack systems, we find that: (1) 67%-86% of developers did not author any code changes for a module, but still actively contributed by reviewing 21%-39% of the code changes, (2) code ownership heuristics that are aware of reviewing activity share a relationship with software quality, and (3) the proportion of reviewers without expertise shares a strong, increasing relationship with the likelihood of having post-release defects. Our results suggest that reviewing activity captures an important aspect of code ownership, and should be included in approximations of it in future studies.
Cross-project Defect Prediction Using A Connectivity-based Unsupervised Class...Feng Zhang
Defect prediction on projects with limited historical data has attracted great interest from both researchers and practitioners. Cross-project defect prediction has been the main area of progress by reusing classifiers from other projects. However, existing approaches require some degree of homogeneity (e.g., a similar distribution of metric values) between the training projects and the target project. Satisfying the homogeneity requirement often requires significant effort (currently a very active area of research).
An unsupervised classifier does not require any training data, therefore the heterogeneity challenge is no longer an issue. In this paper, we examine two types of unsupervised classifiers: a) distance-based classifiers (e.g., k-means); and b) connectivity-based classifiers. While distance-based unsupervised classifiers have been previously used in the defect prediction literature with disappointing performance, connectivity-based classifiers have never been explored before in our community.
We compare the performance of unsupervised classifiers versus supervised classifiers using data from 26 projects from three publicly available datasets (i.e., AEEEM, NASA, and PROMISE). In the cross-project setting, our proposed connectivity-based classifier (via spectral clustering) ranks as one of the top classifiers among five widely-used supervised classifiers (i.e., random forest, naive Bayes, logistic regression, decision tree, and logistic model tree) and five unsupervised classifiers (i.e., k-means, partition around medoids, fuzzy C-means, neural-gas, and spectral clustering). In the within-project setting (i.e., models are built and applied on the same project), our spectral classifier ranks in the second tier, while only random forest ranks in the first tier. Hence, connectivity-based unsupervised classifiers offer a viable solution for cross and within project defect predictions.
Ph.D. Thesis Defense: Studying Reviewer Selection and Involvement in Modern ...The University of Adelaide
Patanamon's Ph.D. thesis defense at Graduate School of Infomation Science, Nara Institute of Science and Technology, Japan. The thesis title is Studying Reviewer Selection and Involvement in Modern Code Review Processes. This presentation takes 20 minutes.
This research study is presented at the 7th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE2014)
Abstract: Effectively performing code review increases the quality of software and reduces occurrence of defects. However, this requires reviewers with experiences and deep understandings of system code. Manual selection of such reviewers can be a costly and time-consuming task. To reduce this cost, we propose a reviewer recommendation algorithm determining file path similarity called FPS algorithm. Using three OSS projects as case studies, FPS algorithm was accurate up to 77.97%, which significantly outperformed the previous
approach.
Find more information and preprint at patanamon.com
Software analytics (for software quality purpose) is a statistical or machine learning classifier that is trained to identify defect-prone software modules. The goal of software analytics is to help software engineers prioritize their software testing effort on the most-risky modules and understand past pitfalls that lead to defective code. While the adoption of software analytics enables software organizations to distil actionable insights, there are still many barriers to broad and successful adoption of such analytics systems. Indeed, even if software organizations can access such invaluable software artifacts and toolkits for data analytics, researchers and practitioners often have little knowledge to properly develop analytics systems. Thus, the accuracy of the predictions and the insights that are derived from analytics systems is one of the most important challenges of data science in software engineering.
In this work, we conduct a series of empirical investigation to better understand the impact of experimental components (i.e., class mislabelling, parameter optimization of classification techniques, and model validation techniques) on the performance and interpretation of software analytics. To accelerate a large amount of compute-intensive experiment, we leverage the High-Performance-Computing (HPC) resources of Centre for Advanced Computing (CAC) from Queen’s University, Canada. Through case studies of systems that span both proprietary and open- source domains, we demonstrate that (1) realistic noise does not impact the precision of software analytics; (2) automated parameter optimization for classification techniques substantially improve the performance and stability of software analytics; and (3) the out-of- sample bootstrap validation technique produces a good balance between bias and variance of performance estimates. Our results lead us to conclude that the experimental components of analytics modelling impact the predictions and associated insights that are derived from software analytics. Empirical investigations on the impact of overlooked experimental components are needed to derive practical guidelines for analytics modelling.
Synthesizing Knowledge from Software Development ArtifactsJeongwhan Choi
The content was created from "The Art and Science of Analyzing Software Data"
O Baysal, Kononenko, O. (Oleksii), Holmes, R. (Reid), and Godfrey, M.W. (Michael W.), “Synthesizing Knowledge from Software Development Artifacts”, 2015.
fundamentals of software engineering.this unit covers all the aspects of software engineering coding standards and naming them and code inspectionna an d various testing methods and
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...Axel Reichwein
Presentation on Open Services for Lifecycle Collaboration (OSLC) at the International Semantic Web Conference (ISWC) 2019 in Auckland, New Zealand.
Engineers need graphs for traceability. Problem: it is currently not possible to build engineering graphs at scale due to data and API heterogeneity. This problem can be solved by standardizing APIs of data sources. OSLC defines a standard API by combining concepts of REST and Linked Data. OSLC has been adopted by vendors of engineering software but more adoption is needed to increase the network effect.
A Comparative Study of Forward and Reverse Engineeringijsrd.com
With the software development at its boom compared to 20 years in the past, software developed in the past may or may not have a well-supported documentation during the software evolution. This may increase the specification gap between the document and the legacy code to make further evolutions and updates. Understanding the legacy code of the underlying decisions made during development is the prime motto, which is very well supported by Reverse Engineering. In this paper, we compare the Transformational Forward engineering, where a stepwise abstraction is obtained with the Transformational Reverse Methodology. While the forward transformation process produces overlap of the decisions, performance is affected. Hence, the use of transformational method of Reverse Engineering which is a backwards Forward Engineering process is suitable. Besides the design recognition obtained is a domain knowledge which can be used in future by the forward engineers.
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffMartin Pinzger
Software systems continuously change and developers spent a large portion of their time in keeping track and understanding changes and their effects. Current development tools provide only limited support. Most of all, they track changes in source files only on the level of textual lines lacking semantic and context information on changes. Developers frequently need to reconstruct this information manually which is a time consuming and error prone task. In this talk, I present three techniques to address this problem by extracting detailed syntactical information from changes in various source files. I start with introducing ChangeDistiller, a tool and approach to extract information on source code changes on the level of ASTs. Next, I present the WSDLDiff approach to extract information on changes in web services interface description files. Finally, I present FMDiff, an approach to extract changes from feature models defined with the linux Kconfig language. For each approach I report on cases studies and experiments to highlight the benefits of our techniques. I also point out several research opportunities opened by our techniques and tools, and the detailed data on changes extracted by them.
Requirements Analysis and Management using InnoslateElizabeth Steiner
his one hour webinar will give you a step by step approach to Requirements Management and Analysis. A systems engineer will lead you through all the features in Innoslate that implement Requirements Management and Analysis.
What will be covered?
Capturing requirements using the automated parser
Writing requirements
Checking quality of requirements
Tracing requirements to other model entities
Baselining requirements
A study of code change patterns for adaptive maintenance with AST analysis IJECEIAES
Example-based transformational approaches to automate adaptive maintenance changes plays an important role in software research. One primary concern of those approaches is that a set of good qualified real examples of adaptive changes previously made in the history must be identified, or otherwise the adoption of such approaches will be put in question. Unfortunately, there is rarely enough detail to clearly direct transformation rule developers to overcome the barrier of finding qualified examples for adaptive changes. This work explores the histories of several open source systems to study the repetitiveness of adaptive changes in software evolution, and hence recognizing the source code change patterns that are strongly related with the adaptive maintenance. We collected the adaptive commits from the history of numerous open source systems, then we obtained the repetitiveness frequencies of source code changes based on the analysis of Abstract Syntax Tree (AST) edit actions within an adaptive commit. Using the prevalence of the most common adaptive changes, we suggested a set of change patterns that seem correlated with adaptive maintenance. It is observed that 76.93% of the undertaken adaptive changes were represented by 12 AST code differences. Moreover, only 9 change patterns covered 64.69% to 76.58% of the total adaptive change hunks in the examined projects. The most common individual patterns are related to initializing objects and method calls changes. A correlation analysis on examined projects shows that they have very similar frequencies of the patterns correlated with adaptive changes. The observed repeated adaptive changes could be useful examples for the construction of transformation approaches.
Developers often wonder how to implement a certain functionality
(e.g., how to parse XML files) using APIs. Obtaining
an API usage sequence based on an API-related natural
language query is very helpful in this regard. Given a query,
existing approaches utilize information retrieval models to
search for matching API sequences. These approaches treat
queries and APIs as bags-of-words and lack a deep understanding
of the semantics of the query.
We propose DeepAPI, a deep learning based approach to
generate API usage sequences for a given natural language
query. Instead of a bag-of-words assumption, it learns the
sequence of words in a query and the sequence of associated
APIs. DeepAPI adapts a neural language model named
RNN Encoder-Decoder. It encodes a word sequence (user
query) into a fixed-length context vector, and generates an
API sequence based on the context vector. We also augment
the RNN Encoder-Decoder by considering the importance
of individual APIs. We empirically evaluate our approach
with more than 7 million annotated code snippets collected
from GitHub. The results show that our approach generates
largely accurate API sequences and outperforms the related
approaches.
Defect, defect, defect: PROMISE 2012 Keynote Sung Kim
Software prediction leveraging repositories has received a tremendous amount of attention within the software engineering community, including PROMISE. In this talk, I will first present great achievements in defect prediction research including new defect prediction features, promising algorithms, and interesting analysis results. However, there are still many challenges in defect prediction. I will talk about them and discuss potential solutions for them leveraging prediction 2.0.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have.
For more Tendenci AMS events, check out www.tendenci.com/events
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus
As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
In software engineering, the right architecture is essential for robust, scalable platforms. Wix has undergone a pivotal shift from event sourcing to a CRUD-based model for its microservices. This talk will chart the course of this pivotal journey.
Event sourcing, which records state changes as immutable events, provided robust auditing and "time travel" debugging for Wix Stores' microservices. Despite its benefits, the complexity it introduced in state management slowed development. Wix responded by adopting a simpler, unified CRUD model. This talk will explore the challenges of event sourcing and the advantages of Wix's new "CRUD on steroids" approach, which streamlines API integration and domain event management while preserving data integrity and system resilience.
Participants will gain valuable insights into Wix's strategies for ensuring atomicity in database updates and event production, as well as caching, materialization, and performance optimization techniques within a distributed system.
Join us to discover how Wix has mastered the art of balancing simplicity and extensibility, and learn how the re-adoption of the modest CRUD has turbocharged their development velocity, resilience, and scalability in a high-growth environment.
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
1. Source Code Comprehension on Evolving Software:
A Literature Survey
Yida Tao
Supervisor: Sunghun Kim
1
2. Motivation
Code Change Comprehension
Tao et al., FSE’12
Code change comprehension is
• Frequently required
• In major development activities, in
particular the code-review process
• How do software engineers understand code changes? An exploratory study in industry. Tao et al., FSE’12
• Expectations, outcomes, and challenges of modern code review. Bacchelli and Bird, ICSE’13
Bacchelli & Bird, ICSE’13
• “…review and understand code they
have not seen before may be more
common that a developer working on
new code”
• “From interviews, no other code
review challenge emerged as clearly as
understanding the submitted change”
2
5. Text Differencing
Flat representation of a program
Sequence of strings
Unix diff
Only output added/deleted lines, can not detect modified lines
Hard to determine when a code fragment is moved upward or downward
Ldiff (Canfora et al., ICSE’09)
An enhanced line differencing tool
Limitations
Changes to *characters*
No syntactic-structure information
5
6. Syntactic Differencing
Structured representation of a program
Abstract syntax tree; XML
ChangeDistiller (Fluri et al., TSE’07)
Tree differencing
Node: bigram string similarity
Control structure: subtree similarity
Output: tree edit script (insert, delete, move, update)
XML differecing
srcXML (Maletic & Collard, ICSM’04): embeds abstract syntax and structure
within the source code
diffX (Al-Ekram et al., CASCON '05)
Limitation
Cannot describe how the behavior of a program is changed
Still report differences for behavior-preserving changes
6
11. Code Change Summarization
LSdiff (Kim and Notkin, ICSE’09)
Group related changes
Detect potential inconsistencies in a code change
11
12. Code Change Summarization (cont.)
DeltaDoc (Buse and Weimer, ASE’10)
Symbolic execution: obtain path predicates for each statement in both
versions
Identify statements that are added, deleted, or have a changed predicates
Summarization
12
13. Code Change Summarization (cont.)
Multi-document summarization (Rastkar and Murphy, ICSE’13)
Linking evolutionary documents (commit log, issue tracking entries)
Finding the most informative sentences to extract to form a summary
Similarity between a sentence and the title of the enclosing document
Overlap between a sentence and the adjacent document
13
14. Code Change Summarization (cont.)
Challenges
Evolutionary documents
Linkage might not be found (Bachman et al., FSE’10, Wu et al., FSE’11)
Human-written document may be unavailable or uninformative (Buse and Weimer,
ASE’10, Tao et al., FSE’12)
Automatically generated document
Verbosity
Uninteresting changes are identified, e.g., “all types that declared toString() added
constructors” (Kim and Notkin, ICSE’09)
14
LSdiff DeltaDoc
15. Outline
Program Differencing
Text Differencing
Syntactic differencing
Semantic differencing
Code Change Summarization
Rules and exceptions
Control-flow changes
Evolutionary documentation
Code Change Comprehension
Querying and Filtering
Customization
15
16. Querying and Filtering
Specifying and detecting meaningful changes (Yu et al., ASE’11)
Normalize the program (user-specified) before differencing
Non-trivial to construct the query
16
17. Querying and Filtering (cont.)
Filtering non-essential changes (Kawrykow and
Robillard, ICSE’11)
Non-essential changes: rename-induced modifications, local
variable extraction, trivial keyword modification, whitespace
and documentation updates
ChangeDistiller (Fluri et al., TSE’07) + Partial program
analysis (Dagenais and Robillard, ICSE’08)
Goal: improving mining and recommendation accuracy
instead of developers’ comprehension
17
18. Outline
Program Differencing
Text Differencing
Syntactic differencing
Semantic differencing
Code Change Summarization
Rules and exceptions
Control-flow changes
Evolutionary documentation
Querying and Filtering
Meaningful changes
Non-essential changes
Code Change Comprehension
18
19. Research Directions
Program Differencing
Text Differencing
Syntactic differencing
Semantic differencing
Code Change Summarization
Rules and exceptions
Control-flow changes
Evolutionary documentation
Querying and Filtering
Meaningful changes
Non-essential changes
Source Code Changes
Work-item-based changes?
19
20. Work-item-based Changes
Multiple work-items in a single code change (e.g., a bug fix +
code cleanup + a new feature)
Very difficult to understand (Tao et al., FSE’12)
20
JFreeChart revision 1083
Trivial keyword removal
Bug fix
Formatting
21. Work-item-based Change Detection
Multiple work-items in a single code change (e.g., a bug fix +
code cleanup + a new feature)
Very difficult to understand (Tao et al., FSE’12)
Change decomposition
Program slicing (entity dependencies)
Pattern matching (similarities)
A single work-item spreads across multiple code changes
(e.g., 5 changes to finally fix a bug completely)
Change aggregation
Linkage to the same issue
Heuristics like time duration, commit authors, program dependencies, etc.
21
22. Research Directions
Program Differencing
Text Differencing
Syntax differencing
Semantic differencing
Code Change Summarization
Rules and exceptions
Control-flow changes
Evolutionary documentation
Querying and Filtering
Meaningful changes
Non-essential changes
Code Change Comprehension
Work-item change detection
Change decomposition
Change aggregation
22
23. Research Directions
Program Differencing
Text Differencing
Syntax differencing
Semantic differencing
Code Change Summarization
Rules and exceptions
Control-flow changes
Evolutionary documentation
Querying and Filtering
Meaningful changes
Non-essential changes
Work-item-specific
changes
Code Change Comprehension
Work-item change detection
Change decomposition
Change aggregation
23
24. Research Directions
Program Differencing
Text Differencing
Syntax differencing
Semantic differencing
Code Change Summarization
Rules and exceptions
Control-flow changes
Evolutionary documentation
Querying and Filtering
Meaningful changes
Non-essential changes
Work-item-specific
changes
Code Change Comprehension
Concrete Execution
Work-item change detection
Change decomposition
Change aggregation
24
25. Explaining code changes with executions of co-
changed test cases
25
Test cases
Best documentation for source code
Test cases co-changed with source code
Documentation for code changes?
Mostly synchronous co-evolution of production and test
code (Zaidman et al., Empirical Software Engineering’11)
Differential test executions
Co-changed test cases T
Executing T on the old version P and new version P’
Comparing executions to explained change behaviors
From StackExchange
http://programmers.stackexchange.com/questions/154439/quality-of-code-in-
unit-tests?newsletter=1&nlcode=67628%7c1a35
• “Unit tests are one of the best sources of documentation for your system,
and arguably the most reliable form”
• “Unit tests are often the first thing you look at when trying to grasp what
some piece of code does”
• “They can also serve as a starting point for people new to the code base”
26. Research Directions
Program Differencing
Text Differencing
Syntax differencing
Semantic differencing
Code Change Summarization
Rules and exceptions
Control-flow changes
Evolutionary documentation
Querying and Filtering
Meaningful changes
Non-essential changes
Work-item-specific
changes
Code Change Comprehension
Concrete Execution
• Co-changed test cases
• Differential test execution
Work-item change detection
Change decomposition
Change aggregation
26
Editor's Notes
We know that software is continuously evolving since developers practically change source code all the time. One of the consequences is, developers also have to understand these code changes, which I refer to as CCC through this talk. Last year, we conducted an exploratory study in MS, where we sent surveys and conducted interviews with MS developers for their practices on CCC. This work is published in FSE. In this work, we found first, CCC is frequently required. The majority of developers understand code changes several times each day
In this year’s ICSE, B in their empirical study on modern code review, they also expressed the similar findings that CCC is more common than understanding the entire program, but CCC is also the most challenging part.
These motivate our work since CCC is a challenging activity but it’s also fundamental to developers’ daily practices.
So in the literature survey, I identify 3 major categories related to CCC.
First is program differencing. This line of work try to help developers by describing code changes
Second is …. Studies in this category take one step further to try to reasoning and explain code changes
Third is. This is sort of “customized” CCC.
Unix diff is the most well-known example in this category. But it’s also well-recognized for two major limitations.
Ldiff:
diff: Longest common subsequence
All possible hunk pairs -> similarity (vector space cosine similarity) -> pick the topmost pairs
Line matching -> Levenhstein edit distance -> above threshold is marked as changed
Unmatched lines are new hunks -> iterate step 2
Since these techniques treat program as normal text, they report program difference as changes to characters. But from a developer’s point of view, the syntactic, or structure information about the source code is lost. This motivates another line of work, which we call “syntax differencing”
This line of work uses structured representation of a program.
Changedistiller, which represents a program as an abstract syntax tree and applies tree differencing algorithm.
In addition to AST, studies also represent code in XML, which can also embed …Then we can apply XML differencing algorithms, like diffX proposed in, to compute program differences.
In cases when developers perform behavior-preserving modifications such as switch the order of if-else, it will still report the differences although from developer’s perspective, they might not think it is an important change.
Therefore, the next line of work focuses on semantic differencing of two program versions. Semantic diff operates on method level, and compares variable dependencies to derive behavioral changes.
In the old version of method add, if x not equal to HI, add it to TOT, otherwise, add DEF to total. From this code, we can derive a list of dependencies, for example, …
In the new version, developers simply want to switch the order of if-else but mistakenly uses assignment instead of equals. Therefore, when the technique computes variable dependencies and compare it to previous ones, it will report that..
These behavioral differences are certainly not expected because when x is assigned to HI, the initial value of x is always lost. In such cases, semantic diff is certainly better than syntactic diff since it can raise developers’ attention on program’s unexpected behavioral change.
Another work, Jdiff, which is published in, is about semantic differencing for oo program.
Simply applying syntactic differencing, we’ll only know that m1 is added, and . But developers may be more interested in how the behavior of program is changed.
if the dynamic type of a is B, the call a.m1 in new version actually invokes m1 in B.
The exception thrown will be caught by different catch blocks after the change.
Jdiff extends CFG to combine…ECFG considers dynamic binding and exception handling for the previous example, and graph differencing algorithm can be applied to reveal the difference.
Some studies also use symbolic execution to characterize programs’ behavior. This technique…instead of actual values. For example, a symbolic execution for this code fragment is like, if this condition is satisfied, return; otherwise, if…, return…
XXX proposed differential symbolic execution that compares the SE of two program versions. The output is like this. Under which condition, two different versions produces different results.
Now I’ve covered 3 categories in program differencing. These work basically try to help CCC by describing what the code change is. The next line of work, which I call “CCS”, takes a further step to try to explain code changes.
Program is presented as a set of predicates that describe code elements, containment relationships, and structural dependencies, which are called “facts”. Then Lsdiff computes changed facts between two program versions.
Inferring rules from the list of change facts
Also inferring exceptions to the rules. Example: all Car’s subtypes’ start methods added calls to the Key.chk method except for the subtype Kia
Finally, DeltaDoc uses some transformation heuristics to summarize these statements’ differences to human-readable documentation.
The studies we’ve seen so far all extract information from source code itself. However, other software artifacts, such as commit log, can also be helpful for understanding code changes since from these artifacts, we might found useful natural language sentences related to the code changes. Motivated by this observation, …proposed…
Each sentence has some features, for example. To locate the most informative or relevant sentences, they are ranked by their feature values.
Here is an example of their output. For this change, its summary contains a list of relevant sentences extracted from its evolutionary documents.
The major challenges of using evolutionary documents is first, linkage between these documents might not exist so we may not even be able to find documents relevant to a code change. This problem is known as the “missing link” and is studied recently.
In addition, document may not… In such cases, we can not rely on them to extract informative change summaries.
As for I introduced before, the biggest problem is verbosity. This is rules and exceptions generated by Lsdiff to describe a code change. This is the number of lines in the change documentation. Compared to human-written commit log, which is the black bar, documentation generated by DeltaDoc is still very long.
Another challenge is, some uninterested changes can be identified automatically. For example, a rule reported by Lsdiff says…, which in the user study, participants complain that such a rule is not useful.
Therefore, there are studies that customize CCC so that developers can query their interested changes and filtering out irrelevant changes.
Non-essential changes include …, which is less likely to be of developers’ interest.
They use ChangeDistiller to detect changes, and apply PPA to resolve type bindings for partial programs (i.e., code changes)
However, the goal of this work is to…
In general, studies in this category focuses on querying meaningful changes and filtering out non-essential changes.