The document describes an approach called LEarning ANnotations (Lean) that aims to automatically annotate program variables with semantic concepts by combining program analysis, runtime monitoring, and machine learning. Lean takes as input source code along with an initial set of annotations provided by developers and a concept diagram describing relationships between concepts. It instruments the code to collect runtime profiles, trains machine learning models to classify variables, and uses program analysis to validate predicted annotations. The authors evaluate Lean on open source projects and find it can correctly annotate an additional 47% of variables on average when given initial annotations for 6% of variables.
Automating and Validating Semantic Annotations.pdfKathryn Patel
This document describes an approach called LEarning ANnotations (Lean) that uses program analysis, runtime monitoring, and machine learning to automatically propagate semantic annotations from a small set of initial annotations provided by developers to additional program entities. The authors evaluate their Lean prototype on open-source Java projects and find that after developers annotate around 6% of entities, Lean can correctly annotate an additional 69% on average, taking less than an hour to analyze a project with over 20,000 lines of code.
This document discusses software maintenance and metrics used to measure software complexity. It notes that maintenance makes up 70% of a software's lifecycle costs. Common maintenance activities include enhancements, adaptations, and corrections. Two important source code metrics discussed are Halstead's effort equation and McCabe's cyclomatic complexity, which measure properties of source code like tokens and control flow to evaluate complexity. Maintaining complexity metrics is important during software evolution and maintenance.
This document summarizes the Dafny language and program verifier. Dafny is an imperative programming language that supports features like classes, generics, algebraic datatypes, and specification constructs like preconditions and postconditions. The Dafny program verifier translates Dafny programs into the Boogie intermediate verification language and uses the Z3 SMT solver to automatically verify that programs meet their specifications. Dafny has been used to specify and verify challenging algorithms like the Schorr-Waite graph marking algorithm.
The document discusses frameworks for software development. It describes the typical stages in a software development framework: investigation, design, development, and evaluation. Investigation involves problem description and requirements definition. Design includes data structure and algorithm design. Development consists of coding, debugging, and testing. Evaluation comprises user acceptance testing and retrospectives. The document also outlines good programming practices like input validation, meaningful naming, using constants, commenting code, proper indentation, and using control structures appropriately. It emphasizes single responsibility per module and meaningful module naming.
The article describes 7 types of metrics and more than 50 their representatives, provides a detailed description and calculation algorithms used. It also touches upon the role of metrics in software development.
This document proposes developing an extended maintainability estimation model for object-oriented software designs that incorporates reliability and portability metrics. It begins by introducing maintainability and discussing how estimating maintainability during design can help reduce maintenance costs. It then reviews related work on maintainability models and metrics. The proposed work section outlines developing a model that calculates maintainability based on reliability and portability factors. It defines the key aspects of reliability and portability and describes a methodology for inheriting these factors into an existing maintainability model called MOOD. The methodology would use a MATLAB GUI to demonstrate how replacing buggy components with reliable, portable ones could lower maintenance costs.
This document proposes developing an extended maintainability estimation model for object-oriented software design that incorporates reliability and portability metrics. It begins by introducing maintainability and discussing how estimating maintainability during design can help reduce maintenance costs. Next, it reviews related work on maintainability models and metrics. The proposed work section then describes incorporating reliability and portability factors into the existing MOOD maintainability model to potentially further lower maintenance costs. It defines reliability and portability in the context of the model. The methodology section outlines developing a tool to demonstrate the model and evaluate how the new factors may enhance maintainability estimation. Formulas for implementing the inheritance of factors into MOOD metrics are also provided.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
Automating and Validating Semantic Annotations.pdfKathryn Patel
This document describes an approach called LEarning ANnotations (Lean) that uses program analysis, runtime monitoring, and machine learning to automatically propagate semantic annotations from a small set of initial annotations provided by developers to additional program entities. The authors evaluate their Lean prototype on open-source Java projects and find that after developers annotate around 6% of entities, Lean can correctly annotate an additional 69% on average, taking less than an hour to analyze a project with over 20,000 lines of code.
This document discusses software maintenance and metrics used to measure software complexity. It notes that maintenance makes up 70% of a software's lifecycle costs. Common maintenance activities include enhancements, adaptations, and corrections. Two important source code metrics discussed are Halstead's effort equation and McCabe's cyclomatic complexity, which measure properties of source code like tokens and control flow to evaluate complexity. Maintaining complexity metrics is important during software evolution and maintenance.
This document summarizes the Dafny language and program verifier. Dafny is an imperative programming language that supports features like classes, generics, algebraic datatypes, and specification constructs like preconditions and postconditions. The Dafny program verifier translates Dafny programs into the Boogie intermediate verification language and uses the Z3 SMT solver to automatically verify that programs meet their specifications. Dafny has been used to specify and verify challenging algorithms like the Schorr-Waite graph marking algorithm.
The document discusses frameworks for software development. It describes the typical stages in a software development framework: investigation, design, development, and evaluation. Investigation involves problem description and requirements definition. Design includes data structure and algorithm design. Development consists of coding, debugging, and testing. Evaluation comprises user acceptance testing and retrospectives. The document also outlines good programming practices like input validation, meaningful naming, using constants, commenting code, proper indentation, and using control structures appropriately. It emphasizes single responsibility per module and meaningful module naming.
The article describes 7 types of metrics and more than 50 their representatives, provides a detailed description and calculation algorithms used. It also touches upon the role of metrics in software development.
This document proposes developing an extended maintainability estimation model for object-oriented software designs that incorporates reliability and portability metrics. It begins by introducing maintainability and discussing how estimating maintainability during design can help reduce maintenance costs. It then reviews related work on maintainability models and metrics. The proposed work section outlines developing a model that calculates maintainability based on reliability and portability factors. It defines the key aspects of reliability and portability and describes a methodology for inheriting these factors into an existing maintainability model called MOOD. The methodology would use a MATLAB GUI to demonstrate how replacing buggy components with reliable, portable ones could lower maintenance costs.
This document proposes developing an extended maintainability estimation model for object-oriented software design that incorporates reliability and portability metrics. It begins by introducing maintainability and discussing how estimating maintainability during design can help reduce maintenance costs. Next, it reviews related work on maintainability models and metrics. The proposed work section then describes incorporating reliability and portability factors into the existing MOOD maintainability model to potentially further lower maintenance costs. It defines reliability and portability in the context of the model. The methodology section outlines developing a tool to demonstrate the model and evaluate how the new factors may enhance maintainability estimation. Formulas for implementing the inheritance of factors into MOOD metrics are also provided.
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09666155510, 09849539085 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...IJCSEA Journal
This document summarizes a research paper that proposes a method for dynamically measuring coupling in distributed object-oriented software systems. The method involves three steps: instrumentation of the Java Virtual Machine to trace method calls, post-processing of the trace files to merge information, and calculation of coupling metrics based on the dynamic traces. The implementation results show that the proposed approach can effectively measure coupling metrics dynamically by accounting for polymorphism and dynamic binding, overcoming limitations of traditional static coupling analysis.
IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...IJCSEA Journal
Software metrics are increasingly playing a central role in the planning and control of software development projects. Coupling measures have important applications in software development and maintenance. Existing literature on software metrics is mainly focused on centralized systems, while work in the area of distributed systems, particularly in service-oriented systems, is scarce. Distributed systems with service oriented components are even more heterogeneous networking and execution environment. Traditional coupling measures take into account only “static” couplings. They do not account for “dynamic” couplings due to polymorphism and may significantly underestimate the complexity of software and misjudge the need for code inspection, testing and debugging. This is expected to result in poor predictive accuracy of the quality models in distributed Object Oriented systems that utilize static coupling measurements. In order to overcome these issues, we propose a hybrid model in Distributed Object Oriented Software for measure the coupling dynamically. In the proposed method, there are three steps
such as Instrumentation process, Post processing and Coupling measurement. Initially the instrumentation process is done. In this process the instrumented JVM that has been modified to trace method calls. During this process, three trace files are created namely .prf, .clp, .svp. In the second step, the information in these file are merged. At the end of this step, the merged detailed trace of each JVM contains pointers to the merged trace files of the other JVM such that the path of every remote call from the client to the server can be uniquely identified. Finally, the coupling metrics are measured dynamically. The implementation results show that the proposed system will effectively measure the coupling metrics dynamically.
Software Refactoring Under Uncertainty: A Robust Multi-Objective ApproachWiem Mkaouer
This document describes a multi-objective robust optimization approach for software refactoring that accounts for uncertainty in code smell severity levels and class importance. The approach formulates refactoring as a multi-objective problem to find solutions that maximize both quality, by correcting code smells, and robustness to changes in severity levels and importance. An evaluation on six open source projects found the approach generates refactoring solutions comparable in quality to existing approaches but with significantly better robustness across different scenarios.
Improving Consistency of UML Diagrams and Its Implementation Using Reverse En...journalBEEI
This document summarizes a research paper that describes the development of a tool called the UML-Code Consistency Checker Tool (UCCCT) to improve consistency between UML design models and their implementation in C# source code using reverse engineering. The tool detects both vertical inconsistencies between UML diagrams (e.g. class diagrams) and the implemented code, as well as horizontal inconsistencies between different UML diagrams. It extracts information from UML diagrams in XMI format and from compiled C# code to generate tree views. predefined consistency rules are then used to check for inconsistencies between the UML models and code. Any inconsistencies found are highlighted in the tree views. An evaluation of UCCCT found it
This document summarizes three types of program slicing: static slicing, thin slicing, and dynamic slicing. Static slicing finds all statements that might affect a variable's value for any input, which can result in large slices. Thin slicing improves on static slicing by only including "producer statements" that directly compute or copy a value to the variable of interest. Dynamic slicing considers the actual input used and only includes statements that were executed and actually affected the variable's value. The document provides definitions, algorithms, examples, and comparisons of the three slicing techniques.
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTSijseajournal
This paper investigates using categorical features of bug reports, such as the component a bug belongs to, to build a classification model for bug assignment. The model is trained to predict the developer assigned to a bug report based on its categorical fields rather than textual content. An evaluation on three projects found that using both categorical features and textual content improved accuracy over using textual content alone. Using only categorical features provided some improvement over prior approaches but was less accurate than using both data types.
The document outlines the steps involved in program design and problem solving techniques, including defining the problem, outlining the solution, developing an algorithm using pseudocode, testing the algorithm, coding the algorithm, running and documenting the program. It also discusses algorithmic problem solving, the structure theorem, meaningful naming conventions, communication between modules through variables and parameters, module cohesion and coupling, and sequential file updates.
This document discusses structured programming, functional programming, programming style, coding guidelines, software documentation, and challenges in software implementation. Structured programming breaks problems down into smaller pieces and uses modular programming and structured coding. Functional programming uses mathematical functions and avoids side effects. Good programming style and coding guidelines make code more readable and maintainable. Software documentation includes requirements, design, technical, and user documentation. Challenges include code reuse and compatibility issues.
This document discusses structured programming, functional programming, programming style, coding guidelines, software documentation, and challenges in software implementation. Structured programming breaks problems down into smaller pieces and uses modular programming and structured coding. Functional programming uses mathematical functions and concepts like recursion. Good programming style and coding guidelines make code readable and understandable. Software documentation includes requirements, design, technical, and user documentation. Challenges include code reuse, version management, and designing for target hosts.
Static white-box testing involves carefully reviewing software design, architecture, or code without executing it to find bugs. It provides access to internal code to find bugs early that may be difficult to discover with black-box testing alone. Formal reviews are the primary method, ranging from peer reviews between two programmers to inspections with multiple trained reviewers following strict roles and procedures to thoroughly check for problems from different perspectives. Checklists cover common errors like uninitialized variables, out-of-bounds array indexing, data type mismatches, computation overflows, and incorrect control flow or parameter handling.
The document provides an overview of software cost estimation. It discusses factors that influence software cost such as programmer ability, product size, available time, required reliability, and level of technology. It also covers different software cost estimation techniques including expert judgment, Delphi estimation, work breakdown structures, algorithmic models like COCOMO, and estimating maintenance costs. The document emphasizes that accurate cost estimation is difficult during planning due to many unknown factors, so organizations may use iterative estimates to reduce cost overruns.
IRJET- A Novel Approach on Computation Intelligence Technique for Softwar...IRJET Journal
This document describes a study that uses an Adaptive Neuro-Fuzzy Inference System (ANFIS) to predict software defects early in the development process. The study uses metrics data from NASA software projects to train and test the ANFIS model. The results show that the ANFIS model is able to accurately predict defects, with low root mean square error values for both the training and testing data, indicating the model was able to generalize without overfitting. The study concludes ANFIS is an effective technique for software defect prediction that can help improve quality and reduce costs.
A metrics suite for variable categorizationt to support program invariants[IJCSEA Journal
Invariants are generally implicit. Explicitly stating program invariants, help programmers to identify
program properties that must be preserved while modifying the code. Existing dynamic techniques detect
invariants which includes both relevant and irrelevant/unused variables and thereby relevant and
irrelevant invariants involved in the program. Due to the presence of irrelevant variables and irrelevant
invariants, speed and efficiency of techniques are affected. Also, displaying properties about irrelevant
variables and irrelevant invariants distracts the user from concentrating on properties of relevant
variables. To overcome these deficiencies only relevant variables are considered by ignoring irrelevant
variables. Further, relevant variables are categorized as design variables and non-design variables. For
this purpose a metrics suite is proposed. These metrics are validated against Weyuker’s principles and
applied on RFV and JLex open source software. Similarly, relevant invariants are categorized as design
invariants, non-design invariants and hybrid invariants. For this purpose a set of rules are proposed. This
entire process enormously improves the speed and efficiency of dynamic invariant detection techniques
A review of slicing techniques in software engineeringSalam Shah
This document summarizes various techniques for program slicing that have been developed since it was first introduced in 1979. It discusses static slicing, dynamic slicing, backward slicing, forward slicing, inter-procedural slicing, and model-based slicing. It also reviews related work on slicing algorithms and techniques that have been proposed to compute slices more efficiently and effectively for software engineering tasks like debugging, testing and maintenance.
The document discusses software cost estimation and introduces the COCOMO model. It describes that COCOMO takes into account project attributes, product attributes, personnel attributes, and hardware attributes to predict development effort. It also explains that algorithmic cost models can be used to quantitatively analyze options by comparing the costs of different development strategies. Finally, it notes that project duration is independent of team size, and adding people too quickly can actually lead to schedule delays.
This document discusses various types of static white box testing techniques including formal reviews, peer reviews, walkthroughs, and inspections. Formal reviews involve following rules and writing a report. Peer reviews involve two programmers reviewing each other's code. Walkthroughs involve a programmer presenting code to reviewers. Inspections are the most formal with trained roles and perspectives. Checklists are provided to review for errors in data declarations, computations, comparisons, control flow, subroutine parameters, and input/output. White box testing finds bugs early by reviewing design and code without executing it.
Modeling Object Oriented Applications by Using Dynamic Information for the I...IOSR Journals
This paper presents an iterative approach to recover collaborations from object-oriented applications using dynamic information. It develops a prototype tool called the Collaboration Browser to demonstrate this approach. The Collaboration Browser allows developers to query runtime information to understand collaborations between classes. It presents dynamic information through panels listing sender classes, receiver classes, invoked methods, and collaboration patterns. Developers can iteratively filter information and focus on specific collaborations. The paper tests the Collaboration Browser on a sample Java application, recovering 19 collaboration patterns by posing 18 queries. It concludes the approach effectively discovers important collaborations and roles classes play within them.
The document discusses techniques for analyzing unstructured text data from software repositories. It describes using textual analysis on code identifiers, comments, commit messages, issue trackers, emails, and forums to perform tasks like traceability link recovery, feature location, clone detection, and bug prediction. Different techniques are discussed, including pattern matching, island parsers, information retrieval methods, and natural language parsing. Choosing the right technique depends on the type of unstructured data and needs of the analysis.
Overlapping optimization with parsing through metagrammarsIAEME Publication
This document describes techniques for improving the performance of a meta framework developed by combining C++ and Java language segments. The meta framework identifies and parses source code containing C++ and Java statements using a metagrammar. Bytecodes are generated from the abstract syntax tree and optimized using techniques associated with the metagrammar, such as constant propagation. Constant propagation identifies constant values and replaces variables with those values to simplify expressions and reduce unnecessary computations. Other optimizations discussed include function inlining, exception handling, and eliminating unreachable code through constant folding. The goal is to develop an optimized meta framework that generates efficient bytecodes for hybrid C++ and Java source code.
Defaultification Refactoring: A Tool for Automatically Converting Java Method...Raffi Khatchadourian
Enabling interfaces to declare (instance) method
implementations, Java 8 default methods can be used as a
substitute for the ubiquitous skeletal implementation software design
pattern. Performing this transformation on legacy software
manually, though, may be non-trivial. The refactoring requires
analyzing complex type hierarchies, resolving multiple implementation
inheritance issues, reconciling differences between class
and interface methods, and analyzing tie-breakers (dispatch
precedence) with overriding class methods. All of this is necessary
to preserve type-correctness and confirm semantics preservation.
We demonstrate an automated refactoring tool called MIGRATE
SKELETAL IMPLEMENTATION TO INTERFACE for transforming
legacy Java code to use the new default construct. The tool,
implemented as an Eclipse plug-in, is driven by an efficient,
fully-automated, type constraint-based refactoring approach. It
features an extensive ruleset covering various corner-cases
where default methods cannot be used. The resulting code is
semantically equivalent to the original, more succinct, easier to
comprehend, less complex, and exhibits increased modularity. A
demonstration can be found at http://youtu.be/YZHIy0yePh8.
50 Self Evaluation Examples, Forms Questions TemplatAndrew Molina
The family ecomap shows ED's nuclear family consisting of herself, her husband BD, and their two grown children who live independently. While ED has a strong relationship with her son who lives out of state, her relationship with her daughter who lives locally is tense due to issues with her daughter's new husband whom ED and her husband dislike. The ecomap identifies sources of support and stress for the family to help develop plans to improve relationships and reduce stresses.
The document provides a 5-step process for requesting writing assistance from HelpWriting.net. It explains how to 1) create an account, 2) complete an order form with instructions and deadlines, 3) review writer bids and qualifications and place a deposit, 4) ensure the paper meets expectations and authorize payment, and 5) request revisions to ensure satisfaction. It guarantees original, high-quality content or a full refund.
More Related Content
Similar to Automating And Validating Program Annotations
IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...IJCSEA Journal
This document summarizes a research paper that proposes a method for dynamically measuring coupling in distributed object-oriented software systems. The method involves three steps: instrumentation of the Java Virtual Machine to trace method calls, post-processing of the trace files to merge information, and calculation of coupling metrics based on the dynamic traces. The implementation results show that the proposed approach can effectively measure coupling metrics dynamically by accounting for polymorphism and dynamic binding, overcoming limitations of traditional static coupling analysis.
IMPLEMENTATION OF DYNAMIC COUPLING MEASUREMENT OF DISTRIBUTED OBJECT ORIENTED...IJCSEA Journal
Software metrics are increasingly playing a central role in the planning and control of software development projects. Coupling measures have important applications in software development and maintenance. Existing literature on software metrics is mainly focused on centralized systems, while work in the area of distributed systems, particularly in service-oriented systems, is scarce. Distributed systems with service oriented components are even more heterogeneous networking and execution environment. Traditional coupling measures take into account only “static” couplings. They do not account for “dynamic” couplings due to polymorphism and may significantly underestimate the complexity of software and misjudge the need for code inspection, testing and debugging. This is expected to result in poor predictive accuracy of the quality models in distributed Object Oriented systems that utilize static coupling measurements. In order to overcome these issues, we propose a hybrid model in Distributed Object Oriented Software for measure the coupling dynamically. In the proposed method, there are three steps
such as Instrumentation process, Post processing and Coupling measurement. Initially the instrumentation process is done. In this process the instrumented JVM that has been modified to trace method calls. During this process, three trace files are created namely .prf, .clp, .svp. In the second step, the information in these file are merged. At the end of this step, the merged detailed trace of each JVM contains pointers to the merged trace files of the other JVM such that the path of every remote call from the client to the server can be uniquely identified. Finally, the coupling metrics are measured dynamically. The implementation results show that the proposed system will effectively measure the coupling metrics dynamically.
Software Refactoring Under Uncertainty: A Robust Multi-Objective ApproachWiem Mkaouer
This document describes a multi-objective robust optimization approach for software refactoring that accounts for uncertainty in code smell severity levels and class importance. The approach formulates refactoring as a multi-objective problem to find solutions that maximize both quality, by correcting code smells, and robustness to changes in severity levels and importance. An evaluation on six open source projects found the approach generates refactoring solutions comparable in quality to existing approaches but with significantly better robustness across different scenarios.
Improving Consistency of UML Diagrams and Its Implementation Using Reverse En...journalBEEI
This document summarizes a research paper that describes the development of a tool called the UML-Code Consistency Checker Tool (UCCCT) to improve consistency between UML design models and their implementation in C# source code using reverse engineering. The tool detects both vertical inconsistencies between UML diagrams (e.g. class diagrams) and the implemented code, as well as horizontal inconsistencies between different UML diagrams. It extracts information from UML diagrams in XMI format and from compiled C# code to generate tree views. predefined consistency rules are then used to check for inconsistencies between the UML models and code. Any inconsistencies found are highlighted in the tree views. An evaluation of UCCCT found it
This document summarizes three types of program slicing: static slicing, thin slicing, and dynamic slicing. Static slicing finds all statements that might affect a variable's value for any input, which can result in large slices. Thin slicing improves on static slicing by only including "producer statements" that directly compute or copy a value to the variable of interest. Dynamic slicing considers the actual input used and only includes statements that were executed and actually affected the variable's value. The document provides definitions, algorithms, examples, and comparisons of the three slicing techniques.
USING CATEGORICAL FEATURES IN MINING BUG TRACKING SYSTEMS TO ASSIGN BUG REPORTSijseajournal
This paper investigates using categorical features of bug reports, such as the component a bug belongs to, to build a classification model for bug assignment. The model is trained to predict the developer assigned to a bug report based on its categorical fields rather than textual content. An evaluation on three projects found that using both categorical features and textual content improved accuracy over using textual content alone. Using only categorical features provided some improvement over prior approaches but was less accurate than using both data types.
The document outlines the steps involved in program design and problem solving techniques, including defining the problem, outlining the solution, developing an algorithm using pseudocode, testing the algorithm, coding the algorithm, running and documenting the program. It also discusses algorithmic problem solving, the structure theorem, meaningful naming conventions, communication between modules through variables and parameters, module cohesion and coupling, and sequential file updates.
This document discusses structured programming, functional programming, programming style, coding guidelines, software documentation, and challenges in software implementation. Structured programming breaks problems down into smaller pieces and uses modular programming and structured coding. Functional programming uses mathematical functions and avoids side effects. Good programming style and coding guidelines make code more readable and maintainable. Software documentation includes requirements, design, technical, and user documentation. Challenges include code reuse and compatibility issues.
This document discusses structured programming, functional programming, programming style, coding guidelines, software documentation, and challenges in software implementation. Structured programming breaks problems down into smaller pieces and uses modular programming and structured coding. Functional programming uses mathematical functions and concepts like recursion. Good programming style and coding guidelines make code readable and understandable. Software documentation includes requirements, design, technical, and user documentation. Challenges include code reuse, version management, and designing for target hosts.
Static white-box testing involves carefully reviewing software design, architecture, or code without executing it to find bugs. It provides access to internal code to find bugs early that may be difficult to discover with black-box testing alone. Formal reviews are the primary method, ranging from peer reviews between two programmers to inspections with multiple trained reviewers following strict roles and procedures to thoroughly check for problems from different perspectives. Checklists cover common errors like uninitialized variables, out-of-bounds array indexing, data type mismatches, computation overflows, and incorrect control flow or parameter handling.
The document provides an overview of software cost estimation. It discusses factors that influence software cost such as programmer ability, product size, available time, required reliability, and level of technology. It also covers different software cost estimation techniques including expert judgment, Delphi estimation, work breakdown structures, algorithmic models like COCOMO, and estimating maintenance costs. The document emphasizes that accurate cost estimation is difficult during planning due to many unknown factors, so organizations may use iterative estimates to reduce cost overruns.
IRJET- A Novel Approach on Computation Intelligence Technique for Softwar...IRJET Journal
This document describes a study that uses an Adaptive Neuro-Fuzzy Inference System (ANFIS) to predict software defects early in the development process. The study uses metrics data from NASA software projects to train and test the ANFIS model. The results show that the ANFIS model is able to accurately predict defects, with low root mean square error values for both the training and testing data, indicating the model was able to generalize without overfitting. The study concludes ANFIS is an effective technique for software defect prediction that can help improve quality and reduce costs.
A metrics suite for variable categorizationt to support program invariants[IJCSEA Journal
Invariants are generally implicit. Explicitly stating program invariants, help programmers to identify
program properties that must be preserved while modifying the code. Existing dynamic techniques detect
invariants which includes both relevant and irrelevant/unused variables and thereby relevant and
irrelevant invariants involved in the program. Due to the presence of irrelevant variables and irrelevant
invariants, speed and efficiency of techniques are affected. Also, displaying properties about irrelevant
variables and irrelevant invariants distracts the user from concentrating on properties of relevant
variables. To overcome these deficiencies only relevant variables are considered by ignoring irrelevant
variables. Further, relevant variables are categorized as design variables and non-design variables. For
this purpose a metrics suite is proposed. These metrics are validated against Weyuker’s principles and
applied on RFV and JLex open source software. Similarly, relevant invariants are categorized as design
invariants, non-design invariants and hybrid invariants. For this purpose a set of rules are proposed. This
entire process enormously improves the speed and efficiency of dynamic invariant detection techniques
A review of slicing techniques in software engineeringSalam Shah
This document summarizes various techniques for program slicing that have been developed since it was first introduced in 1979. It discusses static slicing, dynamic slicing, backward slicing, forward slicing, inter-procedural slicing, and model-based slicing. It also reviews related work on slicing algorithms and techniques that have been proposed to compute slices more efficiently and effectively for software engineering tasks like debugging, testing and maintenance.
The document discusses software cost estimation and introduces the COCOMO model. It describes that COCOMO takes into account project attributes, product attributes, personnel attributes, and hardware attributes to predict development effort. It also explains that algorithmic cost models can be used to quantitatively analyze options by comparing the costs of different development strategies. Finally, it notes that project duration is independent of team size, and adding people too quickly can actually lead to schedule delays.
This document discusses various types of static white box testing techniques including formal reviews, peer reviews, walkthroughs, and inspections. Formal reviews involve following rules and writing a report. Peer reviews involve two programmers reviewing each other's code. Walkthroughs involve a programmer presenting code to reviewers. Inspections are the most formal with trained roles and perspectives. Checklists are provided to review for errors in data declarations, computations, comparisons, control flow, subroutine parameters, and input/output. White box testing finds bugs early by reviewing design and code without executing it.
Modeling Object Oriented Applications by Using Dynamic Information for the I...IOSR Journals
This paper presents an iterative approach to recover collaborations from object-oriented applications using dynamic information. It develops a prototype tool called the Collaboration Browser to demonstrate this approach. The Collaboration Browser allows developers to query runtime information to understand collaborations between classes. It presents dynamic information through panels listing sender classes, receiver classes, invoked methods, and collaboration patterns. Developers can iteratively filter information and focus on specific collaborations. The paper tests the Collaboration Browser on a sample Java application, recovering 19 collaboration patterns by posing 18 queries. It concludes the approach effectively discovers important collaborations and roles classes play within them.
The document discusses techniques for analyzing unstructured text data from software repositories. It describes using textual analysis on code identifiers, comments, commit messages, issue trackers, emails, and forums to perform tasks like traceability link recovery, feature location, clone detection, and bug prediction. Different techniques are discussed, including pattern matching, island parsers, information retrieval methods, and natural language parsing. Choosing the right technique depends on the type of unstructured data and needs of the analysis.
Overlapping optimization with parsing through metagrammarsIAEME Publication
This document describes techniques for improving the performance of a meta framework developed by combining C++ and Java language segments. The meta framework identifies and parses source code containing C++ and Java statements using a metagrammar. Bytecodes are generated from the abstract syntax tree and optimized using techniques associated with the metagrammar, such as constant propagation. Constant propagation identifies constant values and replaces variables with those values to simplify expressions and reduce unnecessary computations. Other optimizations discussed include function inlining, exception handling, and eliminating unreachable code through constant folding. The goal is to develop an optimized meta framework that generates efficient bytecodes for hybrid C++ and Java source code.
Defaultification Refactoring: A Tool for Automatically Converting Java Method...Raffi Khatchadourian
Enabling interfaces to declare (instance) method
implementations, Java 8 default methods can be used as a
substitute for the ubiquitous skeletal implementation software design
pattern. Performing this transformation on legacy software
manually, though, may be non-trivial. The refactoring requires
analyzing complex type hierarchies, resolving multiple implementation
inheritance issues, reconciling differences between class
and interface methods, and analyzing tie-breakers (dispatch
precedence) with overriding class methods. All of this is necessary
to preserve type-correctness and confirm semantics preservation.
We demonstrate an automated refactoring tool called MIGRATE
SKELETAL IMPLEMENTATION TO INTERFACE for transforming
legacy Java code to use the new default construct. The tool,
implemented as an Eclipse plug-in, is driven by an efficient,
fully-automated, type constraint-based refactoring approach. It
features an extensive ruleset covering various corner-cases
where default methods cannot be used. The resulting code is
semantically equivalent to the original, more succinct, easier to
comprehend, less complex, and exhibits increased modularity. A
demonstration can be found at http://youtu.be/YZHIy0yePh8.
Similar to Automating And Validating Program Annotations (20)
50 Self Evaluation Examples, Forms Questions TemplatAndrew Molina
The family ecomap shows ED's nuclear family consisting of herself, her husband BD, and their two grown children who live independently. While ED has a strong relationship with her son who lives out of state, her relationship with her daughter who lives locally is tense due to issues with her daughter's new husband whom ED and her husband dislike. The ecomap identifies sources of support and stress for the family to help develop plans to improve relationships and reduce stresses.
The document provides a 5-step process for requesting writing assistance from HelpWriting.net. It explains how to 1) create an account, 2) complete an order form with instructions and deadlines, 3) review writer bids and qualifications and place a deposit, 4) ensure the paper meets expectations and authorize payment, and 5) request revisions to ensure satisfaction. It guarantees original, high-quality content or a full refund.
This report summarizes focus group and interview research conducted to understand consumer preferences for the clothing brand Rocco. Participants of different age groups provided qualitative feedback. Youth under 20 said price was very important and quality less so, seeking stylish and colorful options. Interviews found 16-year-olds valued trends but one also emphasized quality. Respondents in their 20s wanted versatile, long-lasting basics while those in their 30s appreciated sophisticated styles for work and leisure. Those over 40 favored comfort and wanted options for their active lifestyles. The research provided insights into Rocco's target demographics and how to better meet their needs.
Writing Papers High - The Best Place To Buy Same DAndrew Molina
The document provides instructions for purchasing writing assistance from HelpWriting.net. It outlines a 5-step process: 1) Create an account with a password and email. 2) Complete an order form with instructions, sources, and deadline. 3) Review bids from writers and select one. 4) Review the completed paper and authorize payment. 5) Request revisions until satisfied, with a refund option for plagiarism. The service aims to provide original, high-quality content to meet customer needs.
New YearS Writing Paper Writing Paper, Lined PaperAndrew Molina
The document provides instructions for requesting writing assistance from HelpWriting.net. It outlines a 5-step process: 1) Create an account with a password and email. 2) Complete a 10-minute order form providing instructions, sources, and deadline. 3) Review bids from writers and choose one based on qualifications. 4) Review the completed paper and authorize payment if satisfied. 5) Request revisions until fully satisfied, and the company guarantees original, high-quality work or a full refund.
How To Write A Paper On A Topic ~ Abbeye BookletAndrew Molina
The Buddhist temple of Borobudur in Java has caused disagreement among scholars about its true significance and purpose. Some argue it is a stupa, meant to hold Buddhist relics for pilgrims. Others see it as a mandala, representing Buddhist spiritual concepts. While both sides make valid arguments, the author believes Borobudur serves both purposes - it has architectural elements of a stupa containing relics, and is also designed like a mandala with symbolic spiritual meaning.
How To Write An Essay About Myself Myself HandwritinAndrew Molina
This document discusses how geography, climate, and economic performance are related through individual income levels. It proposes a two-stage econometric analysis to establish and evaluate this relationship. First, it will conduct a literature review on previous research in this area. Then, it will perform statistical analyses using individual income data and climate/geographic indicators to determine the impact of these factors on economic outcomes. The goal is to advance understanding of the linkages between where people live, the local environment, and financial well-being.
The document provides instructions for writing a school essay paper proposal on HelpWriting.net in 5 steps:
1. Create an account and provide login details.
2. Complete a form with paper details, sources, and deadline to request a paper be written.
3. Review bids from writers and choose one based on qualifications to start writing.
4. Review the completed paper and authorize payment if satisfied.
5. Request revisions to ensure satisfaction, with a refund offered for plagiarized content.
Tooth Fairy Letter Template Free - LassafolderAndrew Molina
The document provides instructions for requesting writing assistance from HelpWriting.net. It outlines a 5-step process: 1) Create an account with a password and email. 2) Complete a 10-minute order form providing instructions, sources, and deadline. 3) Review bids from writers and select one based on qualifications. 4) Review the completed paper and authorize payment if satisfied. 5) Request revisions until fully satisfied, with the option of a full refund for plagiarized work.
Features Of A Play Script Display Teaching ResourcesAndrew Molina
The Treaty of Versailles was signed in 1919 by the Allied powers including Britain, France, and the US. Georges Clemenceau, the Prime Minister of France, took a harsh stance towards Germany wanting it to suffer for the destruction it caused. Lloyd George, the Prime Minister of Britain, believed Germany should pay but not be crippled to avoid further conflict. Vittorio Orlando, the Prime Minister of Italy, walked out of negotiations unhappy with the terms.
5-Paragraph Essay Graphic Organizer - In Our PondAndrew Molina
The document provides instructions for using the writing service HelpWriting.net in 5 steps: 1) Create an account with a password and email. 2) Complete an order form with instructions, sources, and deadline. 3) Review bids from writers and choose one. 4) Ensure the paper meets expectations and authorize payment. 5) Request revisions until fully satisfied, with a refund option for plagiarism.
How To Write A Research Paper Middle School Powerpoint - SchoolAndrew Molina
The document discusses Lewis Hine's photographs documenting child labor in the early 20th century. Hine took over 5,100 photos exposing the poor working conditions of children. The paper will analyze 3 of Hine's photos taken in Indianapolis, Indiana. Hine played a key role in raising awareness about the harms of child labor and helping create laws to protect children from unsafe working conditions. While children were eventually replaced in some industries, Hine's photos helped spark education reforms and child labor laws.
The document provides instructions for creating an account on HelpWriting.net to request essay writing help. Users complete a form with their assignment details and deadline. Writers then bid on the request, and the user chooses a writer based on qualifications. The writer completes the paper and the user can request revisions until satisfied. HelpWriting.net guarantees original, high-quality work with refunds for plagiarism.
What Does 500 Words Look Like 500 Word Essay, EssAndrew Molina
The document provides instructions for using the HelpWriting.net service to have essays written. It outlines a 5-step process: 1) Create an account, 2) Complete an order form with instructions and deadline, 3) Review writer bids and choose one, 4) Review the completed paper and authorize payment, 5) Request revisions until satisfied. It emphasizes the original, high-quality work and refund policy if plagiarized.
Marking Rubric Essay, Essay Writing, Good EssayAndrew Molina
The document provides a summary of the plot of the novel "Sister Carrie" by Theodore Dreiser. It discusses how the novel follows a young woman named Carrie as she moves to Chicago to pursue her dreams. Though initially struggling, she eventually finds success as an actress after having relationships as a mistress. The summary then notes some of the controversy around the novel's publication due to its realistic portrayal of women's roles and hardships in 19th century America.
40 Case Brief Examples Templates TemplateLabAndrew Molina
The document provides information about the Draenei race in the World of Warcraft unit. Draenei are an extremely tall race that stand at an astonishing eight feet tall. They are refugees from Outland who fled from the Orcs. Draenei have hooved feet, horns, and blue or purple skin. Their homeland was destroyed so they seek a new home on Azeroth.
Examples Of Argument Essays Essay Examples,Andrew Molina
The document discusses the process for getting writing help from the website HelpWriting.net. It involves 5 steps: 1) Creating an account with an email and password. 2) Completing an order form with instructions and deadline. 3) Reviewing bids from writers and choosing one. 4) Receiving the paper and authorizing payment if pleased. 5) Requesting revisions until satisfied. The website promises original, high-quality content and refunds for plagiarized work.
The document provides instructions for using the HelpWriting.net service to have papers written. It outlines a 5-step process: 1) Create an account; 2) Submit a request with instructions and deadline; 3) Review bids from writers and select one; 4) Review the completed paper and authorize payment; 5) Request revisions to ensure satisfaction, with a refund offered for plagiarized work. The service uses a bidding system and promises original, high-quality content.
The document provides instructions for requesting an assignment writing service from HelpWriting.net. It outlines a 5-step process: 1) Create an account with a password and email. 2) Complete a 10-minute order form providing instructions, sources, and deadline. 3) Review bids from writers and choose one based on qualifications. 4) Review the completed paper and authorize payment if satisfied. 5) Request revisions to ensure satisfaction, with a full refund option for plagiarized work.
Resume Paper Best Types, Colors Brands To ChoosAndrew Molina
This document provides steps for requesting writing assistance from HelpWriting.net:
1. Create an account with a password and email.
2. Complete a 10-minute order form providing instructions, sources, deadline, and attaching a sample for style imitation.
3. Review bids from writers and choose one based on qualifications, history, and feedback, then pay a deposit to start.
4. Review the paper and authorize full payment if pleased, or request revisions using the free revision policy.
5. Confidently choose this service knowing needs will be fully met through original, high-quality content or a full refund.
How to Download & Install Module From the Odoo App Store in Odoo 17Celine George
Custom modules offer the flexibility to extend Odoo's capabilities, address unique requirements, and optimize workflows to align seamlessly with your organization's processes. By leveraging custom modules, businesses can unlock greater efficiency, productivity, and innovation, empowering them to stay competitive in today's dynamic market landscape. In this tutorial, we'll guide you step by step on how to easily download and install modules from the Odoo App Store.
CapTechTalks Webinar Slides June 2024 Donovan Wright.pptxCapitolTechU
Slides from a Capitol Technology University webinar held June 20, 2024. The webinar featured Dr. Donovan Wright, presenting on the Department of Defense Digital Transformation.
Leveraging Generative AI to Drive Nonprofit InnovationTechSoup
In this webinar, participants learned how to utilize Generative AI to streamline operations and elevate member engagement. Amazon Web Service experts provided a customer specific use cases and dived into low/no-code tools that are quick and easy to deploy through Amazon Web Service (AWS.)
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumMJDuyan
(𝐓𝐋𝐄 𝟏𝟎𝟎) (𝐋𝐞𝐬𝐬𝐨𝐧 𝟏)-𝐏𝐫𝐞𝐥𝐢𝐦𝐬
𝐃𝐢𝐬𝐜𝐮𝐬𝐬 𝐭𝐡𝐞 𝐄𝐏𝐏 𝐂𝐮𝐫𝐫𝐢𝐜𝐮𝐥𝐮𝐦 𝐢𝐧 𝐭𝐡𝐞 𝐏𝐡𝐢𝐥𝐢𝐩𝐩𝐢𝐧𝐞𝐬:
- Understand the goals and objectives of the Edukasyong Pantahanan at Pangkabuhayan (EPP) curriculum, recognizing its importance in fostering practical life skills and values among students. Students will also be able to identify the key components and subjects covered, such as agriculture, home economics, industrial arts, and information and communication technology.
𝐄𝐱𝐩𝐥𝐚𝐢𝐧 𝐭𝐡𝐞 𝐍𝐚𝐭𝐮𝐫𝐞 𝐚𝐧𝐝 𝐒𝐜𝐨𝐩𝐞 𝐨𝐟 𝐚𝐧 𝐄𝐧𝐭𝐫𝐞𝐩𝐫𝐞𝐧𝐞𝐮𝐫:
-Define entrepreneurship, distinguishing it from general business activities by emphasizing its focus on innovation, risk-taking, and value creation. Students will describe the characteristics and traits of successful entrepreneurs, including their roles and responsibilities, and discuss the broader economic and social impacts of entrepreneurial activities on both local and global scales.
How Barcodes Can Be Leveraged Within Odoo 17Celine George
In this presentation, we will explore how barcodes can be leveraged within Odoo 17 to streamline our manufacturing processes. We will cover the configuration steps, how to utilize barcodes in different manufacturing scenarios, and the overall benefits of implementing this technology.
How to Manage Reception Report in Odoo 17Celine George
A business may deal with both sales and purchases occasionally. They buy things from vendors and then sell them to their customers. Such dealings can be confusing at times. Because multiple clients may inquire about the same product at the same time, after purchasing those products, customers must be assigned to them. Odoo has a tool called Reception Report that can be used to complete this assignment. By enabling this, a reception report comes automatically after confirming a receipt, from which we can assign products to orders.
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
1. Automating and Validating Program Annotations
Mark Grechanik, Kathryn S. McKinley
Department of Computer Sciences
The University of Texas at Austin
Austin, Texas 78712
{gmark, mckinley}@cs.utexas.edu
Dewayne E. Perry
Department of Electrical and Computer Engineering
The University of Texas at Austin
Austin, Texas 78712
perry@ece.utexas.edu
ABSTRACT
Program annotations help to catch errors, improve program under-
standing, and specify invariants. Adding annotations, however, is
often a manual, laborious, tedious, and error prone process espe-
cially when programs are large. We offer a novel approach for
automating a part of this process. Developers first specify an ini-
tial set of annotations for a few variables and types. Our LEarn-
ing ANnnotations (Lean) system combines these annotations with
run-time monitoring, program analysis, and machine-learning ap-
proaches to discover and validate annotations on unannotated vari-
ables. We evaluate our prototype implementation on open-source
software projects and our results suggest that Lean can general-
ize from a small set of annotated variables to annotate many other
variables. Our experiments show that after users annotate approx-
imately 6% of the program variables and types, Lean correctly an-
notates an additional 69% of variables in the best case, 47% on the
average, and 12% in the worst case, taking less than one hour to
run on an application with over 20,000 lines of code.
Categories and Subject Descriptors
D.2.2 [Software Engineering]: Design Tools and Techniques–
Computer-aided software engineering (CASE); D.2.7 [Software
Engineering]: Distribution, Maintenance, and Enhancement; I.2.1
[Artificial Intelligence]: Applications and Expert Systems; I.5.4
[Pattern Recognition]: Applications
General Terms
Experimentation, Documentation, Algorithms
Keywords
Annotations, semantic concepts, machine learning, program analy-
sis, feature diagrams, runtime monitoring
1. INTRODUCTION
Program annotations assert facts about programs. They appear as
comments in the source code as comments or within special lan-
guage statements. An annotation may assert that values of program
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
Submission to XXXXXXXXXX, XXXXXXXX
Copyright 2006 ACM 0-89791-88-6/97/05 ...$5.00.
variables stay within certain ranges. Annotations may be used to
describe program types, values, or identifiers. The importance of
annotations for program checking and verification is emphasized
in the proposal for a verifying compiler [17]. Program annotations
help to catch errors, improve program understanding, recover soft-
ware architecture, and specify invariants.
One of the major uses of program annotations is to help program-
mers understand legacy systems. A Bell Labs study shows that
up to 80% of programmer’s time is spent discovering the mean-
ing of legacy code when trying to evolve it [10]. Thus, the extra
work required to annotate programs is likely to reduce develop-
ment and maintenance time, as well as to improve software quality.
However, annotating programs is often a manual, tedious, and error
prone process especially for large programs. Although some pro-
gramming languages (e.g., C#, Java) have support for annotations,
many programmers do not annotate their code at all, or at least in-
sufficiently. A fundamental question for creating more robust and
extensible software is how to annotate program source code with a
high degree of automation and precision.
In this work, we focus on deriving semantic concepts for unan-
notated variables from an initial set of annotated variables. Simply
stated, semantic concept annotations are nouns with well-accepted
meanings in public or domain-specific knowledge. For example,
the noun Address is a semantic concept meaning a place where
a person or an institution is located. Programmers may introduce
variables named Address, Add, or S[1], all for the Address
concept 1. The name of the variable S[1] does not match Address,
and relating this variable to the Address concept is challenging
because of the lack of information that helps programmers to iden-
tify this relation. While the variable named Add partially matches
Address, it is ambiguous if the program also uses a Summation
concept for adding numbers.
Our solution, called LEarning ANnotations (Lean), combines
program analysis, run-time monitoring, and machine learning to
automatically apply a small set of initial semantic annotations to
additional unannotated types and variables. The input to Lean is
program source code and a concept diagram describing relations
between semantic concepts. Concept diagrams are graphs whose
nodes represent semantic concepts and edges represent relation-
ships between these concepts. The core idea of Lean is that af-
ter programmers provide a few initial annotations of some vari-
ables and types with these semantic concepts, the system will glean
enough information from these annotations to annotate much of the
rest of the program automatically. We define annotation rules that
guide the assignment of semantic concepts to program variables
and types, and resolve conflicts when they arise.
1These names are taken from open-source program Vehicle Main-
tenance Tracker whose code fragments are shown in Figure 1.
2. Lean works as follows. After programmers specify initial anno-
tations, Lean instruments a program to perform run-time monitor-
ing of program variables. Lean executes this program and collects
a profile of the values of the instrumented variables. Lean uses this
profile to train its learners to identify variables with similar pro-
files. Lean’s learners then classify the rest of program variables by
matching them with semantic concept annotations. Once a match
is determined for a variable, Lean annotates it with the matching
semantic concept.
Annotating 100% of variables automatically is not realistic. Many
reasons exist: machine learning approaches do not guarantee ab-
solute success in solving problems; concept diagrams representing
program design specifications may not match programs; and some
concepts may be difficult to relate to program variables due to lack
of modularity. Consequently, Lean makes mistakes when learning
annotations. In order to improve its precision, Lean uses program
analysis to determine relations among annotated variables. Then
Lean compares these relations with corresponding relations in con-
cept diagrams. If a relation is present between two variables in
the program code and there is no relation between concepts with
which these variables are annotated, then Lean flags it as a possible
annotation error.
We evaluate our approach on open-source software projects and
obtain results that suggest it is effective. Our results show that af-
ter users annotate approximately 6% of the program variables and
types, Lean correctly annotates an additional 69% of variables in
the best case, 47% in the average, and 12% in the worst case, tak-
ing less than one hour to run on an application with over 20,000
lines of code.
2. A MOTIVATING EXAMPLE
The Vehicle Maintenance Tracker (VMT) is an open source Java ap-
plication that records the maintenance of vehicles [4]. Fragments of
the VMT code from three different files are shown in Figures 1(a)–
1(c), and a concept diagram is shown in Figure 1(d).
A fragment of code from the file vendors.java shown in
Figure 1(a) contains the declaration of the class vendors whose
member variables of type String are Name, Add, Pho, Email,
and Web. These variables stand for the vendor’s name, address,
phone number, email, and web site concepts respectively. A frag-
ment of the code from the file VendorEdit.java shown in
Figure 1(b) contains the declaration of the class VendorEdit
whose member variables of types Text and TextArea represent
the same concepts. Even though the names of these variables in
the class VendorEdit are different from the names of the cor-
responding variables in the class vendors, these names partially
match. For example, the variable name Pho in the class vendors
matches the variable PhoneText in the class VendorEdit more
than any other variable of this class when counting the number of
consecutive matching letters.
This informal matching procedure does not work for the frag-
ment of code shown in Figure 1(c). To what semantic concept does
the variable S, which is the parameter to the method addMaintenanceEditor,
correspond? It turns out that the variable S is an array of Strings,
and its elements S[1], S[2], S[3], S[4], and S[5] hold val-
ues of vendor’s identifier, address, email, phone number, and web
site concepts respectively. No VMT documentation mentions this
information, and programmers have to run the program and observe
the values of these variables in order to discover their meanings.
Lean can automate the process of annotating classes and vari-
ables shown in Figures 1(a)– 1(c) with concepts from the diagram
shown in Figure 1(d). This diagram is a graph whose nodes desig-
nate semantic concepts and edges specify relations between these
public class vendors {
private String Name, Add, Pho, Email, Web;
…………….
}
(a) File vendors.java.
public class VendorEdit extends InternalFrame {
private Text NameText;
private TextArea AddressText;
private TextArea PhoneText;
private Text EmailText;
private Text WebText; }
(b) File VendorEdit.java.
public void addMaintenanceEditor(String[] S) {
addMaintenanceServices(new String[]{
((MaintenanceEdit)Desktop.getSelectedFrame()).
getName(), S[4], S[5]});
}
};
String s = S[1];
if (s.equalsIgnoreCase(""))
s = "New";
String residence = S[3];
(c) File VMT.java.
Address Phone Email
Vendor
Name WebSite
(d) A concept diagram for the VMT application.
Figure 1: Code fragments from programs of the VMT project
and its concept diagram.
concepts. We observe that the spellings of some variable names
are similar to the names of corresponding concepts, i.e., Pho –
Phone, Add – Address, Web – WebSite, Name – Name, and
Email – Email. Lean uses these similarities to match names of
variables and concepts, and subsequently to annotate variables and
types with matching semantic concepts.
Variable names residence and Address are spelled differ-
ently, but they are synonyms. Extended with a vocabulary linking
synonymic words, Lean hypothesizes about similarities between
words that are spelled differently but have the same meaning. These
vocabularies can link domain-specific concepts used by different
programmers thereby establishing common meanings for different
programs.
By observing patterns in values of program variables Lean can
also determine whether they should be annotated with certain con-
cepts. To observe patterns, Lean instruments source code to collect
run-time values of the program variables. After running the instru-
mented program, Lean creates a table containing sample data for
each variable. A sample table for the VMT application is shown in
Table 1. Each column in this table contains variable name and val-
ues it held. Some values have distinct structures. The variable Pho
contains only numbers and dashes in the format xxx-xxx-xxxx,
where x stands for a digit and the dash is a separator. Values held
3. by the variable Email have a distinct structure with the @ symbol
and dots used as separators. Lean learns the structures of values
for annotated variables using machine-learning algorithms, and it
then assigns the appropriate semantic concepts to variables whose
values match the learnt structures.
Email Pho Add
tc@abc.com 512-342-8434 Tamara Circle, Austin
mcn@jump.net 512-232-3432 McNeil Drive, Austin
sims@su.edu 512-232-6453 Sims Road, Dallas
Table 1: Values of some variables from the VMT program.
We also observe that if concepts are related in a diagram, then
types and variables that are annotated with these concepts are re-
lated in the code too. The relation between concepts in a diagram
means that instances of data described by these concepts are linked
in some way. For example, the concept Name is related in the con-
cept Vendor, in the concept diagram shown in Figure 1(d). This
relation can be expressed as “Vendor has a Name.” The variable
Name which is annotated with the concept Name is defined in the
class vendors which is annotated with the concept Vendor. The
containment relation in the source code corresponds to the “has-
a” relation in the concept diagram. Lean explores program source
code to analyze relations between program variables and types, and
then compares them with relations among corresponding concepts
in diagrams in order to infer and validate annotations. We explain
this process in Section 6.
3. A PROBLEM STATEMENT
This section defines the problem of automating and validating pro-
gram annotations formally. We define the structure of concept di-
agrams and types of relations between concepts in diagrams and
variables and types in program code. These relations are used in
our algorithm for inferring and validating annotations. Then, we
present rules for annotating program variables and types, and give
a formal definition of the problem.
3.1 Definitions
Feature modeling is a technique for modeling software with con-
cept diagrams, where a feature is an end-user-visible characteris-
tic of a system describing a unit of functionality relevant to some
stakeholder [5, 9]. For example, for the code shown in Figure 1
features are Vendor, Name, Email, Phone, Website, and
Address. Concept diagrams used in feature modeling are called
feature diagrams (FD). Feature diagrams are graphs whose nodes
designate features and edges (also called variation points) specify
how two features are attached to each other.
Three basic types of diagrams are shown in Figure 2. Features f1
and f2 are optional if one of them or both or none can be attached
to the base feature P. Mandatory are features f3 and f4 since both
of them must be attached to the base feature P. Finally, features
f5 and f6 are alternative if either f5 or f6 are attached to the base
feature P.
We view the process of annotating program variables and types
with semantic concepts as mapping these variables and types to
concepts. We limit our approach to mapping program variables
and types to features; we do not consider mapping fragments of
code or selected statements or lines of code to features. The latter
is important in certain cases, and it is the subject of our future work.
We formalize the mapping between features and program vari-
ables and types in the definition of the Annotation Function.
P
f1
f2
P P
f3 f4 f5 f6
Optional Mandatory Alternative
Figure 2: Basic types of feature diagrams.
DEFINITION 1 (ANNOTATION FUNCTION). The annotation
function αo : o → 2F maps a program object (variable), o, to a sub-
set of features, F, in a feature diagram, and the annotation function
ατ : τ → 2F maps a type (basic type, class, interface, or method),
τ, to a subset of features, F.
The dependency between a type τ and an object o is expressed by
the function Type(o) = τ, which maps the object o to its type τ ∈ T,
where T is the set of types. Without the loss of generality we use
the annotation function α : π → 2F that maps a program entity π
(i.e., variable or type) to a subset of features F in some feature
diagram, where program entity π ∈ {o,τ}.
Relations between program entities are used to create and vali-
date mappings between these entities and features. These relations
are given in Definitions 2 – 10.
DEFINITION 2 (EXPRESSION). The n-ary relation Expression
⊆ v1,...vn specifies a program expression, where v1,...vn are vari-
ables or methods used in this expression.
DEFINITION 3 (NAVIGATE). The navigation relation
Navigate(p,q) ⊆ Expression(p, q) is the expression p.q
where q is a member (e.g., field or method) of object p.
DEFINITION 4 (ASSIGN). The assignment relation Assign
⊆ p × Expression specifies the assignment expression p =
Expression, where p is the variable on the left-hand side and
the Expression relation stands for some expression on the right-
hand side.
DEFINITION 5 (CAST). The cast relation Cast(p, q) ⊆
Expression(p, q) is the cast expression (q)p where p ∈ o
and q ∈ T i.e., casting an object p to some type q.
DEFINITION 6 (SUBTYPE). The relation SubType(p, q)
specifies that a type p is a subtype of some type q. In Java this
function is implemented via the implement interface clause. That
is, a class p that implements some interface q is related to this
interface via the SubType relation.
DEFINITION 7 (INHERIT). The Inherit(p, q) relation
specifies that a class p inherits (or extends in Java terminology)
some class q.
DEFINITION 8 (CONTAINS). The Contains(p, q) rela-
tion specifies that program entity q is contained, or defined within
the scope of some type p.
We call interfaces, classes, and methods containment types because
they contain other types, fields and variables as part of their defini-
tions. That is, interfaces contain method declarations, classes con-
tain definitions of fields and methods, and methods contain uses of
fields and declarations and uses of local variables.
4. DEFINITION 9 (δ–RELATION). The relation δ(πk, πn) stands
for “programming entity πk is related to the programming entity πn
via the Expression, Navigate, Assign, or Cast relations.”
This relation is irreflexive, antisymmetric, and transitive. For ex-
ample, from the expression x = y + z we can build four δ rela-
tions: δ(x, y), δ(x, z), δ(y, z), and δ(z, y). If vari-
ables are used in the same expression and their values are not
changed after this expression is evaluated, then their order is not
relevant in the δ–relation. However, if the value of a variable is
changed as a result of the evaluation of the expression, then this
variable is the first component of the corresponding δ–relation.
DEFINITION 10 (γ–RELATION). The relation γ( fp, fq)
stands for “feature fq is connected to feature fp via a variation
point.”
This relation is irreflexive, antisymmetric, and transitive. For ex-
ample, from the mandatory feature diagram shown in Figure 2, we
can build two γ relations: γ(P, f3) and γ(P, f4). If a feature
fr is attached to a feature fs, then feature fs is the first component
of the corresponding γ relation.
The annotation function α maps pairs from the relation δ to pairs
from the relation γ. Suppose that (a, b) ∈ δ and (c, d) ∈ γ.
Then the element (a, b) is annotated with the element (c, d)
if and only if c ∈ α(a) and d ∈ α(b). As a shorthand we write (c,
d) ∈ α((a, b)).
3.2 Rules of Program Annotations
When programmers map types (i.e., interfaces, classes, and meth-
ods) and their variables (i.e., fields and objects) to features, these
mappings may conflict with one another. For example, if a class
is mapped to one feature and it implements an interface that is
mapped to a different feature, then what default mapping would
be valid for an instance of this class? This section offers heuristic
rules to resolve ambiguities when annotating programs.
Direct mapping: γ ∈ α(δ) expresses the fact that for a δ–relation
between program entities that are annotated with feature labels
there is a corresponding γ–relation between features that annotate
these program entities.
Instance mapping: instances of a type that is mapped to some
set of features are automatically annotated with these feature labels.
We write this rule as (Type(o) = τ ∧ f ∈ α(τ)) ⇒ f ∈ α(o).
Member annotation: If the container type is mapped to some
feature, then all of its members are automatically mapped to the
same feature, i.e., (Contain(p, q) ∧ f ∈ α(p)) ⇒ f ∈ α(q).
Expression annotation: if variables in an expression defined
by the relation Expression( ⊆ v1,...vn) are annotated with
some set of features, that is, without the loss of generality f1 ∈
α(v1),..., fn ∈ α(vn), then Lean annotates this expression with a
set of features as f1 ∪ f2 ∪...∪ fn.
Assignment annotation: Given the relation Assign(p, expr),
the expression expr is annotated with a set of features f, then the
variable p is annotated with the same set of features. The con-
verse holds true, i.e., Assign(p, expr) ∧ (f ∈ α(p) ⇔ f
∈ α(expr)). For example, the variable s in the fragment of
code shown in Figure 1(c) is assigned the value of the variable
S[2], which is mapped to the concepts Address. According to
the assignment annotation rule, the variable s maps to the concept
Address.
Cast annotation: casting an object p to some type q automati-
cally remaps this object p to the feature to which this casting type
is mapped. If the type q is not mapped to any feature, then the orig-
inal mapping of the object p is not changed. That is, (Cast(p,
q) ∧ f ∈ α(q)) ⇒ f ∈ α(p) and (Cast(p, q) ∧ α(q) =
⊘ ∧ f ∈ α(p)) ⇒ f ∈ α(p).
Sometimes a default mapping should be overwritten. For exam-
ple, a class may be mapped to one feature, but its instances should
be mapped to some other features. The following rule handles this
condition.
Instance overriding: default feature labels assigned to an in-
stance by the instance mapping rule can be overwritten: (Type(o)
= τ ∧ f ∈ α(τ) ∧ Overwrite(o, g)) ⇒ (f ∈ α(τ) ∧ g
∈ α(o)).
Containment: if an object q is a member of type p that is anno-
tated with some feature label f, then the object q is also annotated
with the feature label f: (Contain(p, q) ∧ f ∈ α(p)) ⇒ f
∈ α(q).
Member overriding: the mappings for members of the contain-
ment types can be overwritten: (Contain(p, q) ∧ f ∈ α(p)
∧ Overwrite(q, g)) ⇒ (f ∈ α(p) ∧ g ∈ α(q)).
Precedence: if the containment type is mapped to one feature
and the type of its member variable is mapped to a different fea-
ture, then this variable is mapped to the same feature to which its
type is mapped: (Contain(p, o) ∧ f ∈ α(p) ∧ g ∈ α(q) ∧
Type(o) = q) ⇒ g ∈ α(o).
Interface: when programmers map interfaces to features, these
mappings are preserved in classes that implement mapped inter-
faces: (Subtype(p, q) ∧ f ∈ α(q) ∧ Contains(q, z))
⇒ (f ∈ α(z) ∧ Contains(p, z)). That is, if an interface is
mapped to some features and is implemented by some class that is
mapped to different features, then the interface fields and methods
implemented by this class remain mapped to the interface features.
Inheritance: if a class extends some other class that is mapped
to some feature, then the extended class is automatically mapped
to the same feature: (Inherit(p, q) ∧ f ∈ α(q) ∧ α(p) =
⊘) ⇒ f ∈ α(p). This rule is dictated by the Java idiom of in-
heritance stating that a class may implement many interfaces, but
it can extend only one class. The extended class can be explic-
itly remapped to a different feature without affecting the mapping
defined for the parent class.
3.3 The Formal Problem Statement
When programmers specify an initial mapping between program
entities and features, they define a partial function α0 over the do-
main of program entities π0 ∈ π and the range of features f0 ⊆ f.
The goal of Lean is to compute the partial function α1 over the
subset of the domain of program entities π1 ⊆ π and the range of
features f1 ⊆ f abiding by the rules specified in Section 3.2, such
that π0 ⊆ π1 and f0 ⊆ f1 and ∀ (a, b) ∈ δ, a,b ∈ π; c,d ∈ F,
such that for c ∈ α(a) ∧ d ∈ α(b) | ∃ γ ∈ (c, d). That is,
if program entities a and b are annotated with feature labels c and
d respectively, and these entities are related to each other via some
δ–relation, then the features labeled c and d in some FD should be
related via some γ–relation.
4. LEAN ARCHITECTURE AND PROCESS
The architecture for Lean and its process description are shown in
Figure 3. The main elements of the Lean architecture are the Map-
per, the Learner, and the Validator shown in Figure 3(a). Solid
arrows show the process of annotating program entities with fea-
ture labels, and dashed arrows specify the process of training the
Learner.
The inputs to the system are program source code and an FD. The
Mapper is a Graphic User Interface (GUI) tool whose components
are Java and XML parsers, program and FD analysis routines, and
an instrumenter. A Java parser produces a tree representing pro-
5. Source
Code
Feature
Diagram
(FD)
Program Data
Table (PDT)
Learner
Validator
Learnt
Annotations
(LA)
DSD
Annotating Mode Training Mode
1
1
Instrumented
Program Code
2 3
4
5 6
7
8
8
8
9
Annotations
and
Relations
Mapper
(a) Lean architecture.
1) Programmers annotate the Source Code with
concepts from the FD using the Mapper
2) The Mapper expands and outputs initial
mappings and and relations
3) The Validator validates initial mappings
4) The Validator outputs annotations for training
5) The Mapper instruments and compiles
the source code
6) The program runs and its instrumented code
outputs the Program Data Table (PDT)
7) Annotated variables and their values from
the PDT are supplied to the Learner for training
8) Learner classifies program variables from the
PDT and produces learnt annotations (LA) with
the help of Domain-Specific Dictionary (DSD)
9) The Validator validates LAs and uses negative
examples to retrain the Learner
(b) Lean process description
Figure 3: Lean Architecture and its process.
gram entities. FDs are represented in XML format, and the Map-
per uses an XML parser to produce a tree representing features and
variation points in the FD. For detailed explanation of architecture
components and formats, see our technical report [16].
Programmers use the Mapper GUI to specify initial mappings
between features from the FD and program entities from the source
code. The Mapper GUI presents both the FD and the programming
entities using tree widgets. The Mapper generates initial annota-
tions from specified mappings in the XML format. Each variable
is described by its access path, and features are specified by their
names. For example, if a variable named var is declared in the
method M of the class C which is defined in the package P, then the
access path to this variable is P.C.M.var.
The user can also specify what entities should be excluded from
the annotation process. For example, using Lean to annotate an
integer variable counting the number of iterations in a loop con-
sumes computing resources while there may not be an appropriate
feature for annotating this variable, or annotating it does not war-
rant the amount of work required. Variables whose values contain
binary (nonprintable) characters may also be excluded since some
machine learning algorithms may not classify these variables cor-
rectly.
The Mapper builds relations defined in Section 3.1 using the
Mapper’s program analysis routines. Then the Mapper’s rule en-
gine analyzes these relations and initial annotation links, and ex-
pands these initial annotations to other program entities using the
rules from Section 3.2. The Mapper outputs user annotations and
relations in the XML file, which is then passed to the Validator.
The Validator checks the correctness of annotations by explor-
ing relations between variables in the source code and features in
the FD. Recall the direct mapping rule stating that for a δ–relation
between annotated entities in the source code there is a γ–relation
between the features with whose labels these program entities are
annotated. The output of the Validator is a list of rejected annota-
tions which the Learner may use to improve its predictive capabili-
ties.
The Mapper instruments the source code to record run-time val-
ues of program variables. Runtime logging is added after the state-
ments and expressions in which the monitored variables are as-
signed values. Lean’s data flow analysis framework locates vari-
able definitions and traces the uses of these variables until either
the end of the scope for the definitions, or the definition of new
variables that shadow previous definitions. Only distinct values of
the monitored variables are collected. Values of annotated variables
are used to train the Learner, which uses the learnt information to
classify unannotated variables.
After instrumenting the source code, the Mapper calls a Java
compiler to produce an executable program. Then the program
runs storing names and the values of program variables in the Pro-
gram Data Table (PDT). Both the Validator and the runtime logger
code output PDT in the Attribute Relation File Format (ARFF) file
format. ARFF serves as an input to the Learner, which is based on
a machine-learning Java-based open source system WEKA [25].
Once the Learner is trained, it classifies unannotated program vari-
ables that are supplied to the Learner as the columns of the PDT.
Lean classifies unannotated variables by obtaining their runtime
values and their names, and supplying them to the Learner which
emits predictions for feature labels with which these variables should
be annotated. In addition, domain-specific dictionaries (DSDs) in-
crease the precision of the classification. The output of the Learner
is a set of learnt annotations (LAs). These LAs are sent to the
Validator to check their correctness. The Validator sends corrected
annotations to the Learner for training to improve its accuracy. This
continuing process of annotating, validating annotations, and learn-
ing from the validated annotations makes Lean effective for long-
term evolution and maintenance of software systems.
Lean produces Java annotations using the annotation type facility
of Java 1.5. An example of a Java annotation type declaration and
its use is shown in Figure 4. In Java, annotation type declarations
are similar to interface declarations [2]. An @ sign precedes the
interface keyword, which is followed by the Concept anno-
tation type name that defines fields FD and Label, for the name of
feature diagram and a feature label respectively. Annotations con-
sist of the @ sign followed by the annotation type Concept and a
parenthesized list of element-value pairs.
5. LEARNING ANNOTATIONS
This section shows how Lean learns and validates annotations. We
explain the learning approach and describe the learners used in
Lean. Then we show how to extend the Learner to adapt to dif-
6. public @interface Concept {
String FD;
String Label;
}
@Concept(FD="VMT", Label="Vendor")
public class vendors {...}
Figure 4: Example of the declaration of a Java annotation type
and its use by Lean.
ferent domains.
5.1 The Learning Approach
We treat the automation of the program annotation process as a
classification problem: given n features and a program variable,
which feature matches this variable the best? Statistical measures
of matching between variables and features are probabilistic. The
Learner classifies program entities with the probabilities that cer-
tain feature labels can be assigned to them. A Lean classifier is
trained to classify an unannotated variable based on the informa-
tion learnt from the annotated variables.
Lean has its roots in the Learning Source Descriptions (LSD)
system developed at the University of Washington for reconciling
schemas of disparate XML data sources [12, 13]. The purpose of
LSD is to learn one-to-one correspondences between elements in
XML schemas. Lean employs the LSD multistrategy learning ap-
proach [19, 13], which organizes multiple learners in layers. The
learners located at the bottom layer are called base learners, and
their predictions are combined by metalearners located at the up-
per layers.
In the multistrategy learning approach, each base learner issues
predictions that a program variable matches each feature with some
probability. A metalearner combines these predictions by multiply-
ing these probabilities by weights assigned to each base learner and
taking the average for the products for the corresponding labels of
the predictions for the same program variable.
The Lean learning algorithm consists of two phases: the training
phase and the annotating (classifying) phase. The training phase
improves the ability of the learners to predict correct annotations
for program variables. Trained learners classify program variables
with feature labels, and based on these classifications, Lean anno-
tates programs. The accuracy of the classification process depends
upon successful training of the Learner.
During the training phase, weights of the base learners are ad-
justed and probabilities are computed for each learner using the
runtime data for annotated variables. Then, during the classifica-
tion step the previously computed weights are used to predict fea-
ture labels for unannotated variables.
Each base leaner is assigned a weight (a real number between
0 and 1) that characterizes its accuracy in predicting annotations
of program variables. Initially, all weights are the same. For each
classified program entity the weights of the learner are modified
to minimize the squared error between the predefined value (i.e.,
1 if the prediction is correct, or 0 otherwise) and the computed
probability multiplied by the weight assigned to the learner. This
approach is called regression [20].
5.2 The Learners
There are three types of base learners used in Lean: a name matcher,
a content matcher, and a Bayes learner [12, 20]. Even though many
different types of learners can be used with the multistrategy learn-
ing approach, we limit our study to these three types of learners
since they proved to give good results when used in LSD.
Name matchers match the names of program entities with feature
labels. The name matching is based on Whirl, a text classification
algorithm based on the nearest-neighbor class [8]. This algorithm
computes the similarity distance between the name of a program
entity and a feature label. This distance should be within some
threshold whose value is determined experimentally.
Whirl-based name matchers work well for meaningful names es-
pecially if large parts of them coincide. They do not perform well
when names are meaningless or consist of combinations of num-
bers, digits, and some special characters (e.g., underscore or caret).
Content matchers work on the same principles and use the same
algorithm (Whirl) as name matchers. The difference is that con-
tent matchers operate on the values of variables rather than their
names. Content matchers work especially well on string variables
that contain long textual elements, and they perform poorly on bi-
nary (nonprintable) and numeric types of variables.
Finally, Bayes learners, particularly the Naı̈ve Bayes classifier,
are among the most practical and competitive approaches to clas-
sification problems [20]. Naı̈ve Bayes classifiers are studied ex-
tensively [14, 20], so we only state what they do in the context of
the problem that we are solving here. For each program variable
the Naı̈ve Bayes classifier assigns some feature label fk, such that
the probability that that this variable belongs to the feature fk, is
maximized.
5.3 Extending the Learner
Domains use special terminologies whose dictionary words mean
specific things. Programmers use domain dictionaries to name vari-
ables and types in programs written for these domains. For exam-
ple, when word “wafer” is encountered in a value of some variable
of a program written for a semiconductor domain, this variable may
be annotated with the wafer concept. Many domains have dictio-
naries listing special words, their synonyms, and explaining their
meanings.
Lean incorporates the knowledge supplied by these dictionar-
ies. Each concept in these dictionaries has a number of words that
are characteristic of this concept. If a word from the dictionary is
encountered in a value or the name of a variable, then this vari-
able may be classified and subsequently annotated by some con-
cept. We use a simple heuristic to change the probabilities that
variables should be annotated with certain feature labels. If no dic-
tionary word is encountered among the name or the values of a
variable, then its probabilities computed by Lean learners for this
variable remain unchanged. Otherwise, if a word belongs to some
concept, then the probability that the given variable, v, belongs to
this concept is incremented by some small real number ∆p, i.e.,
pconcept(v) = pconcept(v)+∆p. We choose this number experimen-
tally as 1/W, where W is the number of words in the DSD. If the
resulting probability is greater than 1.0 after adding ∆p, then the
probability remains 1.0.
6. VALIDATING ANNOTATIONS
This section describes how we infer and validate program annota-
tions using program analysis. First, we state the rationale for in-
ferring and validating annotations. Then, we give the algorithm for
inferring annotations for program variables that are components of
δ–relations. Finally, we show how to validate program annotations.
6.1 The Rationale
Lean cannot annotate all variables due to a number of factors. Ma-
chine learning approaches are only as good as the training data,
and they do not guarantee 100% classification accuracy. Some vari-
7. ables cannot be classified because they assume hard-to-analyze val-
ues. Examples are variables whose values are binary (nonprintable)
strings. It is difficult to train classifiers on such variables since pat-
terns in binary data are inherently complex. For these and other
reasons Lean may assign feature labels to variables incorrectly.
We want to explore the direct mapping rule defined in Section
3.2 to infer annotations for partially annotated δ–relations (that
contain annotated and unannotated components) in order to in-
crease the annotation coverage. Further, we use this rule to validate
the correctness of annotations for fully annotated δ–relations. This
is accomplished by the InferAnnotation and Validate al-
gorithms described below.
6.2 The Inference Algorithm
The algorithm InferAnnotations for inferring annotations is given
in Algorithm 1. Its input is the set of δ–relations and mappings α.
The algorithm iterates through all δ–relations in the main for-loop
to find partially annotated ones. Then, for each found δ–relation
the annotation function is applied to the annotated component to
obtain the set of features with which this component is annotated.
Then, for each feature in this set all γ–relations are found whose
components match this feature. The other components of the ob-
tained γ–relations make it into the set of possible annotations for
the unannotated component of the δ–relation. The last if condi-
tion in the algorithm deals with the same variable used in two or
more expressions in the same scope. Since Lean may annotate this
variable differently, an intersection is taken of the feature sets with
which the uses of this variable are annotated to reject incorrect an-
notations.
Algorithm 1 The InferAnnotations procedure
InferAnnotations( δ, α )
for all (a, b) ∈ δ do
α(a) 7→ fk
α(b) 7→ fm
if fk = ⊘ then
AddFeatureLabel( fk, fm )
else if fm = ⊘ then
AddFeatureLabel( fm, fk )
end if
if ∃ (u, v) ∈ δ s.t. u = a ∨ v = a then
if u = a then
α(u) 7→ fs
α(u) 7→ fs ∩ fk
else if v = a then
α(v) 7→ fs
α(v) 7→ fs ∩ fk
end if
α(a) 7→ fs ∩ fk
end if
end for
AddFeatureLabel( fk, fm )
for all (c, d) ∈ γ do
if d ∈ fm then
fk 7→ fk∪ c
else if c ∈ fm then
fk 7→ fk∪ d
end if
end for
6.3 The Validation Algorithm
The Validate algorithm shown in Algorithm 2 checks for the cor-
rectness of assigned annotations. The key criteria is expressed by
the direct mapping rule. The input to this algorithm is the set of
δ–relations, γ–relations, and mappings α. Each δ–relation has a
color associated with it, which is initially set to red. The red
color means that a given δ–relation is not correctly annotated, and
the green color means that all components of a given δ–relation
are either annotated correctly, or not annotated at all. For this pro-
cedure to work, initial labeling should be valid.
Algorithm 2 The validation procedure
Validate( δ, γ, α )
for all (a, b) ∈ δ do
color((a, b)) 7→ red
α(a) 7→ fk
α(b) 7→ fm
if fk 6= ⊘∧ fm 6= ⊘ then
for all (c, d) ∈ γ do
if (c ∈ fm∧ d ∈ fk) ∨ (c ∈ fk∧ d ∈ fm) then
color((a, b)) 7→ green
break
end if
end for
else if c ∈ fm then
color((a, b)) 7→ green
end if
end for
for all (a, b) ∈ δ do
if color((a, b)) 6= green then
print error
end if
end for
The outer for-loop iterates through all δ–relation, and annota-
tions fk and fm for components of each relation are obtained. If
one or both components of a given δ–relation are not annotated,
then this relation is colored green. Otherwise, the inner for-
loop searches through γ–relations to find one whose components
are members of fk or fm annotation sets. If such a γ–relation exists,
then the corresponding δ–relation is colored green and the inner
loop is exited. Otherwise, if the inner loop exits without finding a
γ–relation whose components are members of fk or fm annotation
sets, then the δ–relation is red and may not be valid.
This algorithm does not specify what the correct annotation of a
program variable is or what caused the error in program annotation.
In fact, annotation errors many be caused by incorrect feature dia-
gram, errors in program source code, or both. The last for-loop
iterates through all δ–relations, checks the colors, and prints red
relations as potentially incorrect.
6.4 The Computational Complexity
Suppose a program has n variables and a feature diagram has m
features. Then it is possible to build
n(n−1)
2 δ–relations and
m(m−1)
2
γ-relations. Thus, the space complexity is O(n2 + m2). The time
complexity is determined by two for–loops in the Validate
and InferAnnotations algorithms. The external for–loops
iterate over all δ–relations and the internal for–loops iterate over
all γ–relations. Considering all other operations in the algorithms
taking O(1), the time complexity is O(n2m2).
7. THE PROTOTYPE IMPLEMENTATION
8. Our prototype implementation included the Mapper, the Validator,
and domain-specific dictionaries. The Mapper is a GUI tool written
in C++ that includes the EDG Java front end [1] and an MS XML
parser. The Mapper contains less than 2,000 lines of code. Its pro-
gram analysis routines recover relations between program entities
as specified in Section 3.1, and expand initial annotations to unan-
notated program entities using the rules specified in Section 3.2.
The Mapper contains the instrumenter routine that adds the logging
code to the original program outputting runtime values of variables
into the PDT in ARFF format. At this point, ARFF files are input
into WEKA to train Learners and classify unannotated variables.
The Validator is written in C++ and contains less than 1,000 lines
of code. Its routines implement the InferAnnotations and
Validate algorithms as described in Section 6. The Validator
takes its input in XML format and outputs a PDT in ARFF format.
The input XML file contains annotations specified by users and
expanded by the Mapper, along with excluded program entities and
δ and γ relations. The output ARFF file contains variable names
and concepts assigned to them.
8. EXPERIMENTAL EVALUATION
In this section we describe the methodology and provide the results
of experimental evaluation of Lean on open-source Java programs.
8.1 Subject Programs
We experiment with a total of seven Java programs that belong to
different domains. MegaMek is a networked Java clone of Battle-
Tech, a turn-based sci-fi boardgame for two or more players. PMD
is a Java source code analyzer which, among other things, finds
unused variables and empty catch blocks. FreeCol is an open
version of a Civilization game in which players conquer new
worlds. Jetty is an Open Source HTTP Server and Servlet con-
tainer. The Vehicle Maintenance Tracker (VMT) tracks
the maintenance of vehicles. The Animal Shelter Manager
(AMS) is an application for animal sanctuaries and shelters that in-
cludes document generation, full reporting, charts, internet publish-
ing, pet search engine, and web interface. Finally, Integrated
Hospital Information System (IHIS) is a program for
maintaining health information records.
8.2 Methodology
To evaluate Lean, we carried out two experiments to explore how
effectively Lean annotates programs and how its training affects the
accuracy of predicting annotations.
In the first experiment, we create a DSD and an FD for each sub-
ject program. Then we annotate a small subset of variables for each
program manually using the Mapper, and then run Lean to annotate
the rest of the program. The goal of this experiment is to determine
how effective Lean is in annotating variables for programs of dif-
ferent sizes that belong to different domains. Each annotation ex-
periment is run with and without a DSD in order to study the effect
of the presence of DSDs on the quality of Lean annotations.
We measure the number of variables annotated by Lean as well
as the number of annotations rejected by the validating algorithm.
The number of variables that Lean can possibly annotate, vars, is
vars = total - (excluded + initial), where total
is the total number of variables in a program, excluded is the
number of variables excluded from the annotation process by the
user, and initial is the number of variables annotated by the
user. Lean’s accuracy ratio is computed as accuracy = (vars
- rejected)/vars, where rejected is the number of anno-
tations produced by Lean and rejected by the validating algorithm.
The goal of the second experiment is to evaluate the effect of
training on the Lean’s classification accuracy. Specifically, it is im-
portant to see the amount of training involved to increase the accu-
racy of annotating programs. Training the Learner is accomplished
by running instrumented programs on input data. Each training run
is done with a distinct input data set. Depending on the number
of training runs Lean can achieve certain accuracy in classifying
data on which it was not trained. If the Learner should be trained
continuously to maintain even low accuracy, then performance-
demanding applications may be exempt from our approach. On
the contrary, if a program should run a reasonable number of times
with distinct data sets for training to achieve good classification
accuracy, then our approach is practical and can be used in the in-
dustrial setting.
8.3 Results
Table 2 contains results of the experimental evaluation of Lean
on the subject programs. Its columns contain the name of a pro-
gram, the size of the DSD, the number of lines of code in the
subject, the number of features in the FD, the number of vari-
ables that Lean could potentially annotate, the Lean running time
in minutes, the percentage of initial annotations computed as ratio
initial/total, where total is the total number of variables
in a program, and initial is the number of variables annotated
by users. The next two columns compare the percentage of total an-
notations without and with the DSD. The last column of this table
shows Lean’s accuracy when used with DSDs.
The highest accuracy is achieved with programs that access and
manipulate domain-specific data rather than general information
without a strong influence of any domain terminology. The low-
est level of accuracy was with the program PMD which analyzes
Java programs whose code does not use terminologies from any
specific domain. The highest level of accuracy was achieved with
the programs ASM and VMT which are written for specific do-
mains with well-defined terminologies, and whose variable names
are relatively easy to interpret and classify.
The next experiment evaluates the accuracy of the Learner. For
each subject application we collected up to 600 distinct input data
sets. We trained the Learner for each subject applications on the
subset of the input data, and used Lean to annotate program vari-
ables using the rest of the input data. Figure 5 shows the de-
pendency of classification accuracy from the number of distinct
training samples used to train the Learner. When annotating the
AMS application, the Learner achieved the highest accuracy, close
to 90%. This accuracy was achieved when the number of distinct
training samples reached 500. The results of this experiment show
that applications need to be run only few hundred times with dis-
tinct input data in order to train the Learner to achieve good ac-
curacy. Since most applications are run at least several thousand
times during their testing, using Lean as a part of application test-
ing to annotate program source code is practical. Potentially, if a
low-cost mechanism [6] is applied to collect training samples over
the life time of applications, then Lean can maintain and evolve
program annotations with evolving programs.
Finally, we used the Learner trained for the VMT application
to annotate variables in other applications. This methodology is
called true-advice versus self-advice which uses the same program
for training and evaluation. Figure 6 shows the percentage of vari-
ables that the Lean Learner annotates with self-advise (left bar)
versus the true-advice annotations (right bar) when the Learner is
trained on the VMT application. This experiment shows that Lean
can be trained on one application and used to annotate other pro-
grams if they operate on the same domain-specific concepts. ASM
and IHIS share common concepts with the VMT application, and it
9. Program Size of Lines Number of Num of Running User Lean annots Lean annots Accu–
DSD, words of code features vars Time, min annots, % w/o DSD with DSD racy, %
Megamek 60 23,782 25 328 56 10% 58% 64% 64%
PMD 20 3,419 12 176 28 7.4% 23% 34% 35%
FreeCol 30 6,855 17 527 39 4.7% 56% 73% 79%
Jetty 30 4,613 6 96 32 12.5% 42% 81% 52%
VMT 80 2,926 8 143 25 5.6% 65% 72% 83%
ASM 60 12,294 23 218 43 5.5% 57% 79% 87%
IHIS 80 1,883 14 225 18 8% 53% 66% 68%
Table 2: Results of the experimental evaluation of Lean on open source programs.
allows learners to be trained and used interchangeably thus achiev-
ing the high degree of automation in annotating program variables.
Algorithms for inferring and validating annotations predict an-
notations for partially annotated δ–relations (i.e., when one com-
ponent of a δ–relation is annotated, and the other is not) and de-
tect when incorrect annotations are assigned in certain situations.
We found that these algorithms are not sound since they can miss
incorrect annotations and they cannot pinpoint the source of the
fault that led to incorrect annotations. However, as our experiments
show, these algorithms perform well in practice for the majority of
cases by discarding up to 83% of incorrectly assigned annotations.
9. RELATED WORK
Related work on program annotations falls into two major cate-
gories: systems that derive annotations as invariants or assertions
from program source code, and systems that automate the anno-
tation process for software-unrelated artifacts. One of the earliest
papers that belongs to the first category describes a technique for
annotating Algol-like programs automatically given their specifica-
tions [11]. The annotation techniques are based on the Hoare-like
inference rules which derive invariants from the assignment state-
ments, or from the control structures of programs. Like the Lean
approach, the annotation process is guided by a set of rules. The
program is incrementally annotated with invariant relationships that
hold between program variables at intermediate points. Unlike our
approach program annotation process is viewed strictly as discov-
ery of invariants by applying inference rules.
A comprehensive study of program annotations presents a classi-
fication of the assertions that were most effective at detecting faults
and describes experience using an assertion processing tool that ad-
dresses ease-of-use and effectiveness of program annotations [23].
Unlike our approach, this tool does not automate the annotation
0
10
20
30
40
50
60
70
80
90
100
0 100 200 300 400 500 600
Distinct training samples
Classification
accuracy
ASM
Jetty
PMD
0
10
20
30
40
50
60
70
80
90
100
0 100 200 300 400 500 600
Distinct training samples
Classification
accuracy
ASM
Jetty
PMD
Figure 5: The graph of the accuracy of the Lean learner.
process.
A technique for annotating source code with XML tags describ-
ing grammar productions is based on modified compilers for C,
Objective C, C++ and Java programs [22]. Like our research, this
parse tree approach uses grammars as external semantic relations
to guide the automatic annotation of program code. However, this
approach is tightly linked to grammars that do not express domain-
specific concepts and relations among them. By contrast, our solu-
tion operates on semantic relations and diagrams that are not linked
to program source code or language grammars.
Various tools and a language for creating framework annotations
allow programmers to generate annotations using frameworks’ and
example applications’ source code, automate the annotation process
with dedicated wizards, and introduce coding conventions for frame-
work annotations languages [24]. Like our research, concepts from
framework description diagrams are used to annotate program source
code. By contrast, the Lean’s goal is to automate the annotation
process rather then introduce a language that allows programmers
to enter annotations manually using some tools.
Calpa is a system that generates annotations automatically for
the DyC dynamic compiler by combining execution frequency and
value profile information with a model of dynamic compilation cost
to choose run-time constants and other dynamic compilation strate-
gies [21]. Calpa is shown to generate annotations of the same or
better quality as those found by a human, but in a fraction of the
time.
Daikon is a system for automatic inferences of program invari-
ants that is based on recording values of program variables at run-
time with their following analysis [15]. Typically, print statements
0
10
20
30
40
50
60
70
80
90
Megamek PMD FreeCol Jetty ASM IHIS
Applications
Vars,
%
Self-advice
True-advice
Figure 6: Percentage of variables that the Lean Learner an-
notates with self-advise (left bar) versus the true-advise anno-
tations (right bar) when the Learner is trained on the VMT
application.
10. are inserted in C source code to record the values of parameters to
functions and other variables before and after functions are called.
Then, these values are analyzed to find variables whose values are
not changed throughout the execution of certain functions. These
variables constitute invariants that annotate respective functions.
Like our research, Calpa and Daikon systems automate the
generation of annotations and the user is relieved from a task that
can be quite difficult and highly critical. Rather than identifying
run-time constants and low-level code properties that are extracted
from the source code, Lean enables programmers to automate the
process of annotating programs with arbitrary semantic concepts.
A number of systems automate the annotation process for software-
unrelated artifacts. Techniques used in these systems are similar to
ones that Lean uses. OpenText.org project presents an interest-
ing approach in automating text annotations [3]. It is a web-based
initiative for annotating Greek texts with various linguistic con-
cepts. Similar to Lean, the result of the annotation is kept in an
XML format which is later converted in the ARFF format required
by WEKA. Like in Lean, machine learning algorithms are used
to classify text and assign annotations based on the results of the
classification. The major difference between our approach and the
OpenText.org is that the latter is used to annotate texts while
the former annotates program code.
An automated annotation system for bioinformatics analysis is
applied to existing genom sequences to generate annotations that
are compared with existing annotations to illustrate not only poten-
tial errors but also to detect if they are not up-to-date [7]. Unlike
Lean, this system cannot be applied to programs, however, Lean
can use its ideas to further improve the validation of existing anno-
tations as programs evolve.
A semi-automatic method uses information extraction techniques
to generate semantic concept annotations for scientific articles in
the biochip domain [18]. This method is applied to annotate tex-
tual corpus from the biochip domain, and it was shown that adding
semantic annotations can improve the quality of information re-
trieval.
10. CONCLUSION
The contributions of this paper are the following:
• a system called Lean that automates program annotation process
and validates assigned annotations;
• Lean implementation in C++ that uses open source machine
learning tools and Java and XML parsers;
• a formalization of Lean rules;
• novel algorithms for inferring and validating annotations;
• our experiments show that after users annotate approximately
6% of the program variables and types, Lean correctly anno-
tates an additional 69% of variables in the best case, 47% on
the average, and 12% in the worst case.
In our experiments it took less than one hour to annotate each
subject application. Our experience suggests that Lean is practical
for many applications, and its algorithms are efficient and effective.
Acknowledgments
We warmly thank Don Batory for his comments and suggestions.
11. REFERENCES
[1] Edison design group. http://www.edg.com.
[2] Jsr 175: A metadata facility for the java programming
language. http://jcp.org/en/jsr/detail?id=175.
[3] The opentext.org project. http://www.opentext.org.
[4] Vehicle maintenance tracker. http://vmt.sourceforge.net/.
[5] Feature-oriented domain analysis (foda) feasibility study.
Technical Report CMU/SEI-90-TR-21, Software
Engineering Institute at Carnegie Mellon University, 1990.
[6] M. Arnold and B. G. Ryder. A framework for reducing the
cost of instrumented code. In PLDI, pages 168–179, 2001.
[7] K. Carter and A. Oka. Bioinformatics issues for automating
the annotation of genomic sequences. Genome Informatics,
12:204–211, 2001.
[8] W. W. Cohen and H. Hirsh. Joins that generalize: Text
classification using whirl. In KDD, pages 169–173, 1998.
[9] K. Czarnecki and U. W. Eisenecker. Generative
Programming: Methods, Tools, and Applications. Addison
Wesley, June 2000.
[10] J. W. Davison, D. Mancl, and W. F. Opdyke. Understanding
and addressing the essential costs of evolving systems. Bell
Labs Technical Journal, 5(2):44–54, 2000.
[11] N. Dershowitz and Z. Manna. Inference rules for program
annotation. IEEE Trans. Software Eng., 7(2):207–222, 1981.
[12] A. Doan, P. Domingos, and A. Y. Halevy. Reconciling
schemas of disparate data sources: A machine-learning
approach. In SIGMOD, 2001.
[13] A. Doan, P. Domingos, and A. Y. Halevy. Learning to match
the schemas of data sources: A multistrategy approach.
Machine Learning, 50(3):279–301, 2003.
[14] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern
Classification. Wiley-Interscience, October 2000.
[15] M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin.
Dynamically discovering likely program invariants to
support program evolution. In ICSE, pages 213–224, 1999.
[16] M. Grechanik, K. S. McKinley, and D. E. Perry. Automating
and validating program annotations. Technical Report
TR-05-39, Department of Computer Sciences, The
University of Texas at Austin, 2005.
[17] C. A. R. Hoare. The verifying compiler: A grand challenge
for computing research. In CC, pages 262–272, 2003.
[18] K. Khelif and R. Dieng-Kuntz. Ontology-based semantic
annotations for biochip domain. In EKAW, 2004.
[19] R. Michalski and G. Tecuci. Machine Learning: A
Multistrategy Approach. Morgan Kaufmann, February 1994.
[20] T. M. Mitchell. Machine Learning. McGraw-Hill, March
1997.
[21] M. Mock, C. Chambers, and S. J. Eggers. Calpa: a tool for
automating selective dynamic compilation. In MICRO, pages
291–302, 2000.
[22] J. F. Power and B. A. Malloy. Program annotation in xml: A
parse-tree based approach. In WCRE, pages 190–200, 2002.
[23] D. S. Rosenblum. A practical approach to programming with
assertions. IEEE Trans. Software Eng., 21(1):19–31, 1995.
[24] A. Viljamaa and J. Viljamaa. Creating framework
specialization instructions for tool environments. In The
Tenth Nordic Workshop on Programming and Software
Development Tools and Techniques, 2002.
[25] I. H. Witten and E. Frank. Data Mining: Practical Machine
Learning Tools and Techniques. Morgan Kaufmann, 2005.