With mining software repositories (MSR), we analyze the rich data created during the whole evolution of one or more software projects. One major obstacle in MSR is the heterogeneity and complexity of source code as a data source. With model-based technology in general and reverse engineering in particular, we can use abstraction to overcome this obstacle. But, this raises a new question: can we apply existing reverse engineering frameworks that were designed to create models from a single revision of a software system to analyze all revisions of such a system at once? This paper presents a framework that uses a combination of EMF, the reverse engineering framework Modisco, a NoSQL-based model persistence framework, and OCL-like expressions to create and analyze fully resolved AST-level model representations of whole source code repositories. We evaluated the feasibility of this approach with a series of experiments on the Eclipse code-base.
Presentation about an eclipse framework that allows to generate ecore model instances as input for tests and benchmarks. Held at the 3rd BigMDE workshop at STAF in L'Aquia, Italy in July 2015.
The document discusses model comparison approaches for delta-compression. It describes comparing models at the element level by matching elements between models and identifying differences. It also discusses representation of differences for compression purposes and experiments comparing EMF Compare and EMF Compress on reverse engineered models from Git repositories.
Model-based Analysis of Large Scale Software RepositoriesMarkus Scheidgen
1) The document discusses a model-based framework for analyzing large scale software repositories. It involves reverse engineering software from version control systems to create abstract syntax tree models, applying transformations and queries to derive metrics and insights, and using Scala for flexible queries and transformations.
2) Two example analyses are described: calculating design structure matrices and propagation costs, and detecting cross-cutting concerns by analyzing co-changed methods within commits.
3) The goal is to enable scalable, language-independent analysis of ultra-large repositories through model-based techniques instead of analyzing raw code directly. This allows abstracting different languages and repositories with common models and analyses.
Metamodeling vs Metaprogramming, A Case Study on Developing Client Libraries ...Markus Scheidgen
This document discusses metamodeling and metaprogramming approaches for developing client libraries for REST APIs. It presents a metamodel for describing REST APIs and shows how annotations in the xTend language can be used as an alternative internal DSL (domain-specific language) to generate code for a REST client library from descriptions of API requests and resource types. Active annotations are processed by custom compilers and can generate platform-specific code while providing static type safety. Both metamodeling and metaprogramming through active annotations are presented as model-driven approaches for developing web applications and REST client code.
Reference Representation in Large Metamodel-based DatasetsMarkus Scheidgen
This document discusses different model representations for large meta-model based datasets. It compares object-by-object representation to fragmentation strategies. Fragmentation breaks models into multiple fragments stored separately. The document evaluates different fragmentation strategies through theoretical analysis and implementation tests. It also compares part-of-source and relational representations and discusses applications of model fragmentation including software engineering and scientific data analysis.
This document describes an approach for supporting software change tasks using automated query reformulations. It begins with an example of a software change request between Alex and Bob. It then discusses using techniques like TextRank and POSRank, adapted from PageRank, to identify important terms from change requests for querying a codebase. The approach was evaluated on a dataset of over 1,900 change tasks from eight open source projects. Experimental results found the proposed approach improved 57.84% of queries and outperformed existing state-of-the-art methods based on measures like query effectiveness and retrieval performance.
Vulnerability analysis involves discovering parts of a program's input that can be exploited by malicious users to drive the program into an insecure state. Potential vulnerabilities exist in locations with known weaknesses that are dependent on or influenced by user input and can be reached during program execution. Vulnerability analysis aims to identify exploitable vulnerabilities by examining the paths in a program's control flow graph that connect points where untrusted data can enter and vulnerable functions can be reached.
This document describes ExSchema, a tool that discovers schemas from the source code of polyglot persistence applications. It analyzes project structure, update operations, repositories, and annotations to understand implicit schemas across different databases. The tool represents schemas as a uniform metamodel and outputs results as PDF files. It is demonstrated on applications using Neo4j, Spring Data Neo4j, MongoDB, HBase, CouchDB, relational databases, and combinations of these. Limitations and future work are discussed. The implementation leverages Eclipse JDT and AST to analyze Java code and output results through Spring Data and as PDF files.
Presentation about an eclipse framework that allows to generate ecore model instances as input for tests and benchmarks. Held at the 3rd BigMDE workshop at STAF in L'Aquia, Italy in July 2015.
The document discusses model comparison approaches for delta-compression. It describes comparing models at the element level by matching elements between models and identifying differences. It also discusses representation of differences for compression purposes and experiments comparing EMF Compare and EMF Compress on reverse engineered models from Git repositories.
Model-based Analysis of Large Scale Software RepositoriesMarkus Scheidgen
1) The document discusses a model-based framework for analyzing large scale software repositories. It involves reverse engineering software from version control systems to create abstract syntax tree models, applying transformations and queries to derive metrics and insights, and using Scala for flexible queries and transformations.
2) Two example analyses are described: calculating design structure matrices and propagation costs, and detecting cross-cutting concerns by analyzing co-changed methods within commits.
3) The goal is to enable scalable, language-independent analysis of ultra-large repositories through model-based techniques instead of analyzing raw code directly. This allows abstracting different languages and repositories with common models and analyses.
Metamodeling vs Metaprogramming, A Case Study on Developing Client Libraries ...Markus Scheidgen
This document discusses metamodeling and metaprogramming approaches for developing client libraries for REST APIs. It presents a metamodel for describing REST APIs and shows how annotations in the xTend language can be used as an alternative internal DSL (domain-specific language) to generate code for a REST client library from descriptions of API requests and resource types. Active annotations are processed by custom compilers and can generate platform-specific code while providing static type safety. Both metamodeling and metaprogramming through active annotations are presented as model-driven approaches for developing web applications and REST client code.
Reference Representation in Large Metamodel-based DatasetsMarkus Scheidgen
This document discusses different model representations for large meta-model based datasets. It compares object-by-object representation to fragmentation strategies. Fragmentation breaks models into multiple fragments stored separately. The document evaluates different fragmentation strategies through theoretical analysis and implementation tests. It also compares part-of-source and relational representations and discusses applications of model fragmentation including software engineering and scientific data analysis.
This document describes an approach for supporting software change tasks using automated query reformulations. It begins with an example of a software change request between Alex and Bob. It then discusses using techniques like TextRank and POSRank, adapted from PageRank, to identify important terms from change requests for querying a codebase. The approach was evaluated on a dataset of over 1,900 change tasks from eight open source projects. Experimental results found the proposed approach improved 57.84% of queries and outperformed existing state-of-the-art methods based on measures like query effectiveness and retrieval performance.
Vulnerability analysis involves discovering parts of a program's input that can be exploited by malicious users to drive the program into an insecure state. Potential vulnerabilities exist in locations with known weaknesses that are dependent on or influenced by user input and can be reached during program execution. Vulnerability analysis aims to identify exploitable vulnerabilities by examining the paths in a program's control flow graph that connect points where untrusted data can enter and vulnerable functions can be reached.
This document describes ExSchema, a tool that discovers schemas from the source code of polyglot persistence applications. It analyzes project structure, update operations, repositories, and annotations to understand implicit schemas across different databases. The tool represents schemas as a uniform metamodel and outputs results as PDF files. It is demonstrated on applications using Neo4j, Spring Data Neo4j, MongoDB, HBase, CouchDB, relational databases, and combinations of these. Limitations and future work are discussed. The implementation leverages Eclipse JDT and AST to analyze Java code and output results through Spring Data and as PDF files.
The document discusses object-oriented programming and C++. It begins with an introduction to OOP compared to procedural programming. Key concepts of OOP like classes, objects, inheritance and polymorphism are explained. The document then discusses C++ programming, including the structure of C++ programs, basic input/output functions, data types in C++, variables and constants. Pointers in C++ and how computer memory works with pointers are also summarized.
GenePattern is a platform for integrative genomics that provides tools for biological data analysis and visualization. It allows users to design analytic workflows by connecting modules, supports various programming languages and computing environments, and aims to improve reproducibility of research through versioning of modules, pipelines and results. The document discusses using GenePattern for microarray gene expression analysis and discusses opportunities for better supporting MAGE-TAB format data.
This document describes the Hermes tool for creating representative benchmark corpora for evaluating code analyses. Hermes uses feature queries to identify the relevant technical features present in projects, and then selects a minimal subset of projects that maximally covers these features. This optimized corpus is more manageable for development and testing while still being comprehensive. The document provides examples of feature queries and how Hermes was used to reduce the corpus size for one evaluation from 100 to 5 projects, speeding up the evaluation significantly with only a small loss of coverage.
An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and ...Kamiya Toshihiro
Toshihiro Kamiya: An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and Analysis,
Proceedings of the 9th IEEE International Workshop on Software Clones (IWSC'15), pp. 1-7 (2015).
This document discusses requirements for flexible modelling tools that can accommodate different modelling purposes and stages of development. It proposes a meta-modelling language and explicit modelling process to provide flexibility in modelling. The meta-modelling language supports multiple classification levels and types, as well as extensibility through untyped elements. An explicit modelling process defines phases, conformance rules, and transitions between phases to provide guidance for model development. Process-aware assistance filters model checks and fixes according to the current process intent.
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...Raffi Khatchadourian
Machine Learning (ML), including Deep Learning (DL), systems, i.e., those with ML capabilities, are pervasive in today’s data-driven society. Such systems are complex; they are comprised of ML models and many subsystems that support learning processes. As with other complex systems, ML systems are prone to classic technical debt issues, especially when such systems are long-lived, but they also exhibit debt specific to these systems. Unfortunately, there is a gap of knowledge in how ML systems actually evolve and are maintained. In this paper, we fill this gap by studying refactorings, i.e., source-to-source semantics-preserving program transformations, performed in real-world, open-source software, and the technical debt issues they alleviate. We analyzed 26 projects, consisting of 4.2 MLOC, along with 327 manually examined code patches. The results indicate that developers refactor these systems for a variety of reasons, both specific and tangential to ML, some refactorings correspond to established technical debt categories, while others do not, and code duplication is a major cross-cutting theme that particularly involved ML configuration and model code, which was also the most refactored. We also introduce 14 and 7 new ML-specific refactorings and technical debt categories, respectively, and put forth several recommendations, best practices, and anti-patterns. The results can potentially assist practitioners, tool developers, and educators in facilitating long-term ML system usefulness.
Model-Based Co-Evolution of Production Systems and their Libraries with Auto...Luca Berardinelli
System models are essential in planning, designing, realizing, and maintaining production systems. AutomationML (AML) is an emerging standard to represent and exchange heterogeneous artifacts throughout the complete system life cycle and is more and more used as a modeling language. AML is designed as a flexible, prototype-based language able to represent the full spectrum of different artifacts. It may be utilized to build reusable libraries containing prototypical elements to build up production systems by using clones. However, libraries have to evolve over time, e.g., to reflect bug fixes, new features or refactorings, and so system models have to co-evolve to reflect
the changes in the libraries.
To tackle this co-evolution challenge, we specify in this paper the relationship between library elements, i.e., prototypes, and system elements, i.e., clones, by establishing a formal model for prototype-based modeling languages. Based on this formalization,we introduce several levels of consistency rigor one may want to achieve when modeling with prototype-based languages. These levels are also the main input to reason about the impact of library changes on the concrete system models for which we provide semi-automated co-evolution propagation strategies. We apply the established theory to the concrete AML case and present concrete tool support for evolving AML models based on Eclipse which demonstrates that consistency between system models and libraries may be maintained semi-automatically.
TMPA-2017: Tools and Methods of Program Analysis
3-4 March, 2017, Hotel Holiday Inn Moscow Vinogradovo, Moscow
Vellvm - Verifying the LLVM
Steve Zdancewic (Professor, USA University of Pennsylvania)
For video follow the link: https://youtu.be/jDPAtUfnoBU
Would like to know more?
Visit our website:
www.tmpaconf.org
www.exactprosystems.com/events/tmpa
Follow us:
https://www.linkedin.com/company/exactpro-systems-llc?trk=biz-companies-cym
https://twitter.com/exactpro
This document contains the table of contents for a course on Object Oriented Programming in C++, Data Structures, Database Management Systems, Boolean Algebra, and Networking. The course is divided into 5 units covering these topics. Unit I focuses on Object Oriented Programming in C++ and covers chapters on classes, objects, inheritance, and file handling. Unit II covers data structures like arrays, stacks, queues and linked lists. Unit III is about database management systems and SQL. Unit IV covers Boolean algebra. Unit V is about networking and communication technology.
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningSung Kim
This document describes DeepAM, an approach for migrating APIs between programming languages using multi-modal sequence-to-sequence learning. DeepAM collects a parallel corpus of API sequences and natural language descriptions from large codebases. It learns semantic representations of API sequences using a deep neural network and aligns equivalent sequences between languages. DeepAM is evaluated on migrating Java APIs to C# and achieves higher accuracy than existing techniques in mining common API mappings from aligned sequences.
TMPA-2017: Tools and Methods of Program Analysis
3-4 March, 2017, Hotel Holiday Inn Moscow Vinogradovo, Moscow
5W+1H Static Analysis Report Quality Measure
Maxim Menshchikov, Timur Lepikhin, Oktetlabs
For video follow the link: https://youtu.be/bjW6_rMCZB8
Would like to know more?
Visit our website:
www.tmpaconf.org
www.exactprosystems.com/events/tmpa
Follow us:
https://www.linkedin.com/company/exactpro-systems-llc?trk=biz-companies-cym
https://twitter.com/exactpro
FLOSSMETRICS: The main objective of FLOSSMETRICS is to construct, publish and analyse a large scale database with information and metrics about libre software development coming from several thousands of software projects, using existing methodologies, and tools already developed. The project will also provide a public platform for validation and industrial exploitation of results.
Applying the Scientific Method to Simulation ExperimentsFrank Bergmann
In this talk I would like to explore on how to apply the scientific method to in silico experiments. How can we design these experiments, so that they are independent of the software tool that gave rise to them? Over the past decade we have seen the rise of model exchange formats such as the Systems Biology Markup Language (SBML), that enable us to share the models readily with colleagues and between applications.
Here I present the Simulation Experiment Description Markup Language (SED-ML) that aims to do the same thing for in silico experiments. After detailing its history, and where it currently stands, I will give a short overview of the growing tool support.
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffMartin Pinzger
Software systems continuously change and developers spent a large portion of their time in keeping track and understanding changes and their effects. Current development tools provide only limited support. Most of all, they track changes in source files only on the level of textual lines lacking semantic and context information on changes. Developers frequently need to reconstruct this information manually which is a time consuming and error prone task. In this talk, I present three techniques to address this problem by extracting detailed syntactical information from changes in various source files. I start with introducing ChangeDistiller, a tool and approach to extract information on source code changes on the level of ASTs. Next, I present the WSDLDiff approach to extract information on changes in web services interface description files. Finally, I present FMDiff, an approach to extract changes from feature models defined with the linux Kconfig language. For each approach I report on cases studies and experiments to highlight the benefits of our techniques. I also point out several research opportunities opened by our techniques and tools, and the detailed data on changes extracted by them.
Semantic scaffolds for pseudocode to-code generation (2020)Minhazul Arefin
They propose a method for program generation based on semantic scaffolds, lightweight structures representing the high-level semantic and syntactic composition of a program. By first searching over plausible scaffolds then using these as constraints for a beam search over programs, we achieve better coverage of the search space when compared with existing
techniques. We apply our hierarchical search method to the SPoC dataset for pseudocodeto- code generation, in which we are given line-level natural language pseudocode annotations
and aim to produce a program satisfying execution-based test cases. By using semantic scaffolds during inference, we achieve a 10% absolute improvement in top-100 accuracy
over the previous state-of-the-art. Additionally, we require only 11 candidates to reach the top-3000 performance of the previous best approach when tested against unseen problems, demonstrating a substantial improvement in efficiency.
fUML-Driven Performance Analysisthrough the MOSES Model LibraryLuca Berardinelli
The growing request for high-quality applications for em- bedded systems demands model-driven approaches that facilitate their design as well as the verification and validation activities.
In this paper we present MOSES, a model-driven performance analysis methodology based on Foundational UML (fUML). Implemented as an executable model library, MOSES provides data structures, as Classes, and algorithms, as Activities, which can be imported to instrument fUML models and then to carry out the performance analysis of the modeled system through fUML model simulation. An industrial case study is provided to show MOSES at work, its achievements and its future challenges.
A Search-based Testing Approach for XML Injection Vulnerabilities in Web Appl...Lionel Briand
The document describes a search-based testing approach for detecting XML injection vulnerabilities in web applications. The approach uses genetic algorithms to search the input space to generate malicious XML outputs defined as test objectives (TOs). The approach was evaluated on four subjects and found to be highly effective at detecting vulnerabilities, achieving 100% TO coverage. Random search was not effective, covering zero TOs. The approach was efficient, taking 5-32 minutes per TO. Input validation decreased coverage while fewer inputs and a restricted alphabet increased efficiency.
Graspskills is one of the largest providers of professional certification courses in the classroom, online and also virtual training. Our main mission is to satisfy the participants through excellent training. We offer more than 100 courses in 130+ countries. Graspskills courses are recognized by more than 50 global companies including Infosys, Wipro, Cognizant, TATA Consultancy Services and Accenture.
The document discusses object-oriented programming and C++. It begins with an introduction to OOP compared to procedural programming. Key concepts of OOP like classes, objects, inheritance and polymorphism are explained. The document then discusses C++ programming, including the structure of C++ programs, basic input/output functions, data types in C++, variables and constants. Pointers in C++ and how computer memory works with pointers are also summarized.
GenePattern is a platform for integrative genomics that provides tools for biological data analysis and visualization. It allows users to design analytic workflows by connecting modules, supports various programming languages and computing environments, and aims to improve reproducibility of research through versioning of modules, pipelines and results. The document discusses using GenePattern for microarray gene expression analysis and discusses opportunities for better supporting MAGE-TAB format data.
This document describes the Hermes tool for creating representative benchmark corpora for evaluating code analyses. Hermes uses feature queries to identify the relevant technical features present in projects, and then selects a minimal subset of projects that maximally covers these features. This optimized corpus is more manageable for development and testing while still being comprehensive. The document provides examples of feature queries and how Hermes was used to reduce the corpus size for one evaluation from 100 to 5 projects, speeding up the evaluation significantly with only a small loss of coverage.
An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and ...Kamiya Toshihiro
Toshihiro Kamiya: An Execution-Semantic and Content-and-Context-Based Code-Clone Detection and Analysis,
Proceedings of the 9th IEEE International Workshop on Software Clones (IWSC'15), pp. 1-7 (2015).
This document discusses requirements for flexible modelling tools that can accommodate different modelling purposes and stages of development. It proposes a meta-modelling language and explicit modelling process to provide flexibility in modelling. The meta-modelling language supports multiple classification levels and types, as well as extensibility through untyped elements. An explicit modelling process defines phases, conformance rules, and transitions between phases to provide guidance for model development. Process-aware assistance filters model checks and fixes according to the current process intent.
An Empirical Study of Refactorings and Technical Debt in Machine Learning Sys...Raffi Khatchadourian
Machine Learning (ML), including Deep Learning (DL), systems, i.e., those with ML capabilities, are pervasive in today’s data-driven society. Such systems are complex; they are comprised of ML models and many subsystems that support learning processes. As with other complex systems, ML systems are prone to classic technical debt issues, especially when such systems are long-lived, but they also exhibit debt specific to these systems. Unfortunately, there is a gap of knowledge in how ML systems actually evolve and are maintained. In this paper, we fill this gap by studying refactorings, i.e., source-to-source semantics-preserving program transformations, performed in real-world, open-source software, and the technical debt issues they alleviate. We analyzed 26 projects, consisting of 4.2 MLOC, along with 327 manually examined code patches. The results indicate that developers refactor these systems for a variety of reasons, both specific and tangential to ML, some refactorings correspond to established technical debt categories, while others do not, and code duplication is a major cross-cutting theme that particularly involved ML configuration and model code, which was also the most refactored. We also introduce 14 and 7 new ML-specific refactorings and technical debt categories, respectively, and put forth several recommendations, best practices, and anti-patterns. The results can potentially assist practitioners, tool developers, and educators in facilitating long-term ML system usefulness.
Model-Based Co-Evolution of Production Systems and their Libraries with Auto...Luca Berardinelli
System models are essential in planning, designing, realizing, and maintaining production systems. AutomationML (AML) is an emerging standard to represent and exchange heterogeneous artifacts throughout the complete system life cycle and is more and more used as a modeling language. AML is designed as a flexible, prototype-based language able to represent the full spectrum of different artifacts. It may be utilized to build reusable libraries containing prototypical elements to build up production systems by using clones. However, libraries have to evolve over time, e.g., to reflect bug fixes, new features or refactorings, and so system models have to co-evolve to reflect
the changes in the libraries.
To tackle this co-evolution challenge, we specify in this paper the relationship between library elements, i.e., prototypes, and system elements, i.e., clones, by establishing a formal model for prototype-based modeling languages. Based on this formalization,we introduce several levels of consistency rigor one may want to achieve when modeling with prototype-based languages. These levels are also the main input to reason about the impact of library changes on the concrete system models for which we provide semi-automated co-evolution propagation strategies. We apply the established theory to the concrete AML case and present concrete tool support for evolving AML models based on Eclipse which demonstrates that consistency between system models and libraries may be maintained semi-automatically.
TMPA-2017: Tools and Methods of Program Analysis
3-4 March, 2017, Hotel Holiday Inn Moscow Vinogradovo, Moscow
Vellvm - Verifying the LLVM
Steve Zdancewic (Professor, USA University of Pennsylvania)
For video follow the link: https://youtu.be/jDPAtUfnoBU
Would like to know more?
Visit our website:
www.tmpaconf.org
www.exactprosystems.com/events/tmpa
Follow us:
https://www.linkedin.com/company/exactpro-systems-llc?trk=biz-companies-cym
https://twitter.com/exactpro
This document contains the table of contents for a course on Object Oriented Programming in C++, Data Structures, Database Management Systems, Boolean Algebra, and Networking. The course is divided into 5 units covering these topics. Unit I focuses on Object Oriented Programming in C++ and covers chapters on classes, objects, inheritance, and file handling. Unit II covers data structures like arrays, stacks, queues and linked lists. Unit III is about database management systems and SQL. Unit IV covers Boolean algebra. Unit V is about networking and communication technology.
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningSung Kim
This document describes DeepAM, an approach for migrating APIs between programming languages using multi-modal sequence-to-sequence learning. DeepAM collects a parallel corpus of API sequences and natural language descriptions from large codebases. It learns semantic representations of API sequences using a deep neural network and aligns equivalent sequences between languages. DeepAM is evaluated on migrating Java APIs to C# and achieves higher accuracy than existing techniques in mining common API mappings from aligned sequences.
TMPA-2017: Tools and Methods of Program Analysis
3-4 March, 2017, Hotel Holiday Inn Moscow Vinogradovo, Moscow
5W+1H Static Analysis Report Quality Measure
Maxim Menshchikov, Timur Lepikhin, Oktetlabs
For video follow the link: https://youtu.be/bjW6_rMCZB8
Would like to know more?
Visit our website:
www.tmpaconf.org
www.exactprosystems.com/events/tmpa
Follow us:
https://www.linkedin.com/company/exactpro-systems-llc?trk=biz-companies-cym
https://twitter.com/exactpro
FLOSSMETRICS: The main objective of FLOSSMETRICS is to construct, publish and analyse a large scale database with information and metrics about libre software development coming from several thousands of software projects, using existing methodologies, and tools already developed. The project will also provide a public platform for validation and industrial exploitation of results.
Applying the Scientific Method to Simulation ExperimentsFrank Bergmann
In this talk I would like to explore on how to apply the scientific method to in silico experiments. How can we design these experiments, so that they are independent of the software tool that gave rise to them? Over the past decade we have seen the rise of model exchange formats such as the Systems Biology Markup Language (SBML), that enable us to share the models readily with colleagues and between applications.
Here I present the Simulation Experiment Description Markup Language (SED-ML) that aims to do the same thing for in silico experiments. After detailing its history, and where it currently stands, I will give a short overview of the growing tool support.
Analyzing Changes in Software Systems From ChangeDistiller to FMDiffMartin Pinzger
Software systems continuously change and developers spent a large portion of their time in keeping track and understanding changes and their effects. Current development tools provide only limited support. Most of all, they track changes in source files only on the level of textual lines lacking semantic and context information on changes. Developers frequently need to reconstruct this information manually which is a time consuming and error prone task. In this talk, I present three techniques to address this problem by extracting detailed syntactical information from changes in various source files. I start with introducing ChangeDistiller, a tool and approach to extract information on source code changes on the level of ASTs. Next, I present the WSDLDiff approach to extract information on changes in web services interface description files. Finally, I present FMDiff, an approach to extract changes from feature models defined with the linux Kconfig language. For each approach I report on cases studies and experiments to highlight the benefits of our techniques. I also point out several research opportunities opened by our techniques and tools, and the detailed data on changes extracted by them.
Semantic scaffolds for pseudocode to-code generation (2020)Minhazul Arefin
They propose a method for program generation based on semantic scaffolds, lightweight structures representing the high-level semantic and syntactic composition of a program. By first searching over plausible scaffolds then using these as constraints for a beam search over programs, we achieve better coverage of the search space when compared with existing
techniques. We apply our hierarchical search method to the SPoC dataset for pseudocodeto- code generation, in which we are given line-level natural language pseudocode annotations
and aim to produce a program satisfying execution-based test cases. By using semantic scaffolds during inference, we achieve a 10% absolute improvement in top-100 accuracy
over the previous state-of-the-art. Additionally, we require only 11 candidates to reach the top-3000 performance of the previous best approach when tested against unseen problems, demonstrating a substantial improvement in efficiency.
fUML-Driven Performance Analysisthrough the MOSES Model LibraryLuca Berardinelli
The growing request for high-quality applications for em- bedded systems demands model-driven approaches that facilitate their design as well as the verification and validation activities.
In this paper we present MOSES, a model-driven performance analysis methodology based on Foundational UML (fUML). Implemented as an executable model library, MOSES provides data structures, as Classes, and algorithms, as Activities, which can be imported to instrument fUML models and then to carry out the performance analysis of the modeled system through fUML model simulation. An industrial case study is provided to show MOSES at work, its achievements and its future challenges.
A Search-based Testing Approach for XML Injection Vulnerabilities in Web Appl...Lionel Briand
The document describes a search-based testing approach for detecting XML injection vulnerabilities in web applications. The approach uses genetic algorithms to search the input space to generate malicious XML outputs defined as test objectives (TOs). The approach was evaluated on four subjects and found to be highly effective at detecting vulnerabilities, achieving 100% TO coverage. Random search was not effective, covering zero TOs. The approach was efficient, taking 5-32 minutes per TO. Input validation decreased coverage while fewer inputs and a restricted alphabet increased efficiency.
Graspskills is one of the largest providers of professional certification courses in the classroom, online and also virtual training. Our main mission is to satisfy the participants through excellent training. We offer more than 100 courses in 130+ countries. Graspskills courses are recognized by more than 50 global companies including Infosys, Wipro, Cognizant, TATA Consultancy Services and Accenture.
This document discusses two degree of freedom systems and provides equations of motion for a two degree of freedom spring-mass system with damping. It presents the matrix form of the equations of motion and defines the mass, damping and stiffness matrices. It then analyzes the free vibration of an undamped two degree of freedom system, determining the natural frequencies and normal modes of vibration. The normal modes allow expressing the motion as a superposition of the individual mode shapes.
La Guerra Fría fue un período de tensión geopolítica entre 1945 y 1991 caracterizado por el enfrentamiento entre las dos superpotencias de la época, Estados Unidos y la Unión Soviética. Ambas naciones lideraban bloques antagónicos y competían por expandir su influencia a nivel mundial mediante tácticas políticas, económicas y militares indirectas, sin llegar a un conflicto armado directo debido a su capacidad nuclear mutua.
Building material is any material which is used for construction purposes. Many naturally occurring substances, such as clay, rocks, sand, and wood, even twigs and leaves, have been used to construct buildings. Apart from naturally occurring materials, many man-made products are in use, some more and some less synthetic. The manufacture of building materials is an established industry in many countries and the use of these materials is typically segmented into specific specialty trades, such as carpentry, insulation, plumbing, and roofing work. They provide the make-up of habitats and structures including homes. Construction industry is the largest consumer of material resources, of both the natural ones (like stone, sand, clay, lime) and the processed and synthetic ones. Each material which is used in the construction, in one form or the other is known as construction material (engineering material). No material, existing in the universe is useless; every material has its own field of application. Stone, bricks, timber, steel, lime, cement, metals etc. are some commonly used materials by civil engineers.
Market outlook
The global market for construction materials is projected to exceed US $ 1 trillion by 2020, driven by improving world economic outlook and recovering construction activity in residential, commercial and infrastructure sector. India's construction industry will continue to expand over the forecast period (2016–2020), with investments in residential, infrastructure and energy projects continuing to drive growth. The industry's output value in real terms is expected to rise at a compound annual growth rate (CAGR) of 5.65% over the forecast period; up from 2.95% during the review period (2011–2015). For August 2016, the construction final demand producer price index (PPI) was flat over July 2016, but is 0.7% above last year. The PPI for Inputs to construction industry goods was down -0.2% from last month and -1.7% from last year. After what will likely be 2% growth in 2016, overall construction costs are forecasted for ongoing growth in 2017 in the 2-3% range. These increases will be primarily led by gains in construction labor wages, which is forecasted for 3-4% in 2017.
See more
https://goo.gl/UiiwHK
https://goo.gl/s4J1zA
Contact us
Niir Project Consultancy Services
106-E, Kamla Nagar, Opp. Spark Mall,
New Delhi-110007, India.
Email: npcs.ei@gmail.com , info@entrepreneurindia.co
Tel: +91-11-23843955, 23845654, 23845886, 8800733955
Mobile: +91-9811043595
Website: www.entrepreneurindia.co , www.niir.org
This document provides a list of potential photos from a recent photo shoot that could be used for an album cover, back cover, digipak, and advertisements.
El documento describe las funciones y objetivos de la Fundación para la Innovación Agraria (FIA) en Chile. La FIA busca fomentar una cultura de innovación en el sector agrario para mejorar las condiciones de vida de los agricultores a través de proyectos de innovación, asociaciones público-privadas y el desarrollo de pequeñas y medianas empresas agrícolas. La FIA ofrece varios instrumentos de financiamiento e incentivos para proyectos de innovación en áreas como la agricultura sostenible, alimentos saludables, emp
Communication and media response to the Westminster AttackStephen Waddington
Best practice communication was critical to allaying fear in the immediate response to the Westminster attacks but sensationalist media coverage must be challenged.
There was a terrorist attack on Westminster Bridge and the Palace of Westminster in London yesterday. Five people, including a police office, are dead, and 40 people are injured.
In this deck I've looked at the response from London's Mayor, the police, journalists, media, the government, the public, and others. It is intended for a lecture to public relations students at Newcastle University.
The deck tells the story of how crisis situations have unfolded in media over the last 40 years, as media has changed.
I've included user generated comment from social networks including some examples of hate and propaganda that you may find disturbing.
Thank you to the emergency services, NHS staff and all the professional communicators involved in the incident response.
This document discusses potential areas where lattice QCD calculations could provide input to help constrain parton distribution functions (PDFs) which are currently not well known. It identifies "benchmarks" where PDFs are already well determined, as well as "opportunities" where lattice calculations could have more impact. These include PDFs at large values of Bjorken x, the strange quark content of the proton, and charm content. The document also discusses how lattice data could be included in PDF global fits to help reduce PDF uncertainties.
Metodos de programacion no lineal javier perezjavier peeez
Este documento resume varios métodos de programación no lineal como el algoritmo no lineal restringido, el método de programación separable, el método de programación cuadrática, el método de programación geométrica y el método de programación estocástica. Explica que la programación no lineal optimiza funciones objetivo sujetas a restricciones no lineales y que no existe un único algoritmo para resolver estos problemas.
Este documento fue creado para el interés estudiantil en saber qué es la teoría de los cuatro humores. Estudiantes, nos dedicamos a la creación de esta investigación que es plasmada en esta presentación.
This document provides an overview of software engineering and the evolution of practices in the field. It discusses how software development has progressed from an ad hoc exploratory approach to more systematic approaches utilizing structured programming, data structure design, data flow design, and object-oriented design. Modern practices emphasize prevention over correction of errors through life cycle models, documentation, testing and other techniques.
The document presents an approach called Convolutional Analysis of code Metrics Evolution (CAME) that uses a convolutional neural network to detect anti-patterns by analyzing the historical evolution of source code metrics at the class level. An evaluation on 7 open-source systems shows that considering longer histories of metrics improves detection performance and that CAME outperforms other machine learning and anti-pattern detection techniques in terms of precision, recall, and F-measure.
Architectural Abstractions for Hybrid ProgramsIvan Ruchkin
Modern cyber-physical systems interact closely with continuous physical processes like kinematic movement. Software component frameworks do not provide an explicit way to represent or reason about these processes. Meanwhile, hybrid program models have been successful in proving critical properties of discrete-continuous systems. These programs deal with diverse aspects of a cyber-physical system such as controller decisions, component communication protocols, and mechanical dynamics, requiring several programs to address the variation. However, currently these aspects are often intertwined in mostly monolithic hybrid programs, which are difficult to understand, change, and organize. These issues can be addressed by component-based engineering, making hybrid modeling more practical. This
paper lays the foundation for using architectural models to
provide component-based benefits to developing hybrid pro-
grams. We build formal architectural abstractions of hy-
brid programs and formulas, enabling analysis of hybrid programs at the component level, reusing parts of hybrid programs, and automatic transformation from views into hybrid programs and formulas. Our approach is evaluated in the context of a robotic collision avoidance case study.
Nowadays software systems are essential to the environment of most organizations, and their maintenance is a key point to support business dynamics. Thus, reverse engineering legacy systems for knowledge reuse has become a major concern in software industry. This article, based on a survey about reverse engineering tools, discusses a set of functional and nonfunctional requirements for an effective tool for reverse engineering, and observes that current tools only partly support these requirements. In addition, we define new requirements, based on our group’s experience and industry feedback, and present the architecture and implementation of LIFT: a Legacy InFormation retrieval Tool, developed based on these demands. Furthermore, we discuss the compliance of LIFT with the defined requirements. Finally, we applied the LIFT in a reverse engineering project of a 210KLOC NATURAL/ADABAS system of a financial institution and analyzed its effectiveness and scalability, comparing data with previous similar projects performed by the same institution.
This document summarizes R at Microsoft, including its acquisition of Revolution Analytics. Key points include:
- R is a widely used open-source statistical programming language. Microsoft acquired Revolution Analytics to help customers use advanced analytics within its data platforms.
- Microsoft products like SQL Server will integrate Revolution R Open, an enhanced open-source R distribution, to allow running R scripts directly from SQL queries.
- Microsoft aims to make R and advanced analytics more accessible and scalable through products like the Machine Learning marketplace and by running R on servers to handle large datasets within SQL Server and Azure.
HIS 2017 Mark Batty-Industrial concurrency specification for C/C++jamieayre
Modern computer systems have intricate memory hierarchies that violate the intuition of a global timeline of interleaved memory accesses. In these so-called relaxed-memory systems, it can be dangerous to use intuition, specifications are universally unreliable, and the outcome of testing is both wildly nondeterministic and dependent on hardware that is getting more permissive of odd behaviour with each generation. These complications pervade the whole system, and have led to bugs in language specifications, deployed processors, compilers, and vendor-endorsed programming idioms – it is clear that current engineering practice is severely lacking.
I will describe my part of a sustained effort in the academic community to tackle these problems by applying a range of techniques, from exhaustive testing to mechanised formal specification and proof. I will focus on a vein of work with strong industrial impact: the formalisation, refinement and validation of the 2011/14/17 C and C++ concurrency design, leading to changes in the ratified international standard, and ultimately uncovering fundamental flaws that lead us to the state of the art in concurrent language specification. This work represents an essential enabling step for verification on modern concurrent systems
The document outlines the course content for a Software Engineering course. It is divided into 5 units that cover topics like software process models, requirements engineering, system design, user interface design, testing strategies, risk management, and quality management. The course aims to explain key concepts in software engineering and help students differentiate between functional and non-functional requirements, discuss various system models, and explain testing and risk management techniques.
(Costless) Software Abstractions for Parallel ArchitecturesJoel Falcou
Performing large, intensive or non-trivial computing on array like data structures is one of the most common task in scientific computing, video game development and other fields. This matter of fact is backed up by the large number of tools, languages and libraries to perform such tasks. If we restrict ourselves to C++ based solutions, more than a dozen such libraries exists from BLAS/LAPACK C++ binding to template meta-programming based Blitz++ or Eigen. If all of these libraries provide good performance or good abstraction, none of them seems to fit the need of so many different user types.
Moreover, as parallel system complexity grows, the need to maintain all those components quickly become unwieldy. This talk explores various software design techniques - like Generative Programming, MetaProgramming and Generic Programming - and their application to the implementation of a parallel computing librariy in such a way that:
- abstraction and expressiveness are maximized - cost over efficiency is minimized
We'll skim over various applications and see how they can benefit from such tools. We will conclude by discussing what lessons were learnt from this kind of implementation and how those lessons can translate into new directions for the language itself.
Architecture Centric Development PPT PresentationERPCell
The π-Method is a formal method for architecture-centric software engineering that uses π-calculus, μ-calculus, and rewriting logic. It supports the formal modeling, analysis, refinement, and development of software systems from abstract architectural models to code generation. The π-Method aims to fully support the formal development of software architectures and enable rigorous analysis to guarantee system properties and quality.
GPCE16: Automatic Non-functional Testing of Code Generators FamiliesMohamed BOUSSAA
The intensive use of generative programming techniques provides an elegant engineering solution to deal with the heterogeneity of platforms and technological stacks. The use of domain-specific languages for example, leads to the creation of numerous code generators that automatically translate high-level system specifications into multi-target executable code. Producing correct and efficient code generator is complex and error-prone. Although software designers provide generally high-level test suites to verify the functional outcome of generated code, it remains challenging and tedious to verify the behavior of produced code in terms of non-functional properties. This paper describes a practical approach based on a runtime monitoring infrastructure to automatically check the potential inefficient code generators. This infrastructure, based on system containers as execution platforms, allows code-generator developers to evaluate the generated code performance. We evaluate our approach by analyzing the performance of Haxe, a popular high-level programming language that involves a set of cross-platform code generators. Experimental results show that our approach is able to detect some performance inconsistencies that reveal real issues in Haxe code generators.
The ideology behind the hydrological modelling I do. It is a revisiting of part of a talk I gave at CUAHSI biennial meeting in Boulder (CO) on July 2008. It promotes the modeling-by-components paradigm
QRS16: NOTICE: A Framework for Non-functional Testing of CompilersMohamed BOUSSAA
This talk was presented at the 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS 2016).
To have more details you can visit the tool website: https://noticegcc.wordpress.com/
Paper link: https://hal.archives-ouvertes.fr/hal-01344835/document
Author website: https://mboussaa.wordpress.com/
Finding Resource Manipulation Bugs in Linux CodeAndrzej Wasowski
This document discusses finding bugs in Linux kernel code using type-and-effect abstraction. It describes how type and effect inference can be used to model Linux kernel code and detect bugs related to memory safety, data races, and other issues. The approach has been implemented in a tool called the Effect-Based Analyzer (EBA) and has found several new bugs in Linux driver code. Handling preprocessor directives (#ifdefs) is important for analyzing highly-configurable systems like the Linux kernel and is an area of ongoing work.
Revisiting Code Ownership and Its Relationship with Software Quality in the S...The University of Adelaide
This work was presented at The 38th International Conference on Software Engineering (ICSE2016).
Abstract: Code ownership establishes a chain of responsibility for modules in large software systems. Although prior work uncovers a link between code ownership heuristics and software quality, these heuristics rely solely on the authorship of code changes. In addition to authoring code changes, developers also make important contributions to a module by reviewing code changes. Indeed, recent work shows that reviewers are highly active in modern code review processes, often suggesting alternative solutions or providing updates to the code changes. In this paper, we complement traditional code ownership heuristics using code review activity. Through a case study of six releases of the large Qt and OpenStack systems, we find that: (1) 67%-86% of developers did not author any code changes for a module, but still actively contributed by reviewing 21%-39% of the code changes, (2) code ownership heuristics that are aware of reviewing activity share a relationship with software quality, and (3) the proportion of reviewers without expertise shares a strong, increasing relationship with the likelihood of having post-release defects. Our results suggest that reviewing activity captures an important aspect of code ownership, and should be included in approximations of it in future studies.
Code reviews have been conducted since decades in
software projects, with the aim of improving code quality from
many different points of view. During code reviews, developers are supported by checklists, coding standards and, possibly, by various kinds of static analysis tools. This paper investigates whether warnings highlighted by static analysis tools are taken care of during code reviews and, whether there are kinds of warnings that tend to be removed more than others. Results of a study conducted by mining the Gerrit repository of six Java open source projects indicate that the density of warnings only slightly vary after each review. The overall percentage of warnings removed during reviews is slightly higher than what previous studies found for the overall project evolution history. However, when looking (quantitatively and qualitatively) at specific categories of warnings, we found that during code reviews developers focus on certain kinds of problems. For such
categories of warnings the removal percentage tend to be very high—often above 50% and sometimes 100%. Examples of those are warnings in the imports, regular expression, and type resolution categories. In conclusion, while a broad warning detection might produce way too many false positives, enforcing the removal of certain warnings prior to the patch submission could reduce the
amount of effort provided during the code review process.
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...The Third Creative Media
"Navigating Invideo: A Comprehensive Guide" is an essential resource for anyone looking to master Invideo, an AI-powered video creation tool. This guide provides step-by-step instructions, helpful tips, and comparisons with other AI video creators. Whether you're a beginner or an experienced video editor, you'll find valuable insights to enhance your video projects and bring your creative ideas to life.
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...kalichargn70th171
In today's business landscape, digital integration is ubiquitous, demanding swift innovation as a necessity rather than a luxury. In a fiercely competitive market with heightened customer expectations, the timely launch of flawless digital products is crucial for both acquisition and retention—any delay risks ceding market share to competitors.
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
8 Best Automated Android App Testing Tool and Framework in 2024.pdfkalichargn70th171
Regarding mobile operating systems, two major players dominate our thoughts: Android and iPhone. With Android leading the market, software development companies are focused on delivering apps compatible with this OS. Ensuring an app's functionality across various Android devices, OS versions, and hardware specifications is critical, making Android app testing essential.
Consistent toolbox talks are critical for maintaining workplace safety, as they provide regular opportunities to address specific hazards and reinforce safe practices.
These brief, focused sessions ensure that safety is a continual conversation rather than a one-time event, which helps keep safety protocols fresh in employees' minds. Studies have shown that shorter, more frequent training sessions are more effective for retention and behavior change compared to longer, infrequent sessions.
Engaging workers regularly, toolbox talks promote a culture of safety, empower employees to voice concerns, and ultimately reduce the likelihood of accidents and injuries on site.
The traditional method of conducting safety talks with paper documents and lengthy meetings is not only time-consuming but also less effective. Manual tracking of attendance and compliance is prone to errors and inconsistencies, leading to gaps in safety communication and potential non-compliance with OSHA regulations. Switching to a digital solution like Safelyio offers significant advantages.
Safelyio automates the delivery and documentation of safety talks, ensuring consistency and accessibility. The microlearning approach breaks down complex safety protocols into manageable, bite-sized pieces, making it easier for employees to absorb and retain information.
This method minimizes disruptions to work schedules, eliminates the hassle of paperwork, and ensures that all safety communications are tracked and recorded accurately. Ultimately, using a digital platform like Safelyio enhances engagement, compliance, and overall safety performance on site. https://safelyio.com/
Most important New features of Oracle 23c for DBAs and Developers. You can get more idea from my youtube channel video from https://youtu.be/XvL5WtaC20A
WWDC 2024 Keynote Review: For CocoaCoders AustinPatrick Weigel
Overview of WWDC 2024 Keynote Address.
Covers: Apple Intelligence, iOS18, macOS Sequoia, iPadOS, watchOS, visionOS, and Apple TV+.
Understandable dialogue on Apple TV+
On-device app controlling AI.
Access to ChatGPT with a guest appearance by Chief Data Thief Sam Altman!
App Locking! iPhone Mirroring! And a Calculator!!
UI5con 2024 - Bring Your Own Design SystemPeter Muessig
How do you combine the OpenUI5/SAPUI5 programming model with a design system that makes its controls available as Web Components? Since OpenUI5/SAPUI5 1.120, the framework supports the integration of any Web Components. This makes it possible, for example, to natively embed own Web Components of your design system which are created with Stencil. The integration embeds the Web Components in a way that they can be used naturally in XMLViews, like with standard UI5 controls, and can be bound with data binding. Learn how you can also make use of the Web Components base class in OpenUI5/SAPUI5 to also integrate your Web Components and get inspired by the solution to generate a custom UI5 library providing the Web Components control wrappers for the native ones.
A neural network is a machine learning program, or model, that makes decisions in a manner similar to the human brain, by using processes that mimic the way biological neurons work together to identify phenomena, weigh options and arrive at conclusions.
E-commerce Development Services- Hornet DynamicsHornet Dynamics
For any business hoping to succeed in the digital age, having a strong online presence is crucial. We offer Ecommerce Development Services that are customized according to your business requirements and client preferences, enabling you to create a dynamic, safe, and user-friendly online store.
How Can Hiring A Mobile App Development Company Help Your Business Grow?ToXSL Technologies
ToXSL Technologies is an award-winning Mobile App Development Company in Dubai that helps businesses reshape their digital possibilities with custom app services. As a top app development company in Dubai, we offer highly engaging iOS & Android app solutions. https://rb.gy/necdnt
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsPeter Muessig
The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...kalichargn70th171
In today's fiercely competitive mobile app market, the role of the QA team is pivotal for continuous improvement and sustained success. Effective testing strategies are essential to navigate the challenges confidently and precisely. Ensuring the perfection of mobile apps before they reach end-users requires thoughtful decisions in the testing plan.
A Comprehensive Guide on Implementing Real-World Mobile Testing Strategies fo...
Creating and Analyzing Source Code Repository Models - A Model-based Approach to Mining Software Repositories
1. Model-based Analysis of
Source Code Repositories
Markus Scheidgen
1
Martin Schmidt Joachim Fischer
{scheidge,schmidma,fischer@informatik.hu-berlin.de
Humboldt Universität zu Berlin
2. Agenda
▶ Software Evolution, Reverse Engineering, and Mining Software
Repositories
▶ Model-based Mining of Software Repositories
▶ srcrepo a framework for model-bases analysis of software
repositories
▶ Experiments with Eclipse’s software repositories
▶ Conclusions
2
3. software maintenance
RHEAD
…
R0
■ quality assessment
■ implicit dependencies
software engineering
Software Evolution
3
requirements
M
M {C}
S
user
problem
M
M {C}
S
user
mining software repositories1
software modernization
OMG’s ADM
(Achitecture-Driven Modernization)
‣AST Meta-Model (ASTM)
‣Knowledge Discovery Meta-
Model (KDM)
‣Software Metrics Meta-Model
(SMM)
M
SS
M
M
M
{C} {C}
reverseengineering
forwardengineering
transformation
1.H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007
4. Mining Software Repositories – In General
▶ The term mining software repositories (MSR) has been coined
to describe a broad class of investigations into the examination
of software repositories.
▶ The premise of MSR is that empirical and systematic
investigations of repositories will shed new light on the process
of software evolution. [1]
▶ Different scopes, e.g. single software projects vs. many
software projects
▶ Different goals, e.g. quality assessments and implicit
dependencies vs. generalizations about software evolution
4
1.H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007
5. Model-based Mining of Software Repositories
5
MS M{C} reverse engineering
1.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symp. on Software Testing and Analysis; 2008
6. Model-based Mining of Software Repositories
5
MS M{C}
MM{Cn}
RHEAD
…
R
0
reverse engineering
1.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symp. on Software Testing and Analysis; 2008
7. Model-based Mining of Software Repositories
5
MS M{C}
MM{Cn}
RHEAD
…
R
0
{Cn-1} MM
reverse engineering
1.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symp. on Software Testing and Analysis; 2008
8. Model-based Mining of Software Repositories
5
MS M{C}
MM{Cn}
RHEAD
…
R
0
{Cn-1} MM
{C0}
…
MM
…
…
…
reverse engineering
1.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symp. on Software Testing and Analysis; 2008
9. Model-based Mining of Software Repositories
5
MS M{C}
MM{Cn}
RHEAD
…
R
0
{Cn-1} MM
{C0}
…
MM
…
…
…
reverse engineering
1.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symp. on Software Testing and Analysis; 2008
10. Model-based Mining of Software Repositories
▶ Scope
■ depends on concreter MSR-application and its goals
■ number of software projects: single repositories, large
repositories, ultra-large repositories
■ Sources as text and text based metrics, e.g. LOC
■ Declarations only: packages, classes, methods, but no
statements, expressions, etc.
■ Full AST with or without cross-references
6
11. Model-based Mining of Software Repositories
▶ Scope
■ depends on concreter MSR-application and its goals
■ number of software projects: single repositories, large
repositories, ultra-large repositories
■ Sources as text and text based metrics, e.g. LOC
■ Declarations only: packages, classes, methods, but no
statements, expressions, etc.
■ Full AST with or without cross-references
7
12. Model-based Mining of Software Repositories
▶ MSR tools are already “model-based”, but in a proprietary
manner
▶ Idea: existing reverse engineering framework and
corresponding standard meta-models and modeling
frameworks instead of proprietary solutions
▶ Goals
■ deal with heterogeneity (different version control systems,
different languages)
■ reuse of existing meta-models, transformations, and languages
■ interoperability with existing analysis tools
■ retaining meaningful scalability
8
34. Research Questions
▶ Assumptions
■ Development of MSR-applications based on models,
transformation languages and standardized meta-models is
favorable
■ Some MSR-applications need to analyze source code on a deep
(AST) level
■ MSR-analysis is performed iteratively
▶ Hypotheses
■ Models of source code repositories can be created and persisted
■ Traversing existing persistent models of source code repositories
is much faster than traversing transient models that are created
from version control system on the fly
14
35. srcrepo – A Framework for Model-based MSR
▶ Eclipse’s MoDisco as reverse engineering framework
■ reverse engineering for Java, based on EMF
■ Support for many JRE-ased languages: Java, xText, JSP, XML
■ creates instances of a Java EMF meta-model that corresponds to the handwritten
JDT AST-model
■ provides transformation to language independent artifacts, e.g. KDM
▶ EMF-Fragments1 to store very large-models
■ uses No-SQL databases and stores larger model fragments within database entries
■ in contrast to object-by-object stores such as ORM-based CDO or No-SQL-based
Morsa or Neo4J
▶ Xtend programming with higher order functions to mimic OCL-style
definition of software metrics2
15
1.M.Scheidgen, A.Zubow,J.Fischer,T.H.Kolbe: Automated and Transparent Model Fragmentation for Persisting Large Models; ACM/IEEE
15th International Conference on Model Driven Engineering Languages & Systems (MODELS); Innsbruck; 2012
2. M.Scheidgen, J.Fischer: Model-based Mining of Software Repositories; 8th Systems and Modeling Conference, Valencia, Spain,
September 29th, 2014
36. “OCL” to Calculate Metrics of AST-Models
16
// Weighted number of methods per class.
def wmc(AbstractTypeDeclaration type,(Block) int weight) {
type.bodyDeclarations.sum[if (body != null) weight.apply(it.body) else 1]
}
37. Experiments
▶ Eclipse Foundation sources, i.e. Eclipse platform and plug-
ins (large scale software repository)
▶ Organized in different (couple hundred) projects: jdt, cdt,
emf, ...
▶ Available via GITHub
▶ GIT repositories can be gathered automated via GITHub’s
REST-ful API
▶ 200 largest Eclipse repositories that actually contained Java
code: 6.6 GB Git, 400 MLOC, 250 GB model with 4 billion
objects.
17
41. Delta-Compression1
21
cdt
webtools
gmf−tooling
rap
jdt.core
delta-models
initial revisions
uncompressed
Named element matching
GB
0
2
4
6
8
10
12
14
cdt
webtools
gmf−tooling
rap
jdt.core
detla-models
initial revisions
uncompressed
Meta-class matching
GB
0
2
4
6
8
10
12
14
cdt
webtools
gmf−tooling
rap
jdt.core
delta-lines
initial revisions
uncompressed
Line matching
MLines
0
20
40
60
80
100
initial
revisions
delta
models for
meta-class
delta
lines
delta models
for named
elements
020406080
Compressed relative to full size
(%)
parse compress
named
elements
compress
meta-class
decompr.
meta-class
decompr.
named
elements
parse compress
named
elements
compress
meta-class
decompr.
meta-class
decompr.
named
elements
050010001500
Avg. execution times
avg.timeperrevision(ms)
10502001000
Avg. execution times (logarithmic)
avg.timeperrevision(ms)
1.M.Scheidgen: Evaluation of Model Comparison for Delta-Compression in Model Persistence; BigMDE 2016 (at STAF 2016),Vienna,
Austria, July 6-7, 2016
42. Model Creation v Analysis with Delta-Compression
22
jdt.ui
xtext
eclipselink
jdt.core
swt
cdt
ocl
ptp
org.aspectj
cdo
udf
merge/increment
load
save
compress
parse
checkout
time(hours)
0
2
4
6
8
10
12
14
jdt.ui
xtext
eclipselink
jdt.core
swt
cdt
ocl
ptp
org.aspectj
cdo
udf
merge/increment
load
save
parse
checkout
time(hours)
0
2
4
6
8
10
12
14
without compression with compression
43. Conclusions
▶ MSR can support software evolution and helps to
understand software evolution
▶ Traversing a source code repository to gather information
(MSR) is very time consuming, especially with iterative
analysis
▶ It is possible to save most of this time via saving data in its
model state, at the cost of comparably large models that
need to be persisted
▶ The MSR analysis execution time savings are considerable
23