"Improving Software Maintenance using Unsupervised Machine Learning techniques": Ph.D. defence presentation.
Unsupervised Machine Learning techniques have been used to face different software maintenance issues such as Software Modularisation and Clone detection.
Machine Learning for Software MaintainabilityValerio Maggio
"Machine Learning for Software Maintainability":
Slides presented at the Joint Workshop on Intelligent Methods for Software System Engineering (JIMSE) 2012
Histolab: an Open Source Python Library for Reproducible Digital PathologyAlessia Marcolini
The histo-pathological analysis of tissue sections is the gold standard to assess the presence of many complex diseases, such as tumors and it is expected to be at the center of the AI revolution in medicine, prevision supported by the increasing success of deep learning applications to digital pathology. The aim of histolab is to provide a tool for Whole Slide Images (WSIs) processing in a reproducible environment to support clinical and scientific research. histolab is designed to handle WSIs, automatically detect the tissue, and retrieve informative tiles.
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...ACM Chicago
Join us as Tao Xie, Professor and Willett Faculty Scholar in the Department of Computer Science at the University of Illinois at Urbana-Champaign and ACM Distinguished Speaker, talks about Intelligent Software Engineering: Synergy between AI and Software Engineering. This is a joint meeting hosted by Chicago Chapter ACM / Loyola University Computer Science Department.
Machine Learning for Software MaintainabilityValerio Maggio
"Machine Learning for Software Maintainability":
Slides presented at the Joint Workshop on Intelligent Methods for Software System Engineering (JIMSE) 2012
Histolab: an Open Source Python Library for Reproducible Digital PathologyAlessia Marcolini
The histo-pathological analysis of tissue sections is the gold standard to assess the presence of many complex diseases, such as tumors and it is expected to be at the center of the AI revolution in medicine, prevision supported by the increasing success of deep learning applications to digital pathology. The aim of histolab is to provide a tool for Whole Slide Images (WSIs) processing in a reproducible environment to support clinical and scientific research. histolab is designed to handle WSIs, automatically detect the tissue, and retrieve informative tiles.
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...ACM Chicago
Join us as Tao Xie, Professor and Willett Faculty Scholar in the Department of Computer Science at the University of Illinois at Urbana-Champaign and ACM Distinguished Speaker, talks about Intelligent Software Engineering: Synergy between AI and Software Engineering. This is a joint meeting hosted by Chicago Chapter ACM / Loyola University Computer Science Department.
Using electronic laboratory notebooks in the academic life sciences: a group ...SC CTSI at USC and CHLA
Digital technologies are shaping the way experiments are performed, results captured, and findings disseminated. We will explore what an Electronic Laboratory Notebook (ELN) afford a biomedical researcher, what it requires, and how one should go about implementing it? Electronic lab notebooks are a fairly new technology and offer many benefits to the user as well as organizations. For example: electronic lab notebooks are easier to search upon, simplify data copying and backups, and support collaboration amongst many users. ELNs can have fine-grained access controls, and can be more secure than their paper counterparts. They also allow the direct incorporation of data from instruments, replacing the practice of printing out data to be stapled into a paper notebook. ELNs offer significant benefits to researchers by facilitating long-term storage, reproducibility, and enhanced availability of experiment records across multiple devices, ensuring standard operating procedure compliance and providing interfaces to instrumentation, supporting IP protection, collaboration, and open science. ELNs eliminate the need for manual transcription and can be used by distributed groups, facilitate managing notes, and simplify the inclusion and curation of digital resources (e.g. instrument data, analysis results).
Data has always been used in every company irrespective of its domain to improve the operational
efficiency and the products themselves. However, analyzing and extracting information from “Big Data”
is the next revolution in technology, since previously unknown nuggets of information are now made
visible. In fact, over 90% of the data available in the world has been generated in the last two years.
“Big Data” analytics has become the next hot topic for most companies - from financial institutions to
technology companies to service providers. Likewise in software engineering, data collected about the
development of software, the operation of the software in the field, and the users feedback on software
have been used before. However, collecting and analyzing this information across hundreds of thousands
or millions of software projects gives us the unique ability to reason about the ecosystem at large, and
software in general. At no time in history has there been easier access to extremely powerful
computational resources as it is today, thanks to the advances in cloud computing, both from the
technology and business perspectives. Therefore, it is easier today than ever before to analyze big data.
In this technical briefing, we will present the state-of-the-art with respect to the research carried out in the
area of big data analytics in software engineering research. We will present the research along three
dimensions:
1) What are the software engineering problems being solved? Examples of problems include: How
much source code is newly written and how much is reused from past projects? Can we
recommend best practices to developers by observing the development of software among
hundreds of thousands of software projects?
2) What are the datasets that are being used? Examples of my datasets include: all the mobile apps
in the Google Play store, all of the world's Open Source projects, and hundreds of gigabytes of
execution logs. Such large datasets provide us with a unique view into the SE field.
3) What are the tools and techniques available to analyze the large datasets? We intend to present
generic software solutions that have been applied to big datasets in other areas of research, and
the tools and techniques created by software engineering researchers.
In the end we will present the challenges inherently present in large datasets - volume, variety, velocity,
and veracity. Such challenges often complicate the analysis of the data and can invalidate the
interpretation of the results. We will conclude with the future opportunities that are present in big data
analytics for software engineering research.
Towards Mining Software Repositories Research that MattersTao Xie
Towards Mining Software Repositories Research that Matters. Talk slides at Next Generation of Mining Software Repositories '14 (Pre-FSE 2014 Event), Nov 15–16. HKUST, Hong Kong http://ng2014.msrworld.org/
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...inside-BigData.com
In this deck from the HPC User Forum in Milwaukee, Arno Kolster from Providentia Worldwide presents: Machine & Deep Learning: Practical Deployments and Best Practices for the Next Two Years.
Providentia Worldwide is a new venture in technology and solutions consulting which bridges the gap between High Performance Computing and Enterprise Hyperscale computing. We take the best practices from the most demanding compute environments in the world and apply those techniques and design patterns to your business. We believe that a solution isn't a solution until it works for you, the way you and your organization envisions. Whether it's a short term engagement around technology vision and strategy, a new project in deep learning and streaming analytics, or an in-depth, long-term partnership with reference implementations, operational guidance, and technology leadership -- we have the experience and the expertise to bring your best ideas to life.
Watch the video: https://insidehpc.com/2017/09/machine-deep-learning-practical-deployments-best-practices-next-two-years/
Learn more: http://www.providentiaworldwide.com/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
This is an introductory workshop for machine learning. Introduced machine learning tasks such as supervised learning, unsupervised learning and reinforcement learning.
Finding Bad Code Smells with Neural Network Models IJECEIAES
Code smell refers to any symptom introduced in design or implementation phases in the source code of a program. Such a code smell can potentially cause deeper and serious problems during software maintenance. The existing approaches to detect bad smells use detection rules or standards using a combination of different object-oriented metrics. Although a variety of software detection tools have been developed, they still have limitations and constraints in their capabilities. In this paper, a code smell detection system is presented with the neural network model that delivers the relationship between bad smells and object-oriented metrics by taking a corpus of Java projects as experimental dataset. The most well-known objectoriented metrics are considered to identify the presence of bad smells. The code smell detection system uses the twenty Java projects which are shared by many users in the GitHub repositories. The dataset of these Java projects is partitioned into mutually exclusive training and test sets. The training dataset is used to learn the network model which will predict smelly classes in this study. The optimized network model will be chosen to be evaluated on the test dataset. The experimental results show when the modelis highly trained with more dataset, the prediction outcomes are improved more and more. In addition, the accuracy of the model increases when it performs with higher epochs and many hidden layers.
Transferring Software Testing Tools to PracticeTao Xie
ACM SIGSOFT Webinar co-presented by Nikolai Tillmann (Microsoft), Judith Bishop (Microsoft Research), Pratap Lakshman (Microsoft), Tao Xie (University of Illinois at Urbana-Champaign) http://www.sigsoft.org/resources/webinars.html
Most work on scholarly document processing assumes that the information processed is trustworthy and factually correct. However, this is not always the case. There are two core challenges, which should be addressed: 1) ensuring that scientific publications are credible -- e.g. that claims are not made without supporting evidence, and that all relevant supporting evidence is provided; and 2) that scientific findings are not misrepresented, distorted or outright misreported when communicated by journalists or the general public. I will present some first steps towards addressing these problems and outline remaining challenges.
Unsupervised Machine Learning for clone detectionValerio Maggio
"Unsupervised Machine Learning for clone detection" highlights the main topics of using Unsupervised Machine Learning techniques (Kernel methods and data clustering) for the code clones detection task.
Using electronic laboratory notebooks in the academic life sciences: a group ...SC CTSI at USC and CHLA
Digital technologies are shaping the way experiments are performed, results captured, and findings disseminated. We will explore what an Electronic Laboratory Notebook (ELN) afford a biomedical researcher, what it requires, and how one should go about implementing it? Electronic lab notebooks are a fairly new technology and offer many benefits to the user as well as organizations. For example: electronic lab notebooks are easier to search upon, simplify data copying and backups, and support collaboration amongst many users. ELNs can have fine-grained access controls, and can be more secure than their paper counterparts. They also allow the direct incorporation of data from instruments, replacing the practice of printing out data to be stapled into a paper notebook. ELNs offer significant benefits to researchers by facilitating long-term storage, reproducibility, and enhanced availability of experiment records across multiple devices, ensuring standard operating procedure compliance and providing interfaces to instrumentation, supporting IP protection, collaboration, and open science. ELNs eliminate the need for manual transcription and can be used by distributed groups, facilitate managing notes, and simplify the inclusion and curation of digital resources (e.g. instrument data, analysis results).
Data has always been used in every company irrespective of its domain to improve the operational
efficiency and the products themselves. However, analyzing and extracting information from “Big Data”
is the next revolution in technology, since previously unknown nuggets of information are now made
visible. In fact, over 90% of the data available in the world has been generated in the last two years.
“Big Data” analytics has become the next hot topic for most companies - from financial institutions to
technology companies to service providers. Likewise in software engineering, data collected about the
development of software, the operation of the software in the field, and the users feedback on software
have been used before. However, collecting and analyzing this information across hundreds of thousands
or millions of software projects gives us the unique ability to reason about the ecosystem at large, and
software in general. At no time in history has there been easier access to extremely powerful
computational resources as it is today, thanks to the advances in cloud computing, both from the
technology and business perspectives. Therefore, it is easier today than ever before to analyze big data.
In this technical briefing, we will present the state-of-the-art with respect to the research carried out in the
area of big data analytics in software engineering research. We will present the research along three
dimensions:
1) What are the software engineering problems being solved? Examples of problems include: How
much source code is newly written and how much is reused from past projects? Can we
recommend best practices to developers by observing the development of software among
hundreds of thousands of software projects?
2) What are the datasets that are being used? Examples of my datasets include: all the mobile apps
in the Google Play store, all of the world's Open Source projects, and hundreds of gigabytes of
execution logs. Such large datasets provide us with a unique view into the SE field.
3) What are the tools and techniques available to analyze the large datasets? We intend to present
generic software solutions that have been applied to big datasets in other areas of research, and
the tools and techniques created by software engineering researchers.
In the end we will present the challenges inherently present in large datasets - volume, variety, velocity,
and veracity. Such challenges often complicate the analysis of the data and can invalidate the
interpretation of the results. We will conclude with the future opportunities that are present in big data
analytics for software engineering research.
Towards Mining Software Repositories Research that MattersTao Xie
Towards Mining Software Repositories Research that Matters. Talk slides at Next Generation of Mining Software Repositories '14 (Pre-FSE 2014 Event), Nov 15–16. HKUST, Hong Kong http://ng2014.msrworld.org/
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...inside-BigData.com
In this deck from the HPC User Forum in Milwaukee, Arno Kolster from Providentia Worldwide presents: Machine & Deep Learning: Practical Deployments and Best Practices for the Next Two Years.
Providentia Worldwide is a new venture in technology and solutions consulting which bridges the gap between High Performance Computing and Enterprise Hyperscale computing. We take the best practices from the most demanding compute environments in the world and apply those techniques and design patterns to your business. We believe that a solution isn't a solution until it works for you, the way you and your organization envisions. Whether it's a short term engagement around technology vision and strategy, a new project in deep learning and streaming analytics, or an in-depth, long-term partnership with reference implementations, operational guidance, and technology leadership -- we have the experience and the expertise to bring your best ideas to life.
Watch the video: https://insidehpc.com/2017/09/machine-deep-learning-practical-deployments-best-practices-next-two-years/
Learn more: http://www.providentiaworldwide.com/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
This is an introductory workshop for machine learning. Introduced machine learning tasks such as supervised learning, unsupervised learning and reinforcement learning.
Finding Bad Code Smells with Neural Network Models IJECEIAES
Code smell refers to any symptom introduced in design or implementation phases in the source code of a program. Such a code smell can potentially cause deeper and serious problems during software maintenance. The existing approaches to detect bad smells use detection rules or standards using a combination of different object-oriented metrics. Although a variety of software detection tools have been developed, they still have limitations and constraints in their capabilities. In this paper, a code smell detection system is presented with the neural network model that delivers the relationship between bad smells and object-oriented metrics by taking a corpus of Java projects as experimental dataset. The most well-known objectoriented metrics are considered to identify the presence of bad smells. The code smell detection system uses the twenty Java projects which are shared by many users in the GitHub repositories. The dataset of these Java projects is partitioned into mutually exclusive training and test sets. The training dataset is used to learn the network model which will predict smelly classes in this study. The optimized network model will be chosen to be evaluated on the test dataset. The experimental results show when the modelis highly trained with more dataset, the prediction outcomes are improved more and more. In addition, the accuracy of the model increases when it performs with higher epochs and many hidden layers.
Transferring Software Testing Tools to PracticeTao Xie
ACM SIGSOFT Webinar co-presented by Nikolai Tillmann (Microsoft), Judith Bishop (Microsoft Research), Pratap Lakshman (Microsoft), Tao Xie (University of Illinois at Urbana-Champaign) http://www.sigsoft.org/resources/webinars.html
Most work on scholarly document processing assumes that the information processed is trustworthy and factually correct. However, this is not always the case. There are two core challenges, which should be addressed: 1) ensuring that scientific publications are credible -- e.g. that claims are not made without supporting evidence, and that all relevant supporting evidence is provided; and 2) that scientific findings are not misrepresented, distorted or outright misreported when communicated by journalists or the general public. I will present some first steps towards addressing these problems and outline remaining challenges.
Unsupervised Machine Learning for clone detectionValerio Maggio
"Unsupervised Machine Learning for clone detection" highlights the main topics of using Unsupervised Machine Learning techniques (Kernel methods and data clustering) for the code clones detection task.
LINSEN an efficient approach to split identifiers and expand abbreviationsValerio Maggio
"Linsen an efficient approach to split identifiers and expand abbreviations"
Slides presented at the International Conference of Software Maintenance (ICSM) 2012, Riva del Garda (TN), Italy
Refactoring can either completely disrupt your project or make you go faster. This presentation will help you to avoid some pitfalls.
It also demonstrates refactorings that you could apply straight away to make your code better.
Lecture of Software Engineering II - University of Naples Federico II Main Topics:
- Testing Taxonimes
- Unit Testing with JUnit 4.x
- Testing Scaffolding and Mocking
- JMock
Abstract-Software is ubiquitous in our daily life. It brings us great convenience and a big headache about software reliability as well: Software is never bug-free, and software bugs keep incurring monetary loss of even catastrophes. In the pursuit of better reliability, software engineering researchers found that huge amount of data in various forms can be collected from software systems, and these data, when properly analyzed, can help improve software reliability. Unfortunately, the huge volume of complex data renders the analysis of simple techniques incompetent; consequently, studies have been resorting to data mining for more effective analysis. In the past few years, we have witnessed many studies on mining for software reliability reported in data mining as well as software engineering forums. These studies either develop new or apply existing data mining techniques to tackle reliability problems from different angles. In order to keep data mining researchers abreast of the latest development in this growing research area, we propose this paper on data mining for software reliability. In this paper, we will present a comprehensive overview of this area, examine representative studies, and lay out challenges to data mining researchers.
Improvement of Software Maintenance and Reliability using Data Mining Techniquesijdmtaiir
Software is ubiquitous in our daily life. It brings us
great convenience and a big headache about software reliability
as well: Software is never bug-free, and software bugs keep
incurring monetary loss of even catastrophes. In the pursuit of
better reliability, software engineering researchers found that
huge amount of data in various forms can be collected from
software systems, and these data, when properly analyzed, can
help improve software reliability. Unfortunately, the huge
volume of complex data renders the analysis of simple
techniques incompetent; consequently, studies have been
resorting to data mining for more effective analysis. In the past
few years, we have witnessed many studies on mining for
software reliability reported in data mining as well as software
engineering forums. These studies either develop new or apply
existing data mining techniques to tackle reliability problems
from different angles. In order to keep data mining researchers
abreast of the latest development in this growing research area,
we propose this paper on data mining for software reliability.
In this paper, we will present a comprehensive overview of this
area, examine representative studies, and lay out challenges to
data mining researchers
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi | Automating Machine Learning, Artificial Intelligence, and Data Science | Guided Analytics
Keynote presentation from ECBS conference. The talk is about how to use machine learning and AI in improving software engineering. Experiences from our project in Software Center (www.software-center.se).
Analyzing Big Data's Weakest Link (hint: it might be you)HPCC Systems
Tim Menzies, NC State University, presents at the 2015 HPCC Systems Engineering Summit Community Day.
For Big Data applications, there is a lack of any gold standards for "good analysis" or methods to assess our certification programs. Hence, we are still in the dark about whether or not our human analysts are making the best use possible of the tools of Big Data. While much progress has been made in the systems aspects of Big Data, certain critical human-centered aspects remain an open issue. Regardless of the sophistication of the analysis tools and environment, all that architecture can still be used incorrectly by users. If this issue was confined to a small number of inexperienced users, then it could be addressed via process improvements such as better training. But is it? What do we know about our analysts? Where are the studies that mine the people doing the data miners?
This presentation offers some preliminary results on tools that combine ECL with other methods that recognize the code generated by experienced or inexperienced developers. While the results are preliminary, they do raise the possibility that we can better characterize what it means to be experienced (or inexperienced) at Big Data applications.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Similar to Improving Software Maintenance using Unsupervised Machine Learning techniques (20)
"Number Crunching in Python": slides presented at EuroPython 2012, Florence, Italy
Slides have been authored by me and by Dr. Enrico Franchi.
Scientific and Engineering Computing, Numpy NDArray implementation and some working case studies are reported.
"Clone detection in Python": Slides presented at EuroPython 2012
Clone Detection in Python highlights the topic of code duplication detection using Machine Learning techniques.
Some examples on Python code duplications and C-Python implementation duplications are reported as well.
A Tree Kernel based approach for clone detectionValerio Maggio
"A Tree Kernel based approach for clone detection"
Slides presented at the International Conference of Software Maintenance (ICSM) 2010 - TImsoara, Romania
Refactoring: Improve the design of existing codeValerio Maggio
Refactoring: Improve the design of existing code
Software Engineering class on main refactoring techniques and bad smells reported in the famous Fawler's book on this topic!
Software Engineering Class on (main) Scaffolding techniques using JMock Java Framework.
Slides report some working examples and highlight differences among Mock Objects, Test Double and Test Stubs.
Lecture of Software Engineering II
University of Naples Federico II
Main Topics:
- What is TDD
- TDD and XP
- TDD Mantra
- TDD Principles and Practices
Web frameworks: web development done right.
Lecture of Web Technologies, University of Naples Federico II
Main topics:
- Evolution of Web Technolgies
- Web frameworks and Design Principles
- Django and Google App engine: web frameworks in Python
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
The Art of the Pitch: WordPress Relationships and Sales
Improving Software Maintenance using Unsupervised Machine Learning techniques
1. Dottorato in Scienze Computazionali e Informatiche, XXV Ciclo
Ph.D. Candidate:
Valerio Maggio
Thesis Advisors:
Dr. Sergio Di Martino
Dr. Anna Corazza
June 5th, 2013
UNSUPERVISED
MACHINE LEARNING
FOR SOFTWARE
MAINTENANCE
2. THESIS
G O A L
UNSUPERVISED
MACHINE LEARNING FOR
SOFTWARE MAINTENANCE
These solutions exploit (unsupervised) machine learning
techniques to mine information from the source code
Define and experimentally evaluate solutions (techniques and
prototype tools) for automatic software analysis to support
software maintenance activities.
3. There exist different types of
Software Maintenance.
SOFTWARE MAINTENANCE
“A software system must be continually adapted during its overall life cycle or it
progressively becomes less satisfactory.”
(cit. Lehman’s First Law of Software Evolution)
4. SOFTWARE MAINTENANCE
“A software system must be continually adapted during its overall life cycle or it
progressively becomes less satisfactory.”
(cit. Lehman’s First Law of Software Evolution)
Software Maintenance represents the most
expensive, time consuming and challenging
phase of the whole development process.
5. SOFTWARE MAINTENANCE
“A software system must be continually adapted during its overall life cycle or it
progressively becomes less satisfactory.”
(cit. Lehman’s First Law of Software Evolution)
Software Maintenance represents the most
expensive, time consuming and challenging
phase of the whole development process.
Software Maintenance could account up to the
85-90% of the total software costs.
8. ISSUES
SOFTWARE
MAINTENANCE
The source code (usually) represents the most
reliable source of information about the system
“Software Maintenance is about change!”
(cit. S. Jarzabek)
Change Analysis
Program
Comprehension
The documentation is usually
scarce or not up to date!
9. ISSUES
SOFTWARE
MAINTENANCE
The source code (usually) represents the most
reliable source of information about the system
“Software Maintenance is about change!”
(cit. S. Jarzabek)
Change Analysis
Program
Comprehension
Reverse Engineering
The documentation is usually
scarce or not up to date!
10. REVERSE
E N G I N E
E R I N G
• Definition of tools and techniques to support maintenance activities
• Goal: Build higher-level software models in an automatic fashion
gathering information from the source code or any other document
• Goal: To aid the comprehension of the system
11. REVERSE
E N G I N E
E R I N G
• Definition of tools and techniques to support maintenance activities
• Goal: Build higher-level software models in an automatic fashion
gathering information from the source code or any other document
• Goal: To aid the comprehension of the system
STATIC
ANALYSIS
DYNAMIC
ANALYSIS
12. REVERSE
E N G I N E
E R I N G
• Definition of tools and techniques to support maintenance activities
• Goal: Build higher-level software models in an automatic fashion
gathering information from the source code or any other document
• Goal: To aid the comprehension of the system
STATIC
ANALYSIS
DYNAMIC
ANALYSIS
13. MACHINE
L E A R N
I N G
• Provides computational effective solutions to analyze large data sets
14. MACHINE
L E A R N
I N G
• Provides computational effective solutions to analyze large data sets
• Provides solutions that can be tailored to different tasks/domains
15. MACHINE
L E A R N
I N G
• Provides computational effective solutions to analyze large data sets
• Provides solutions that can be tailored to different tasks/domains
• Requires many efforts in:
• the definition of the relevant information best suited for the specific task/domain
16. MACHINE
L E A R N
I N G
• Provides computational effective solutions to analyze large data sets
• Provides solutions that can be tailored to different tasks/domains
• Requires many efforts in:
• the definition of the relevant information best suited for the specific task/domain
• the application of the learning algorithms to the considered data
19. UNSUPERVISEDLEARNING
• Supervised Learning:
• Learn from labelled samples
• Unsupervised Learning:
• Learn (directly) from the data
Learn by examples
(+) No cost of labeling samples
(-) Trade-off imposed on the quality of the data
21. THESIS
C O N T R
I B U T I
O N S
Contributions to three relevant and related
open issues in Software Maintenance
&
22. THESIS
C O N T R
I B U T I
O N S
Contributions to three relevant and related
open issues in Software Maintenance
1. SOFTWARE
RE-MODULARIZATION
3. CLONE
DETECTION
2. SOURCE CODE
NORMALIZATION
&
24. Software Classes
UI Process
Components
UI
Components
Data Access
Components
Data Helpers /
Utilities
Security
Operational Management
Communications
Business
Components
Application Facade
Buisiness
Workflows
Messages
Interfaces
Service Interfaces
Re-modularization provides a way to
support software maintainers by
automatically grouping together
(clustering) “related” software classes
SOFTWARE
RE-MODULARIZATION
PROBL
EM
S T A T E
M E N T
25. Re-modularization provides a way to
support software maintainers by
automatically grouping together
(clustering) “related” software classes
SOFTWARE
RE-MODULARIZATION
External
Systems
Service Consumers
Services
Service Interfaces
Messages
Interfaces
Cross Cutting
Security
OperationalManagement
Communications
Data
Data Access
Components
Data Helpers /
Utilities
Presentation
UI
Components
UI Process
Components
Business
Application Facade
Buisiness
Workflows
Business
Components
Clusters of Software Classes
PROBL
EM
S T A T E
M E N T
29. SOURCECODE
LEXICALINFORMATION (IDEA): Exploit the lexical information gathered from the source code
to produce clusters of classes that are lexically related.
State of the Art: Information Retrieval (IR) based approaches.
49. CONTRI
BUTION
SOFTWARE RE-
MODULARIZATION
• We propose a source code Zone Indexing
Corazza A., Di Martino, S., Maggio, V., Scanniello, G.
Investigating the use of lexical information for software system clustering
(2011) Proceedings of the European Conference on Software Maintenance and Reengineering, CSMR, art. no. 5741257, pp. 35-44.
ISSN: 15345351 ISBN: 978-076954343-7
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.
Combining machine learning and information retrieval techniques for software clustering
(2012) Communications in Computer and Information Science, 255 CCIS, pp. 42-60. ISSN: 18650929 ISBN: 978-364228032-0
50. CONTRI
BUTION
SOFTWARE RE-
MODULARIZATION
• We propose a source code Zone Indexing
• An approach to automatically weight the contribution of different terms in
Zones
• Maximum-Likelihood Estimation (MLE) Approach
Corazza A., Di Martino, S., Maggio, V., Scanniello, G.
Investigating the use of lexical information for software system clustering
(2011) Proceedings of the European Conference on Software Maintenance and Reengineering, CSMR, art. no. 5741257, pp. 35-44.
ISSN: 15345351 ISBN: 978-076954343-7
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.
Combining machine learning and information retrieval techniques for software clustering
(2012) Communications in Computer and Information Science, 255 CCIS, pp. 42-60. ISSN: 18650929 ISBN: 978-364228032-0
51. CONTRI
BUTION
SOFTWARE RE-
MODULARIZATION
• We propose a source code Zone Indexing
• An approach to automatically weight the contribution of different terms in
Zones
• Maximum-Likelihood Estimation (MLE) Approach
• A variant of the K-medoid clustering algorithm
• Tailored to the software clustering task
Corazza A., Di Martino, S., Maggio, V., Scanniello, G.
Investigating the use of lexical information for software system clustering
(2011) Proceedings of the European Conference on Software Maintenance and Reengineering, CSMR, art. no. 5741257, pp. 35-44.
ISSN: 15345351 ISBN: 978-076954343-7
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.
Combining machine learning and information retrieval techniques for software clustering
(2012) Communications in Computer and Information Science, 255 CCIS, pp. 42-60. ISSN: 18650929 ISBN: 978-364228032-0
57. AUTHORITATIVENESS
RE
SULTS
Comparison of Clustering Results with an Authoritative Target Partition
Ant Lucene Tomcat Azureus Hibernate ITextdotNet jEdit JFreeChart JFTP JHotDraw JRefactory JUnit LiferayPortal Pmd Synapse TigerEnvelopes Velocity Xalan Xerces
0
0.2
0.4
0.6
0.8
1.0
19 target Open Source Java Systems FLAT INDEXING (NO ZONES) = STATE OF THE ART
FLAT INDEXING (NO ZONES) ZONE INDEXING ZONE INDEXING + MLE WEIGHTS
58. AUTHORITATIVENESS
RE
SULTS
Comparison of Clustering Results with an Authoritative Target Partition
Ant Lucene Tomcat Azureus Hibernate ITextdotNet jEdit JFreeChart JFTP JHotDraw JRefactory JUnit LiferayPortal Pmd Synapse TigerEnvelopes Velocity Xalan Xerces
0
0.2
0.4
0.6
0.8
1.0
19 target Open Source Java Systems FLAT INDEXING (NO ZONES) = STATE OF THE ART
FLAT INDEXING (NO ZONES) ZONE INDEXING ZONE INDEXING + MLE WEIGHTS
59. AUTHORITATIVENESS
RE
SULTS
Comparison of Clustering Results with an Authoritative Target Partition
Ant Lucene Tomcat Azureus Hibernate ITextdotNet jEdit JFreeChart JFTP JHotDraw JRefactory JUnit LiferayPortal Pmd Synapse TigerEnvelopes Velocity Xalan Xerces
0
0.2
0.4
0.6
0.8
1.0
19 target Open Source Java Systems FLAT INDEXING (NO ZONES) = STATE OF THE ART
FLAT INDEXING (NO ZONES) ZONE INDEXING ZONE INDEXING + MLE WEIGHTS
60. AUTHORITATIVENESS
RE
SULTS
Comparison of Clustering Results with an Authoritative Target Partition
19 target Open Source Java Systems FLAT INDEXING (NO ZONES) = STATE OF THE ART
0
0.2
0.4
0.6
0.8
1.0
Ant Lucene Tomcat Azureus Hibernate ITextdotNet jEdit JFreeChart JFTP JHotDraw JRefactory JUnit LiferayPortal Pmd Synapse TigerEnvelopes Velocity Xalan Xerces
FLAT INDEXING (NO ZONES) ZONE INDEXING ZONE INDEXING + MLE WEIGHTS
70. • camelCase Splitter: r’(?<=!^)([A-Z][a-z]+)’
• drawXORRect ==> drawXOR | Rect
• drawxorrect ==> NO SPLIT
Splitting algorithms based on naming conventions
are not robust enough
IDENTIFIERSSPLITTING
72. Splitting algorithms based on naming conventions
are not robust enough
• Heavy use of Abbreviations in the source code
• r as for Rectangle
• rect as for Rectangle
ABBREVIATIONSEXPANSION
73. Splitting algorithms based on naming conventions
are not robust enough
• Heavy use of Abbreviations in the source code
• r as for Rectangle
• rect as for Rectangle
ABBREVIATIONSEXPANSION
76. CONTRI
BUTION
SOURCE CODE
NORMALIZATION
• LINSEN: novel technique for code normalization
• Able to both Split Identifiers and Expand possible occurring
abbreviations
Corazza, A., Di Martino, S., Maggio, V.
LINSEN: An efficient approach to split identifiers and expand abbreviations
(2012) IEEE International Conference on Software Maintenance, ICSM, art. no. 6405277, pp. 233-242. ISBN: 978-146732312-3
77. CONTRI
BUTION
SOURCE CODE
NORMALIZATION
• LINSEN: novel technique for code normalization
• Able to both Split Identifiers and Expand possible occurring
abbreviations
• Based on an efficient String Matching technique:
Baeza-Yates&Perlberg Algorithm (BYP)
• Exploit different Sources of Information to find the matching
words
Corazza, A., Di Martino, S., Maggio, V.
LINSEN: An efficient approach to split identifiers and expand abbreviations
(2012) IEEE International Conference on Software Maintenance, ICSM, art. no. 6405277, pp. 233-242. ISBN: 978-146732312-3
78. SKETCHOFTHEALGORITHM
• NODES correspond to characters of the current identifier
Model: Weighted Directed Graph
Example: drawXORRect identifier
0
11
1
9
4
5
7
8
2 63
10
79. SKETCHOFTHEALGORITHM
• ARCS corresponds to matchings between identifier substrings
and dictionary words
• NODES correspond to characters of the current identifier
Model: Weighted Directed Graph
Example: drawXORRect identifier
0
11
1
9
4
5
7
8
2 63
10
80. SKETCHOFTHEALGORITHM
• ARCS corresponds to matchings between identifier substrings
and dictionary words
• Padding Arcs to ensure the Graph always connected
• NODES correspond to characters of the current identifier
Model: Weighted Directed Graph
Example: drawXORRect identifier
0
11
1
9
4
“r”,C-MAX
“d”,C-MAX
5
7
8
2
“R”,C-MAX
6
“a”,C-MAX
3
“w”,C-MAX
“X”,C-MAX
“O”,C-MAX
“e”,C-MAX
10
“c”,C-MAX“t”,C-MAX
“raw”,c(“raw”)
“draw”,c(“draw”)
“or”,c(“or”)
“xor”,c(“xor”)
“rectangle”,c(“rectangle”)
“R”,C-MAX
81. SKETCHOFTHEALGORITHM
• Every Arc is Labelled with the corresponding dictionary word
• Weights represent the “cost” of each matching
• Cost function [c(“word”)] favors longest words and domain-
related information
Model: Weighted Directed Graph
Example: drawXORRect identifier
0
11
1
9
4
“r”,C-MAX
“d”,C-MAX
5
7
8
2
“R”,C-MAX
6
“a”,C-MAX
3
“w”,C-MAX
“X”,C-MAX
“O”,C-MAX
“e”,C-MAX
10
“c”,C-MAX“t”,C-MAX
“raw”,c(“raw”)
“draw”,c(“draw”)
“or”,c(“or”)
“xor”,c(“xor”)
“rectangle”,c(“rectangle”)
“R”,C-MAX
82. SKETCHOFTHEALGORITHM
• The final Mapping Solution corresponds to the sequence of labels
in the path with the minimum cost (Djikstra Algorithm)
Model: Weighted Directed Graph
Example: drawXORRect identifier
0
11
1
9
4
“r”,C-MAX
“d”,C-MAX
5
7
8
2
“R”,C-MAX
6
“a”,C-MAX
3
“w”,C-MAX
“X”,C-MAX
“O”,C-MAX
“e”,C-MAX
10
“c”,C-MAX“t”,C-MAX
“raw”,c(“raw”)
“draw”,c(“draw”)
“or”,c(“or”)
“xor”,c(“xor”)
“rectangle”,c(“rectangle”)
“R”,C-MAX
83. SPLITTING
Accuracy Rates for the
comparison with
DTW (Madani et. al 2010)
0
0.25
0.5
0.75
1
JhotDraw 5.1 Lynx 2.8.5
DTW LINSEN DTW LINSEN
DTW LINSEN
RE
SULTS
# 1
Accuracy Rates for the
comparison with GenTest
(Lawrie and Binkley, 2011)
0
0.25
0.5
0.75
1
which 2.20 a2ps 4.14
GenTest LINSEN
DTW O(n3)
84. Accuracy Rates for the
comparison with
AMAP (Hill and Pollock, 2008)
0
0.25
0.5
0.75
1
CW DL OO AC PR SL TOTAL (Avg)
AMAP
LINSEN
EXPANSION
RE
SULTS
# 2
CW: Combination Words
DL: Dropped Letters
OO: Others
AC: Acronyms
PR: Prefix
SL: Single Letters
95. CONTRI
BUTION
CLONE DETECTION
• Combining different sources of information to improve the
effectiveness of the detection process
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.
A Tree Kernel based approach for clone detection
(2010) IEEE International Conference on Software Maintenance, ICSM, art. no. 5609715. ISBN: 978-142448629-8
Corazza, A., Di Martino, S., Maggio, V., Moschitti, A., Passerini, A., Scanniello, G., Silvestri F.
Using Machine Learning and Information Retrieval Techniques to Improve Software Maintainability
(2013) Communications in Computer and Information Science, In Press
96. CONTRI
BUTION
CLONE DETECTION
• Combining different sources of information to improve the
effectiveness of the detection process
• Structural and Lexical Information
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.
A Tree Kernel based approach for clone detection
(2010) IEEE International Conference on Software Maintenance, ICSM, art. no. 5609715. ISBN: 978-142448629-8
Corazza, A., Di Martino, S., Maggio, V., Moschitti, A., Passerini, A., Scanniello, G., Silvestri F.
Using Machine Learning and Information Retrieval Techniques to Improve Software Maintainability
(2013) Communications in Computer and Information Science, In Press
97. CONTRI
BUTION
CLONE DETECTION
• Combining different sources of information to improve the
effectiveness of the detection process
• Structural and Lexical Information
• Application of Kernel Methods to Source Code Structures
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.
A Tree Kernel based approach for clone detection
(2010) IEEE International Conference on Software Maintenance, ICSM, art. no. 5609715. ISBN: 978-142448629-8
Corazza, A., Di Martino, S., Maggio, V., Moschitti, A., Passerini, A., Scanniello, G., Silvestri F.
Using Machine Learning and Information Retrieval Techniques to Improve Software Maintainability
(2013) Communications in Computer and Information Science, In Press
98. CONTRI
BUTION
CLONE DETECTION
• Combining different sources of information to improve the
effectiveness of the detection process
• Structural and Lexical Information
• Application of Kernel Methods to Source Code Structures
• Typically applied in other field such as NLP or Bioinformatics
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.
A Tree Kernel based approach for clone detection
(2010) IEEE International Conference on Software Maintenance, ICSM, art. no. 5609715. ISBN: 978-142448629-8
Corazza, A., Di Martino, S., Maggio, V., Moschitti, A., Passerini, A., Scanniello, G., Silvestri F.
Using Machine Learning and Information Retrieval Techniques to Improve Software Maintainability
(2013) Communications in Computer and Information Science, In Press
100. CODE
STRUCTURES
KERNELSFORSTRUCTURES
Abstract Syntax Tree (AST)
Tree structure representing the syntactic structure of
the different instructions of a program (function)
Program Dependencies Graph (PDG)
(Directed) Graph structure representing the relationship
among the different statement of a program
Computation of the dot product between (Graph) Structures
K( ),
101. CODE
STRUCTURES
KERNELSFORSTRUCTURES
Abstract Syntax Tree (AST)
Tree structure representing the syntactic structure of
the different instructions of a program (function)
Program Dependencies Graph (PDG)
(Directed) Graph structure representing the relationship
among the different statement of a program
Computation of the dot product between (Graph) Structures
K( ),
103. <
x y = =
x +
x 1
y -
y 1
while
block
while
block
block
if
>
b a = =
a +
a 1
b -
b 1
>
b 0 =
c 3
CODE AST
KERNELFORCLONES
104. <
x y = =
x +
x 1
y -
y 1
while
block
while
block
block
if
>
b a = =
a +
a 1
b -
b 1
>
b 0 =
c 3
CODE AST AST KERNEL
KERNELFORCLONES
<
block
while
= =
block
=
y -
=
x +
+
x 1
-
y 1
<
x y
>
b 0 =
c 3
if
block
>
b a
-
b 1
<
block
while
+
a 1
=
b -
=
a +
109. while
block<
x y
KERNELS
FOR CODE
STRUCTURES:
AST
KERNELFEATURES Instruction Class (IC)
i.e., LOOP, CALL,
CONDITIONAL_STATEMENT
Instruction (I)
i.e., FOR, IF, WHILE, RETURN
Context (C)
i.e., Instruction Class of
the closer statement node
Lexemes (Ls)
Lexical information gathered
(recursively) from leaves
110. while
block<
x y
KERNELS
FOR CODE
STRUCTURES:
AST
KERNELFEATURES
IC = Conditional-Expr
I = Less-operator
C = Loop
Ls= [x,y]
IC = Loop
I = while-loop
C = Function-Body
Ls= [x, y]
Instruction Class (IC)
i.e., LOOP, CALL,
CONDITIONAL_STATEMENT
Instruction (I)
i.e., FOR, IF, WHILE, RETURN
Context (C)
i.e., Instruction Class of
the closer statement node
Lexemes (Ls)
Lexical information gathered
(recursively) from leaves
IC = Block
I = while-body
C = Loop
Ls= [ x ]
111. CLONE DETECTION
• Comparison with another (pure) AST-based clone detector
• Comparison on a system with randomly seeded clones
0
0.25
0.5
0.75
1
Precision Recall F-measure
CloneDigger Tree Kernel Tool
RE
SULTS
Results refer to clones where code
fragments have been modified by adding/
removing or changing code statements
117. CONCLUSIONS
FUTURE RESEARCH DIRECTIONS
• Re-modularization:
• Integrate code normalization algorithm (LINSEN) to lexical analysis
• Code Normalization:
• Investigate the contributions of different sources of information
used for disambiguations
• Clone Detection:
• Combine Static (and Kernel methods) with Dynamic Analysis
118. THANK YOU
June 5th, 2013 - Ph.D. Defense
Valerio Maggio
Ph.D. Student, University of Naples “Federico II”
valerio.maggio@unina.it