Source code summarization is a process of generating summaries that describe software code, the majority of source code summarization usually generated manually, where the summaries are written by software developers. Recently, new automated approaches are becoming more useful. These approaches have been found to be effective in some cases. The main weaknesses of these approaches are that they never exploit code dependencies and summarize either the software classes or methods but not both. This paper proposes a source code summarization approach (Suncode) that produces a short description for each class and method in the software system. To validate the approach, it has been applied to several case studies. Moreover, the generated summaries are compared to summaries that written by human experts and to summaries that written by a state-of-the-art solution. Results of this paper found that Suncode summaries provide better information about code dependencies comparing with other studies. In addition, Suncode summaries can improve and support the current software documentation. The results found that manually written summaries were more precise and short as well.
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT IAEME Publication
The major part of risk the development of software orprograms is existence ofduplicate code that can affect the software maintainability. The main aim of Clone
identification technique is to search and detect the parts of the software code which is
identical. In the passed there are various techniques that are used to identify andreflect the code identity and code fragments.Code cloning reduces the time and effort of the softwaredeveloper but it alsodecreases the quality of the software like readability, changeability and increasesmaintainability. So, code clone has to be detected to reducethe cost of maintenance tosome extent. In this paper, a new Generic technique is purposed to detect code clone
from various input source codes (from web, disk and etc.,) by segmenting the code intonumber of sub-programs or modules or functions. I propose a technique that candetect 1-type,2type, 3-type and 4-type clones efficiently.
Tracing Requirements as a Problem of Machine Learning ijseajournal
Software requirement engineering and evolution essential to software development process, which defines and elaborates what is to be built in a project. Requirements are mostly written in text and will later evolve to fine-grained and actionable artifacts with details about system configurations, technology stacks, etc. Tracing the evolution of requirements enables stakeholders to determine the origin of each requirement and
understand how well the software’s design reflects to its requirements. Reckoning requirements traceability
is not a trivial task, a machine learning approach is used to classify traceability between various associated requirements. In particular, a 2-learner, ontology-based, pseudo-instances-enhanced approach, where two classifiers are trained to separately exploit two types of features, lexical features and features derived from a hand-built ontology, is investigated for such task. The hand-built ontology is also leveraged to generate
pseudo training instances to improve machine learning results. In comparison to a supervised baseline system that uses only lexical features, our approach yields a relative error reduction of 56.0%. Most interestingly, results do not deteriorate when the hand-built ontology is replaced with its automatically
constructed counterpart.
GENERIC CODE CLONING METHOD FOR DETECTION OF CLONE CODE IN SOFTWARE DEVELOPMENT IAEME Publication
The major part of risk the development of software orprograms is existence ofduplicate code that can affect the software maintainability. The main aim of Clone
identification technique is to search and detect the parts of the software code which is
identical. In the passed there are various techniques that are used to identify andreflect the code identity and code fragments.Code cloning reduces the time and effort of the softwaredeveloper but it alsodecreases the quality of the software like readability, changeability and increasesmaintainability. So, code clone has to be detected to reducethe cost of maintenance tosome extent. In this paper, a new Generic technique is purposed to detect code clone
from various input source codes (from web, disk and etc.,) by segmenting the code intonumber of sub-programs or modules or functions. I propose a technique that candetect 1-type,2type, 3-type and 4-type clones efficiently.
Tracing Requirements as a Problem of Machine Learning ijseajournal
Software requirement engineering and evolution essential to software development process, which defines and elaborates what is to be built in a project. Requirements are mostly written in text and will later evolve to fine-grained and actionable artifacts with details about system configurations, technology stacks, etc. Tracing the evolution of requirements enables stakeholders to determine the origin of each requirement and
understand how well the software’s design reflects to its requirements. Reckoning requirements traceability
is not a trivial task, a machine learning approach is used to classify traceability between various associated requirements. In particular, a 2-learner, ontology-based, pseudo-instances-enhanced approach, where two classifiers are trained to separately exploit two types of features, lexical features and features derived from a hand-built ontology, is investigated for such task. The hand-built ontology is also leveraged to generate
pseudo training instances to improve machine learning results. In comparison to a supervised baseline system that uses only lexical features, our approach yields a relative error reduction of 56.0%. Most interestingly, results do not deteriorate when the hand-built ontology is replaced with its automatically
constructed counterpart.
NL based Object Oriented modeling - EJSR 35(1)IT Industry
Imran Sarwar Bajwa, Shahzad Mumtaz, Ali Samad [2009], "Object Oriented Software Modeling using NLP Based Knowledge Extraction", European Journal of Scientific Research, Aug 2009, Vol. 35 No. 01, pp:22-33
The Plagiarism Detection Systems for Higher Education - A Case Study in Saudi...ijseajournal
Plagiarism, cheating, and other types of academic misconduct are critical issues in higher education. In
this study, we conducted two questionnaires, one for Saudi universities and another for Saudi students at
different Saudi universities to investigate their beliefs and perceptions about plagiarism tools.
The first questionnaire was conducted to investigate to which degree the Saudi universities use plagiarism
tools. Four universities responded to our questionnaire (KSU, Immamu, PSAU, and Shaqra University).
The second questionnaire was used to investigate the user perceptions toward plagiarism tools in their
universities. Forty students responded to this questionnaire. Each student was answered 20 questions. Part
of these questions is to measure the confidence of students in terms of referencing; another part is to
measure the overall confidence to the system and evaluation of the students’ experience. The responses
indicate that the respondents believe that some questions reflect their own cases with the plagiarism during
their educational lifetime.
AUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENTijnlc
This project aims to develop a system which converts a natural language statement into MySQL query to retrieve information from respective database. The system mainly focuses on creation of complex queries which includes nested queries with more than two-level depth, queries with aggregate functions, having clause, group by clause and co-related queries which are formed due to constraint on aggregate function. The natural language input statement taken from the user is passed through various OpenNLP natural language processing techniques like Tokenization, Parts of Speech Tagging, Stemming and Lemmatization to get the statement in the desired form. The statement is further processed to extract the type of query, the basic clause, which specifies the required entities from the database and the condition clause, which specifies constraints on the basic clause. The final query is generated by converting the basic and condition clauses to their query form and then concatenating the condition query to the basic query. Currently, the system works only with MySQL database.
Robust Text Watermarking Technique for Authorship Protection of Hindi Languag...CSCJournals
Digital text documents have become a significantly important part on the Internet. A large number of users are attracted towards this digital form of text documents. But some security threats also arise concurrently. The digital libraries offer effective ways to access educational materials, government e-documents, financial documents, social media contents and many others. However content authorship and tamper detection of all these digital text documents require special attention. Till now, considerably very few digital watermarking techniques exist for text documents. In this paper, we propose a method for effective watermarking of Hindi language text documents. Hindi stands second among all languages across the world. It has widespread availability of its digital contents of various types. In proposed technique, the watermark is logically embedded in the text using 'swar' (vowel) as a special feature of the Hindi language, supported by suitable encryption. In extraction phase the Certificate Authority (CA) plays an important role in the authorship protection process as a trusted third party. The text is decrypted and watermark is extracted to prove genuine authorship. Our technique has been tested for various types of feasible text attacks with different embedding frequency.
Introduction: The Structure of Complex systems, The Inherent Complexity of Software, Attributes of Complex System, Organized and Disorganized Complexity, Bringing Order to Chaos, Designing Complex Systems
Imran Sarwar Bajwa, [2010], "Markov Logics Based Automated Business Requirements Analysis", in International Journal of Computer and Electrical Engineering (IJCEE) 2(3) pp:481-485, June 2010
Summarization of software artifacts is an ongoing field of research among the software engineering community due to the benefits that summarization provides like saving of time and efforts in various software engineering tasks like code search, duplicate bug reports detection, traceability link recovery, etc. Summarization is to produce short and concise summaries. The paper presents the review of the state of the art of summarization techniques in software engineering context. The paper gives a brief overview to the software artifacts which are mostly used for summarization or have benefits from summarization. The paper briefly describes the general process of summarization. The paper reviews the papers published from 2010 to June 2017 and classifies the works into extractive and abstractive summarization. The paper also reviews the evaluation techniques used for summarizing software artifacts. The paper discusses the open problems and challenges in this field of research. The paper also discusses the future scopes in this area for new researchers.
Domain Specific Terminology Extraction (ICICT 2006)IT Industry
Imran Sarwar Bajwa, M. Imran Siddique, M. Abbas Choudhary, [2006], "Automatic Domain Specific Terminology Extraction using a Decision Support System", in IEEE 4th International Conference on Information and Communication Technology (ICICT 2006), Cairo, Egypt. pp:651-659
Automated Java Code Generation (ICDIM 2006)IT Industry
Imran Sarwar Bajwa, Imran Siddique, M. Abbas Choudhary, [2006], "Rule Based Production System for Automatic Code Generation in Java", in IEEE 1st International Conference on Digital Information Management (ICDIM 2006), Bangalore, India, Dec 2006, pp:300-305
Keyword Extraction Based Summarization of Categorized Kannada Text Documents ijsc
The internet has caused a humongous growth in the number of documents available online. Summaries of documents can help find the right information and are particularly effective when the document base is very large. Keywords are closely associated to a document as they reflect the document's content and act as indices for a given document. In this work, we present a method to produce extractive summaries of documents in the Kannada language, given number of sentences as limitation. The algorithm extracts key words from pre-categorized Kannada documents collected from online resources. We use two feature selection techniques for obtaining features from documents, then we combine scores obtained by GSS (Galavotti, Sebastiani, Simi) coefficients and IDF (Inverse Document Frequency) methods along with TF (Term Frequency) for extracting key words and later use these for summarization based on rank of the sentence. In the current implementation, a document from a given category is selected from our database and depending on the number of sentences given by the user, a summary is generated.
Single document keywords extraction in Bahasa Indonesia using phrase chunkingTELKOMNIKA JOURNAL
Keywords help readers to understand the idea of a document quickly. Unfortunately, considerable time and effort are often needed to come up with a good set of keywords manually. This research focused on generating keywords from a document automatically using phrase chunking. Firstly, we collected part of speech patterns from a collection of documents. Secondly, we used those patterns to extract candidate keywords from the abstract and the content of a document. Finally, keywords are selected from the candidates based on the number of words in the keyword phrases and some scenarios involving candidate reduction and sorting. We evaluated the result of each scenario using precision, recall, and F-measure. The experiment results show: i) shorter-phrase keywords with string reduction extracted from the abstract and sorted by frequency provides the highest score, ii) in every proposed scenario, extracting keywords using the abstract always presents a better result, iii) using shorter-phrase patterns in keywords extraction gives better score in comparison to using all phrase patterns, iv) sorting scenarios based on the multiplication of candidate frequencies and the weight of the phrase patterns offer better results.
Source code summarization is a process of generating summaries that describes software code, the majority of source code summarization usually generated manually, where the summaries are written by software developers. Recently, new automated approaches are becoming more useful. These approaches have been found to be effective in some cases. The main weaknesses of these approaches are that they never exploit code dependencies and summarize either the software classes or methods but not both. This paper proposes a source code summarization approach (Suncode) that produces a short description for each class and method in the software system. To validate the approach, it has been applied on several case studies. Moreover, the generated summaries are compared to summaries that written by human experts and to summaries that written by a state-of-the-art solution. Results of this paper found that Suncode summaries provide better information about code dependencies comparing with other studies. In addition, Suncode summaries can improve and support the current software documentation. The results found that manually written summaries were more precise and short as well.
Programmer Productivity Enhancement Through Controlled Natural Language Inputijseajournal
We have created CABERNET, a Controlled Nature Language (CNL) based approach to program creation. CABERNET allows programmers to use a simple outline-based syntax. This allows increased programmer efficiency and syntax flexibility. CNLs have successfully been used for writing requirements documents. We propose taking this approach well beyond this to fully functional programs. Through the use of heuristics and inference to analyze and determine the programmer’s intent we are able to create fully functional mobile applications. The goal is for programs to be aligned with the way that the humans think rather than the way computers process information. Through the use of templates a CABERNET application can be processed to run on multiple run time environments. Because processing of a CABERNET program file results in native application program performance is maintained.
This paper presents a vision on how the software development process could be a fully unified mechanized
process by getting benefits from the advances of Natural Language Processing and Program Synthesis
fields. The process begins from requirements written in natural language that is translated to sentences in
logical form. A program synthesizer gets those sentences in logical form (the translator's outcome) and
generates a source code. Finally, a compiler produces ready-to-run software. To find out how the building
blocks of our proposed approach works, we conducted an exploratory research on the literature in the
fields of Requirements Engineering, Natural Language Processing, and Program Synthesis. Currently, this
approach is difficult to accomplish in a fully automatic way due to the ambiguities inherent in natural
language, the reasoning context of the software, and the program synthesizer limitations in generating a
source code from logic.
NL based Object Oriented modeling - EJSR 35(1)IT Industry
Imran Sarwar Bajwa, Shahzad Mumtaz, Ali Samad [2009], "Object Oriented Software Modeling using NLP Based Knowledge Extraction", European Journal of Scientific Research, Aug 2009, Vol. 35 No. 01, pp:22-33
The Plagiarism Detection Systems for Higher Education - A Case Study in Saudi...ijseajournal
Plagiarism, cheating, and other types of academic misconduct are critical issues in higher education. In
this study, we conducted two questionnaires, one for Saudi universities and another for Saudi students at
different Saudi universities to investigate their beliefs and perceptions about plagiarism tools.
The first questionnaire was conducted to investigate to which degree the Saudi universities use plagiarism
tools. Four universities responded to our questionnaire (KSU, Immamu, PSAU, and Shaqra University).
The second questionnaire was used to investigate the user perceptions toward plagiarism tools in their
universities. Forty students responded to this questionnaire. Each student was answered 20 questions. Part
of these questions is to measure the confidence of students in terms of referencing; another part is to
measure the overall confidence to the system and evaluation of the students’ experience. The responses
indicate that the respondents believe that some questions reflect their own cases with the plagiarism during
their educational lifetime.
AUTOMATED SQL QUERY GENERATOR BY UNDERSTANDING A NATURAL LANGUAGE STATEMENTijnlc
This project aims to develop a system which converts a natural language statement into MySQL query to retrieve information from respective database. The system mainly focuses on creation of complex queries which includes nested queries with more than two-level depth, queries with aggregate functions, having clause, group by clause and co-related queries which are formed due to constraint on aggregate function. The natural language input statement taken from the user is passed through various OpenNLP natural language processing techniques like Tokenization, Parts of Speech Tagging, Stemming and Lemmatization to get the statement in the desired form. The statement is further processed to extract the type of query, the basic clause, which specifies the required entities from the database and the condition clause, which specifies constraints on the basic clause. The final query is generated by converting the basic and condition clauses to their query form and then concatenating the condition query to the basic query. Currently, the system works only with MySQL database.
Robust Text Watermarking Technique for Authorship Protection of Hindi Languag...CSCJournals
Digital text documents have become a significantly important part on the Internet. A large number of users are attracted towards this digital form of text documents. But some security threats also arise concurrently. The digital libraries offer effective ways to access educational materials, government e-documents, financial documents, social media contents and many others. However content authorship and tamper detection of all these digital text documents require special attention. Till now, considerably very few digital watermarking techniques exist for text documents. In this paper, we propose a method for effective watermarking of Hindi language text documents. Hindi stands second among all languages across the world. It has widespread availability of its digital contents of various types. In proposed technique, the watermark is logically embedded in the text using 'swar' (vowel) as a special feature of the Hindi language, supported by suitable encryption. In extraction phase the Certificate Authority (CA) plays an important role in the authorship protection process as a trusted third party. The text is decrypted and watermark is extracted to prove genuine authorship. Our technique has been tested for various types of feasible text attacks with different embedding frequency.
Introduction: The Structure of Complex systems, The Inherent Complexity of Software, Attributes of Complex System, Organized and Disorganized Complexity, Bringing Order to Chaos, Designing Complex Systems
Imran Sarwar Bajwa, [2010], "Markov Logics Based Automated Business Requirements Analysis", in International Journal of Computer and Electrical Engineering (IJCEE) 2(3) pp:481-485, June 2010
Summarization of software artifacts is an ongoing field of research among the software engineering community due to the benefits that summarization provides like saving of time and efforts in various software engineering tasks like code search, duplicate bug reports detection, traceability link recovery, etc. Summarization is to produce short and concise summaries. The paper presents the review of the state of the art of summarization techniques in software engineering context. The paper gives a brief overview to the software artifacts which are mostly used for summarization or have benefits from summarization. The paper briefly describes the general process of summarization. The paper reviews the papers published from 2010 to June 2017 and classifies the works into extractive and abstractive summarization. The paper also reviews the evaluation techniques used for summarizing software artifacts. The paper discusses the open problems and challenges in this field of research. The paper also discusses the future scopes in this area for new researchers.
Domain Specific Terminology Extraction (ICICT 2006)IT Industry
Imran Sarwar Bajwa, M. Imran Siddique, M. Abbas Choudhary, [2006], "Automatic Domain Specific Terminology Extraction using a Decision Support System", in IEEE 4th International Conference on Information and Communication Technology (ICICT 2006), Cairo, Egypt. pp:651-659
Automated Java Code Generation (ICDIM 2006)IT Industry
Imran Sarwar Bajwa, Imran Siddique, M. Abbas Choudhary, [2006], "Rule Based Production System for Automatic Code Generation in Java", in IEEE 1st International Conference on Digital Information Management (ICDIM 2006), Bangalore, India, Dec 2006, pp:300-305
Keyword Extraction Based Summarization of Categorized Kannada Text Documents ijsc
The internet has caused a humongous growth in the number of documents available online. Summaries of documents can help find the right information and are particularly effective when the document base is very large. Keywords are closely associated to a document as they reflect the document's content and act as indices for a given document. In this work, we present a method to produce extractive summaries of documents in the Kannada language, given number of sentences as limitation. The algorithm extracts key words from pre-categorized Kannada documents collected from online resources. We use two feature selection techniques for obtaining features from documents, then we combine scores obtained by GSS (Galavotti, Sebastiani, Simi) coefficients and IDF (Inverse Document Frequency) methods along with TF (Term Frequency) for extracting key words and later use these for summarization based on rank of the sentence. In the current implementation, a document from a given category is selected from our database and depending on the number of sentences given by the user, a summary is generated.
Single document keywords extraction in Bahasa Indonesia using phrase chunkingTELKOMNIKA JOURNAL
Keywords help readers to understand the idea of a document quickly. Unfortunately, considerable time and effort are often needed to come up with a good set of keywords manually. This research focused on generating keywords from a document automatically using phrase chunking. Firstly, we collected part of speech patterns from a collection of documents. Secondly, we used those patterns to extract candidate keywords from the abstract and the content of a document. Finally, keywords are selected from the candidates based on the number of words in the keyword phrases and some scenarios involving candidate reduction and sorting. We evaluated the result of each scenario using precision, recall, and F-measure. The experiment results show: i) shorter-phrase keywords with string reduction extracted from the abstract and sorted by frequency provides the highest score, ii) in every proposed scenario, extracting keywords using the abstract always presents a better result, iii) using shorter-phrase patterns in keywords extraction gives better score in comparison to using all phrase patterns, iv) sorting scenarios based on the multiplication of candidate frequencies and the weight of the phrase patterns offer better results.
Source code summarization is a process of generating summaries that describes software code, the majority of source code summarization usually generated manually, where the summaries are written by software developers. Recently, new automated approaches are becoming more useful. These approaches have been found to be effective in some cases. The main weaknesses of these approaches are that they never exploit code dependencies and summarize either the software classes or methods but not both. This paper proposes a source code summarization approach (Suncode) that produces a short description for each class and method in the software system. To validate the approach, it has been applied on several case studies. Moreover, the generated summaries are compared to summaries that written by human experts and to summaries that written by a state-of-the-art solution. Results of this paper found that Suncode summaries provide better information about code dependencies comparing with other studies. In addition, Suncode summaries can improve and support the current software documentation. The results found that manually written summaries were more precise and short as well.
Programmer Productivity Enhancement Through Controlled Natural Language Inputijseajournal
We have created CABERNET, a Controlled Nature Language (CNL) based approach to program creation. CABERNET allows programmers to use a simple outline-based syntax. This allows increased programmer efficiency and syntax flexibility. CNLs have successfully been used for writing requirements documents. We propose taking this approach well beyond this to fully functional programs. Through the use of heuristics and inference to analyze and determine the programmer’s intent we are able to create fully functional mobile applications. The goal is for programs to be aligned with the way that the humans think rather than the way computers process information. Through the use of templates a CABERNET application can be processed to run on multiple run time environments. Because processing of a CABERNET program file results in native application program performance is maintained.
This paper presents a vision on how the software development process could be a fully unified mechanized
process by getting benefits from the advances of Natural Language Processing and Program Synthesis
fields. The process begins from requirements written in natural language that is translated to sentences in
logical form. A program synthesizer gets those sentences in logical form (the translator's outcome) and
generates a source code. Finally, a compiler produces ready-to-run software. To find out how the building
blocks of our proposed approach works, we conducted an exploratory research on the literature in the
fields of Requirements Engineering, Natural Language Processing, and Program Synthesis. Currently, this
approach is difficult to accomplish in a fully automatic way due to the ambiguities inherent in natural
language, the reasoning context of the software, and the program synthesizer limitations in generating a
source code from logic.
‘CodeAliker’ - Plagiarism Detection on the Cloud acijjournal
Plagiarism is a burning problem that academics have been facing in all of the varied levels of the educational system. With the advent of digital content, the challenge to ensure the integrity of academic work has been amplified. This paper discusses on defining a precise definition of plagiarized computer code, various solutions available for detecting plagiarism and building a cloud platform for plagiarism disclosure.
‘CodeAliker’, our application thus developed automates the submission of assignments and the review process associated for essay text as well as computer code. It has been made available under the GNU’s General Public License as a Free and Open Source Software.
AGILE SOFTWARE ARCHITECTURE INGLOBAL SOFTWARE DEVELOPMENT ENVIRONMENT:SYSTEMA...ijseajournal
In recent years, software development companies started to adopt Global Software Development (GSD) to
explore the benefits of this approach, mainly cost reduction. However, the GSD environment also brings
more complexity and challenges. Some challenges are related to communication aspects like cultural differences, time zone, and language. This paper is the first step in an extensive study to understand if the
software architecture can ease communication in GSD environments. We conducted a Systematic Literature Mapping (SLM) to catalog relevant studies about software architecture and GSD teams and identify
potential practices for use in the software industry. This paper’s findings contribute to the GSD body of
knowledge by exploring the impact of software architecture strategy on the GSD environment. It presents
hypotheses regarding the relationship between software architecture and GSD challenges, which will guide
future research.
Summarization of software artifacts is an ongoing field of research among the software engineering community due to the benefits that summarization provides like saving of time and efforts in various software engineering tasks like code search, duplicate bug reports detection, traceability link recovery, etc. Summarization is to produce short and concise summaries. The paper presents the review of the state of the art of summarization techniques in software engineering context. The paper gives a brief overview to the software artifacts which are mostly used for summarization or have benefits from summarization. The paper briefly describes the general process of summarization. The paper reviews the papers published from 2010 to June 2017 and classifies the works into extractive and abstractive summarization. The paper also reviews the evaluation techniques used for summarizing software artifacts. The paper discusses the open problems and challenges in this field of research. The paper also discusses the future scopes in this area for new researchers.
SoftCloud: A Tool for Visualizing Software Artifacts as Tag Clouds.pdfRa'Fat Al-Msie'deen
Software artifacts visualization helps software developers to manage the size and complexity of the software system. The tag cloud technique visualizes tags within the cloud according to their frequencies in software artifacts. A font size of the tag within the cloud indicates its frequency within a software artifact, while the color of a tag within the cloud uses just for aesthetic purposes. This paper suggests a new approach (SoftCloud) to visualize software artifacts as a tag cloud. The originality of SoftCloud is visualizing all the artifacts available to the software program as a tag cloud. Experiments have conducted on different software artifacts to validate SoftCloud and demonstrate its strengths. The results showed the ability of SoftCloud to correctly retrieve all tags and their frequencies from available software artifacts.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
A Survey on Design Pattern Detection ApproachesCSCJournals
Design patterns play a key role in software development process. The interest in extracting design pattern instances from object-oriented software has increased tremendously in the last two decades. Design patterns enhance program understanding, help to document the systems and capture design trade-offs.
This paper provides the current state of the art in design patterns detection. The selected approaches cover the whole spectrum of the research in design patterns detection. We noticed diverse accuracy values extracted by different detection approaches. The lessons learned are listed at the end of this paper, which can be used for future research directions and guidelines in the area of design patterns detection.
Visualizing Object-oriented Software for Understanding and Documentation Ra'Fat Al-Msie'deen
Understanding or comprehending source code is one of
the core activities of software engineering. Understanding object-oriented source code is essential and required when a programmer maintains, migrates, reuses, documents or enhances source code. The source code that is not comprehended cannot be changed. The comprehension of object-oriented source code is a difficult problem solving process. In order to document object-oriented software system there are needs to understand its source code. To do so, it is necessary to mine source code dependencies in addition to quantitative information in source code such as the
number of classes. This paper proposes an automatic approach, which aims to document object-oriented software by visualizing its source code. The design of the object-oriented source code and its main characteristics are represented in the visualization. Package content, class information, relationships between classes, dependencies between methods and software metrics is displayed. The extracted views are very helpful to understand and document the object-oriented software. The novelty of this approach is the exploiting of code dependencies and quantitative information in source code to document object-oriented software efficiently by means of a set of graphs. To validate the approach, it has been applied to several case studies. The results of this evaluation showed that most of the object-oriented software systems have been documented correctly.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Design of optimal search engine using text summarization through artificial i...TELKOMNIKA JOURNAL
Natural language processing is the trending topic in the latest research areas, which allows the developers to create the human-computer interactions to come into existence. The natural language processing is an integration of artificial intelligence, computer science and computer linguistics. The research towards natural Language Processing is focused on creating innovations towards creating the devices or machines which operates basing on the single command of a human. It allows various Bot creations to innovate the instructions from the mobile devices to control the physical devices by allowing the speech-tagging. In our paper, we design a search engine which not only displays the data according to user query but also performs the detailed display of the content or topic user is interested for using the summarization concept. We find the designed search engine is having optimal response time for the user queries by analyzing with number of transactions as inputs. Also, the result findings in the performance analysis show that the text summarization method has been an efficient way for improving the response time in the search engine optimizations.
Similar to Supporting software documentation with source code summarization (20)
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug Reports.pdfRa'Fat Al-Msie'deen
A Bug Tracking System (BTS), such as Bugzilla, is generally utilized to track submitted Bug Reports (BRs) for a particular software system. Duplicate Bug Report (DBR) retrieval is the process of obtaining a DBR in the BTS. This process is important to avoid needless work from engineers on DBRs. To prevent wasting engineer resources, such as effort and time, on previously submitted (or duplicate) BRs, it is essential to find and retrieve DBRs as soon as they are submitted by software users. Thus, this paper proposes an automatic approach (called BushraDBR) that aims to assist an engineer (called a triager) to retrieve DBRs and stop the duplicates before they start. Where BushraDBR stands for Bushra Duplicate Bug Reports retrieval process. Therefore, when a new BR is sent to the Bug Repository (BRE), an engineer checks whether it is a duplicate of an existing BR in BRE or not via BushraDBR approach. If it is, the engineer marks it as DBR, and the BR is excluded from consideration for any additional work; otherwise, the BR is added to the BRE. BushraDBR approach relies on Textual Similarity (TS) between the newly submitted BR and the rest of the BRs in BRE to retrieve DBRs. BushraDBR exploits unstructured data from BRs to apply Information Retrieval (IR) methods in an efficient way. BushraDBR approach uses two techniques to retrieve DBRs: Latent Semantic Indexing (LSI) and Formal Concept Analysis (FCA). The originality of BushraDBR is to stop DBRs before they occur by comparing the newly reported BR with the rest of the BRs in the BTS, thus saving time and effort during the Software Maintenance (SM) process. BushraDBR also uniquely retrieves DBR through the use of LSI and FCA techniques. BushraDBR approach had been validated and evaluated on several publicly available data sets from Bugzilla. Experiments show the ability of BushraDBR approach to retrieve DBRs in an efficient and accurate manner.
Software evolution understanding: Automatic extraction of software identifier...Ra'Fat Al-Msie'deen
Software companies usually develop a set of product variants within the same family that share certain functions and differ in others. Variations across software variants occur to meet different customer requirements. Thus, software product variants evolve overtime to cope with new requirements. A software engineer who deals with this family may find it difficult to understand the evolution scenarios that have taken place over time. In addition, software identifier names are important resources to understand the evolution scenarios in this family. This paper introduces an automatic approach called Juana’s approach to detect the evolution scenario across two product variants at the source code level and identifies the common and unique software identifier names across software variants source code. Juana’s approach refers to common and unique identifier names as a software identifiers map and computes it by comparing software variants to each other. Juana considers all software identifier names such as package, class, attribute, and method. The novelty of this approach is that it exploits common and unique identifier names across the source code of software variants, to understand the evolution scenarios across software family in an efficient way. For validity, Juana was applied on ArgoUML and Mobile Media software variants. The results of this evaluation validate the relevance and the performance of the approach as all evolution scenarios were correctly detected via a software identifiers map.
BushraDBR: An Automatic Approach to Retrieving Duplicate Bug ReportsRa'Fat Al-Msie'deen
A Bug Tracking System (BTS), such as Bugzilla, is generally utilized to track submitted Bug Reports (BRs) for a particular software system. Duplicate Bug Report (DBR) retrieval is the process of obtaining a DBR in the BTS. This process is important to avoid needless work from engineers on DBRs. To prevent wasting engineer resources, such as effort and time, on previously submitted (or duplicate) BRs, it is essential to find and retrieve DBRs as soon as they are submitted by software users. Thus, this paper proposes an automatic approach (called BushraDBR) that aims to assist an engineer (called a triager) to retrieve DBRs and stop the duplicates before they start. Where BushraDBR stands for Bushra Duplicate Bug Reports retrieval process. Therefore, when a new BR is sent to the Bug Repository (BRE), an engineer checks whether it is a duplicate of an existing BR in BRE or not via BushraDBR approach. If it is, the engineer marks it as DBR, and the BR is excluded from consideration for any additional work; otherwise, the BR is added to the BRE. BushraDBR approach relies on Textual Similarity (TS) between the newly submitted BR and the rest of the BRs in BRE to retrieve DBRs. BushraDBR exploits unstructured data from BRs to apply Information Retrieval (IR) methods in an efficient way. BushraDBR approach uses two techniques to retrieve DBRs: Latent Semantic Indexing (LSI) and Formal Concept Analysis (FCA). The originality of BushraDBR is to stop DBRs before they occur by comparing the newly reported BR with the rest of the BRs in the BTS, thus saving time and effort during the Software Maintenance (SM) process. BushraDBR also uniquely retrieves DBR through the use of LSI and FCA techniques. BushraDBR approach had been validated and evaluated on several publicly available data sets from Bugzilla. Experiments show the ability of BushraDBR approach to retrieve DBRs in an efficient and accurate manner.
FeatureClouds: Naming the Identified Feature Implementation Blocks from Softw...Ra'Fat Al-Msie'deen
Identifying software identifiers that implement a particular feature of a software product is known as feature identification. Feature identification is one of the most critical and popular processes performed by software engineers during software maintenance activity. However, a meaningful name must be assigned to the Identified Feature Implementation Block (IFIB) to complete the feature identification process. The feature naming process remains a challenging task, where the majority of existing approaches manually assign the name of the IFIB. In this paper, the approach called FeatureClouds was proposed, which can be exploited by software developers to name the IFIBs from software code. FeatureClouds approach incorporates word clouds visualization technique to name Feature Blocks (FBs) by using the most frequent words across these blocks. FeatureClouds had evaluated by assessing its added benefit to the current approaches in the literature, where limited tool support was supplied to software developers to distinguish feature names of the IFIBs. For validity, FeatureClouds had applied to draw shapes and ArgoUML software. The findings showed that the proposed approach achieved promising results according to well-known metrics in terms of Precision and Recall.
YamenTrace: Requirements Traceability - Recovering and Visualizing Traceabili...Ra'Fat Al-Msie'deen
YamenTrace - Requirements Traceability: Recovering and Visualizing Traceability Links Between Requirements and Source Code of Object-oriented Software Systems (Presentation 2023)
Automatic Labeling of the Object-oriented Source Code: The Lotus ApproachRa'Fat Al-Msie'deen
Most of open-source software systems become available on the internet today. Thus, we need automatic methods to label software code. Software code can be labeled with a set of keywords. These keywords in this paper referred as software labels. The goal of this paper is to provide a quick view of the software code vocabulary. This paper proposes an automatic approach to document the object-oriented software by labeling its code. The approach exploits all software identifiers to label software code. The paper presents the results of study conducted on the ArgoUML and drawing shapes case studies. Results showed that all code labels were correctly identified.
Constructing a software requirements specification and design for electronic ...Ra'Fat Al-Msie'deen
Requirements engineering process intends to obtain software services and constraints. This process is essential to meet the customer's needs and expectations. This process includes three main activities in general. These are detecting requirements by interacting with software stakeholders, transferring these requirements into a standard document, and examining that the requirements really define the software that the client needs. Functional requirements are services that the software should deliver to the end-user. In addition, functional requirements describe how the software should respond to specific inputs, and how the software should behave in certain circumstances. This paper aims to develop a software requirements specification document of the electronic IT news magazine system. The electronic magazine provides users to post and view up-to-date IT news. Still, there is a lack in the literature of comprehensive studies about the construction of the electronic magazine software specification and design in conformance with the contemporary software development processes. Moreover, there is a need for a suitable research framework to support the requirements engineering process. The novelty of this paper is the construction of software specification and design of the electronic magazine by following the Al-Msie'deen research framework. All the documents of software requirements specification and design have been constructed to conform to the agile usage-centered design technique and the proposed research framework. A requirements specification and design are suggested and followed for the construction of the electronic magazine software. This study proved that involving users extensively in the process of software requirements specification and design will lead to the creation of dependable and acceptable software systems.
Detecting commonality and variability in use-case diagram variantsRa'Fat Al-Msie'deen
The use-case diagram is a software artifact. Thus, as with any software artifact, the use-case diagrams change across time through the software development life cycle. Therefore, several versions of the same diagram are existed at distinct times. Thus, comparing all use-case diagram variants to detect common and variable use-cases becomes one of the main challenges in the product line reengineering field. The contribution of this paper is to suggest an automatic approach to compare a collection of use-case diagram variants and detect both commonality and variability. In our work, every use-case represents a feature. The proposed approach visualizes the detected features using formal concept analysis, where common and variable features are introduced to software engineers. The proposed approach was applied on a mobile media case study to be validated. The findings confirm the importance and the performance of the suggested approach as all common and variable features were precisely detected via formal concept analysis and latent semantic indexing.
Naming the Identified Feature Implementation Blocks from Software Source CodeRa'Fat Al-Msie'deen
Identifying software identifiers that implement a particular feature of a software product is known as feature identification. Feature identification is one of the most critical and popular processes performed by software engineers during software maintenance activity. However, a meaningful name must be assigned to the Identified Feature Implementation Block (IFIB) to complete the feature identification process. The feature naming process remains a challenging task, where the majority of existing approaches manually assign the name of the IFIB. In this paper, the approach called FeatureClouds was proposed, which can be exploited by software developers to name the IFIBs from software code. FeatureClouds approach incorporates word clouds visualization technique to name Feature Blocks (FBs) by using the most frequent words across these blocks. FeatureClouds had evaluated by assessing its added benefit to the current approaches in the literature, where limited tool support was supplied to software developers to distinguish feature names of the IFIBs. For validity, FeatureClouds had applied to draw shapes and ArgoUML software. The findings showed that the proposed approach achieved promising results according to well-known metrics in terms of Precision and Recall.
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxrickgrimesss22
Discover the essential features to incorporate in your Winzo clone app to boost business growth, enhance user engagement, and drive revenue. Learn how to create a compelling gaming experience that stands out in the competitive market.
Do you want Software for your Business? Visit Deuglo
Deuglo has top Software Developers in India. They are experts in software development and help design and create custom Software solutions.
Deuglo follows seven steps methods for delivering their services to their customers. They called it the Software development life cycle process (SDLC).
Requirement — Collecting the Requirements is the first Phase in the SSLC process.
Feasibility Study — after completing the requirement process they move to the design phase.
Design — in this phase, they start designing the software.
Coding — when designing is completed, the developers start coding for the software.
Testing — in this phase when the coding of the software is done the testing team will start testing.
Installation — after completion of testing, the application opens to the live server and launches!
Maintenance — after completing the software development, customers start using the software.
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
In the ever-evolving landscape of technology, enterprise software development is undergoing a significant transformation. Traditional coding methods are being challenged by innovative no-code solutions, which promise to streamline and democratize the software development process.
This shift is particularly impactful for enterprises, which require robust, scalable, and efficient software to manage their operations. In this article, we will explore the various facets of enterprise software development with no-code solutions, examining their benefits, challenges, and the future potential they hold.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Mind IT Systems
Healthcare providers often struggle with the complexities of chronic conditions and remote patient monitoring, as each patient requires personalized care and ongoing monitoring. Off-the-shelf solutions may not meet these diverse needs, leading to inefficiencies and gaps in care. It’s here, custom healthcare software offers a tailored solution, ensuring improved care and effectiveness.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Łukasz Chruściel
No one wants their application to drag like a car stuck in the slow lane! Yet it’s all too common to encounter bumpy, pothole-filled solutions that slow the speed of any application. Symfony apps are not an exception.
In this talk, I will take you for a spin around the performance racetrack. We’ll explore common pitfalls - those hidden potholes on your application that can cause unexpected slowdowns. Learn how to spot these performance bumps early, and more importantly, how to navigate around them to keep your application running at top speed.
We will focus in particular on tuning your engine at the application level, making the right adjustments to ensure that your system responds like a well-oiled, high-performance race car.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Looking for a reliable mobile app development company in Noida? Look no further than Drona Infotech. We specialize in creating customized apps for your business needs.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
Zoom is a comprehensive platform designed to connect individuals and teams efficiently. With its user-friendly interface and powerful features, Zoom has become a go-to solution for virtual communication and collaboration. It offers a range of tools, including virtual meetings, team chat, VoIP phone systems, online whiteboards, and AI companions, to streamline workflows and enhance productivity.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Check out the webinar slides to learn more about how XfilesPro transforms Salesforce document management by leveraging its world-class applications. For more details, please connect with sales@xfilespro.com
If you want to watch the on-demand webinar, please click here: https://www.xfilespro.com/webinars/salesforce-document-management-2-0-smarter-faster-better/
2. Ra'Fat Al-Msie'deen, Anas H. Blasi/International Journal of Advanced and Applied Sciences, 6(1) 2019, Pages: 59-67
60
(Suncode†) to produce summaries of software
classes and methods.
The source code summarization process is a
critical issue in the software engineering domain, so
numerous methods have been suggested to
summarize the software code (cf. Section 2). The
source code summarization process is a tedious job
and costly procedure but the majority of source code
summarization generators are still manual
(McBurney and McMillan, 2014) which produced by
human experts. However, these summaries often
incomplete (Moreno et al., 2013a) and must be
updated constantly (Shi et al., 2011).
Human experts spend a lot of time reading and
browsing the software source code (Ko et al., 2006;
LaToza et al., 2006). Thus, there is a need to propose
Suncode approach to summarize the software code
and overcome the limitations of existing approaches.
In the context of this paper, summary is a text
generated from software source code contains
important information in the original text (i.e.,
source code), moreover, the summary text does not
exceed half of the original text (Radev et al., 2002).
Suncode is a unique approach, where it
summarizes the software classes and methods. Also,
it focuses on the context of the identifier. In this
paper, context means how the software identifier
(e.g., class, attribute and method) interacts with
other ones via code dependencies such as:
inheritance, method invocation and attribute access.
Suncode produces a readable English summary of
the context for each class and method in the
software system.
Several approaches have been proposed to
summarize the software source code. One approach
presented by Haiduc et al. (2010b) returns a list of
keywords from the method. A different method
presented by Sridhara et al. (2010) produces
descriptive summary comments for software
methods. Moreno et al. (2013b) presented an
approach to summarize the software classes.
McBurney and McMillan (McBurney and McMillan,
2014) suggested an approach to generate summaries
of the context of Java methods.
This paper proposes an approach to summarize
the software classes and methods based on their
textual and structural information (Suncode accepts
as input software code and produces as outputs a set
of summaries). Based on the static code analysis,
Suncode extracts the software source code. Then,
Suncode generates rapid summary messages
(McBurney and McMillan, 2014) for each class and
method and, at last, Suncode aggregates all rapid
summary messages into a single document.
Suncode approach is detailed in this paper as
follows: Section 2 discusses the related work
relevant to Suncode contributions. Section 3 shows
an overview of Suncode. Section 4 presents the
source code summarization process step-by-step.
Section 5 describes the experiments that were
† Suncode stands for Supporting Software Documentation with
Source Code Summarization.
conducted to validate Suncode proposal, while
section 6 concludes and provides perspectives for
this work.
2. Related works
This section presents the related work closest to
Suncode approach. McBurney and McMillan (2016b,
2014) presented a source code summarization
method that generates descriptions of the context of
Java methods – that is, where the method is called.
They use static code analysis. In their work, the
context of the method is limited to method
invocation. Suncode presents an automatic approach
to summarize the software classes and methods. The
context of the class is how the class interacts with
other ones. While the context of method is how the
method interacts with other methods and attributes.
Suncode helps software developers understand the
role the class (resp. method) plays in the software.
Suncode uses static code analysis.
A different work was developed by Sridhara et al.
(2010). Sridhara et al. (2010) presented a new
method to automatically produce descriptive
summary comments for software methods. Based on
the method’s signature and body, their comment
generator found the content of the summary and
produced the natural language text that summarizes
the method.
Another method offered by Haiduc et al. (2010b)
produced a list of keywords from the method code
using the vector space model. Haiduc et al. (2010b)
proposed using a vector space model method to
mine important keywords from method and
presented those keywords to software developers. In
fact, their approach helps software developers to get
some contextual information through listing the
main keywords from method code.
Moreno et al. (2013a) suggested a new method to
automatically produce natural language summaries
for software classes by using stereotype of class. In
another paper, Moreno et al. (2013b) presented an
Eclipse plugin named jSummarizer‡ to produce the
natural language summaries for software classes
based on a stereotype of class. In their work, the
produced summaries used to re-document the class
code. Moreover, the extracted summaries used to
help programmers to easier comprehend complex
software classes.
Suncode developed to document legacy software
systems. For each class and method in the software
there is a summary. The produced summary for the
class or method contains textual and structural
information based on its code. Suncode used static
code analysis to parse the software code. Suncode
accepts as inputs the software code and produces a
summary for each class and method in the software.
Al-Msie'deen (2014) approach relied on generating
rapid summary messages for each class and method
in the software. The rapid summary message is a
predefined template. This template is a natural
‡ JSummarizer: http://www.cs.wayne.edu/~severe/jsummarizer/
3. Ra'Fat Al-Msie'deen, Anas H. Blasi/International Journal of Advanced and Applied Sciences, 6(1) 2019, Pages: 59-67
61
language sentence describing a particular kind of
rapid summary message. Each template is filled in
with keywords selected from the extracted code.
Suncode approach constructs summaries by
combining the rapid summary messages.
3. Approach overview
This section presents the main ideas used in
Suncode. It also provides an overview of the source
code summarization process. Then, it introduces the
drawing shapes software that illustrates the
remaining of the paper.
Various papers in the domain of software
understanding show that software developers rely
on good software documentation. In general,
manually-written documentation is incomplete,
time-consuming and it must be updated periodically
(McBurney and McMillan, 2014). Thus, Suncode
comes to overcome these problems regarding
manually-written documentation. Suncode
automatically document software code by producing
a useful summary.
Suncode aims to provide software developers
with meaningful summary for software classes and
methods. It uses static code analysis to parse
software code. The source code summarization
process takes the software's code as its inputs and
generates a set of summaries as its outputs.
Furthermore, the code summaries are important to
document and understand legacy software systems.
Suncode exploits software identifiers and code
dependencies to produce a useful summary. Names
of software identifiers consider as textual
information. On the other hand, the dependencies
between code elements consider as structural
information. Software functionality is presented by
its classes and methods. Thus, Suncode produces two
types of summaries one for class and another for
method.
The basic point that distinguishes Suncode's
approach from other approaches is that it deals with
the context of the class and method, which contains
textual and structural information of the class and
method. The context of the class being summarized
includes the class signature and its body. On the
other hand, the context of the method being
summarized contains the method signature and its
body. The context of class or method is important
because it helps software developers to know the
behavior and the role of the class or method in the
software system.
Fig. 1 shows the source code summarization
process. Suncode approach creates a summary for
given software in three steps: 1) Extract the software
source code using static code parser. Then, 2)
Identify rapid summary messages for software
classes and methods. Finally, 3) Generate English
readable sentences to summarize each class and
method in the software system. The summarization
process is automatic; as long as the software
contains classes and methods, the summaries will
describe the main information of a given software
class or method.
As an illustrative example, this paper considers
the drawing shapes software§. However, this
software system uses to draw several types of
shapes (Al-Msie’deen, 2018; Al-Msie’deen and Blasi,
2018). This toy software used for better explanation
for the remaining of this paper. Suncode only uses
the software code as input for the software code
summarization process.
4. Source code summarization process
This section presents the source code
summarization process step-by-step. Suncode
identifies code summaries in three steps as detailed
below.
4.1. Extracting the software source code
Suncode uses static code analysis to parse
software source code. Suncode parser** produces an
XML file for each software system. The XML file
contains main code elements such as: package, class,
attribute and method. Also, it contains the main
dependencies between code elements such as:
inheritance, method invocation and attribute access.
The extracted XML file contains textual and
structural information about software code. XML
format is readable by the human experts and its
representation is independent of any programming
language (Kanellopoulos et al., 2006).
To document the legacy software system by
summarizing its code there are needs to extract main
code elements and main dependencies between
those elements. Fig. 2 shows the format of the XML
file which uses to express the object-oriented source
code.
4.2. Identifying rapid summary messages for
each class and method
Based on the extracted XML file from the previous
step, Suncode produces different types of messages
that represent information about a class’s/method’s
context. These messages are called rapid summary
messages. Suncode creates six different types of
rapid summary messages that represent information
about a class’s context. These rapid summary
messages are briefly described in Table 1.
The goal of these rapid summary messages is to
determine the content of each class in a software
system. First, a name message gives the name of the
class. The access level message is a message created
to reflect the access level of the class (e.g., public).
Another type of message is the package message.
The idea behind a package message is to give
software developers the package to which the class
belongs. A fourth message type is the inheritance
message. This message presents information about
§ Drawing shapes: https://sites.google.com/site/ralmsideen/tools
** Suncode parser https://sites.google.com/site/ralmsideen/tools
4. Ra'Fat Al-Msie'deen, Anas H. Blasi/International Journal of Advanced and Applied Sciences, 6(1) 2019, Pages: 59-67
62
the class that the class inherits from. Another type of
message is the attribute message. This message
mentions all attributes that belong to the class. The
last message type is the method message. This
message serves to mention all methods that belong
to the class. These messages are briefly shown in Fig.
3.
Fig. 1: Source code summarization process
Fig. 2: XML as a format of expression of object-oriented source code
Moreover, Suncode generates eight different
kinds of rapid summary messages that give
information about a method’s context. These rapid
summary messages are briefly described in Table 2.
5. Ra'Fat Al-Msie'deen, Anas H. Blasi/International Journal of Advanced and Applied Sciences, 6(1) 2019, Pages: 59-67
63
Table 1: Rapid summary messages that Suncode creates for class's context
# Message type Description
1 Name message Gives the name of the class
2 Access level message Gives the access level of the class
3 Package message Gives the name of the package to which the class belongs
4 Inheritance message Gives the name of the class that the class inherits from
5 Attribute message Gives the names of attributes that belong to the class
6 Method message Gives the names of methods that belong to the class
Fig. 3: Rapid summary messages that Suncode creates for class's context
Table 2: Rapid summary messages that Suncode creates for method's context
# Message type Description
1 Name message Gives the name of the method
2 Access level message Gives the access level of the method
3 Return type message Gives the return data type of the method
4 Class message Gives the name of the class to which the method belongs
5 Parameter message Gives the names, data types and order of the parameters
6 Variable message Gives the names and data types of local variables
7 Access message Gives the names of the attributes that the method accesses
8 Invocation message Gives the names of the methods that the method calls
The main objective of these rapid summary
messages is to determine the content of each method
in the software product. The first message is the
name message. This message gives the name of the
method. Then, the access level message aims to
return the access level of method (e.g., public, private
or protected). A third message type is the return
type message. The return type message is a message
created to give the return data type of the method.
Another type of message is the class message. The
idea behind a class message is to give programmers
the class to which the method belongs. The
parameter message is a message created to return
the number of parameters, names, data types and
order of the parameters in the method signature.
While, the variable message is a message conveys
information about the names and data types of local
variables inside the method body. The access
message returns the names of the attributes that the
method accesses and, at last, the final message type
is the invocation message. This message is used to
say what methods a given method calls or invokes.
The method's rapid summary messages are shown in
Fig. 4.
Suncode accepts the software source code. Then,
extracts the software code based on the static code
analysis (Al-Msie’deen, 2015). After that, the
approach identifies the rapid summary messages
that summarize the software classes and methods.
For example, in the drawing shapes software, the
rapid summary messages for myoval class are shown
in Table 3.
Table 3: An example of rapid summary messages produced by Suncode for class's context (e.g., myoval class)
# Message kind Message text
1 Name message The name of this class is MyOval.
2 Access level message The access level for this class is public.
3 Package message The package to which this class belongs is coreElements.
4 Inheritance message This class inherits from the MyShape class.
5 Attribute message This class contains the following attribute: example.
6 Method message This class contains the following methods: MyOval and draw.
6. Ra'Fat Al-Msie'deen, Anas H. Blasi/International Journal of Advanced and Applied Sciences, 6(1) 2019, Pages: 59-67
64
For method’s context, Suncode generates
different message types to give a summary for each
method in the software system. For instance, in the
drawing shapes software, the rapid summary
messages for main method are shown in Table 4.
Fig. 4: Rapid summary messages that Suncode creates for method's context
Table 4: An example of rapid summary messages generated by Suncode for method's context (e.g., main method)
# Message kind Message text
1 Name message The name of this method is main.
2 Access level message The access level for this method is public.
3 Return type message The return data type for this method is void.
4 Class message The class to which this method belongs is drawingShapes.
5 Parameter message
This method contains 1 parameter.
This method consists of the following parameter: args and its data type is string.
6 Variable message This method contains the following local variable: application and its data type is drawingShapes.
7 Access message This method accesses the following attributes: application and exit_on_close.
8 Invocation message This method invokes the following method: setDefaultCloseOperation.
The main goal of the code summarization is to
describe the source code of any software system. The
functionalities of any software implemented through
its classes and methods. Thus, Suncode focuses on
the class and method summaries.
4.3. Generating a readable English description
for each class and method
Suncode creates a summary of a given identifier
(class or method) in three steps: a) extracting
software code, then, b) generating rapid summary
messages and, at last, c) aggregating all rapid
summary messages into a single document. Suncode
summarizes software classes and methods as
readable English text and generates a document for
each class and method, where messages occur in a
predefined order as in Tables 1 and 2. This ordering
was decided based on the importance of the
messages from the author's point of view. The code
summary contains textual and structural
information (Al-Msie’deen et al., 2013). The main
dependencies between source code elements (i.e.,
inheritance, method invocation and attribute access)
represent structural code information, while the
software identifiers and their properties (e.g., name
and access level) consider as textual code
information.
In the last step of the code summarization
process, Suncode aggregates the rapid summary
messages into a single document. For example, the
summary of myoval class (Table 3) is "The name of
this class is MyOval. The access level for this class is
public. The package to which this class belongs is
coreElements. This class inherits from the MyShape
class. This class contains the following attribute:
example. This class contains the following methods:
MyOval and draw".
As another example, consider the main method
from the drawing shapes software. The Suncode
summary is "The name of this method is main. The
access level for this method is public. The return
data type for this method is void. The class to which
this method belongs is drawingShapes. This method
contains 1 parameter. This method consists of the
following parameter: args and its data type is string.
This method contains the following local variable:
application and its data type is drawingShapes. This
method accesses the following attributes:
application and exit_on_close. This method invokes
7. Ra'Fat Al-Msie'deen, Anas H. Blasi/International Journal of Advanced and Applied Sciences, 6(1) 2019, Pages: 59-67
65
the following method: setDefaultCloseOperation".
Moreover, the fully combined summary of draw
method is shown in Fig. 5 (McBurney and McMillan,
2016a).
Software developers need an efficient method to
help them to understand the legacy software code.
Reading the complete code of software takes too
long time. In addition, the reading of method (or
class) signature does not tell us enough information
about the purpose of the code. Thus, Suncode aims to
create a brief description for software code by using
textual and structural information in software
classes and methods.
Fig. 5: An example of a summary generated by Suncode approach (e.g., draw method)
5. Experimentation
To validate Suncode, the experiments were
conducted on two Java open-source software
systems: i.e., NanoXML†† and ArgoUML‡‡. NanoXML
software is a Java program for parsing XML file.
ArgoUML is an open source tool for UML diagrams.
The two software systems show different sizes:
NanoXML (medium system) and ArgoUML (large
system).
The different complexity levels display the
scalability of Suncode to dealing with such case
studies. NanoXML and ArgoUML software is well
documented and their code summaries are available
for comparison to Suncode summaries and
validation of its proposal.
Suncode summaries are compared to summaries
written by human experts such as Javadocs. Javadoc
is a type of software documentation which is a
summary written by the software developers. Table
5 shows the obtained summary for getResult method
through Suncode approach and Javadoc.
Moreover, Suncode summaries are compared to
summaries written by a state-of-the-art solution
(McBurney and McMillan, 2014; 2016a). Table 6
shows the obtained summary for read method
through Suncode approach and McBurney tool
(McBurney and McMillan, 2016a).
Table 7 presents the extracted summary of the
ArgoStatusEvent class from ArgoUML software using
Suncode prototype§§. In addition to the summary
from JavaDocs.
Results note that there is a difference between
the summaries written by software developers, and
summaries generated by Suncode approach.
Comparing to manually written summaries and the
state-of-the-art summaries, results found that
Suncode summaries provide better contextual
information. Suncode summaries can improve the
†† http://nanoxml.sourceforge.net/orig/index.html
‡‡ http://argouml-downloads.tigris.org/argouml-0.28.1/
§§ https://sites.google.com/site/ralmsideen/tools
current software documentation by combining
Suncode summaries with other summaries. In
contrast, manually written summaries were more
precise and short.
Based on the Suncode results, it appears the
summary is generated based on the class (resp.
method) signature and its body. The extracted
summary is clear, precise and brief. The extracted
summary tells programmers, where the method is
called. The produced summary tells software
developers where the class is inherited. The
summary contains too many details and helps
programmers to understand what the class and
method do. In addition, summaries provide helpful
contextual information about the software classes
and methods. The generated summaries can be used
to improve existing documentation.
One threat to the validity of Suncode approach is
that it considers only the Java software systems. This
represents a threat to prototype validity (Al-
Msie’deen, 2014) that limits Suncode
implementation ability to deal only with systems
that developed based on the Java language. In
addition, Suncode does not include the class and
method comments in the summarization process.
Moreover, Suncode does not split identifier name
into their constituent words where it appears as it is
in the summary (should be improved by using
identifier splitting algorithms).
6. Conclusion and future directions
This paper has presented a new approach for
automatically generating summaries of software
classes and methods. Suncode approach differs from
other approaches in that it summarizes the software
classes and methods. Moreover, Suncode exploits
textual and structural information (Haiduc et al.,
2010a) to summarize software classes and methods.
The only input of the approach is the software code
and the output is a set of English paragraphs
describing software classes and methods. Suncode
8. Ra'Fat Al-Msie'deen, Anas H. Blasi/International Journal of Advanced and Applied Sciences, 6(1) 2019, Pages: 59-67
66
used rapid summary messages to identify the most
important information in the class and method
context. Then, Suncode aggregated messages to
create an English paragraph of this context. The
authors have implemented Suncode and evaluated
its generated results on three case studies.
Table 5: The extracted summary from the getResult method in NanoXML software
Software NanoXML
Class StdXMLBuilder Method getResult ( )
Method signature public Object getResult ( ) Method body return this.root;
Javadoc summary
“This method returns the result of the building process” (NanoXML Javadocs:
http://nanoxml.sourceforge.net/orig/NanoXML-2-JavaDoc/index.html)
Suncode summary
The name of this method is getResult. The access level for this method is public. The return data type for this
method is object. The class to which this method belongs is StdXMLBuilder. This method accesses the
following attribute: root.
Results showed that all summaries were
identified. The authors have compared the
summaries produced from Suncode to summaries
written by human specialists (e.g., Javadocs) and to
summaries written by a state-of-the-art approach.
The authors have found that Suncode provided
better contextual information than manually written
and the state-of-the-art summaries.
Table 6: The mined summary from the read method in NanoXML software
Software NanoXML
Class StdXMLReader Method read ( )
McBurney
tool
summary
“This method reads a character and returns the character. That character is used in methods that add child XML elements
and attributes of XML elements. This method calls a method that skips the whitespace. This method can be used in an
assignment statement; for example: char ch = reader.read();” (McBurney and McMillan, 2016a).
Suncode
summary
The name of this method is read. The access level for this method is public. The return data type for this method is character.
The class to which this method belongs is StdXMLReader. This method contains the following local variable: ch and its data
type is int. This method accesses the following attributes: currentReader, pbReader and readers. This method invokes the
following methods: read, empty, close, pop and read.
Furthermore, authors can improve the current
software documentation by combining Suncode
summaries with the manually written summaries or
the state-of-the-art summaries. In contrast, manually
written summaries were more precise and short. For
future work, Suncode plans to split the names of
software identifiers (e.g., package, class, attribute
and method) into words by using the camel-case
splitting algorithm (Al-Msie’deen et al., 2014a). It
also plans to use the class and method comments in
the summarization process.
Table 7: The extracted summary from the ArgoStatusEvent class of ArgoUML software
Software ArgoUML
Class ArgoStatusEvent
Javadoc
summary
“The status event is used to notify interested parties of a status change” (ArgoUML Javadocs: http://argouml-
stats.tigris.org/nonav/javadocs/javadocs-0.28/)
Suncode
summary
The name of this class is ArgoStatusEvent. The access level for this class is public. The package to which this class belongs is
org.argouml.application.events. This class inherits from the ArgoEvent class. This class contains the following attribute: text.
This class contains the following methods: ArgoStatusEvent, getEventStartRange and getText.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict
of interest.
References
Al-Msie’deen R (2014). Reverse engineering feature models from
software variants to build software product lines: REVPLINE
approach. Ph.D. Dissertation, Universite Montpellier II-
Sciences et Techniques du Languedoc, France.
Al-Msie’deen R (2015). Visualizing object-oriented software for
understanding and documentation. International Journal of
Computer Science and Information Security, 13(5): 18–27.
Al-Msie’deen R (2018). Automatic labeling of the object-oriented
source code: The lotus approach. Science International-
Lahore, 30(1): 45–48.
Al-Msie’deen R and Blasi A (2018). The impact of the object-
oriented software evolution on software metrics: The iris
approach. Indian Journal of Science and Technology, 11(8): 1–
8.
https://doi.org/10.17485/ijst/2018/v11i8/121148
Al-Msie’deen R, Huchard M, Seriai A, Urtado C, and Vauttier S
(2014a). Automatic documentation of [mined] feature
implementations from source code elements and use- case
diagrams with the REVPLINE approach. International Journal
of Software Engineering and Knowledge Engineering, 24(10):
1413–1438.
https://doi.org/10.1142/S0218194014400142
Al-Msie’deen R, Huchard M, Seriai A, Urtado C, and Vauttier S
(2014b). Reverse engineering feature models from software
configurations using formal concept analysis. In the Eleventh
International Conference on Concept Lattices and Their
Applications, Košice, Slovakia: 95–106.
Al-Msie’deen R, Seriai A, Huchard M, Urtado C, and Vauttier S
(2013). Mining features from the object-oriented source code
of software variants by combining lexical and structural
similarity. In the IEEE 14th International Conference on
Information Reuse and Integratio, IEEE Computer Society, San
Francisco, CA, USA: 586–593.
https://doi.org/10.1109/IRI.2013.6642522
Al-Msie’deen R, Seriai A, Huchard M, Urtado C, and Vauttier S
(2014c). Documenting the mined feature implementations
from the object-oriented source code of a collection of
9. Ra'Fat Al-Msie'deen, Anas H. Blasi/International Journal of Advanced and Applied Sciences, 6(1) 2019, Pages: 59-67
67
software product variants. In The 26th International
Conference on Software Engineering and Knowledge
Engineering, Vancouver, Canada, 138–143.
Dave N, Davis D, Potts K, and Asuncion HU (2014). Uncovering file
relationships using association mining and topic modeling. In
the 6th International Conference on Information, Process, and
Knowledge Management, Barcelona, Spain: 105–111.
Forward A and Lethbridge TC (2002). The relevance of software
documentation, tools and technologies: A survey. In the 2002
ACM Symposium on Document Engineering, ACM, McLean,
Virginia, USA: 26-33.
https://doi.org/10.1145/585058.585065
Haiduc S, Aponte J, and Marcus A (2010a). Supporting program
comprehension with source code summarization. In the 32nd
ACM/IEEE International Conference on Software Engineering,
ACM, Cape Town, South Africa, 2: 223-226.
https://doi.org/10.1145/1810295.1810335
Haiduc S, Aponte J, Moreno L, and Marcus A (2010b). On the use of
automated text summarization techniques for summarizing
source code. In the 17th Working Conference on Reverse
Engineering (WCRE), IEEE, Beverly, USA: 35-44.
https://doi.org/10.1109/WCRE.2010.13
Kanellopoulos Y, Dimopulos T, Tjortjis C, and Makris C (2006).
Mining source code elements for comprehending object-
oriented systems and evaluating their maintainability.
SIGKDD Explorations, 8(1): 33–40.
https://doi.org/10.1145/1147234.1147240
Ko AJ, Myers BA, Coblenz MJ, and Aung HH (2006). An exploratory
study of how developers seek, relate, and collect relevant
information during software maintenance tasks. IEEE
Transactions on software Engineering, 32(12): 971-987.
https://doi.org/10.1109/TSE.2006.116
LaToza TD, Venolia G, and DeLine R (2006). Maintaining mental
models: A study of developer work habits. In the 28th
International Conference on Software Engineering, ACM,
Shanghai, China, 492–501.
https://doi.org/10.1145/1134285.1134355
McBurney PW and McMillan C (2014). Automatic documentation
generation via source code summarization of method context.
In the 22nd International Conference on Program
Comprehension, ACM, Hyderabad, India: 279-290.
https://doi.org/10.1145/2597008.2597149
McBurney PW and McMillan C (2016a). Automatic source code
summarization of context for java methods. IEEE Transactions
on Software Engineering, 42(2): 103-119.
https://doi.org/10.1109/TSE.2015.2465386
McBurney PW and McMillan C (2016b). An empirical study of the
textual similarity between source code and source code
summaries. Empirical Software Engineering, 21(1): 17-42.
https://doi.org/10.1007/s10664-014-9344-6
Moreno L, Aponte J, Sridhara G, Marcus A, Pollock L, and Vijay-
Shanker K (2013a). Automatic generation of natural language
summaries for java classes. In the IEEE 21st International
Conference on Program Comprehension, IEEE, San Francisco,
USA: 23-32.
https://doi.org/10.1109/ICPC.2013.6613830
Moreno L, Marcus A, Pollock L, and Vijay-Shanker K (2013b).
Jsummarizer: An automatic generator of natural language
summaries for java classes. In IEEE 21st International
Conference on Program Comprehension, IEEE, San Francisco,
USA: 230-232.
https://doi.org/10.1109/ICPC.2013.6613855
Radev DR, Hovy EH, and McKeown KR (2002). Introduction to the
special issue on summarization. Computational Linguistics,
28(4): 399–408.
https://doi.org/10.1162/089120102762671927
Roehm T, Tiarks R, Koschke R, and Maalej W (2012). How do
professional developers comprehend software?. In the 34th
International Conference on Software Engineering, IEEE,
Zurich, Switzerland: 255-265.
https://doi.org/10.1109/ICSE.2012.6227188
Shi L, Zhong H, Xie T, and Li M (2011). An empirical study on
evolution of API documentation. In the International
Conference on Fundamental Approaches to Software
Engineering, Springer, Berlin, Heidelberg, Germany: 416-431.
https://doi.org/10.1007/978-3-642-19811-3_29
Sridhara G, Hill E, Muppaneni D, Pollock L, and Vijay-Shanker K
(2010). Towards automatically generating summary
comments for java methods. In the IEEE/ACM International
Conference on Automated Software Engineering, ACM,
Antwerp, Belgium: 43-52.
https://doi.org/10.1145/1858996.1859006
Yau SS and Collofello JS (1980). Some stability measures for
software maintenance. IEEE Transactions on Software
Engineering, 6(6): 545-552.
https://doi.org/10.1109/TSE.1980.234503