My keynote at the 2012 Workshop on Mining Unstructured Data (co-located with the 10th Working Conference on Reverse Engineering - WCRE'12). Kingston, Ontario, Canada. October 17th, 2012.
International Journals of Management, IT & Engineering (IJMIE) is a refereed research journal which aims to promote the links between engineering and management. The journal focuses on issues related to the development and implementation of new methodologies and technologies, which improve the operational objectives of an organization. These include, among others, product development, human resources management, project management, logistics, production management, e-commerce, quality management, financial planning, risk management, decision support systems, General Management, Banking, Insurance, Economics, IT, Computer Science, Cyber Security and emerging trends in allied subjects. Thus, the journal provides a forum for researchers and practitioners for the publication of innovative scholarly research, which contributes to the adoption of a new holistic managerial approach that ensures a technologically, economically, socially and ecologically acceptable deployment of new technologies in business practice.
This document summarizes and reviews research on combining image compression and encryption techniques. It begins by introducing the topic and noting that compression and encryption are often combined to improve efficiency and security of data transmission. It then categorizes the combinations into three types: encryption followed by compression, compression followed by encryption, and hybrid techniques that combine the two.
The document proceeds to summarize research on each combination type. For encryption followed by compression, it outlines research applying symmetric and asymmetric encryption with both lossless and lossy compression. For compression followed by encryption, it discusses how compression can improve security by removing redundancies before encryption. Finally, it notes emerging research on hybrid techniques that integrate compression and encryption in a single step.
Determining Basis Test Paths Using Genetic Algorithm and J48 IJECEIAES
Basis test paths is a method that uses a graph contains nodes as a representation of codes and the lines as a sequence of code execution steps. Determination of basis test paths can be generated using a Genetic Algorithm, but the drawback was the number of iterations affect the possibility of visibility of the appropriate basis path. When the iteration is less, there is a possibility the paths do not appear all. Conversely, if the iteration is too much, all the paths have appeared in the middle of iteration. This research aims to optimize the performance of Genetic Algorithms for the generation of Basis Test Paths by determining how many iterations level corresponding to the characteristics of the code. Code metrics Node, Edge, VG, NBD, LOC were used as features to determine the number of iterations. J48 classifier was employed as a method to predict the number of iterations. There were 17 methods have selected as a data training, and 16 methods as a data test. The system was able to predict 84.5% of 58 basis paths. Efficiency test results also show that our system was able to seek Basis Paths 35% faster than the old system.
Randomness evaluation framework of cryptographic algorithmsijcisjournal
Nowadays, computer systems are developing very rapidly and become more and more complex, which
leads to the necessity to provide security for them. This paper is intended to present software for testing
and evaluating cryptographic algorithms. When evaluating block and stream ciphers one of the most basic
property expected from them is to pass statistical randomness testing, demonstrating in this way their
suitability to be random number generators. The primary goal of this paper is to propose a new framework
to evaluate the randomness of cryptographic algorithms: based only on a .dll file which offers access to the
encryption function, the decryption function and the key schedule function of the cipher that has to be tested
(block cipher or stream cipher), the application evaluates the randomness and provides an interpretation of
the results. For this, all nine tests used for evaluation of AES candidate block ciphers and three NIST
statistical tests are applied to the algorithm being tested. In this paper, we have evaluated Tiny Encryption
Algorithm (block cipher), Camellia (block cipher) and LEX (stream cipher) to determine if they pass
statistical randomness testing.
Proceedings of the 50th Hawaii International Conference on System Sciences | 2017
Discovering Malware with Time Series Shapelets
Om P. Patri
University of Southern California
Los Angeles, CA 90089
patri@usc.edu
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...ijscai
Internet is the boon in modern era as every organization uses it for dissemination of information and ecommerce
related applications. Sometimes people of organization feel delay while accessing internet in
spite of proper bandwidth. Prediction model of web caching and prefetching is an ideal solution of this
delay problem. Prediction model analysing history of internet user from server raw log files and determine
future sequence of web objects and placed all web objects to nearer to the user so access latency could be
reduced to some extent and problem of delay is to be solved. To determine sequence of future web objects,
it is necessary to determine proximity of one web object with other by identifying proper distance metric
technique related to web caching and prefetching. This paper studies different distance metric techniques
and concludes that bio informatics based distance metric techniques are ideal in context to Web Caching
and Web Prefetching
High Capacity Image Steganography Using Adjunctive Numerical Representations ...ijcisjournal
LSB steganography is a one of the most widely used methods for implementing covert data channels in
image file exchanges [1][2]. The low computational complexity and implementation simplicity of the algorithm are significant factors for its popularity with the primary reason being low image distortion. Many attempts have been made to increase the embedding capacity of LSB algorithms by expanding into the second or third binary layers of the image while maintaining a low probability of detection with minimal distortive effects [2][3][4]. In this paper,we introduce an advanced technique for covertly embedding data within images using redundant number system decomposition over non -standard digital bit planes. Both grayscale and bit-mapped images are equally effective as cover files. It will be shown that this unique steganography method has minimal visual distortive affects while also preserving the cover file statistics, making it less susceptible to most general steganography detection algorithms.
Using Cisco Network Components to Improve NIDPS Performance csandit
Network Intrusion Detection and Prevention Systems (NIDPSs) are used to detect, prevent and
report evidence of attacks and malicious traffic. Our paper presents a study where we used open
source NIDPS software. We show that NIDPS detection performance can be weak in the face of
high-speed and high-load traffic in terms of missed alerts and missed logs. To counteract this
problem, we have proposed and evaluated a solution that utilizes QoS, queues and parallel
technologies in a multi-layer Cisco Catalyst Switch to increase NIDPSs detection performance.
Our approach designs a novel QoS architecture to organise and improve throughput-forwardplan
traffic in a layer 3 switch in order to improve NIDPS performance.
International Journals of Management, IT & Engineering (IJMIE) is a refereed research journal which aims to promote the links between engineering and management. The journal focuses on issues related to the development and implementation of new methodologies and technologies, which improve the operational objectives of an organization. These include, among others, product development, human resources management, project management, logistics, production management, e-commerce, quality management, financial planning, risk management, decision support systems, General Management, Banking, Insurance, Economics, IT, Computer Science, Cyber Security and emerging trends in allied subjects. Thus, the journal provides a forum for researchers and practitioners for the publication of innovative scholarly research, which contributes to the adoption of a new holistic managerial approach that ensures a technologically, economically, socially and ecologically acceptable deployment of new technologies in business practice.
This document summarizes and reviews research on combining image compression and encryption techniques. It begins by introducing the topic and noting that compression and encryption are often combined to improve efficiency and security of data transmission. It then categorizes the combinations into three types: encryption followed by compression, compression followed by encryption, and hybrid techniques that combine the two.
The document proceeds to summarize research on each combination type. For encryption followed by compression, it outlines research applying symmetric and asymmetric encryption with both lossless and lossy compression. For compression followed by encryption, it discusses how compression can improve security by removing redundancies before encryption. Finally, it notes emerging research on hybrid techniques that integrate compression and encryption in a single step.
Determining Basis Test Paths Using Genetic Algorithm and J48 IJECEIAES
Basis test paths is a method that uses a graph contains nodes as a representation of codes and the lines as a sequence of code execution steps. Determination of basis test paths can be generated using a Genetic Algorithm, but the drawback was the number of iterations affect the possibility of visibility of the appropriate basis path. When the iteration is less, there is a possibility the paths do not appear all. Conversely, if the iteration is too much, all the paths have appeared in the middle of iteration. This research aims to optimize the performance of Genetic Algorithms for the generation of Basis Test Paths by determining how many iterations level corresponding to the characteristics of the code. Code metrics Node, Edge, VG, NBD, LOC were used as features to determine the number of iterations. J48 classifier was employed as a method to predict the number of iterations. There were 17 methods have selected as a data training, and 16 methods as a data test. The system was able to predict 84.5% of 58 basis paths. Efficiency test results also show that our system was able to seek Basis Paths 35% faster than the old system.
Randomness evaluation framework of cryptographic algorithmsijcisjournal
Nowadays, computer systems are developing very rapidly and become more and more complex, which
leads to the necessity to provide security for them. This paper is intended to present software for testing
and evaluating cryptographic algorithms. When evaluating block and stream ciphers one of the most basic
property expected from them is to pass statistical randomness testing, demonstrating in this way their
suitability to be random number generators. The primary goal of this paper is to propose a new framework
to evaluate the randomness of cryptographic algorithms: based only on a .dll file which offers access to the
encryption function, the decryption function and the key schedule function of the cipher that has to be tested
(block cipher or stream cipher), the application evaluates the randomness and provides an interpretation of
the results. For this, all nine tests used for evaluation of AES candidate block ciphers and three NIST
statistical tests are applied to the algorithm being tested. In this paper, we have evaluated Tiny Encryption
Algorithm (block cipher), Camellia (block cipher) and LEX (stream cipher) to determine if they pass
statistical randomness testing.
Proceedings of the 50th Hawaii International Conference on System Sciences | 2017
Discovering Malware with Time Series Shapelets
Om P. Patri
University of Southern California
Los Angeles, CA 90089
patri@usc.edu
STUDY OF DISTANCE MEASUREMENT TECHNIQUES IN CONTEXT TO PREDICTION MODEL OF WE...ijscai
Internet is the boon in modern era as every organization uses it for dissemination of information and ecommerce
related applications. Sometimes people of organization feel delay while accessing internet in
spite of proper bandwidth. Prediction model of web caching and prefetching is an ideal solution of this
delay problem. Prediction model analysing history of internet user from server raw log files and determine
future sequence of web objects and placed all web objects to nearer to the user so access latency could be
reduced to some extent and problem of delay is to be solved. To determine sequence of future web objects,
it is necessary to determine proximity of one web object with other by identifying proper distance metric
technique related to web caching and prefetching. This paper studies different distance metric techniques
and concludes that bio informatics based distance metric techniques are ideal in context to Web Caching
and Web Prefetching
High Capacity Image Steganography Using Adjunctive Numerical Representations ...ijcisjournal
LSB steganography is a one of the most widely used methods for implementing covert data channels in
image file exchanges [1][2]. The low computational complexity and implementation simplicity of the algorithm are significant factors for its popularity with the primary reason being low image distortion. Many attempts have been made to increase the embedding capacity of LSB algorithms by expanding into the second or third binary layers of the image while maintaining a low probability of detection with minimal distortive effects [2][3][4]. In this paper,we introduce an advanced technique for covertly embedding data within images using redundant number system decomposition over non -standard digital bit planes. Both grayscale and bit-mapped images are equally effective as cover files. It will be shown that this unique steganography method has minimal visual distortive affects while also preserving the cover file statistics, making it less susceptible to most general steganography detection algorithms.
Using Cisco Network Components to Improve NIDPS Performance csandit
Network Intrusion Detection and Prevention Systems (NIDPSs) are used to detect, prevent and
report evidence of attacks and malicious traffic. Our paper presents a study where we used open
source NIDPS software. We show that NIDPS detection performance can be weak in the face of
high-speed and high-load traffic in terms of missed alerts and missed logs. To counteract this
problem, we have proposed and evaluated a solution that utilizes QoS, queues and parallel
technologies in a multi-layer Cisco Catalyst Switch to increase NIDPSs detection performance.
Our approach designs a novel QoS architecture to organise and improve throughput-forwardplan
traffic in a layer 3 switch in order to improve NIDPS performance.
This document discusses a hybrid approach using genetic algorithms and fuzzy logic to improve anomaly and intrusion detection. It begins with an overview of intrusion detection systems and discusses different types of database anomalies. It then describes techniques for intrusion detection including clustering, genetic algorithms, and fuzzy c-means clustering. The document presents the advantages of using a genetic algorithm for intrusion detection systems. It provides results of experiments measuring fit value and time using the hybrid genetic algorithm and fuzzy approach. The experiments showed this approach can accurately detect different attacks. The conclusion is that the hybrid genetic algorithm and fuzzy method was effective at anomaly-based intrusion detection within a network.
(Structural) Feature Interactions for Variability-Intensive Systems Testing Gilles Perrouin
Presentation given in the "short talks" session in the Dagstuhl seminar 14281 on "Feature Interactions - the Next Generation" , Schloss Dagstuhl, Germany, July 2014.
The aim of this research paper is to design a new pseudorandom number generator based on FCSR registers, with not affecting the speed of generation.In the main part, a deep description of the cascades used in the design and describe the working principle of GPRN. The analyzed statistical characteristics are obtained from various generated sequences using test package NIST 800-22 and then confirmed by experiment, that a modeled GPRN has a higher period of repetition and works faster.
Features of genetic algorithm for plain text encryption IJECEIAES
The data communication has been growing in present day. Therefore, the data encryption became very essential in secured data transmission and storage and protecting data contents from intruder and unauthorized persons. In this paper, a fast technique for text encryption depending on genetic algorithm is presented. The encryption approach is achieved by the genetic operators Crossover and mutation. The encryption proposal technique based on dividing the plain text characters into pairs, and applying the crossover operation between them, followed by the mutation operation to get the encrypted text. The experimental results show that the proposal provides an important improvement in encryption rate with comparatively high-speed processing.
The document discusses an improved method for storing feature vectors to detect Android malware. It proposes using a compressed row storage format to efficiently store the statistical features that represent malware families. This involves storing only the non-zero elements of sparse feature matrices in three vectors, which reduces storage needs by 79% compared to conventional methods. This improved storage technique leads to reduced processing time for feature vector generation and malware detection overall. The proposed method aims to enhance Android malware analysis by making feature vector searches and classification faster.
Abstract DNA cryptography the new era of cryptography enhanced the cryptography in terms of time complexity as well as capacity. It uses the DNA strands to hide the information. The repeated sequence of the DNA makes highly difficult for unintended authority to get the message. The level of security can be increased by using the more than one cover medium. This paper discussed a technique that uses two cover medium to hide the message one is DNA and the other is image. The message is encrypted and converted to faked DNA then this faked DNA is hided to the image. Keywords: DNA, cryptography, DNA cryptography
The quality of image encryption techniques by reasoned logicTELKOMNIKA JOURNAL
One form of data is digital images, because of their widespread of frequent exchange over the internet it is necessary to preserve the security and privacy of the images transmitted.There are many image encryption techniques that have different security levels and there are many standards and protocols fortesting the quality of encryption security. The cipher images can be evaluated using various quality measuring criteria, these measures quantify certain features of the image. If there are many methods that can be applied to secure images; the question is what is the most powerful scheme that can be use damong these methods? This research try to answer this question by taking three different encryption methods (rivest cipher 5 (RC5), chaotic and permutation) and measure their quality using the peek signal to noise ratio (PSNR),correlation, entropy, number of pixels changes rate (NPCR) and unified average changing intensity (UACI), the results of these criteria were input to a fuzzy logic system that was used to find the best one among them.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
A Modified Technique For Performing Data Encryption & Data DecryptionIJERA Editor
In this age of universal electronic connectivity of viruses and hackers of electronic eavesdropping and electronic fraud, there is indeed needed to store the information securely. This, in turn, led to a heightened awareness to protect data and resources from disclosure, to guarantee the authenticity of data and messages and to protect systems from network-based attacks. Information security via encryption decryption techniques is a very popular research area for many people’s over the years. This paper elaborates the basic concept of the cryptography, specially public and private cryptography. It also contains a review of some popular encryption decryption algorithms. A modified method is also proposed. This method is fast in comparison to the existing methods.
This document provides a history of the programming language CLU and the development of the concept of data abstraction, which was a foundational idea behind CLU's design. It describes how the idea of data abstraction arose from early work on programming methodology focusing on modularity and encapsulation. CLU was the first implemented language to provide direct support for data abstraction through its features for defining abstract data types with encapsulated representations and operations. The document outlines the origins and development of the data abstraction concept and provides an overview of CLU's design process and influential features such as its exception handling, iterators, and parameterized types.
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONijaia
In natural language processing, attention mechanism in neural networks are widely utilized. In this paper, the research team explore a new mechanism of extending output attention in recurrent neural networks for dialog systems. The new attention method was compared with the current method in generating dialog sentence using a real dataset. Our architecture exhibits several attractive properties such as better handle long sequences and, it could generate more reasonable replies in many cases.
USE OF MARKOV CHAIN FOR EARLY DETECTING DDOS ATTACKSIJNSA Journal
DDoS has a variety of types of mixed attacks. Botnet attackers can chain different types of DDoS attacks to confuse cybersecurity defenders. In this article, the attack type can be represented as the state of the model. Considering the attack type, we use this model to calculate the final attack probability. The final attack probability is then converted into one prediction vector, and the incoming attacks can be detected early before IDS issues an alert. The experiment results have shown that the prediction model that can make multi-vector DDoS detection and analysis easier.
The document discusses techniques for analyzing unstructured text data from software repositories. It describes using textual analysis on code identifiers, comments, commit messages, issue trackers, emails, and forums to perform tasks like traceability link recovery, feature location, clone detection, and bug prediction. Different techniques are discussed, including pattern matching, island parsers, information retrieval methods, and natural language parsing. Choosing the right technique depends on the type of unstructured data and needs of the analysis.
Finding Bad Code Smells with Neural Network Models IJECEIAES
Code smell refers to any symptom introduced in design or implementation phases in the source code of a program. Such a code smell can potentially cause deeper and serious problems during software maintenance. The existing approaches to detect bad smells use detection rules or standards using a combination of different object-oriented metrics. Although a variety of software detection tools have been developed, they still have limitations and constraints in their capabilities. In this paper, a code smell detection system is presented with the neural network model that delivers the relationship between bad smells and object-oriented metrics by taking a corpus of Java projects as experimental dataset. The most well-known objectoriented metrics are considered to identify the presence of bad smells. The code smell detection system uses the twenty Java projects which are shared by many users in the GitHub repositories. The dataset of these Java projects is partitioned into mutually exclusive training and test sets. The training dataset is used to learn the network model which will predict smelly classes in this study. The optimized network model will be chosen to be evaluated on the test dataset. The experimental results show when the modelis highly trained with more dataset, the prediction outcomes are improved more and more. In addition, the accuracy of the model increases when it performs with higher epochs and many hidden layers.
The document describes a proposed tool called the Class Breakpoint Analyzer (CBA) that evaluates software quality at the class level. The CBA extracts metrics like weighted methods per class (WMC), depth of inheritance tree (DIT), number of children (NOC), and lack of cohesion in methods (LCOM) based on the Chidamber and Kemerer (CK) metrics suite. Threshold values are set for each metric to determine if a class is overloaded. The CBA then generates a scorecard for each class to identify classes that need to be refactored to improve quality and reusability. The goal is to help evaluate code quality, identify areas for improvement, and make off-the-shelf
Class quality evaluation using class quality scorecardsIAEME Publication
The document describes a Class Breakpoint Analyzer tool that evaluates software quality using metrics. The tool extracts metrics like Weighted Methods per Class (WMC), Depth of Inheritance Tree (DIT), Number of Children (NOC), and Lack of Cohesion in Methods (LCOM) from source code. Threshold values for each metric indicate if a class needs restructuring. The tool generates a scorecard to determine if a class is overloaded or saturated. This helps improve reusability of existing software and evaluate code quality for junior programmers. The tool uses metrics from the Chidamber and Kemerer (CK) suite to analyze classes and suggest where to break classes for better design.
This document summarizes Martin Pinzger's research on predicting buggy methods using software repository mining. The key points are:
1. Pinzger and colleagues conducted experiments on 21 Java projects to predict buggy methods using source code and change metrics. Change metrics like authors and method histories performed best with up to 96% accuracy.
2. Predicting buggy methods at a finer granularity than files can save manual inspection and testing effort. Accuracy decreases as fewer methods are predicted but change metrics maintain higher precision.
3. Case studies on two classes show that method-level prediction achieves over 82% precision compared to only 17-42% at the file level. This demonstrates the benefit of finer-
Fake Reviews Detection using Supervised Machine LearningIRJET Journal
The document presents a study on detecting fake reviews using supervised machine learning techniques. The study applies various machine learning classifiers to identify fake reviews based on the content of reviews as well as features extracted from reviewers' behaviors.
The proposed approach involves three phases: data preprocessing, feature extraction, and feature engineering. It extracts features representing reviewers' behaviors like capital letters, punctuation, and emojis used. It then compares classifier performance on Yelp restaurant reviews in identifying fake reviews, with and without these extracted behavioral features.
The results show that classifiers generally perform better at detecting fake reviews when the extracted behavioral features are included. For example, the KNN classifier's f-score improved from 82.40% to 83.20%
INVESTIGATING THE EFFECT OF BD-CRAFT TO TEXT DETECTION ALGORITHMSijaia
With the rise and development of deep learning, computer vision and document analysis has influenced the
area of text detection. Despite significant efforts in improving text detection performance, it remains to be
challenging, as evident by the series of the Robust Reading Competitions. This study investigates the impact
of employing BD-CRAFT – a variant of CRAFT that involves automatic image classification utilizing a
Laplacian operator and further preprocess the classified blurry images using blind deconvolution to the
top-ranked algorithms, SenseTime and TextFuseNet. Results revealed that the proposed method
significantly enhanced the detection performances of the said algorithms. TextFuseNet + BD-CRAFT
achieved an outstanding h-mean result of 93.55% and shows an impressive improvement of over 4%
increase to its precision yielding 95.71% while SenseTime + BD-CRAFT placed first with a very
remarkable 95.22% h-mean and exhibited a huge precision improvement of over 4%.
Investigating the Effect of BD-CRAFT to Text Detection Algorithmsgerogepatton
The document summarizes a study that investigated the effect of applying a blind deconvolution technique called BD-CRAFT to improve the performance of two state-of-the-art text detection algorithms, SenseTime and TextFuseNet. BD-CRAFT automatically classifies images as blurry or non-blurry using a Laplacian operator threshold, and applies blind deconvolution to deblur the blurry images. The study found that combining BD-CRAFT with SenseTime and TextFuseNet significantly improved their text detection performances on the ICDAR 2013 dataset, with TextFuseNet + BD-CRAFT achieving a 93.55% h-mean and SenseTime + BD-CRAFT
This document presents a new model called EQUIRS (Explicitly Query Understanding Information Retrieval System) based on Hidden Markov Models (HMM) to improve natural language processing for text query information retrieval. The proposed EQUIRS system is compared to previous fuzzy clustering methods. Experimental results on a dataset of 900 files across 5 categories show that EQUIRS has higher accuracy than fuzzy clustering, as measured by precision, recall, F-measure, though it has longer training and searching times. The document concludes that EQUIRS is an effective approach for information retrieval based on HMM.
This document discusses a hybrid approach using genetic algorithms and fuzzy logic to improve anomaly and intrusion detection. It begins with an overview of intrusion detection systems and discusses different types of database anomalies. It then describes techniques for intrusion detection including clustering, genetic algorithms, and fuzzy c-means clustering. The document presents the advantages of using a genetic algorithm for intrusion detection systems. It provides results of experiments measuring fit value and time using the hybrid genetic algorithm and fuzzy approach. The experiments showed this approach can accurately detect different attacks. The conclusion is that the hybrid genetic algorithm and fuzzy method was effective at anomaly-based intrusion detection within a network.
(Structural) Feature Interactions for Variability-Intensive Systems Testing Gilles Perrouin
Presentation given in the "short talks" session in the Dagstuhl seminar 14281 on "Feature Interactions - the Next Generation" , Schloss Dagstuhl, Germany, July 2014.
The aim of this research paper is to design a new pseudorandom number generator based on FCSR registers, with not affecting the speed of generation.In the main part, a deep description of the cascades used in the design and describe the working principle of GPRN. The analyzed statistical characteristics are obtained from various generated sequences using test package NIST 800-22 and then confirmed by experiment, that a modeled GPRN has a higher period of repetition and works faster.
Features of genetic algorithm for plain text encryption IJECEIAES
The data communication has been growing in present day. Therefore, the data encryption became very essential in secured data transmission and storage and protecting data contents from intruder and unauthorized persons. In this paper, a fast technique for text encryption depending on genetic algorithm is presented. The encryption approach is achieved by the genetic operators Crossover and mutation. The encryption proposal technique based on dividing the plain text characters into pairs, and applying the crossover operation between them, followed by the mutation operation to get the encrypted text. The experimental results show that the proposal provides an important improvement in encryption rate with comparatively high-speed processing.
The document discusses an improved method for storing feature vectors to detect Android malware. It proposes using a compressed row storage format to efficiently store the statistical features that represent malware families. This involves storing only the non-zero elements of sparse feature matrices in three vectors, which reduces storage needs by 79% compared to conventional methods. This improved storage technique leads to reduced processing time for feature vector generation and malware detection overall. The proposed method aims to enhance Android malware analysis by making feature vector searches and classification faster.
Abstract DNA cryptography the new era of cryptography enhanced the cryptography in terms of time complexity as well as capacity. It uses the DNA strands to hide the information. The repeated sequence of the DNA makes highly difficult for unintended authority to get the message. The level of security can be increased by using the more than one cover medium. This paper discussed a technique that uses two cover medium to hide the message one is DNA and the other is image. The message is encrypted and converted to faked DNA then this faked DNA is hided to the image. Keywords: DNA, cryptography, DNA cryptography
The quality of image encryption techniques by reasoned logicTELKOMNIKA JOURNAL
One form of data is digital images, because of their widespread of frequent exchange over the internet it is necessary to preserve the security and privacy of the images transmitted.There are many image encryption techniques that have different security levels and there are many standards and protocols fortesting the quality of encryption security. The cipher images can be evaluated using various quality measuring criteria, these measures quantify certain features of the image. If there are many methods that can be applied to secure images; the question is what is the most powerful scheme that can be use damong these methods? This research try to answer this question by taking three different encryption methods (rivest cipher 5 (RC5), chaotic and permutation) and measure their quality using the peek signal to noise ratio (PSNR),correlation, entropy, number of pixels changes rate (NPCR) and unified average changing intensity (UACI), the results of these criteria were input to a fuzzy logic system that was used to find the best one among them.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
A Modified Technique For Performing Data Encryption & Data DecryptionIJERA Editor
In this age of universal electronic connectivity of viruses and hackers of electronic eavesdropping and electronic fraud, there is indeed needed to store the information securely. This, in turn, led to a heightened awareness to protect data and resources from disclosure, to guarantee the authenticity of data and messages and to protect systems from network-based attacks. Information security via encryption decryption techniques is a very popular research area for many people’s over the years. This paper elaborates the basic concept of the cryptography, specially public and private cryptography. It also contains a review of some popular encryption decryption algorithms. A modified method is also proposed. This method is fast in comparison to the existing methods.
This document provides a history of the programming language CLU and the development of the concept of data abstraction, which was a foundational idea behind CLU's design. It describes how the idea of data abstraction arose from early work on programming methodology focusing on modularity and encapsulation. CLU was the first implemented language to provide direct support for data abstraction through its features for defining abstract data types with encapsulated representations and operations. The document outlines the origins and development of the data abstraction concept and provides an overview of CLU's design process and influential features such as its exception handling, iterators, and parameterized types.
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONijaia
In natural language processing, attention mechanism in neural networks are widely utilized. In this paper, the research team explore a new mechanism of extending output attention in recurrent neural networks for dialog systems. The new attention method was compared with the current method in generating dialog sentence using a real dataset. Our architecture exhibits several attractive properties such as better handle long sequences and, it could generate more reasonable replies in many cases.
USE OF MARKOV CHAIN FOR EARLY DETECTING DDOS ATTACKSIJNSA Journal
DDoS has a variety of types of mixed attacks. Botnet attackers can chain different types of DDoS attacks to confuse cybersecurity defenders. In this article, the attack type can be represented as the state of the model. Considering the attack type, we use this model to calculate the final attack probability. The final attack probability is then converted into one prediction vector, and the incoming attacks can be detected early before IDS issues an alert. The experiment results have shown that the prediction model that can make multi-vector DDoS detection and analysis easier.
The document discusses techniques for analyzing unstructured text data from software repositories. It describes using textual analysis on code identifiers, comments, commit messages, issue trackers, emails, and forums to perform tasks like traceability link recovery, feature location, clone detection, and bug prediction. Different techniques are discussed, including pattern matching, island parsers, information retrieval methods, and natural language parsing. Choosing the right technique depends on the type of unstructured data and needs of the analysis.
Finding Bad Code Smells with Neural Network Models IJECEIAES
Code smell refers to any symptom introduced in design or implementation phases in the source code of a program. Such a code smell can potentially cause deeper and serious problems during software maintenance. The existing approaches to detect bad smells use detection rules or standards using a combination of different object-oriented metrics. Although a variety of software detection tools have been developed, they still have limitations and constraints in their capabilities. In this paper, a code smell detection system is presented with the neural network model that delivers the relationship between bad smells and object-oriented metrics by taking a corpus of Java projects as experimental dataset. The most well-known objectoriented metrics are considered to identify the presence of bad smells. The code smell detection system uses the twenty Java projects which are shared by many users in the GitHub repositories. The dataset of these Java projects is partitioned into mutually exclusive training and test sets. The training dataset is used to learn the network model which will predict smelly classes in this study. The optimized network model will be chosen to be evaluated on the test dataset. The experimental results show when the modelis highly trained with more dataset, the prediction outcomes are improved more and more. In addition, the accuracy of the model increases when it performs with higher epochs and many hidden layers.
The document describes a proposed tool called the Class Breakpoint Analyzer (CBA) that evaluates software quality at the class level. The CBA extracts metrics like weighted methods per class (WMC), depth of inheritance tree (DIT), number of children (NOC), and lack of cohesion in methods (LCOM) based on the Chidamber and Kemerer (CK) metrics suite. Threshold values are set for each metric to determine if a class is overloaded. The CBA then generates a scorecard for each class to identify classes that need to be refactored to improve quality and reusability. The goal is to help evaluate code quality, identify areas for improvement, and make off-the-shelf
Class quality evaluation using class quality scorecardsIAEME Publication
The document describes a Class Breakpoint Analyzer tool that evaluates software quality using metrics. The tool extracts metrics like Weighted Methods per Class (WMC), Depth of Inheritance Tree (DIT), Number of Children (NOC), and Lack of Cohesion in Methods (LCOM) from source code. Threshold values for each metric indicate if a class needs restructuring. The tool generates a scorecard to determine if a class is overloaded or saturated. This helps improve reusability of existing software and evaluate code quality for junior programmers. The tool uses metrics from the Chidamber and Kemerer (CK) suite to analyze classes and suggest where to break classes for better design.
This document summarizes Martin Pinzger's research on predicting buggy methods using software repository mining. The key points are:
1. Pinzger and colleagues conducted experiments on 21 Java projects to predict buggy methods using source code and change metrics. Change metrics like authors and method histories performed best with up to 96% accuracy.
2. Predicting buggy methods at a finer granularity than files can save manual inspection and testing effort. Accuracy decreases as fewer methods are predicted but change metrics maintain higher precision.
3. Case studies on two classes show that method-level prediction achieves over 82% precision compared to only 17-42% at the file level. This demonstrates the benefit of finer-
Fake Reviews Detection using Supervised Machine LearningIRJET Journal
The document presents a study on detecting fake reviews using supervised machine learning techniques. The study applies various machine learning classifiers to identify fake reviews based on the content of reviews as well as features extracted from reviewers' behaviors.
The proposed approach involves three phases: data preprocessing, feature extraction, and feature engineering. It extracts features representing reviewers' behaviors like capital letters, punctuation, and emojis used. It then compares classifier performance on Yelp restaurant reviews in identifying fake reviews, with and without these extracted behavioral features.
The results show that classifiers generally perform better at detecting fake reviews when the extracted behavioral features are included. For example, the KNN classifier's f-score improved from 82.40% to 83.20%
INVESTIGATING THE EFFECT OF BD-CRAFT TO TEXT DETECTION ALGORITHMSijaia
With the rise and development of deep learning, computer vision and document analysis has influenced the
area of text detection. Despite significant efforts in improving text detection performance, it remains to be
challenging, as evident by the series of the Robust Reading Competitions. This study investigates the impact
of employing BD-CRAFT – a variant of CRAFT that involves automatic image classification utilizing a
Laplacian operator and further preprocess the classified blurry images using blind deconvolution to the
top-ranked algorithms, SenseTime and TextFuseNet. Results revealed that the proposed method
significantly enhanced the detection performances of the said algorithms. TextFuseNet + BD-CRAFT
achieved an outstanding h-mean result of 93.55% and shows an impressive improvement of over 4%
increase to its precision yielding 95.71% while SenseTime + BD-CRAFT placed first with a very
remarkable 95.22% h-mean and exhibited a huge precision improvement of over 4%.
Investigating the Effect of BD-CRAFT to Text Detection Algorithmsgerogepatton
The document summarizes a study that investigated the effect of applying a blind deconvolution technique called BD-CRAFT to improve the performance of two state-of-the-art text detection algorithms, SenseTime and TextFuseNet. BD-CRAFT automatically classifies images as blurry or non-blurry using a Laplacian operator threshold, and applies blind deconvolution to deblur the blurry images. The study found that combining BD-CRAFT with SenseTime and TextFuseNet significantly improved their text detection performances on the ICDAR 2013 dataset, with TextFuseNet + BD-CRAFT achieving a 93.55% h-mean and SenseTime + BD-CRAFT
This document presents a new model called EQUIRS (Explicitly Query Understanding Information Retrieval System) based on Hidden Markov Models (HMM) to improve natural language processing for text query information retrieval. The proposed EQUIRS system is compared to previous fuzzy clustering methods. Experimental results on a dataset of 900 files across 5 categories show that EQUIRS has higher accuracy than fuzzy clustering, as measured by precision, recall, F-measure, though it has longer training and searching times. The document concludes that EQUIRS is an effective approach for information retrieval based on HMM.
1) The document discusses various ways that artificial intelligence can be applied to different phases of the software engineering lifecycle, including requirements specification, design, coding, testing, and estimation.
2) It provides examples of using techniques like natural language processing to clarify requirements, knowledge graphs to manage requirements information, and computational intelligence for requirements prioritization.
3) For design, the document discusses using intelligent agents to recommend patterns and designs to satisfy quality attributes from requirements and assist with assigning responsibilities to components.
Abstract-Software is ubiquitous in our daily life. It brings us great convenience and a big headache about software reliability as well: Software is never bug-free, and software bugs keep incurring monetary loss of even catastrophes. In the pursuit of better reliability, software engineering researchers found that huge amount of data in various forms can be collected from software systems, and these data, when properly analyzed, can help improve software reliability. Unfortunately, the huge volume of complex data renders the analysis of simple techniques incompetent; consequently, studies have been resorting to data mining for more effective analysis. In the past few years, we have witnessed many studies on mining for software reliability reported in data mining as well as software engineering forums. These studies either develop new or apply existing data mining techniques to tackle reliability problems from different angles. In order to keep data mining researchers abreast of the latest development in this growing research area, we propose this paper on data mining for software reliability. In this paper, we will present a comprehensive overview of this area, examine representative studies, and lay out challenges to data mining researchers.
Improvement of Software Maintenance and Reliability using Data Mining Techniquesijdmtaiir
This document discusses using data mining techniques to improve software maintenance and reliability. It provides an overview of applying techniques like classification, association rule mining, and clustering to mine software engineering data from code bases, change histories, and bug reports. Specifically, it describes mining frequent patterns and rules from source code and revision histories to detect bugs as deviations from these patterns. A methodology is presented that involves parsing source code to build an itemset database, applying frequent itemset mining to extract programming patterns and rules, and detecting violations of rules as potential bugs. Challenges and limitations of these approaches are also discussed.
The document summarizes text mining techniques in data mining. It discusses common text mining tasks like text categorization, clustering, and entity extraction. It also reviews several text mining algorithms and techniques, including information extraction, clustering, classification, and information visualization. Several literature papers applying these techniques to domains like movie reviews, research proposals, and e-commerce are also summarized. The document concludes that text mining can extract useful patterns from unstructured text through techniques like clustering, classification, and information extraction.
Deep Learning in Text Recognition and Text Detection : A ReviewIRJET Journal
The document discusses text detection and recognition using deep learning techniques. It begins with an introduction to deep learning and its use in optical character recognition (OCR). There are two main components of OCR - text detection, which locates text in an image, and text recognition, which identifies the text. Convolutional neural networks (CNNs) are effective for both tasks. The document then outlines the steps for text detection and recognition using deep learning, including data collection, preprocessing, feature extraction using CNNs, training and validating models on datasets, and testing the trained models on new data.
Data mining is the knowledge discovery in databases and the gaol is to extract patterns and knowledge from
large amounts of data. The important term in data mining is text mining. Text mining extracts the quality
information highly from text. Statistical pattern learning is used to high quality information. High –quality in
text mining defines the combinations of relevance, novelty and interestingness. Tasks in text mining are text
categorization, text clustering, entity extraction and sentiment analysis. Applications of natural language
processing and analytical methods are highly preferred to turn
Software Refactoring Under Uncertainty: A Robust Multi-Objective ApproachWiem Mkaouer
This document describes a multi-objective robust optimization approach for software refactoring that accounts for uncertainty in code smell severity levels and class importance. The approach formulates refactoring as a multi-objective problem to find solutions that maximize both quality, by correcting code smells, and robustness to changes in severity levels and importance. An evaluation on six open source projects found the approach generates refactoring solutions comparable in quality to existing approaches but with significantly better robustness across different scenarios.
A Survey on Design Pattern Detection ApproachesCSCJournals
Design patterns play a key role in software development process. The interest in extracting design pattern instances from object-oriented software has increased tremendously in the last two decades. Design patterns enhance program understanding, help to document the systems and capture design trade-offs.
This paper provides the current state of the art in design patterns detection. The selected approaches cover the whole spectrum of the research in design patterns detection. We noticed diverse accuracy values extracted by different detection approaches. The lessons learned are listed at the end of this paper, which can be used for future research directions and guidelines in the area of design patterns detection.
An Implementation on Effective Robot Mission under Critical Environemental Co...IJERA Editor
Software engineering is a field of engineering, for designing and writing programs for computers or other electronic devices. A software engineer, or programmer, writes software (or changes existing software) and compiles software using methods that make it better quality. Is the application of engineering to the design, development, implementation, testingand main tenance of software in a systematic method. Now a days the robotics are also plays an important role in present automation concepts. But we have several challenges in that robots when they are operated in some critical environments. Motion planning and task planning are two fundamental problems in robotics that have been addressed from different perspectives. For resolve this there are Temporal logic based approaches that automatically generate controllers have been shown to be useful for mission level planning of motion, surveillance and navigation, among others. These approaches critically rely on the validity of the environment models used for synthesis. Yet simplifying assumptions are inevitable to reduce complexity and provide mission-level guarantees; no plan can guarantee results in a model of a world in which everything can go wrong. In this paper, we show how our approach, which reduces reliance on a single model by introducing a stack of models, can endow systems with incremental guarantees based on increasingly strengthened assumptions, supporting graceful degradation when the environment does not behave as expected, and progressive enhancement when it does.
Similar to Not Only Statements: The Role of Textual Analysis in Software Quality (20)
ATTICUS - Premio FORUM PA Sanità 2019 (Presentazione)Rocco Oliveto
Candidatura del progetto ATTICUS al Premio Forum PA Sanità 2019. ATTICUS prevede lo sviluppo di un sistema hardware/software intelligente, in grado di monitorare costantemente un individuo e di segnalare anomalie che riguardano sia il suo stato di salute, rilevate attraverso la misura e l’analisi automatica dei parametri vitali, sia nel suo comportamento, rilevate attraverso il monitoraggio e l’analisi degli spostamenti che la persona compie nello svolgimento delle sue attività.
Candidatura del progetto ATTICUS al Premio Forum PA Sanità 2019. ATTICUS prevede lo sviluppo di un sistema hardware/software intelligente, in grado di monitorare costantemente un individuo e di segnalare anomalie che riguardano sia il suo stato di salute, rilevate attraverso la misura e l’analisi automatica dei parametri vitali, sia nel suo comportamento, rilevate attraverso il monitoraggio e l’analisi degli spostamenti che la persona compie nello svolgimento delle sue attività.
Il Corso di Laurea in Informatica incontra il Mondo del Lavoro - Presentazion...Rocco Oliveto
Il contributo degli esponenti del Mondo del Lavoro, delle Professioni, delle Imprese e delle Pubbliche Amministrazioni sono indispensabili per rafforzare l’occupabilità e la crescita sia personale sia professionale degli Studenti universitari. La prima edizione dell’evento “Il Corso di Studio di Informatica incontra il mondo del lavoro” ha l’obiettivo di valutare con esponenti del Mondo del Lavoro la completezza e l’efficacia dell’offerta formativa del Corso di Laurea in Informatica. L’evento sarà anche l’occasione per presentare una prima bozza di progettazione di una Laurea Magistrale in Informatica su temi legati alla “Sicurezza Informatica”.
This document summarizes the process used to identify the Most Influential Paper from the 2005 International Workshop on Program Comprehension. It involved a two-step process: 1) Analyzing citations of the 24 papers presented at IWPC 2005 over the past 10 years, identifying papers with the most citations as candidates; 2) Having the program committee members vote on the candidate papers to determine the Most Influential Paper. Through this process, the paper "Concise and Consistent Naming" by Florian Deißenböck and Markus Pizka was identified as having the most citations (158 over 10 years) and the most votes (10) from program committee members, and was thus deemed the
Presentazione CdL in Informatica @UNIMOL - 2014Rocco Oliveto
Presentazione del CdL in Informatica alla giornata della matricola del 07.10.14 organizzata all'Università del Molise per dare il benvenuto alle matricole della coorte 2014/2015.
SCAM 2014 - A few notes from the Program ChairsRocco Oliveto
The document summarizes the 14th IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM 2014). It provides details on the number of submissions received, reviewing process, accepted papers, program including technical sessions and keynote speaker, awards, and plans for a post-conference special journal issue. Over 110 papers and 20 demos were submitted and reviewed, with 26 papers and 9 demos accepted after a rigorous peer-review process involving over 90 reviewers. The conference program featured 6 technical sessions over two days along with tool demonstrations and an open steering committee meeting.
The document announces the ICPC 2015 conference to be held in Florence, Italy from May 18-19, 2015. It provides information about the location of the conference in Florence including transportation from major cities, venues such as the Congress Center and Villa Vittoria, and sights around the city. It also lists the organizing committee and technical details such as the call for papers, submission deadlines, and a special journal issue for best papers.
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills MN
Travis Hills of Minnesota developed a method to convert waste into high-value dry fertilizer, significantly enriching soil quality. By providing farmers with a valuable resource derived from waste, Travis Hills helps enhance farm profitability while promoting environmental stewardship. Travis Hills' sustainable practices lead to cost savings and increased revenue for farmers by improving resource efficiency and reducing waste.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
Nucleophilic Addition of carbonyl compounds.pptxSSR02
Nucleophilic addition is the most important reaction of carbonyls. Not just aldehydes and ketones, but also carboxylic acid derivatives in general.
Carbonyls undergo addition reactions with a large range of nucleophiles.
Comparing the relative basicity of the nucleophile and the product is extremely helpful in determining how reversible the addition reaction is. Reactions with Grignards and hydrides are irreversible. Reactions with weak bases like halides and carboxylates generally don’t happen.
Electronic effects (inductive effects, electron donation) have a large impact on reactivity.
Large groups adjacent to the carbonyl will slow the rate of reaction.
Neutral nucleophiles can also add to carbonyls, although their additions are generally slower and more reversible. Acid catalysis is sometimes employed to increase the rate of addition.
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...Wasswaderrick3
In this book, we use conservation of energy techniques on a fluid element to derive the Modified Bernoulli equation of flow with viscous or friction effects. We derive the general equation of flow/ velocity and then from this we derive the Pouiselle flow equation, the transition flow equation and the turbulent flow equation. In the situations where there are no viscous effects , the equation reduces to the Bernoulli equation. From experimental results, we are able to include other terms in the Bernoulli equation. We also look at cases where pressure gradients exist. We use the Modified Bernoulli equation to derive equations of flow rate for pipes of different cross sectional areas connected together. We also extend our techniques of energy conservation to a sphere falling in a viscous medium under the effect of gravity. We demonstrate Stokes equation of terminal velocity and turbulent flow equation. We look at a way of calculating the time taken for a body to fall in a viscous medium. We also look at the general equation of terminal velocity.
ESR spectroscopy in liquid food and beverages.pptxPRIYANKA PATEL
With increasing population, people need to rely on packaged food stuffs. Packaging of food materials requires the preservation of food. There are various methods for the treatment of food to preserve them and irradiation treatment of food is one of them. It is the most common and the most harmless method for the food preservation as it does not alter the necessary micronutrients of food materials. Although irradiated food doesn’t cause any harm to the human health but still the quality assessment of food is required to provide consumers with necessary information about the food. ESR spectroscopy is the most sophisticated way to investigate the quality of the food and the free radicals induced during the processing of the food. ESR spin trapping technique is useful for the detection of highly unstable radicals in the food. The antioxidant capability of liquid food and beverages in mainly performed by spin trapping technique.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
aziz sancar nobel prize winner: from mardin to nobel
Not Only Statements: The Role of Textual Analysis in Software Quality
1. Not Only Statements:
The Role of Textual Analysis
in Software Quality
Rocco Oliveto
rocco.oliveto@unimol.it
University of Molise
2nd Workshop on Mining Unstructured Data
October 17th, 2012 - Kingston, Canada
8. Text is Software Too
Alexander Dekhtyar
Dept. Computer Science
University of Kentucky
dekhtyar@cs.uky.edu
Jane Hu↵man Hayes
Dept. Computer Science
University of Kentucky
hayes@cs.uky.edu
Tim Menzies
Dept. Computer Science,
Portland State University,
tim@menzies.us
Abstract
Software compiles and therefore is characterized by a
parseable grammar. Natural language text rarely conforms
to prescriptive grammars and therefore is much harder to
parse. Mining parseable structures is easier than mining
less structured entities. Therefore, most work on mining
repositories focuses on software, not natural language text.
Here, we report experiments with mining natural language
text (requirements documents) suggesting that: (a) mining
natural language is not too di cult, so (b) software repos-
itories should routinely be augmented with all the natural
language text used to develop that software.
1 Introduction
“I have seen the future of software engineering, and it
is......Text?”
Much of the work done in the past has focused on the
mining of software repositories that contain structured, eas-
ily parseable artifacts. Even when non-structured artifacts
existed (or portions of structured artifacts that were non-
structured), researchers ignored them. These items tended
to be ”exclusions from consideration” in research papers.
We argue that these non-structured artifacts are rich
in semantic information that cannot be extracted from
the nice-to-parse syntactic structures such as source code.
Much useful information can be obtained by treating text
as software, or at least, as part of the software repository,
and by developing techniques for its e cient mining.
To date, we have found that information retrieval (IR)
methods can be used to support the processing of textual
software artifacts. Specifically, these methods can be used
to facilitate the tracing of software artifacts to each other
(such as tracing design elements to requirements). We have
found that we can generate candidate links in an automated
fashion faster than humans; we can retrieve more true links
than humans; and we can allow the analyst to participate
in the process in a limited way and realize vast results im-
provements [10,11].
In this paper, we discuss:
• The kinds of text seen in software;
• Problems with using non-textual methods;
• The importance of early life cycle artifacts;
• The mining of software repositories with an emphasis
on natural language text; and
• Results from work that we have performed thus far on
mining of textual artifacts.
2 Text in Software Engineering
Textual artifacts associated with software can roughly
be partitioned into two large categories:
1. Text produced during the initial development and then
maintained, such as requirements, design specifica-
tions, user manuals and comments in the code;
2. Text produced after the software is fielded, such as
problem reports, reviews, messages posted to on-line
software user group forums, modification requests, etc.
Both categories of artifacts can help us analyze software
itself, although di↵erent approaches may be employed. In
this paper, we discuss how lifecycle development documents
can be used to mine traceability information for Indepen-
dent Validation & Verification (IV&V) analysts and how
artifacts (e.g., textual interface requirements) can be used
to study and predict software faults.
3 If not text..
One way to assess our proposal would be to assess what
can be learned from alternative representations. In the soft-
ware verification world, reasoning about two represenations
are common: formal models and static code measures.
A formal model has two parts: a system model and a
properties model. The system model describes how the pro-
gram can change the values of variables while the properties
model describes global invariants that must be maintained
when the system executes. Often, a temporal logic1
is used
1Temporal logic is classical logic augmented with some tem-
poral operators such as ⇤X (always X is true); ⌃X (eventually
X is true); X (X is true at the next time point); X
S
Y (X is
true until Y is true).
Non-structured artifacts are
rich in semantic information that
cannot be extracted from the
nice-to-parse syntactic
structures such as source code
...TA in SE...
9. traceability recovery (Antoniol et al. TSE 2002, Marcus and Maletic ICSE 2003)
change impact analysis (Canfora et al. Metrics 2005)
feature location (Poshyvanyk et al. TSE 2007)
program comprehension (Haiduc et al. ICSE 2010, Hindle et al. MSR 2011)
bug localization (Lo et al. ICSE 2012)
clone detection (Marcus et al ASE 2001)
...
Textual Analysis
Applications
13. ...process overview...
source code
entity
source code
entity
source code
entity
text
normalization
identifier
normalization
term
weighting
application
of NLP/IR
new
knwoledge
new
knwoledge
new
knwoledge
14. Textual Analysis to...
...measure class cohesion
Given a class
1. compute the textual similarity between all the
pairs of methods
2. compute the average texual similary (value
between 0 and 1)
3. the higher the similarity the higher the
cohesion
A. Marcus, D. Poshyvanyk, R. Ferenc: Using the Conceptual Cohesion of Classes for Fault Prediction in Object-
Oriented Systems. IEEETransanctions Software Engineering. 34(2): 287-300 (2008)
15. Textual Analysis to...
...measure class coupling
Given two classes A and B
1. compute the textual similarity between all
unordered pairs of methods from class A and
class B
2. compute the average texual similary (value
between 0 and 1)
3. the higher the similarity the higher the coupling
D. Poshyvanyk,A. Marcus, R. Ferenc,T. Gyimóthy: Using information retrieval based coupling measures for impact
analysis. Empirical Software Engineering 14(1): 5-32 (2009)
24. Class C
method-by-method
matrix construction
m1m2 ........ mn
m1
m2. . . . . . . .
mn
SSM CIM CSM
Structural Similarity
between Methods
Call-based Interaction
between Methods
Conceptual Similarity
between Methods
n methods
...the approach...
G. Bavota,A. De Lucia,A. Marcus, R. Oliveto:A two-step technique for extract class refactoring.ASE 2010: 151-154
G. Bavota,A. De Lucia, R. Oliveto: Identifying Extract Class refactoring opportunities using structural and semantic cohesion measures.
Journal of Systems and Software 84(3): 397-414 (2011)
25. public class UserManagement {
//String representing the table user in the database
private static final String TABLE_USER = "user";
//String representing the table teaching in the database
private static final String TABLE_TEACHING = "teaching";
/* Insert a new user in TABLE_USER */
public void insertUser(User pUser){
boolean check = checkMandatoryFieldsUser(pUser);
...
String sql = "INSERT INTO " + UserManagement.TABLE_USER + " ... ";
...
}
/* Update an existing user in TABLE_USER */
public void updateUser(User pUser){
boolean check = checkMandatoryFieldsUser(pUser);
...
String sql = "UPDATE " + UserManagement.TABLE_USER + " ... ";
...
}
/* Delete an existing user in TABLE_USER */
public void deleteUser(User pUser){
...
String sql = "DELETE FROM " + UserManagement.TABLE_USER + " ... ";
...
}
/* Verify if in TABLE_USER exists the user pUser */
public void existsUser(User pUser){
...
String sql = "SELECT FROM " + UserManagement.TABLE_USER + " ... ";
...
}
/* Check the mandatory fields in pUser */
public boolean checkMandatoryFieldsUser(User pUser){
...
}
/* Insert a new teaching in TABLE_TEACHING */
public void insertTeaching(Teaching pTeaching){
boolean check = checkMandatoryFieldsTeaching(pTeaching);
...
String sql = "INSERT INTO " + UserManagement.TABLE_TEACHING + " ... ";
...
}
/* Update an existing teaching in TABLE_TEACHING */
public void updateTeaching(Teaching pTeaching){
boolean check = checkMandatoryFieldsTeaching(pTeaching);
...
String sql = "UPDATE " + UserManagement.TABLE_TEACHING + " ... ";
...
}
/* Delete an existing teaching in TABLE_USER */
public void deleteTeaching(Teaching pTeaching){
...
String sql = "DELETE FROM " + UserManagement.TABLE_TEACHING + " ... ";
...
}
/* Check the mandatory fields in pTeaching */
public boolean checkMandatoryFieldsTeaching(Teaching pTeaching){
...
}
}
0 0 0 10.5 00 0.50
00 000 0100
0 00 0 0.5100 0
0
0
0
0
0.5
0
0
0
0
0
0
0
0
0
0
0
0
0
10 00 00
0 10.5 00.5 0
00 00 1 0
0 00 1 00
0.500 0 01
00.50001
CDM similarity
SSM similarity
CSM similarity
IU UU IT UT CT
IU
UU
DU
EU
CU
IT
method-by-method matrix
wCDM = 0.2
wSSM = 0.5
wCSM = 0.3
IU = insertUser - UU = updateUser - DU = deleteUser - EU = existsUser - CU = checkMandatoryFieldsUser
IT = insertTeaching - UT = updateTeaching - DU = deleteTeaching - CT = checkMandatoryFieldsTeaching
DU EU CU DT
UT
DT
CT
0 0 0 10 00 00
00 100 0110
0 10 0 0110 0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
10 00 00
0 10 00 0
01 11 1 0
1 01 1 01
011 1 01
001111
IU UU IT UT CT
IU
UU
DU
EU
CU
IT
DU EU CU DT
UT
DT
CT
0 0 0 10.5 0.20 0.30.1
00 0.300.1 0.210.40
0.1 0.30.1 0 0.510.50 0
0
0
0
0.1
0.5
0
0.4
0
0
0
0
0.1
0
0
0.1
0.5
0.1
0
10 00.2 00
0.1 10.2 00.1 0.1
0.10.3 0.30.5 1 0
0.3 01 0.10.5
0.10.30.7 0.4 01
0.20.20.50.50.71
IU UU IT UT CT
IU
UU
DU
EU
CU
IT
DU EU CU DT
UT
DT
CT
0 0 0 10.3 0.10 0.30
00 0.600 0.110.60
0 0.60 0 0.310.70 0
0
0
0
0
0.3
0
0.6
0
0
0
0
0
0
0
0
0.7
0
0
10 00.1 00
0 10.2 00.1 0
00.6 0.60.7 1 0
0.6 00.6 1 00.7
0.10.60.7 0.6 01
0.10.20.70.70.71
IU UU IT UT CT
IU
UU
DU
EU
CU
IT
DU EU CU DT
UT
DT
CT
26. DU
UU
CU
IU
0.6
0.7
Candidate Chain C1
Candidate Chain C2
Trivial Chain T1
UUIU DU
Candidate Class C1
DTIT UT CT
Candidate Class C2
EU
Method-by-method Relationships before Filtering Method-by-method Relationships after Filtering Proposed Refactoring
0.7
EU
0.7
0.2
IT
0.1
0.6
0.1
0.6
UT
DT
CT
0.7
0.6
0.3
0.6
0.3
0.1
DU
UU
CU
IU
0.6
0.7
0.7
EU
0.7
IT
0.6
0.6
UT
DT
CT
0.7
0.6
0.3
0.6
0.3
CU
method-by-method matrix
after transitive closure
proposed refactoring
...the approach...
27. DU
UU
CU
IU
0.6
0.7
Candidate Chain C1
Candidate Chain C2
Trivial Chain T1
UUIU DU
Candidate Class C1
DTIT UT CT
Candidate Class C2
EU
Method-by-method Relationships before Filtering Method-by-method Relationships after Filtering Proposed Refactoring
0.7
EU
0.7
0.2
IT
0.1
0.6
0.1
0.6
UT
DT
CT
0.7
0.6
0.3
0.6
0.3
0.1
DU
UU
CU
IU
0.6
0.7
0.7
EU
0.7
IT
0.6
0.6
UT
DT
CT
0.7
0.6
0.3
0.6
0.3
CU
method-by-method matrix
after transitive closure
proposed refactoring
...the approach...
Conceptual cohesion plays a crucial role
Refactoring operations make
sense for developers
28. The developer point of view...
Do measures reflect the quality perceived by developers?
29. ...the study...
How does class coupling align
with developers’ perception of coupling?
Four types of source of information
structural
dynamic
semantic
historical
The study involved 90 subjects
G. Bavota, B. Dit, R. Oliveto, M. Di Penta, D. Poshynanyk,A. De Lucia.An Empirical Study on the Developers'
Perception of Software Coupling. Submitted to ICSE 2013.
30. ...take away...
Coupling cannot be captured and measured using only
structural information, such as method calls
Different sourceS of information are needed
Semantic coupling seems to reflect the developers’ mental
model when identifying interaction between entities
Semantic coupling is able to capture “latent coupling
relationships” incapsulated in identifiers and comments
33. ...the study...
QALP Score: the similarity between a module’s
comment and its code
Used to evaluate the quality of source code but it can
be also used to predict faults
0.0
0.2
0.4
0.6
0.8
1.0
0 2 4 6 8 10 12 14
QALPScore
Defect Count
Mozilla
MP
Figure 2. Maximum QALP score per defect
count for both programs.
Second, many of the com
used to make up for a lack of
outward looking. In the firs
that are not easily understoo
are required to explain the c
ments are intended for users
internal functionality of the
and comments have few wor
low QALP score. For examp
shows an example of both ty
determines whether there is
contained in the variable m
clear from the called functi
it is simply a whitespace te
the reader of this; thus, the c
D. Binkley, H. Feild, D. Lawrie, and M. Pighin,“Software fault prediction using language processing,” in Proceedings
of theTesting:Academic and Industrial Conference Practice and ResearchTechniques, 2007, pp. 99–110.
34. Inconsistent naming...
path? Is it a relative path or an absolute path?
And what about if it is used as both relative and absolute?
35. ...the study...
Term entropy: the physical dispersion of terms in a
program.The higher the entropy, the more scattered
across the program the terms
Context coverage: the conceptual dispersion of terms.
The higher their context coverage, the more unrelated the
methods using them
The use of identical terms in different
contexts may increase the risk of faults
V.Arnaoudova, L. M. Eshkevari, R. Oliveto,Y.-G. Guéhéneuc, G.Antoniol: Physical and conceptual identifier
dispersion: Measures and relation to fault proneness. ICSM 2010: 1-5
36. ...take away...
Term entropy and context coverage only
partially correlate with size
The number of high entropy and high context coverage
terms contained in a method or attribute helps to explain
the probability of it being faulty
If a Rhino (ArgoUML) method contains an identifier with a
term having high entropy and high context its probability of
being faulty is six (two) times higher
see also
S. Lemma Abebe,V.Arnaoudova, P.Tonella, G.Antoniol andY.-G. Guéhéneuc.
Can Lexicon Bad Smells improve fault prediction? WCRE 2013
39. How to induce
developers to use
meaningful identifiers?
40. Reverse engineering, used with
evolving software development
technologies, will provide
significant incremental
enhancements to our productivity
41. Reverse engineering, used
evolving software development
technologies
significant incremental
enhancements to our productivity
Continuous
Textual Analysis
42. COCONUT...
1. The Administrator activates the add member function in the terminal of the system
and correctly enters his login and password identifying him as an Administrator.
2. The system responds by presenting a form to the Administrator on a terminal
screen. The form includes the first and last name, the address, and contact
information (phone, email and fax) of the customer, as well as the fidelity index.
The fidelity index can be: New Member, Silver Member, and Gold Member. After
50 rentals the member is considered as Silver Member, while after 150 rentals the
member becomes a Gold Member. The system also displays the membership fee
to be paid.
3. The Administrator fills the form and then confirms all the requested form
information is correct.
addmember.txt
45. COCONUT...
1. The Administrator activates the add member function in the terminal
of the system and correctly enters his login and password identifying
him as an Administrator.
2. The system responds by presenting a form to the Administrator on a
terminal screen. The form includes the first and last name, the
address, and contact information (phone, email and fax) of the
customer, as well as the fidelity index. The fidelity index can be: New
Member, Silver Member, and Gold Member. After 50 rentals the
member is considered as Silver Member, while after 150 rentals the
member is a Gold Member. The system also displays the
membership fee to be paid.
3. The Administrator fills the form and then confirms all the requested
form information is correct.
addmember.txt
51. Good Query Bad Query
# Method Class Score
1 insertUser
Manager
User
0.99
2 deleteUser
Manager
User
0.95
3 assignUser
Manager
Role
0.88
4 util Utility 0.84
5 getUsers
Manager
User
0.79
52. Good Query Bad Query
# Method Class Score
1 insertUser
Manager
User
0.99
2 deleteUser
Manager
User
0.95
3 assignUser
Manager
Role
0.88
4 util Utility 0.84
5 getUsers
Manager
User
0.79
Useful results on
top of the list
53. Good Query Bad Query
# Method Class Score
1 insertUser
Manager
User
0.99
2 deleteUser
Manager
User
0.95
3 assignUser
Manager
Role
0.88
4 util Utility 0.84
5 getUsers
Manager
User
0.79
# Method Class Score
1 util Utility 0.93
2 dbConnect
Manager
Db
0.90
3 insertUser
Manager
User
0.86
4 networking Utility 0.76
5 loadRs
Manager
Db
0.73
False positives on
top of the list
Useful results on
top of the list
54. How to use query
assessment for
improving code
vocabulary?
60. ...problems...
how to remove the noise in source code?
which elements should be indexed?
identifier splitting and expansion
task-based pre-processing
62. ...problems...
how to set the parameters of some
technqiues (e.g., LSI)?
do we need customized versions of NLP/IR
techniques?
are the different techniques equivalent?
task-specific techniques?
65. Linguistic
Common practices, from linguistic aspect, in the source code that
decrease the quality of the software (Arnaoudova WCRE 2010)
How to define linguistic antipatterns?
How to identify them?
Which is the impact of linguistic antipatterns
on software development and maintenance?
How to prevent linguistic antipatterns?
67. 0 0
0 00 0
00 0
01 10 1
1 1 1
1 1 1
0 0 0 01 1 1
0
Software
Can textual analysis be used during
test case selection?
Can textual analysis be used to improve
search-based test case generation?
Can textual analysis be used to capture
testing complexity of source code?