Due to the intangible nature of “software”, accurate and reliable software effort estimation is a challenge in the software Industry. It is unlikely to expect very accurate estimates of software
development effort because of the inherent uncertainty in software development projects and the complex and dynamic interaction of factors that impact software development. Heterogeneity exists in the software engineering datasets because data is made available from diverse sources.
This can be reduced by defining certain relationship between the data values by classifying them into different clusters. This study focuses on how the combination of clustering and
regression techniques can reduce the potential problems in effectiveness of predictive efficiency due to heterogeneity of the data. Using a clustered approach creates the subsets of data having a degree of homogeneity that enhances prediction accuracy. It was also observed in this study that ridge regression performs better than other regression techniques used in the analysis.
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...IJERA Editor
Cost estimating at schematic design stage as the basis of project evaluation, engineering design, and cost
management, plays an important role in project decision under a limited definition of scope and constraints in
available information and time, and the presence of uncertainties. The purpose of this study is to compare the
performance of cost estimation models of two different hybrid artificial intelligence approaches: regression
analysis-adaptive neuro fuzzy inference system (RANFIS) and case based reasoning-genetic algorithm (CBRGA)
techniques. The models were developed based on the same 50 low-cost apartment project datasets in
Indonesia. Tested on another five testing data, the models were proven to perform very well in term of accuracy.
A CBR-GA model was found to be the best performer but suffered from disadvantage of needing 15 cost drivers
if compared to only 4 cost drivers required by RANFIS for on-par performance.
Critical Paths Identification on Fuzzy Network Projectiosrjce
In this paper, a new approach for identifying fuzzy critical path is presented, based on converting the
fuzzy network project into deterministic network project, by transforming the parameters set of the fuzzy
activities into the time probability density function PDF of each fuzzy time activity. A case study is considered as
a numerical tested problem to demonstrate our approach.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization stratagem. This evolutionary technique always aims to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as
automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic
clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization
stratagem. This evolutionary technique always aims
to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to
detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal
values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance
of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
Min-based qualitative possibilistic networks are one of the effective tools for a compact representation of decision problems under uncertainty. The exact approaches for computing decision based on possibilistic networks are limited by the size of the possibility distributions.
Generally, these approaches are based on possibilistic propagation algorithms. An important step in the computation of the decision is the transformation of the DAG into a secondary structure, known as the junction trees. This transformation is known to be costly and represents a difficult problem. We propose in this paper a new approximate approach for the computation
of decision under uncertainty within possibilistic networks. The computing of the optimal optimistic decision no longer goes through the junction tree construction step. Instead, it is performed by calculating the degree of normalization in the moral graph resulting from the merging of the possibilistic network codifying knowledge of the agent and that codifying its preferences.
Using particle swarm optimization to solve test functions problemsriyaniaes
In this paper the benchmarking functions are used to evaluate and check the particle swarm optimization (PSO) algorithm. However, the functions utilized have two dimension but they selected with different difficulty and with different models. In order to prove capability of PSO, it is compared with genetic algorithm (GA). Hence, the two algorithms are compared in terms of objective functions and the standard deviation. Different runs have been taken to get convincing results and the parameters are chosen properly where the Matlab software is used. Where the suggested algorithm can solve different engineering problems with different dimension and outperform the others in term of accuracy and speed of convergence.
The pertinent single-attribute-based classifier for small datasets classific...IJECEIAES
Classifying a dataset using machine learning algorithms can be a big challenge when the target is a small dataset. The OneR classifier can be used for such cases due to its simplicity and efficiency. In this paper, we revealed the power of a single attribute by introducing the pertinent single-attributebased-heterogeneity-ratio classifier (SAB-HR) that used a pertinent attribute to classify small datasets. The SAB-HR’s used feature selection method, which used the Heterogeneity-Ratio (H-Ratio) measure to identify the most homogeneous attribute among the other attributes in the set. Our empirical results on 12 benchmark datasets from a UCI machine learning repository showed that the SAB-HR classifier significantly outperformed the classical OneR classifier for small datasets. In addition, using the H-Ratio as a feature selection criterion for selecting the single attribute was more effectual than other traditional criteria, such as Information Gain (IG) and Gain Ratio (GR).
Comparison of Cost Estimation Methods using Hybrid Artificial Intelligence on...IJERA Editor
Cost estimating at schematic design stage as the basis of project evaluation, engineering design, and cost
management, plays an important role in project decision under a limited definition of scope and constraints in
available information and time, and the presence of uncertainties. The purpose of this study is to compare the
performance of cost estimation models of two different hybrid artificial intelligence approaches: regression
analysis-adaptive neuro fuzzy inference system (RANFIS) and case based reasoning-genetic algorithm (CBRGA)
techniques. The models were developed based on the same 50 low-cost apartment project datasets in
Indonesia. Tested on another five testing data, the models were proven to perform very well in term of accuracy.
A CBR-GA model was found to be the best performer but suffered from disadvantage of needing 15 cost drivers
if compared to only 4 cost drivers required by RANFIS for on-par performance.
Critical Paths Identification on Fuzzy Network Projectiosrjce
In this paper, a new approach for identifying fuzzy critical path is presented, based on converting the
fuzzy network project into deterministic network project, by transforming the parameters set of the fuzzy
activities into the time probability density function PDF of each fuzzy time activity. A case study is considered as
a numerical tested problem to demonstrate our approach.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization stratagem. This evolutionary technique always aims to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
Automatic Unsupervised Data Classification Using Jaya Evolutionary Algorithmaciijournal
In this paper we attempt to solve an automatic clustering problem by optimizing multiple objectives such as
automatic k-determination and a set of cluster validity indices concurrently. The proposed automatic
clustering technique uses the most recent optimization algorithm Jaya as an underlying optimization
stratagem. This evolutionary technique always aims
to attain global best solution rather than a local best solution in larger datasets. The explorations and exploitations imposed on the proposed work results to
detect the number of automatic clusters, appropriate partitioning present in data sets and mere optimal
values towards CVIs frontiers. Twelve datasets of different intricacy are used to endorse the performance
of aimed algorithm. The experiments lay bare that the conjectural advantages of multi objective clustering optimized with evolutionary approaches decipher into realistic and scalable performance paybacks.
Min-based qualitative possibilistic networks are one of the effective tools for a compact representation of decision problems under uncertainty. The exact approaches for computing decision based on possibilistic networks are limited by the size of the possibility distributions.
Generally, these approaches are based on possibilistic propagation algorithms. An important step in the computation of the decision is the transformation of the DAG into a secondary structure, known as the junction trees. This transformation is known to be costly and represents a difficult problem. We propose in this paper a new approximate approach for the computation
of decision under uncertainty within possibilistic networks. The computing of the optimal optimistic decision no longer goes through the junction tree construction step. Instead, it is performed by calculating the degree of normalization in the moral graph resulting from the merging of the possibilistic network codifying knowledge of the agent and that codifying its preferences.
Using particle swarm optimization to solve test functions problemsriyaniaes
In this paper the benchmarking functions are used to evaluate and check the particle swarm optimization (PSO) algorithm. However, the functions utilized have two dimension but they selected with different difficulty and with different models. In order to prove capability of PSO, it is compared with genetic algorithm (GA). Hence, the two algorithms are compared in terms of objective functions and the standard deviation. Different runs have been taken to get convincing results and the parameters are chosen properly where the Matlab software is used. Where the suggested algorithm can solve different engineering problems with different dimension and outperform the others in term of accuracy and speed of convergence.
The pertinent single-attribute-based classifier for small datasets classific...IJECEIAES
Classifying a dataset using machine learning algorithms can be a big challenge when the target is a small dataset. The OneR classifier can be used for such cases due to its simplicity and efficiency. In this paper, we revealed the power of a single attribute by introducing the pertinent single-attributebased-heterogeneity-ratio classifier (SAB-HR) that used a pertinent attribute to classify small datasets. The SAB-HR’s used feature selection method, which used the Heterogeneity-Ratio (H-Ratio) measure to identify the most homogeneous attribute among the other attributes in the set. Our empirical results on 12 benchmark datasets from a UCI machine learning repository showed that the SAB-HR classifier significantly outperformed the classical OneR classifier for small datasets. In addition, using the H-Ratio as a feature selection criterion for selecting the single attribute was more effectual than other traditional criteria, such as Information Gain (IG) and Gain Ratio (GR).
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"IJDKP
Clustering is one of the data mining techniques that have been around to discover business intelligence by grouping objects into clusters using a similarity measure. Clustering is an unsupervised learning process that has many utilities in real time applications in the fields of marketing, biology, libraries, insurance, city-planning, earthquake studies and document clustering. Latent trends and relationships among data objects can be unearthed using clustering algorithms. Many clustering algorithms came into existence. However, the quality of clusters has to be given paramount importance. The quality objective is to achieve
highest similarity between objects of same cluster and lowest similarity between objects of different clusters. In this context, we studied two widely used clustering algorithms such as the K-Means and Fuzzy K-Means. K-Means is an exclusive clustering algorithm while the Fuzzy K-Means is an overlapping clustering algorithm. In this paper we prove the hypothesis “Fuzzy K-Means is better than K-Means for Clustering” through both literature and empirical study. We built a prototype application to demonstrate the differences between the two clustering algorithms. The experiments are made on diabetes dataset
obtained from the UCI repository. The empirical results reveal that the performance of Fuzzy K-Means is better than that of K-means in terms of quality or accuracy of clusters. Thus, our empirical study proved the hypothesis “Fuzzy K-Means is better than K-Means for Clustering”.
Illustration of Medical Image Segmentation based on Clustering Algorithmsrahulmonikasharma
Image segmentation is the most basic and crucial process remembering the true objective to facilitate the characterization and representation of the structure of excitement for medical or basic images. Despite escalated research, segmentation remains a challenging issue because of the differing image content, cluttered objects, occlusion, non-uniform object surface, and different factors. There are numerous calculations and techniques accessible for image segmentation yet at the same time there requirements to build up an efficient, quick technique of medical image segmentation. This paper has focused on K-means and Fuzzy C means clustering algorithm to segment malaria blood samples in more accurate manner.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Performance Analysis of Genetic Algorithm as a Stochastic Optimization Tool i...paperpublications3
Abstract: Engineering design problems are complex by nature because of their critical objective functions involving many variables and Constraints. Engineers have to ensure the compatibility with the imposed specifications keeping the manufacturing costs low. Moreover, the methodology may vary according to the design problem.
The main issue is to choose the proper tool for optimization. In the earlier days, a design problem was optimized by some of the conventional optimization techniques like gradient Search, evolutionary optimization, random search etc. These are known as classical methods.
The method is to be properly Chosen depending on the nature of the problem- an incorrect choice may sometimes fail to give the optimal solution. So the methods are less robust.
Now-a-days soft-computing techniques are being widely used for optimizing a function. These are more robust. Genetic algorithm is one such method. It is an effective tool in the realm of stochastic optimization (non-classical). The algorithm produces many strings and generation to reach the optimal point.
The main objective of the paper is to optimize engineering design problems using Genetic Algorithm and to analyze how the algorithm reaches the optima effectively and closely. We choose a mathematical expression for the objective function in terms of the design variables and optimize the same under given constraints using GA.
Improving K-NN Internet Traffic Classification Using Clustering and Principle...journalBEEI
K-Nearest Neighbour (K-NN) is one of the popular classification algorithm, in this research K-NN use to classify internet traffic, the K-NN is appropriate for huge amounts of data and have more accurate classification, K-NN algorithm has a disadvantages in computation process because K-NN algorithm calculate the distance of all existing data in dataset. Clustering is one of the solution to conquer the K-NN weaknesses, clustering process should be done before the K-NN classification process, the clustering process does not need high computing time to conqest the data which have same characteristic, Fuzzy C-Mean is the clustering algorithm used in this research. The Fuzzy C-Mean algorithm no need to determine the first number of clusters to be formed, clusters that form on this algorithm will be formed naturally based datasets be entered. The Fuzzy C-Mean has weakness in clustering results obtained are frequently not same even though the input of dataset was same because the initial dataset that of the Fuzzy C-Mean is less optimal, to optimize the initial datasets needs feature selection algorithm. Feature selection is a method to produce an optimum initial dataset Fuzzy C-Means. Feature selection algorithm in this research is Principal Component Analysis (PCA). PCA can reduce non significant attribute or feature to create optimal dataset and can improve performance for clustering and classification algorithm. The resultsof this research is the combination method of classification, clustering and feature selection of internet traffic dataset was successfully modeled internet traffic classification method that higher accuracy and faster performance.
Study of relevancy, diversity, and novelty in recommender systemsChemseddine Berbague
In the next slides, we present our approach to tackling the conflicting recommendation quality in recommender systems using a genetic-based clustering algorithm. In our approach, we studied the users' tendencies toward diversity and proposed a pairwise similarity measure to amount it. Later, we used the new similarity within a fitness function to create overlapped clusters and to recommend balanced recommendations in terms of diversity and relevancy.
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTIONIJCSEA Journal
The rapid development of computer networks around the world generated new areas especially in computer instruction processing. In grid computing, instruction processing is performed by external processors available to the system. An important topic in this area is task scheduling to available external resources. However, we do not deal with this topic here. In this paper we intend to work on strategic decision making on selecting the best alternative resources for processing instructions with respect to criteria in special conditions. Where the criteria might be security, political, technical, cost, etc. Grid computing should be determined with respect to the processing objectives of instructions of a program. This paper seeks a way through combining Analytic Hierarchy Process (AHP) and Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) to help us in ranking and selecting available resources according to considerable criteria in allocating instructions to resources. Therefore, our findings will help technical managers of organizations in choosing as well as ranking candidate alternatives for processing program instructions.
Opinion mining framework using proposed RB-bayes model for text classicationIJECEIAES
Information mining is a capable idea with incredible potential to anticipate future patterns and conduct. It alludes to the extraction of concealed information from vast data sets by utilizing procedures like factual examination, machine learning, grouping, neural systems and genetic algorithms. In naive baye’s, there exists a problem of zero likelihood. This paper proposed RB-Bayes method based on baye’s theorem for prediction to remove problem of zero likelihood. We also compare our method with few existing methods i.e. naive baye’s and SVM. We demonstrate that this technique is better than some current techniques and specifically can analyze data sets in better way. At the point when the proposed approach is tried on genuine data-sets, the outcomes got improved accuracy in most cases. RB-Bayes calculation having precision 83.333.
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...IJECEIAES
Data analysis plays a prominent role in interpreting various phenomena. Data mining is the process to hypothesize useful knowledge from the extensive data. Based upon the classical statistical prototypes the data can be exploited beyond the storage and management of the data. Cluster analysis a primary investigation with little or no prior knowledge, consists of research and development across a wide variety of communities. Cluster ensembles are melange of individual solutions obtained from different clusterings to produce final quality clustering which is required in wider applications. The method arises in the perspective of increasing robustness, scalability and accuracy. This paper gives a brief overview of the generation methods and consensus functions included in cluster ensemble. The survey is to analyze the various techniques and cluster ensemble methods.
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...ijcsit
Enterprise financial distress or failure includes bankruptcy prediction, financial distress, corporate performance prediction and credit risk estimation. The aim of this paper is that using wavelet networks innon-linear combination prediction to solve ARMA (Auto-Regressive and Moving Average) model problem.ARMA model need estimate the value of all parameters in the model, it has a large amount of computation.Under this aim, the paper provides an extensive review of Wavelet networks and Logistic regression. Itdiscussed the Wavelet neural network structure, Wavelet network model training algorithm, Accuracy rateand error rate (accuracy of classification, Type I error, and Type II error). The main research opportunity exist a proposed of business failure prediction model (wavelet network model and logistic regression
model). The empirical research which is comparison of Wavelet Network and Logistic Regression on training and forecasting sample, the result shows that this wavelet network model is high accurate and the overall prediction accuracy, Type Ⅰerror and Type Ⅱ error, wavelet networks model is better thanlogistic regression model.
Threshold benchmarking for feature ranking techniquesjournalBEEI
In prediction modeling, the choice of features chosen from the original feature set is crucial for accuracy and model interpretability. Feature ranking techniques rank the features by its importance but there is no consensus on the number of features to be cut-off. Thus, it becomes important to identify a threshold value or range, so as to remove the redundant features. In this work, an empirical study is conducted for identification of the threshold benchmark for feature ranking algorithms. Experiments are conducted on Apache Click dataset with six popularly used ranker techniques and six machine learning techniques, to deduce a relationship between the total number of input features (N) to the threshold range. The area under the curve analysis shows that ≃ 33-50% of the features are necessary and sufficient to yield a reasonable performance measure, with a variance of 2%, in defect prediction models. Further, we also find that the log2(N) as the ranker threshold value represents the lower limit of the range.
A h k clustering algorithm for high dimensional data using ensemble learningijitcs
Advances made to the traditional clustering algorithms solves the various problems such as curse of
dimensionality and sparsity of data for multiple attributes. The traditional H-K clustering algorithm can
solve the randomness and apriority of the initial centers of K-means clustering algorithm. But when we
apply it to high dimensional data it causes the dimensional disaster problem due to high computational
complexity. All the advanced clustering algorithms like subspace and ensemble clustering algorithms
improve the performance for clustering high dimension dataset from different aspects in different extent.
Still these algorithms will improve the performance form a single perspective. The objective of the
proposed model is to improve the performance of traditional H-K clustering and overcome the limitations
such as high computational complexity and poor accuracy for high dimensional data by combining the
three different approaches of clustering algorithm as subspace clustering algorithm and ensemble
clustering algorithm with H-K clustering algorithm.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"IJDKP
Clustering is one of the data mining techniques that have been around to discover business intelligence by grouping objects into clusters using a similarity measure. Clustering is an unsupervised learning process that has many utilities in real time applications in the fields of marketing, biology, libraries, insurance, city-planning, earthquake studies and document clustering. Latent trends and relationships among data objects can be unearthed using clustering algorithms. Many clustering algorithms came into existence. However, the quality of clusters has to be given paramount importance. The quality objective is to achieve
highest similarity between objects of same cluster and lowest similarity between objects of different clusters. In this context, we studied two widely used clustering algorithms such as the K-Means and Fuzzy K-Means. K-Means is an exclusive clustering algorithm while the Fuzzy K-Means is an overlapping clustering algorithm. In this paper we prove the hypothesis “Fuzzy K-Means is better than K-Means for Clustering” through both literature and empirical study. We built a prototype application to demonstrate the differences between the two clustering algorithms. The experiments are made on diabetes dataset
obtained from the UCI repository. The empirical results reveal that the performance of Fuzzy K-Means is better than that of K-means in terms of quality or accuracy of clusters. Thus, our empirical study proved the hypothesis “Fuzzy K-Means is better than K-Means for Clustering”.
Illustration of Medical Image Segmentation based on Clustering Algorithmsrahulmonikasharma
Image segmentation is the most basic and crucial process remembering the true objective to facilitate the characterization and representation of the structure of excitement for medical or basic images. Despite escalated research, segmentation remains a challenging issue because of the differing image content, cluttered objects, occlusion, non-uniform object surface, and different factors. There are numerous calculations and techniques accessible for image segmentation yet at the same time there requirements to build up an efficient, quick technique of medical image segmentation. This paper has focused on K-means and Fuzzy C means clustering algorithm to segment malaria blood samples in more accurate manner.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Performance Analysis of Genetic Algorithm as a Stochastic Optimization Tool i...paperpublications3
Abstract: Engineering design problems are complex by nature because of their critical objective functions involving many variables and Constraints. Engineers have to ensure the compatibility with the imposed specifications keeping the manufacturing costs low. Moreover, the methodology may vary according to the design problem.
The main issue is to choose the proper tool for optimization. In the earlier days, a design problem was optimized by some of the conventional optimization techniques like gradient Search, evolutionary optimization, random search etc. These are known as classical methods.
The method is to be properly Chosen depending on the nature of the problem- an incorrect choice may sometimes fail to give the optimal solution. So the methods are less robust.
Now-a-days soft-computing techniques are being widely used for optimizing a function. These are more robust. Genetic algorithm is one such method. It is an effective tool in the realm of stochastic optimization (non-classical). The algorithm produces many strings and generation to reach the optimal point.
The main objective of the paper is to optimize engineering design problems using Genetic Algorithm and to analyze how the algorithm reaches the optima effectively and closely. We choose a mathematical expression for the objective function in terms of the design variables and optimize the same under given constraints using GA.
Improving K-NN Internet Traffic Classification Using Clustering and Principle...journalBEEI
K-Nearest Neighbour (K-NN) is one of the popular classification algorithm, in this research K-NN use to classify internet traffic, the K-NN is appropriate for huge amounts of data and have more accurate classification, K-NN algorithm has a disadvantages in computation process because K-NN algorithm calculate the distance of all existing data in dataset. Clustering is one of the solution to conquer the K-NN weaknesses, clustering process should be done before the K-NN classification process, the clustering process does not need high computing time to conqest the data which have same characteristic, Fuzzy C-Mean is the clustering algorithm used in this research. The Fuzzy C-Mean algorithm no need to determine the first number of clusters to be formed, clusters that form on this algorithm will be formed naturally based datasets be entered. The Fuzzy C-Mean has weakness in clustering results obtained are frequently not same even though the input of dataset was same because the initial dataset that of the Fuzzy C-Mean is less optimal, to optimize the initial datasets needs feature selection algorithm. Feature selection is a method to produce an optimum initial dataset Fuzzy C-Means. Feature selection algorithm in this research is Principal Component Analysis (PCA). PCA can reduce non significant attribute or feature to create optimal dataset and can improve performance for clustering and classification algorithm. The resultsof this research is the combination method of classification, clustering and feature selection of internet traffic dataset was successfully modeled internet traffic classification method that higher accuracy and faster performance.
Study of relevancy, diversity, and novelty in recommender systemsChemseddine Berbague
In the next slides, we present our approach to tackling the conflicting recommendation quality in recommender systems using a genetic-based clustering algorithm. In our approach, we studied the users' tendencies toward diversity and proposed a pairwise similarity measure to amount it. Later, we used the new similarity within a fitness function to create overlapped clusters and to recommend balanced recommendations in terms of diversity and relevancy.
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTIONIJCSEA Journal
The rapid development of computer networks around the world generated new areas especially in computer instruction processing. In grid computing, instruction processing is performed by external processors available to the system. An important topic in this area is task scheduling to available external resources. However, we do not deal with this topic here. In this paper we intend to work on strategic decision making on selecting the best alternative resources for processing instructions with respect to criteria in special conditions. Where the criteria might be security, political, technical, cost, etc. Grid computing should be determined with respect to the processing objectives of instructions of a program. This paper seeks a way through combining Analytic Hierarchy Process (AHP) and Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) to help us in ranking and selecting available resources according to considerable criteria in allocating instructions to resources. Therefore, our findings will help technical managers of organizations in choosing as well as ranking candidate alternatives for processing program instructions.
Opinion mining framework using proposed RB-bayes model for text classicationIJECEIAES
Information mining is a capable idea with incredible potential to anticipate future patterns and conduct. It alludes to the extraction of concealed information from vast data sets by utilizing procedures like factual examination, machine learning, grouping, neural systems and genetic algorithms. In naive baye’s, there exists a problem of zero likelihood. This paper proposed RB-Bayes method based on baye’s theorem for prediction to remove problem of zero likelihood. We also compare our method with few existing methods i.e. naive baye’s and SVM. We demonstrate that this technique is better than some current techniques and specifically can analyze data sets in better way. At the point when the proposed approach is tried on genuine data-sets, the outcomes got improved accuracy in most cases. RB-Bayes calculation having precision 83.333.
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...IJECEIAES
Data analysis plays a prominent role in interpreting various phenomena. Data mining is the process to hypothesize useful knowledge from the extensive data. Based upon the classical statistical prototypes the data can be exploited beyond the storage and management of the data. Cluster analysis a primary investigation with little or no prior knowledge, consists of research and development across a wide variety of communities. Cluster ensembles are melange of individual solutions obtained from different clusterings to produce final quality clustering which is required in wider applications. The method arises in the perspective of increasing robustness, scalability and accuracy. This paper gives a brief overview of the generation methods and consensus functions included in cluster ensemble. The survey is to analyze the various techniques and cluster ensemble methods.
COMPARISON OF WAVELET NETWORK AND LOGISTIC REGRESSION IN PREDICTING ENTERPRIS...ijcsit
Enterprise financial distress or failure includes bankruptcy prediction, financial distress, corporate performance prediction and credit risk estimation. The aim of this paper is that using wavelet networks innon-linear combination prediction to solve ARMA (Auto-Regressive and Moving Average) model problem.ARMA model need estimate the value of all parameters in the model, it has a large amount of computation.Under this aim, the paper provides an extensive review of Wavelet networks and Logistic regression. Itdiscussed the Wavelet neural network structure, Wavelet network model training algorithm, Accuracy rateand error rate (accuracy of classification, Type I error, and Type II error). The main research opportunity exist a proposed of business failure prediction model (wavelet network model and logistic regression
model). The empirical research which is comparison of Wavelet Network and Logistic Regression on training and forecasting sample, the result shows that this wavelet network model is high accurate and the overall prediction accuracy, Type Ⅰerror and Type Ⅱ error, wavelet networks model is better thanlogistic regression model.
Threshold benchmarking for feature ranking techniquesjournalBEEI
In prediction modeling, the choice of features chosen from the original feature set is crucial for accuracy and model interpretability. Feature ranking techniques rank the features by its importance but there is no consensus on the number of features to be cut-off. Thus, it becomes important to identify a threshold value or range, so as to remove the redundant features. In this work, an empirical study is conducted for identification of the threshold benchmark for feature ranking algorithms. Experiments are conducted on Apache Click dataset with six popularly used ranker techniques and six machine learning techniques, to deduce a relationship between the total number of input features (N) to the threshold range. The area under the curve analysis shows that ≃ 33-50% of the features are necessary and sufficient to yield a reasonable performance measure, with a variance of 2%, in defect prediction models. Further, we also find that the log2(N) as the ranker threshold value represents the lower limit of the range.
A h k clustering algorithm for high dimensional data using ensemble learningijitcs
Advances made to the traditional clustering algorithms solves the various problems such as curse of
dimensionality and sparsity of data for multiple attributes. The traditional H-K clustering algorithm can
solve the randomness and apriority of the initial centers of K-means clustering algorithm. But when we
apply it to high dimensional data it causes the dimensional disaster problem due to high computational
complexity. All the advanced clustering algorithms like subspace and ensemble clustering algorithms
improve the performance for clustering high dimension dataset from different aspects in different extent.
Still these algorithms will improve the performance form a single perspective. The objective of the
proposed model is to improve the performance of traditional H-K clustering and overcome the limitations
such as high computational complexity and poor accuracy for high dimensional data by combining the
three different approaches of clustering algorithm as subspace clustering algorithm and ensemble
clustering algorithm with H-K clustering algorithm.
In the present paper, applicability and
capability of A.I techniques for effort estimation prediction has
been investigated. It is seen that neuro fuzzy models are very
robust, characterized by fast computation, capable of handling
the distorted data. Due to the presence of data non-linearity, it is
an efficient quantitative tool to predict effort estimation. The one
hidden layer network has been developed named as OHLANFIS
using MATLAB simulation environment.
Here the initial parameters of the OHLANFIS are
identified using the subtractive clustering method. Parameters of
the Gaussian membership function are optimally determined
using the hybrid learning algorithm. From the analysis it is seen
that the Effort Estimation prediction model developed using
OHLANFIS technique has been able to perform well over normal
ANFIS Model.
A simplified predictive framework for cost evaluation to fault assessment usi...IJECEIAES
Software engineering is an integral part of any software development scheme which frequently encounters bugs, errors, and faults. Predictive evaluation of software fault contributes towards mitigating this challenge to a large extent; however, there is no benchmarked framework being reported in this case yet. Therefore, this paper introduces a computational framework of the cost evaluation method to facilitate a better form of predictive assessment of software faults. Based on lines of code, the proposed scheme deploys adopts a machine-learning approach to address the perform predictive analysis of faults. The proposed scheme presents an analytical framework of the correlation-based cost model integrated with multiple standards machine learning (ML) models, e.g., linear regression, support vector regression, and artificial neural networks (ANN). These learning models are executed and trained to predict software faults with higher accuracy. The study considers assessing the outcomes based on error-based performance metrics in detail to determine how well each learning model performs and how accurate it is at learning. It also looked at the factors contributing to the training loss of neural networks. The validation result demonstrates that, compared to logistic regression and support vector regression, neural network achieves a significantly lower error score for software fault prediction.
Finding Relationships between the Our-NIR Cluster ResultsCSCJournals
The problem of evaluating node importance in clustering has been active research in present days and many methods have been developed. Most of the clustering algorithms deal with general similarity measures. However In real situation most of the cases data changes over time. But clustering this type of data not only decreases the quality of clusters but also disregards the expectation of users, when usually require recent clustering results. In this regard we proposed Our-NIR method that is better than Ming-Syan Chen proposed a method and it has proven with the help of results of node importance, which is related to calculate the node importance that is very useful in clustering of categorical data, still it has deficiency that is importance of data labeling and outlier detection. In this paper we modified Our-NIR method for evaluating of node importance by introducing the probability distribution which will be better than by comparing the results.
Ensemble based Distributed K-Modes ClusteringIJERD Editor
Clustering has been recognized as the unsupervised classification of data items into groups. Due to the explosion in the number of autonomous data sources, there is an emergent need for effective approaches in distributed clustering. The distributed clustering algorithm is used to cluster the distributed datasets without gathering all the data in a single site. The K-Means is a popular clustering method owing to its simplicity and speed in clustering large datasets. But it fails to handle directly the datasets with categorical attributes which are generally occurred in real life datasets. Huang proposed the K-Modes clustering algorithm by introducing a new dissimilarity measure to cluster categorical data. This algorithm replaces means of clusters with a frequency based method which updates modes in the clustering process to minimize the cost function. Most of the distributed clustering algorithms found in the literature seek to cluster numerical data. In this paper, a novel Ensemble based Distributed K-Modes clustering algorithm is proposed, which is well suited to handle categorical data sets as well as to perform distributed clustering process in an asynchronous manner. The performance of the proposed algorithm is compared with the existing distributed K-Means clustering algorithms, and K-Modes based Centralized Clustering algorithm. The experiments are carried out for various datasets of UCI machine learning data repository.
AN APPROACH FOR SOFTWARE EFFORT ESTIMATION USING FUZZY NUMBERS AND GENETIC AL...csandit
One of the most critical tasks during the software development life cycle is that of estimating the effort and time involved in the development of the software product. Estimation may be performed by many ways such as: Expert judgments, Algorithmic effort estimation, Machine
learning and Analogy-based estimation. In which Analogy-based software effort estimation is the process of identifying one or more historical projects that are similar to the project being developed and then using the estimates from them. Analogy-based estimation is integrated with Fuzzy numbers in order to improve the performance of software project effort estimation during
the early stages of a software development lifecycle. Because of uncertainty associated with attribute measurement and data availability, fuzzy logic is introduced in the proposed model.But hardly a historical project is exactly same as the project being estimated due to some distance associated in similarity distance. This means that the most similar project still has a
similarity distance with the project being estimated in most of the cases. Therefore, the effort needs to be adjusted when the most similar project has a similarity distance with the project being estimated. To adjust the reused effort, we build an adjustment mechanism whose
algorithm can derive the optimal adjustment on the reused effort using Genetic Algorithm. The proposed model Combine the fuzzy logic to estimate software effort in early stages with Genetic algorithm based adjustment mechanism may result to near the correct effort estimation.
AN APPROACH FOR SOFTWARE EFFORT ESTIMATION USING FUZZY NUMBERS AND GENETIC AL...cscpconf
One of the most critical tasks during the software development life cycle is that of estimating the effort and time involved in the development of the software product. Estimation may be performed by many ways such as: Expert judgments, lgorithmic effort estimation, Machine learning and Analogy-based estimation. In which Analogy-based software effort estimation is
the process of identifying one or more historical projects that are similar to the project being developed and then using the estimates from them. Analogy-based estimation is integrated with Fuzzy numbers in order to improve the performance of software project effort estimation during the early stages of a software development lifecycle. Because of uncertainty associated with tribute measurement and data availability, fuzzy logic is introduced in the proposed model. But hardly a historical project is exactly same as the project being estimated due to some distance associated in similarity distance. This means that the most similar project still has a similarity distance with the project being estimated in most of the cases. Therefore, the effort needs to be adjusted when the most similar project has a similarity distance with the project being estimated. To adjust the reused effort, we build an adjustment mechanism whose
algorithm can derive the optimal adjustment on the reused effort using Genetic Algorithm. The proposed model Combine the fuzzy logic to estimate software effort in early stages with Genetic algorithm based adjustment mechanism may result to near the correct effort estimation.
An approach for software effort estimation using fuzzy numbers and genetic al...csandit
One of the most critical tasks during the software development life cycle is that of estimating the
effort and time involved in the development of the software product. Estimation may be
performed by many ways such as: Expert judgments, Algorithmic effort estimation, Machine
learning and Analogy-based estimation. In which Analogy-based software effort estimation is
the process of identifying one or more historical projects that are similar to the project being
developed and then using the estimates from them. Analogy-based estimation is integrated with
Fuzzy numbers in order to improve the performance of software project effort estimation during
the early stages of a software development lifecycle. Because of uncertainty associated with
attribute measurement and data availability, fuzzy logic is introduced in the proposed model.
But hardly a historical project is exactly same as the project being estimated due to some
distance associated in similarity distance. This means that the most similar project still has a
similarity distance with the project being estimated in most of the cases. Therefore, the effort
needs to be adjusted when the most similar project has a similarity distance with the project
being estimated. To adjust the reused effort, we build an adjustment mechanism whose
algorithm can derive the optimal adjustment on the reused effort using Genetic Algorithm. The
proposed model Combine the fuzzy logic to estimate software effort in early stages with Genetic
algorithm based adjustment mechanism may result to near the correct effort estimation.
Clustering heterogeneous categorical data using enhanced mini batch K-means ...IJECEIAES
Clustering methods in data mining aim to group a set of patterns based on their similarity. In a data survey, heterogeneous information is established with various types of data scales like nominal, ordinal, binary, and Likert scales. A lack of treatment of heterogeneous data and information leads to loss of information and scanty decision-making. Although many similarity measures have been established, solutions for heterogeneous data in clustering are still lacking. The recent entropy distance measure seems to provide good results for the heterogeneous categorical data. However, it requires many experiments and evaluations. This article presents a proposed framework for heterogeneous categorical data solution using a mini batch k-means with entropy measure (MBKEM) which is to investigate the effectiveness of similarity measure in clustering method using heterogeneous categorical data. Secondary data from a public survey was used. The findings demonstrate the proposed framework has improved the clustering’s quality. MBKEM outperformed other clustering algorithms with the accuracy at 0.88, v-measure (VM) at 0.82, adjusted rand index (ARI) at 0.87, and Fowlkes-Mallow’s index (FMI) at 0.94. It is observed that the average minimum elapsed time-varying for cluster generation, k at 0.26 s. In the future, the proposed solution would be beneficial for improving the quality of clustering for heterogeneous categorical data problems in many domains.
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSijdkp
Subspace clustering discovers the clusters embedded in multiple, overlapping subspaces of high
dimensional data. Many significant subspace clustering algorithms exist, each having different
characteristics caused by the use of different techniques, assumptions, heuristics used etc. A comprehensive
classification scheme is essential which will consider all such characteristics to divide subspace clustering
approaches in various families. The algorithms belonging to same family will satisfy common
characteristics. Such a categorization will help future developers to better understand the quality criteria to
be used and similar algorithms to be used to compare results with their proposed clustering algorithms. In
this paper, we first proposed the concept of SCAF (Subspace Clustering Algorithms’ Family).
Characteristics of SCAF will be based on the classes such as cluster orientation, overlap of dimensions etc.
As an illustration, we further provided a comprehensive, systematic description and comparison of few
significant algorithms belonging to “Axis parallel, overlapping, density based” SCAF.
A Defect Prediction Model for Software Product based on ANFISIJSRD
Artificial intelligence techniques are day by day getting involvement in all the classification and prediction based process like environmental monitoring, stock exchange conditions, biomedical diagnosis, software engineering etc. However still there are yet to be simplify the challenges of selecting training criteria for design of artificial intelligence models used for prediction of results. This work focus on the defect prediction mechanism development using software metric data of KC1.We have taken subtractive clustering approach for generation of fuzzy inference system (FIS).The FIS rules are generated at different radius of influence of input attribute vectors and the developed rules are further modified by ANFIS technique to obtain the prediction of number of defects in software project using fuzzy logic system.
A Defect Prediction Model for Software Product based on ANFISIJSRD
Artificial intelligence techniques are day by day getting involvement in all the classification and prediction based process like environmental monitoring, stock exchange conditions, biomedical diagnosis, software engineering etc. However still there are yet to be simplify the challenges of selecting training criteria for design of artificial intelligence models used for prediction of results. This work focus on the defect prediction mechanism development using software metric data of KC1.We have taken subtractive clustering approach for generation of fuzzy inference system (FIS).The FIS rules are generated at different radius of influence of input attribute vectors and the developed rules are further modified by ANFIS technique to obtain the prediction of number of defects in software project using fuzzy logic system.
Similar to ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH (20)
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR cscpconf
The progressive development of Synthetic Aperture Radar (SAR) systems diversify the exploitation of the generated images by these systems in different applications of geoscience. Detection and monitoring surface deformations, procreated by various phenomena had benefited from this evolution and had been realized by interferometry (InSAR) and differential interferometry (DInSAR) techniques. Nevertheless, spatial and temporal decorrelations of the interferometric couples used, limit strongly the precision of analysis results by these techniques. In this context, we propose, in this work, a methodological approach of surface deformation detection and analysis by differential interferograms to show the limits of this technique according to noise quality and level. The detectability model is generated from the deformation signatures, by simulating a linear fault merged to the images couples of ERS1 / ERS2 sensors acquired in a region of the Algerian south.
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATIONcscpconf
A novel based a trajectory-guided, concatenating approach for synthesizing high-quality image real sample renders video is proposed . The lips reading automated is seeking for modeled the closest real image sample sequence preserve in the library under the data video to the HMM predicted trajectory. The object trajectory is modeled obtained by projecting the face patterns into an KDA feature space is estimated. The approach for speaker's face identification by using synthesise the identity surface of a subject face from a small sample of patterns which sparsely each the view sphere. An KDA algorithm use to the Lip-reading image is discrimination, after that work consisted of in the low dimensional for the fundamental lip features vector is reduced by using the 2D-DCT.The mouth of the set area dimensionality is ordered by a normally reduction base on the PCA to obtain the Eigen lips approach, their proposed approach by[33]. The subjective performance results of the cost function under the automatic lips reading modeled , which wasn’t illustrate the superior performance of the
method.
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...cscpconf
Universities offer software engineering capstone course to simulate a real world-working environment in which students can work in a team for a fixed period to deliver a quality product. The objective of the paper is to report on our experience in moving from Waterfall process to Agile process in conducting the software engineering capstone project. We present the capstone course designs for both Waterfall driven and Agile driven methodologies that highlight the structure, deliverables and assessment plans.To evaluate the improvement, we conducted a survey for two different sections taught by two different instructors to evaluate students’ experience in moving from traditional Waterfall model to Agile like process. Twentyeight students filled the survey. The survey consisted of eight multiple-choice questions and an open-ended question to collect feedback from students. The survey results show that students were able to attain hands one experience, which simulate a real world-working environment. The results also show that the Agile approach helped students to have overall better design and avoid mistakes they have made in the initial design completed in of the first phase of the capstone project. In addition, they were able to decide on their team capabilities, training needs and thus learn the required technologies earlier which is reflected on the final product quality
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIEScscpconf
Using social media in education provides learners with an informal way for communication. Informal communication tends to remove barriers and hence promotes student engagement. This paper presents our experience in using three different social media technologies in teaching software project management course. We conducted different surveys at the end of every semester to evaluate students’ satisfaction and engagement. Results show that using social media enhances students’ engagement and satisfaction. However, familiarity with the tool is an important factor for student satisfaction.
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGICcscpconf
In real world computing environment with using a computer to answer questions has been a human dream since the beginning of the digital era, Question-answering systems are referred to as intelligent systems, that can be used to provide responses for the questions being asked by the user based on certain facts or rules stored in the knowledge base it can generate answers of questions asked in natural , and the first main idea of fuzzy logic was to working on the problem of computer understanding of natural language, so this survey paper provides an overview on what Question-Answering is and its system architecture and the possible relationship and
different with fuzzy logic, as well as the previous related research with respect to approaches that were followed. At the end, the survey provides an analytical discussion of the proposed QA models, along or combined with fuzzy logic and their main contributions and limitations.
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS cscpconf
Human beings generate different speech waveforms while speaking the same word at different times. Also, different human beings have different accents and generate significantly varying speech waveforms for the same word. There is a need to measure the distances between various words which facilitate preparation of pronunciation dictionaries. A new algorithm called Dynamic Phone Warping (DPW) is presented in this paper. It uses dynamic programming technique for global alignment and shortest distance measurements. The DPW algorithm can be used to enhance the pronunciation dictionaries of the well-known languages like English or to build pronunciation dictionaries to the less known sparse languages. The precision measurement experiments show 88.9% accuracy.
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS cscpconf
In education, the use of electronic (E) examination systems is not a novel idea, as Eexamination systems have been used to conduct objective assessments for the last few years. This research deals with randomly designed E-examinations and proposes an E-assessment system that can be used for subjective questions. This system assesses answers to subjective questions by finding a matching ratio for the keywords in instructor and student answers. The matching ratio is achieved based on semantic and document similarity. The assessment system is composed of four modules: preprocessing, keyword expansion, matching, and grading. A survey and case study were used in the research design to validate the proposed system. The examination assessment system will help instructors to save time, costs, and resources, while increasing efficiency and improving the productivity of exam setting and assessments.
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTICcscpconf
African Buffalo Optimization (ABO) is one of the most recent swarms intelligence based metaheuristics. ABO algorithm is inspired by the buffalo’s behavior and lifestyle. Unfortunately, the standard ABO algorithm is proposed only for continuous optimization problems. In this paper, the authors propose two discrete binary ABO algorithms to deal with binary optimization problems. In the first version (called SBABO) they use the sigmoid function and probability model to generate binary solutions. In the second version (called LBABO) they use some logical operator to operate the binary solutions. Computational results on two knapsack problems (KP and MKP) instances show the effectiveness of the proposed algorithm and their ability to achieve good and promising solutions.
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAINcscpconf
In recent years, many malware writers have relied on Dynamic Domain Name Services (DDNS) to maintain their Command and Control (C&C) network infrastructure to ensure a persistence presence on a compromised host. Amongst the various DDNS techniques, Domain Generation Algorithm (DGA) is often perceived as the most difficult to detect using traditional methods. This paper presents an approach for detecting DGA using frequency analysis of the character distribution and the weighted scores of the domain names. The approach’s feasibility is demonstrated using a range of legitimate domains and a number of malicious algorithmicallygenerated domain names. Findings from this study show that domain names made up of English characters “a-z” achieving a weighted score of < 45 are often associated with DGA. When a weighted score of < 45 is applied to the Alexa one million list of domain names, only 15% of the domain names were treated as non-human generated.
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...cscpconf
The amount of piracy in the streaming digital content in general and the music industry in specific is posing a real challenge to digital content owners. This paper presents a DRM solution to monetizing, tracking and controlling online streaming content cross platforms for IP enabled devices. The paper benefits from the current advances in Blockchain and cryptocurrencies. Specifically, the paper presents a Global Music Asset Assurance (GoMAA) digital currency and presents the iMediaStreams Blockchain to enable the secure dissemination and tracking of the streamed content. The proposed solution provides the data owner the ability to control the flow of information even after it has been released by creating a secure, selfinstalled, cross platform reader located on the digital content file header. The proposed system provides the content owners’ options to manage their digital information (audio, video, speech, etc.), including the tracking of the most consumed segments, once it is release. The system benefits from token distribution between the content owner (Music Bands), the content distributer (Online Radio Stations) and the content consumer(Fans) on the system blockchain.
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEMcscpconf
This paper discusses the importance of verb suffix mapping in Discourse translation system. In
discourse translation, the crucial step is Anaphora resolution and generation. In Anaphora
resolution, cohesion links like pronouns are identified between portions of text. These binders
make the text cohesive by referring to nouns appearing in the previous sentences or nouns
appearing in sentences after them. In Machine Translation systems, to convert the source
language sentences into meaningful target language sentences the verb suffixes should be
changed as per the cohesion links identified. This step of translation process is emphasized in
the present paper. Specifically, the discussion is on how the verbs change according to the
subjects and anaphors. To explain the concept, English is used as the source language (SL) and
an Indian language Telugu is used as Target language (TL)
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...cscpconf
In this paper, based on the definition of conformable fractional derivative, the functional
variable method (FVM) is proposed to seek the exact traveling wave solutions of two higherdimensional
space-time fractional KdV-type equations in mathematical physics, namely the
(3+1)-dimensional space–time fractional Zakharov-Kuznetsov (ZK) equation and the (2+1)-
dimensional space–time fractional Generalized Zakharov-Kuznetsov-Benjamin-Bona-Mahony
(GZK-BBM) equation. Some new solutions are procured and depicted. These solutions, which
contain kink-shaped, singular kink, bell-shaped soliton, singular soliton and periodic wave
solutions, have many potential applications in mathematical physics and engineering. The
simplicity and reliability of the proposed method is verified.
AUTOMATED PENETRATION TESTING: AN OVERVIEWcscpconf
The using of information technology resources is rapidly increasing in organizations,
businesses, and even governments, that led to arise various attacks, and vulnerabilities in the
field. All resources make it a must to do frequently a penetration test (PT) for the environment
and see what can the attacker gain and what is the current environment's vulnerabilities. This
paper reviews some of the automated penetration testing techniques and presents its
enhancement over the traditional manual approaches. To the best of our knowledge, it is the
first research that takes into consideration the concept of penetration testing and the standards
in the area.This research tackles the comparison between the manual and automated
penetration testing, the main tools used in penetration testing. Additionally, compares between
some methodologies used to build an automated penetration testing platform.
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORKcscpconf
Since the mid of 1990s, functional connectivity study using fMRI (fcMRI) has drawn increasing
attention of neuroscientists and computer scientists, since it opens a new window to explore
functional network of human brain with relatively high resolution. BOLD technique provides
almost accurate state of brain. Past researches prove that neuro diseases damage the brain
network interaction, protein- protein interaction and gene-gene interaction. A number of
neurological research paper also analyse the relationship among damaged part. By
computational method especially machine learning technique we can show such classifications.
In this paper we used OASIS fMRI dataset affected with Alzheimer’s disease and normal
patient’s dataset. After proper processing the fMRI data we use the processed data to form
classifier models using SVM (Support Vector Machine), KNN (K- nearest neighbour) & Naïve
Bayes. We also compare the accuracy of our proposed method with existing methods. In future,
we will other combinations of methods for better accuracy.
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...cscpconf
In order to treat and analyze real datasets, fuzzy association rules have been proposed. Several
algorithms have been introduced to extract these rules. However, these algorithms suffer from
the problems of utility, redundancy and large number of extracted fuzzy association rules. The
expert will then be confronted with this huge amount of fuzzy association rules. The task of
validation becomes fastidious. In order to solve these problems, we propose a new validation
method. Our method is based on three steps. (i) We extract a generic base of non redundant
fuzzy association rules by applying EFAR-PN algorithm based on fuzzy formal concept analysis.
(ii) we categorize extracted rules into groups and (iii) we evaluate the relevance of these rules
using structural equation model.
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATAcscpconf
In many applications of data mining, class imbalance is noticed when examples in one class are
overrepresented. Traditional classifiers result in poor accuracy of the minority class due to the
class imbalance. Further, the presence of within class imbalance where classes are composed of
multiple sub-concepts with different number of examples also affect the performance of
classifier. In this paper, we propose an oversampling technique that handles between class and
within class imbalance simultaneously and also takes into consideration the generalization
ability in data space. The proposed method is based on two steps- performing Model Based
Clustering with respect to classes to identify the sub-concepts; and then computing the
separating hyperplane based on equal posterior probability between the classes. The proposed
method is tested on 10 publicly available data sets and the result shows that the proposed
method is statistically superior to other existing oversampling methods.
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCHcscpconf
Data collection is an essential, but manpower intensive procedure in ecological research. An
algorithm was developed by the author which incorporated two important computer vision
techniques to automate data cataloging for butterfly measurements. Optical Character
Recognition is used for character recognition and Contour Detection is used for imageprocessing.
Proper pre-processing is first done on the images to improve accuracy. Although
there are limitations to Tesseract’s detection of certain fonts, overall, it can successfully identify
words of basic fonts. Contour detection is an advanced technique that can be utilized to
measure an image. Shapes and mathematical calculations are crucial in determining the precise
location of the points on which to draw the body and forewing lines of the butterfly. Overall,
92% accuracy were achieved by the program for the set of butterflies measured.
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...cscpconf
Smart cities utilize Internet of Things (IoT) devices and sensors to enhance the quality of the city
services including energy, transportation, health, and much more. They generate massive
volumes of structured and unstructured data on a daily basis. Also, social networks, such as
Twitter, Facebook, and Google+, are becoming a new source of real-time information in smart
cities. Social network users are acting as social sensors. These datasets so large and complex
are difficult to manage with conventional data management tools and methods. To become
valuable, this massive amount of data, known as 'big data,' needs to be processed and
comprehended to hold the promise of supporting a broad range of urban and smart cities
functions, including among others transportation, water, and energy consumption, pollution
surveillance, and smart city governance. In this work, we investigate how social media analytics
help to analyze smart city data collected from various social media sources, such as Twitter and
Facebook, to detect various events taking place in a smart city and identify the importance of
events and concerns of citizens regarding some events. A case scenario analyses the opinions of
users concerning the traffic in three largest cities in the UAE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGEcscpconf
The anonymity of social networks makes it attractive for hate speech to mask their criminal
activities online posing a challenge to the world and in particular Ethiopia. With this everincreasing
volume of social media data, hate speech identification becomes a challenge in
aggravating conflict between citizens of nations. The high rate of production, has become
difficult to collect, store and analyze such big data using traditional detection methods. This
paper proposed the application of apache spark in hate speech detection to reduce the
challenges. Authors developed an apache spark based model to classify Amharic Facebook
posts and comments into hate and not hate. Authors employed Random forest and Naïve Bayes
for learning and Word2Vec and TF-IDF for feature selection. Tested by 10-fold crossvalidation,
the model based on word2vec embedding performed best with 79.83%accuracy. The
proposed method achieve a promising result with unique feature of spark for big data.
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTcscpconf
This article presents Part of Speech tagging for Nepali text using General Regression Neural
Network (GRNN). The corpus is divided into two parts viz. training and testing. The network is
trained and validated on both training and testing data. It is observed that 96.13% words are
correctly being tagged on training set whereas 74.38% words are tagged correctly on testing
data set using GRNN. The result is compared with the traditional Viterbi algorithm based on
Hidden Markov Model. Viterbi algorithm yields 97.2% and 40% classification accuracies on
training and testing data sets respectively. GRNN based POS Tagger is more consistent than the
traditional Viterbi decoding technique.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
This is a presentation by Dada Robert in a Your Skill Boost masterclass organised by the Excellence Foundation for South Sudan (EFSS) on Saturday, the 25th and Sunday, the 26th of May 2024.
He discussed the concept of quality improvement, emphasizing its applicability to various aspects of life, including personal, project, and program improvements. He defined quality as doing the right thing at the right time in the right way to achieve the best possible results and discussed the concept of the "gap" between what we know and what we do, and how this gap represents the areas we need to improve. He explained the scientific approach to quality improvement, which involves systematic performance analysis, testing and learning, and implementing change ideas. He also highlighted the importance of client focus and a team approach to quality improvement.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Ethnobotany and Ethnopharmacology:
Ethnobotany in herbal drug evaluation,
Impact of Ethnobotany in traditional medicine,
New development in herbals,
Bio-prospecting tools for drug discovery,
Role of Ethnopharmacology in drug evaluation,
Reverse Pharmacology.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
2. 494 Computer Science & Information Technology (CS & IT)
complexity of the projects. These techniques include artificial neural network, genetic algorithm,
support vector regression, genetic programming, neuro-fuzzy inference system and case base
reasoning. They use the historical datasets of completed projects as training data and predict the
values for new project’s effort based on previous training. Though, a significant improvement has
been achieved using soft computing techniques in software cost estimation, yet there exist some
limitations due to the heterogeneity in the datasets.
Soft computing techniques estimate accurately if there is some relationship between the tuples of
the dataset. Due to this heterogeneity that exists amongst software projects, these techniques are
not able to estimate optimally. This heterogeneity of data can be reduced by clustering the data
into some similar groups. The goal of clustering is to create the groups of data that have similar
characteristics. The clustering divides the data set X into k disjoint subsets that have some
dissimilarity between them.
A clustered regression approach has been proposed in this study in order to generate more
efficient estimation sub models. A feature weighted grey relational based clustering method has
been integrated with regression techniques. The feature weighted grey relational clustering
algorithm uses grey relational analysis for weighting features and also for clustering. The results
obtained showed that clustering could decrease the effect of irrelevant projects on accuracy of
estimations. Cluster specific regression models for the four publicly available data sets are
generated. Empirical results have shown that regression when applied on clustered data provides
some outstanding results, thus indicating that the methodology has great potential and can be used
for software effort estimation. The results are subjected to statistical testing using Wilcoxon
signed rank test and Mann_Whitney U Test.
The rest of the paper is organized as follows. Section 2 reviews some related works on clustering
algorithm and GRG as a similarity measure for feature weighting. Section 3, introduces the
modeling techniques. Further, in Section 4 we present the proposed methodology. Section 5,
gives description of the data sets used in the study and the experimental results that demonstrate
the use of proposed clustered regression approach in software effort estimation. In the end the
conclusion is made in Section 6.
2. REVIEW OF LITERATURE
A number of data clustering techniques have been developed to find the optimal subsets of data
from the existing datasets [4],[5],[6]. The main aim of clustering is to partition an unlabeled data
set into subsets according to some similarity measure, called unsupervised classification.
Clustering algorithms can be categorized into two main families: input clustering and input-
output clustering [7]. In input clustering algorithms all the attributes are considered as
independent. Hard c-means [8] and fuzzy c-means [9] algorithms fall into this category. In the
case of input-output clustering each multi-attribute data point is considered as a vector of
independent attribute with some corresponding dependent value. Let S = {(x1,y1), (x2,y2) ,… (xn,
yn)} be a set of unlabelled input-output data pairs. Each independent input vector xi = [x1i, x2i,…
xki] has a corresponding dependent value yi. Research work has been done to motivate this
category of classification [10],[11].
Kung and Su [12] developed an effective approach to establish affine Takagi-Sugeno (T-S) fuzzy
model for a nonlinear system from its input – output data. Chunheng, Cui and Wang [13]
proposed FCM-SLNMM clustering algorithm, consisting of two stages. The FCM algorithm was
applied in the first stage and supervised learning normal mixture model was applied in the second
stage. The clustering results of the first stage were used as training data. The experiments on the
real world data from the UCI repository showed that the supervised learning normal mixture
3. Computer Science & Information Technology (CS & IT) 495
model can improve the performance of the FCM algorithm sharply. Lin and Tsai [14] proposed a
hierarchical grey clustering approach in which the similarity measure was a globalized modified
grey relational grade instead of traditional distances. Chang and Yeh [15] generalized the concept
of grey relational analysis in order to develop a technique for analyzing the similarity between
given patterns. They also proposed a clustering algorithm to find cluster centers of a given
dataset.
In this study, GRA a technique of GST which utilizes the concept of absolute point-to-point
distance between features [16],[29] has been applied. GST a recently developed system
engineering theory, was first established by Deng [18],[19],[20]. It draws out valuable
information by generating and developing the partially known information. So far, GST has been
applied in different areas of image processing [23], mobile communication [24], machine vision
inspection [25], decision making [26], stock price prediction [27] and system control [28]. The
success of GST motivated us to investigate its application in software effort estimation.
3. MODELING TECHNIQUES
The data available for software cost estimation is inherently non linear and hence accurate
estimation of effort is difficult. Efficient estimation can be achieved if this non linearity that
exists can be treated by tracing relationships among data values. In this study, we try to reduce the
heterogeneity by applying feature weighted grey relational clustering methodology.
3.1 Regression
As discussed, a large number of techniques have been applied to the field of software effort
estimation. The aim of this study is to assess which regression techniques perform best to
estimate software effort. The following techniques are considered.
3.1.1 Ordinary Least Square Regression
It is the most popular and widely applied technique to build software cost estimation models.
According to principle of least squares the `best fitting' line is the line which minimizes the
deviations of the observed data away from the line. The regression parameters for the least square
line, are estimates of the unknown regression parameters in the model. This is referred to as
multiple linear regression and is given by:
ikikii xxy εβββ +++= ,1,10 ..... (1)
where, yi is a dependent variable where as x1,x2,........xk are k independent variables. βo is the y
intercept, β1 , β2 are the slope of y, εi is the error term.
3.1.2 Ridge Regression (RR)
RR is an alternative regression technique that tries to address the potential problems with OLS
that arise due to highly correlated attributes. In regression, the objective is to “explain” the
variation in one or more “response variables”, by associating this variation with proportional
variation in one or more “explanatory variables”, but the problem arises when the explanatory
variables vary in similar ways reducing , their collective power of explanation. The phenomenon
is known as near collinearity. As the different variables are correlated the covariance matrix X X
will be nearly singular and as a result the estimates will be unstable. A small variation in error
will have large impact on βˆ
. Ridge regression reduces the sensitivity by adding a number δ to
4. 496 Computer Science & Information Technology (CS & IT)
the elements on the diagonal of the matrix to be inverted. δ is called the ridge parameter and it
yields the following estimator of β.
( ) ( )eXIXX n ''ˆ 1−
+= δβδ
(2)
where, In represents the identity matrix of rank n.
3.1.3 Forward Stepwise Regression
The purpose of stepwise regression is to generate regression model in which the detection of most
predictive variables is carried out. It is carried out by a series of F tests. The method evaluates the
independent variables at each step, adding or deleting them from the model based on user-
specified criteria. In the first step, each of the independent variables are evaluated individually
and the variable that has the largest F value greater than or equal to the F to enter value, is entered
into the regression equation. In the subsequent steps, when a variable is added to the model based
on their F value, the method also examines variables included in the model based on F to remove
criteria, and if any variables are found they are removed.
3.1.4 Backward Stepwise Regression
The backward stepwise elimination procedure is basically a series of tests for significance of
independent variables. The process starts with the maximum model, it eliminates the variable
with the highest p -value for the test of significance of the variable, conditioned on the p -value
being bigger than some pre-determined level (say, 0.05). In the next step, it fits the reduced
model after having removed the variable from the maximum model, and also removes from
the reduced model the variable with the highest p-value for the test of significance of that variable
(if p>=0.05 ) and so on. The process ends when no more variables can be removed from the
model at significance level 5%.
3.1.5 Multiple Adaptive Regression Splines(MARS)
MAR splines focuses on the development and deployment of accurate and easy-to-understand
regression models. The MAR splines model is designed to predict continuous numeric and high
quality probability models. MAR spline model is a regression model which automatically
generates non-linearities and interactions between variables and is thus a promising technique to
be used for software effort estimation[21]. MAR splines fits the data to the following equation.
( )( )jxhbbe ii
K
k
L
iki ∑=
=Π+=
1
10
(3)
In this bo and bk are the intercept and slope. Parameters hi (xi (j)) are the hinge functions. They take
the form max (0, xi (j)-b) where, b is the knot. MAR splines is a multiple piece wise linear
regression by adding multiple hinge functions.
3.2 Grey Relational Analysis
GRA is comparatively a novel technique in software estimations. It is used for analyzing the
relationships that exists between two series. The magnetism of GRA to software effort estimation
shoots from its flexibility to model complex nonlinear relationship between effort and cost drivers
[16].
5. Computer Science & Information Technology (CS & IT) 497
Grey Relational Grade by Deng’s Method [18],[19],[20]
GRA is used to quantify all the influences of various factors and the relationship among data
series that is a collection of measurements [16].The three main steps involved in the process are:
Data Processing: The first step is the standardization of the various attributes. Every attribute has
the same amount of influence as the data is made dimensionless by using various techniques like
upper bound effectiveness, lower bound effectiveness or moderate effectiveness.
Upper-bound effectiveness (i.e., larger-the-better) is given by:
( ) ( ) ( )
( ) ( )kxkx
kxkx
kx
iiii
iii
i
minmax
min*
−
−
=
(4)
where i=1,2,…,m and k=1,2,…,n.
Difference Series: GRA uses the grey relational coefficient to describe the trend relationship
between an objective series and a reference series at a given point in a system.
( ) ( )( )
( ) max,
maxmin
0 ,
∆+∆
∆+∆
=
ζ
ζ
γ
k
kxkx
io
i
(5)
where;
∆0,i(k) = |x0(k) − xi(k)| is the difference of the absolute value between x0(k) and xi(k);
∆min = minjmink |x0(k) − xj(k)| is the smallest value of ∆0,j ∀j ∈ {1, 2, . . . , n};
∆max = maxjmaxk |x0(k) −xj(k)| is the largest value of ∆0,j∀j ∈ {1, 2, . . . , n}; and
ζ is the distinguishing coefficient, ζ ∈ (0, 1].
The ζ value will change the magnitude of γ(x0(k), xi(k). In this study the value of ζ has been taken
as 0.5 [17].
Grey Relational Grade: GRG is used to find overall similarity degree between reference tuple xo
and comparative tuple xi. When the value of GRG approaches 1, the two tuples are “more closely
similar”. When GRG approaches a value 0, the two tuples are “more dissimilar”. The GRG Γ(x0,
xi) between an objective series xi and the reference series x0 was defined by Deng as follows:
( ) ( ) ( )( )∑=
=Γ
n
k
ii kxkx
n
xx
1
0,0 ,
1
γ
(6)
4. PROPOSED METHODOLOGIES
4.1 Clustered regression using grey relational analysis
In this methodology, in order to reduce the heterogeneity that exists in the datasets, the initial
focus is to partition datasets into subsets according to some similarity measure, called
unsupervised classification. The proposed clustering algorithm uses grey relational analysis for
feature selection as well as for clustering. In this the maximum mean grey relational grade
6. 498 Computer Science & Information Technology (CS & IT)
between data points acts as an objective function instead of the minimum distance used by k-
means. The structural flow chart is shown in figure 1.
The three main steps involved in the algorithm are:
Step 1: Using Grey relational analysis for finding feature weights.
Step 2: Using Grey relational analysis for clustering the datasets based on these feature-weights.
Step 3: Applying regression techniques on the clustered datasets.
Step 4: Effort Prediction by Regression Techniques
The detailed algorithm is described as follows:
Using Grey Relational Analysis for finding feature weights.
Feature Selection by GRA [16]
a. Construction of data: The columns in each cluster dataset are treated as series . The effort
series xe ={e1,e2,e3.......en} is taken as the reference series and the attribute columns are
regarded as objective series.
b. Normalization: Each data series is normalized according to equation 4, so that they have
same degree of influence on the dependent variable “effort”.
c. Generation of Grey Relational Grade: Grey relational grade(GRG) is calculated for each
series wrt reference series according to equation 6.
The GRG’s are generated, normalized and used as the corresponding features weight wk .
Using Grey Relational Analysis for clustering the datasets based on these feature-weights.
After finding the weights of the features from the first step it applies the clustered approach based
on grey relational analysis. The detailed algorithm is described as follows:
a. Weight of each feature is generated as described earlier.
b. Normalize the data with larger the better as per equation 4,
c. Calculate distance between data points based on weighted GRG
• Consider the ith
data point as reference series xo
• All the other data points as the objective series, { x1,x2,x3, ……xn-1 }
• Calculate the grey relational coefficient with ζ=0.5 [17].Calculate weighted GRG
of the reference series and feature weight calculated as in step 1 for all objective
series.
d. Randomly select the number of desired of clusters center ck.
e. GRG distance from data points to cluster centers is used as a basis to select cluster
members, for which it has maximum GRG value.
f. Update the cluster centers by selecting centers based on the maximum mean GRG, then
repeat step e.
g. Repeat steps e and f, until there is no change in the cluster head updating or the difference
between the mean is less than some predefined threshold value.
5. EXPERIMENTAL RESULTS
5.1 Dataset
In order to evaluate the models based upon the proposed methodology, four well established
datasets from the Promise repository [22] have been used for validating our model. These datasets
are Desharnais, Finnish, Albrecht and Maxwell. The descriptive statistics of the datasets are
shown in Table 1 given below. All the datasets have a varied range of effort values. These
7. Computer Science & Information Technology (CS & IT) 499
datasets have been treated individually as they have distinct features. Also the clusters from each
dataset have been treated separately. The prediction accuracy for all the models with and without
clustering are then compared. In order to measure the accuracy of the software estimation, we
have used three most popularly used evaluation criteria in software engineering i.e MMRE,
MdMRE and Pred(n).
Figure1. Structural flowchart for feature weighted Grey Relational Clustering
Table1. Descriptive Statistics of the data sets
Cases Features Effort Mean Minimum
(effort
value)
Maximum
(effort
value)
Effort Std.
Dev.
Albrecht 24 8 21.875000 0.50 105.20 28.417895
Desharnais 77 11 4833.9090 546 23940 4188.1851
Finnish 38 8 7678.2894 460 26670 7135.2799
Maxwell 62 23 8223 583 63694 10499.903
5.1.1 Comparison over Desharnais data set
The results obtained suggest that applying regression technique on clustered data produces more
accurate estimation models than applying regression on the entire datasets. This is evident from
the results obtained shown in Table 2. The Pred(25) accuracy has improved from 35.06 % to 50
% using OLS regression whereas, the MMRE and MdMRE has fallen from 0.5 to 0.32 and from
0.31 to 0.25 respectively. Similar observations can be notified from the table below for all other
regression models also. Best results have been obtained on using the proposed feature weighted
grey relational clustering.
Select Continuous Attributes
Data Processing
Grey Relational Coefficient
Grey Relational Grade
Historical data
sets
Cluster of each data set
Ordinary Least
Square Regression
Ridge Regression
Forward Stepwise
Backward Stepwise
Estimated Effort by Regression
techniques
MAR Splines
Regression Techniques
Data Processing
Grey Relational Coefficient
GRA for
Clustering
using the
Feature
weights
GRA for
finding
Feature
weights
Grey Relational Grade
8. 500 Computer Science & Information Technology (CS & IT)
Table 2. Prediction accuracy results (Desharnais data set)
Figure 2. Boxplot of Absolute Residuals for Desharnais
The Boxplot of absolute residuals provide good indication of the distribution of residuals and can
help better understand the mean magnitude of relative error and Pred (25).The results obtained
were subjected to statistical tests using Wilcoxon Signed rank test and Mann Whitney U test.
The box plots of absolute residuals shown in figure 2 suggest that:
The medians for all regression techniques applied on Desharnais _Cluster are more close to zero,
as it is clear from the values on the Y-axis, indicating that the residuals were closer to the
minimum value. The Outliers are few and less extreme in cases of Desharnais_Clusters as
compared to Desharnais data set. As the p-value in all the cases shown in Table 3 is greater 0.05
where in we conclude that the residuals obtained in all approaches are not significantly different
from the test value zero. As a result, the proposed methods can be used for software effort
estimation. The statistical tests were performed using SPSS 19 for windows.
Table 3. Wilcoxon signed rank test Test
The results of Mann_Whitney U Test are provided in Table 4. Predictions obtained using the
clustered approach presented statistically significant estimations.
OLS Ridge
Regression
Forward
Stepwise
Backward
Stepwise
MAR Splines
Desharnais
MMRE 0.5 0.47 0.5 0.5 0.51
MdMRE 0.31 0.3 0.31 0.31 0.32
Pred(25) 35.06 41.56 37.66 37.66 35.06
Desharnais(Cluster_1) using Grey Relational Clustering
MMRE 0.32 0.35 0.33 0.39 0.39
MdMRE 0.25 0.23 0.24 0.21 0.20
Pred(25) 50 55.56 50 52.78 58.33
Desharnais Desharnais_Cluster_1
using Grey Relational
Z Asymp. Sig. (2-tailed) Z Asymp. Sig. (2-
tailed)
OLS-Actual -.419a .675 -.236a .814
Ridge – Actual -.551a .582 -.299a .765
Forward - Actual -.449a .653 -.189a .850
Backward- Actual -.449a .653 -.314a .753
MARS– Actual -.566a .571 -.236a .814
9. Computer Science & Information Technology (CS & IT) 501
Table 4. Results Mann-Whitney U Test
5.1.2 Comparison over Finnish data set:
For the Finnish dataset, some significant results (as shown in Table 5.) were obtained on the
clustered data. The Pred(25)accuracy improved from 36.84 % to 100 % using OLS regression
whereas, the MMRE and MdMRE has fallen from 0.75 to 0.02 and from 0.36 to 0.02 respectively.
Similar observations can be notified from the table below for all other regression models also.
The boxplot of absolute residuals for Finnish dataset and Finnish_cluster is shown in Figure 3.
They suggest that:
The medians for all regression techniques applied on Finnish _Cluster are very close to zero, as it
is clear from the values on the Y-axis, indicating that the estimates were closer to the minimum
value. Outliers are less extreme in case of Finnish_Cluster. One sample Wilcoxon signed rank test
has been applied in order to investigate the significance of the results by setting level of
confidence to 0.05. From the results obtained as shown in Table 6, we can conclude that no
significant difference exists between the residual median and hypothetical median.
Table 5. Prediction accuracy results (Finnish data set)
Unsurprisingly, predictions based on clustered regression model presented statistically significant
accurate estimations, measured using absolute residuals, confirmed by the results of boxplot of
absolute residuals and also verified using Mann-Whitney U test (Table 7.)
Figure 3. Boxplot of Absolute Residuals for Finnish
Desharnais vs. Desharnais_Cluster_1
using Grey Relational Z
OLS Regression -4.018
Ridge Regression -4.079
Forward Stepwise -4.252
Backward Stepwise -4.227
MAR Splines -4.240
OLS Ridge
Regression
Forward
Stepwise
Backward
Stepwise
MAR Splines
Finnish
MMRE 0.75 0.71 1.01 0.76 0.08
MdMRE 0.36 0.32 0.43 0.42 0.07
Pred(25) 36.84 36.84 36.84 36.84 97.37
Finnish(Cluster_1) using Grey Relational Clustering
MMRE 0.02 0.025 0.23 0.022 0.022
MdMRE 0.02 0.024 0.022 0.023 0.023
Pred(25) 100 100 100 100 100
10. 502 Computer Science & Information Technology (CS & IT)
Table 6. Wilcoxon signed rank test
Table 7. Results Mann-Whitney U Test
5.1.3 Comparison over Albrecht data set:
The results obtained using the proposed clustered regression approach produced more accurate
models. This is evident from the Pred(25) accuracy that improved from 37.5 % to 85.71 % using
OLS whereas, the MMRE and MdMRE has fallen from 0.9 to 0.09 and from 0.43 to 0.05
respectively. Similar observations can be notified for all other regression models also (Table 8).
Table 8. Prediction accuracy results (Albrecht data set)
The box plots of absolute residuals suggest that:
The medians for all regression techniques applied on Albrecht_Clusters are very close to zero, as
it is clear from the values on the Y-axis, indicating that the residuals were closer to the minimum
value. Outliers are less extreme in case of Finnish_Cluster.
Finnish Finnish_Cluster_1
using Grey Relational
Z AsympSig (2-
tailed)
Z Asymp.Sig.
(2tailed)
OLS- Actual -.268a .788 -.408a .683
Ridge – Actual -.355a .722 -.220b .826
Forward- Actual -.268a .788 -.157b .875
Backward- Actual -.152a .879 -.345a .730
MARS– Actual -.558a .577 -.282a .778
Finnish vs.
Finnish_Cluster_1
using Grey Relational Z
OLS Regression -2.022
Ridge Regression -2.104
Forward Stepwise -2.228
Backward Stepwise -2.022
MAR Splines -1.939
OLS Ridge
Regression
Forward
Stepwise
Backward
Stepwise
MAR Splines
Albrecht
MMRE 0.9 0.91 0.86 1 1.23
MdMRE 0.43 0.52 0.5 0.49 0.6
Pred(25) 37.5 37.5 41.67 37.5 29.17
Albrecht(Cluster_1) using Grey Relational clustering
MMRE 0.092 0.21 0.19 0.08 0.225
MdMRE 0.05 0.24 0.22 0.025 0.175
Pred(25) 85.71 50 57.14 85.71 57.14
11. Computer Science & Information Technology (CS & IT) 503
Figure 4. Boxplot of Absolute Residuals for Albrecht
The results of Wilcoxon signed rank conclude that no significant difference exists between the
residual median and hypothetical median, thus indicating good predictions (Table 9.)
Table 9. Wilcoxon signed rank test
Table 10. Results Mann-Whitney U Test
The results obtained using Mann-Whitney U test shown in Table 10, however didn’t prove
significant difference between the proposed approaches. This is because of the small size of the
dataset. The data set comprises of 24 projects. It was divided into two clusters one with 20
projects and other with 4 projects. Clustered regression approach was applied on 20 projects.
5.1.4 Comparison over Maxwell Dataset
The results obtained in Table 11 using the proposed clustered regression approach produced more
accurate models for Maxwell dataset also. This is evident from the Pred(25) accuracy that
improved from 38.71 % to 51.51 % using OLS regression whereas, the MMRE and MdMRE has
fallen from 0.59 to 0. 51 and from 0.38 to 0.24 respectively. For Ridge regression also, the
Pred(25) accuracy has increased from 43.55% to 60.60% which is significant improvement, the
MMRE and MdMRE has gone low from 0.54 to 0.33 and 0.3 to 0.18 respectively.
Albrecht Albrecht_Cluster_1
using Grey Relational
Z Asymp.Sig.
(2-tailed)
Z Asymp.Sig (2-
tailed)
OLS- Actual -.029a .977 -.031a .975
Ridge – Actual -.057a .954 -.031a .975
Forward - Actual -.057a .954 -.031a .975
Backward Actual -.086b .932 -.031a .975
MARS– Actual -.029b .977 -.031a .975
Albrecht vs.
Albrecht_Cluster_1
using Grey Relational Z
OLS Regression -1.014
Ridge Regression -1.014
Forward Stepwise -.943
Backward Stepwise -1.155
MAR Splines -1.108
12. 504 Computer Science & Information Technology (CS & IT)
Table 11. Prediction accuracy results(Maxwell data set)
The box plots of absolute residuals suggest that:
The medians for all regression techniques applied on Maxwell _Cluster are more close to zero, as
it is clear from the values on the Y-axis, indicating that the estimates were closer to the minimum
value. The medians are more skewed to the minimum value indicating that the predictions are
good. Outliers are few and less extreme in case of Maxwell_Cluster as compared to entire dataset.
Figure 5. Boxplot of Absolute Residuals for Maxwell
Table 12. Wilcoxon signed rank test
Results of Wilcoxon signed rank test suggest that no significant difference exists between the
residual median and hypothetical median. The results of Wilcoxon signed rank test are given in
Table 12.
Concerning the statistical test based on Mann-Whitney U(Table 13), we found no significant
difference between clustered regression approach and regression approach.
OLS Ridge
Regression
Forward
Stepwise
Backward
Stepwise
MAR
Splines
Maxwell
MMRE 0.59 0.54 0.53 0.59 0.7
MdMRE 0.38 0.3 0.32 0.33 0.46
Pred(25) 38.71 43.55 38.71 37.1 32.26
Maxwell(Cluster_1) using Grey Relational Clustering
MMRE 0.51 0.33 0.46 0.60 0.75
MdMRE 0.24 0.18 0.25 0.25 0.52
Pred(25) 51.51 60.60 48.48 48.48 24.24
Maxwell Maxwell_Cluster_1
using Grey Relational
Z Asymp. Sig
(2-tailed)
Z Asymp. Sig (2-
tailed)
OLS-Actual -249a .803 -.068a .946
Ridge – Actual -.508a .611 -.616b .538
Forward – Actual -.691a .490 -.068b .946
Backward-Actual -.831a .406 -.023b .982
MARS – Actual -.109a .913 -.205a .838
13. Computer Science & Information Technology (CS & IT) 505
Table 13. Results Mann-Whitney U Test
6. CONCLUSION
This work resolves the heterogeneity problems that exist in the datasets. In order to confirm the
effectiveness of proposed work, four different data sets have been used for software estimation.
Simulation results obtained provide a comparison of clustered regression approach over only
regression. The results confirm that the proposed feature weighted grey relational clustering
algorithm performed appreciably for software effort estimation. The statistical test based on
Mann-Whitney U, further confirmed that statistical significant difference exists between the
proposed clustered-regression models and regression models.
Further, this work can be extended by using clustered approach with different soft computing
techniques with different similarity measures for feature selection. GRA can also be analyzed for
feature subset selection Also, for enhanced efficiency in software estimation the techniques
should be applied on large data sets with different clustering algorithms.
REFERENCES
[1] Boehm, B (1981) Software Engineering Economics Englewood Cliffs, NJ, Prentice Hall.
[2] Albrecht, A.J. & Gaffney, J.R. (1983) “Software measurement, source lines of code, and development
effort prediction: a software science validation”, IEEE Transactions on Software Engineering, Vol. 9,
No. 6, pp 639-648.
[3] Putnam, Lawrence H. (1978) "A General Empirical Solution to the Macro Software Sizing and
Estimating Problem", IEEE Transactions on Software Engineering, Vol. SE-4, No. 4, pp 345-361.
[4] El-Zaghmouri, B. M. & Abu-Zanona, M. A. (2012) “Fuzzy C-Mean Clustering Algorithm
Modification and Adaption for Application”, World of Computer Science and Information
Technology Journal, ISSN: 2221-0741, Vol.2, No.1, pp 42-45.
[5] Lin, C. T. & Tsai, H. Y. (2005) “Hierarchical Clustering Analysis Based on Grey Relation grade”,
Information and Management Sciences, Vol. 16, No. 1, pp 95-105.
[6] Wong, C.C. & Chen, C.C. (1998) “Data clustering by grey relational analysis”, J. Grey Syst, Vol. 10,
No. 3, pp 281-288.
[7] Hu, Y.C., Chen, R. S., Hsu, Y. T., & Tzebg, G. H. (2002) “Grey self-organizing feature maps”, Neuro
computing, Vol. 48, No.1–4, pp 863-877.
[8] Duda, R.O., & Hart, P.E. (1973) Pattern classification and scene analysis, John Wiley &Sons, Inc.,
New York.
[9] Bezdek, J. C., Ehrlich, R. & Full, W. (1984) “FCM: The Fuzzy c- Means Clustering Algorithm”,
Computers & Geoscience, Vol. 10, No. 2-3, pp 191-203.
[10] Runkler, T.A. & Bezdek, J.C. (1999) “Alternating cluster estimation: a new tool for clustering and
function approximation”, IEEE Trans. Fuzzy Syst., Vol. 7, No. 4, pp 377-393.
[11] Pedrycz, W.(1996) “Conditional fuzzy c-means”, Pattern Recogn. Lett.,Vol. 17, No. 6, pp 625-632.
[12] Kung C. C & Su J. Y. (2007) “Affine Takagi-Sugeno fuzzy modeling algorithm by Fuzzy c-
regression models clustering with a novel cluster validity criterion”, IET Control Theory Appl., pp.
1255 – 1265.
Maxwell vs.
Maxwell_Cluster_1
using Grey Relational Z
OLS Regression -1.431
Ridge Regression -1.448
Forward Stepwise -1.482
Backward Stepwise -1.312
MAR Splines -1.806
14. 506 Computer Science & Information Technology (CS & IT)
[13] Wang, W., Wang, C., Cui, X. & Wang, A. (2008) “A clustering algorithm combine the FCM
algorithm with supervised learning normal mixture model”, ICPR 2008, pp 1-4.
[14] Lin, C. T. & Tsai, H. Y. (2005) “Hierarchical Clustering Analysis Based on Grey Relation grade”,
Information and Management Sciences, Vol. 16, No. 1, pp 95-105.
[15] Chang, K. C. & Yeh, M. F. (2005) “Grey Relational Based Analysis approach for data clustering”,
IEEE Proc.-Vis. Image Signal Process, Vol.152, No.2.
[16] Song, Q., Shepperd M., Mair C. (2005) “Using Grey Relational Analysis to Predict Software Effort
with Small Data Sets”. Proceedings of the 11th International Symposium on Software Metrics
(METRICS’05), pp 35-45.
[17] Azzeh, M., Neagu, D. & Cowling, P. I., (2010) “Fuzzy grey relational analysis for software effort
estimation”, Journal of Empirical software Engineering, Vol.15, No.1, [doi:10.1007/s10664-009-
9113-0]
[18] Deng, J. L. (1982) “Control problems of grey system”, System and Control Letters, Vol. 1, pp 288-94.
[19] Deng, J. (1989) “Introduction to Grey System theory”, The Journal of Grey System, Vol.1, No.1, pp
1-24.
[20] Deng, J. (1989) “Grey information space”, The Journal of Grey System, Vol.1, No.1, pp 103-117.
[21] MATLAB® Documentation, http://www.mathworks.com/help/techdoc/
[22] PROMISE Repository of empirical software engineering data http://promisedata.org/ repository.
[23] Jou, J. M , Chen, P. Y & Sun, J. M. (1999) “The gray prediction search algorithm for block motion
estimation”, IEEE Transactions on Circuits and Systems for Video Technology, Vol.9, No.6, pp 843-
848.
[24] Su, S. L., Su, Y. C. & Huang, J. F. (2000) “Grey-based power control for DS-CDMA cellular mobile
systems”, IEEE Transactions on Vehicular Technology, Vol.49, No.6, pp 2081-2088.
[25] Jiang, B.C, Tasi, S. L & Wang, C. C. (2002) “Machine vision-based gray relational theory applied to
IC marking inspection”, IEEE Transactions on Semiconductor Manufacturing, Vol.15, No.4, pp 531-
539.
[26] Luo, R. C, Chen, T. M & Su, K. L. (2001) “Target tracking using a hierarchical grey-fuzzy motion
decision making method”, IEEE Transactions on Systems, Man and Cybernetics, Part A, Vol.31,
No.3, pp 179-186.
[27] Wang, Y. F. (2003) “On-demand forecasting of stock prices using a real-time predictor”, IEEE
Transactions on Knowledge and Data Engineering, Vol.15, No.4, pp 1033-1037.
[28] Huang, S. J, Huang, C. L. (2000) “Control of an inverted pendulum using grey prediction model”,
IEEE Transactions on Industry Applications, Vol.36, No.2, pp 452-458.
[29] Li, G, Ruhe, J, Emran, A. Al & Richter, M.M. (2007) “A flexible method for software effort
estimation by analogy”, Empirical Software Engineering, Vol.12, pp 65-106. [doi:10.1007/s10664-
006-7552-4]
15. Computer Science & Information Technology (CS & IT) 507
Authors
Geeta Nagpal
Geeta Nagpal, Ph D in Computer Science & Engineering from National Institute of
Technology, Jalandhar, INDIA. She completed her Master’s degree in Computer
Science from Punjab Agricultural University, Ludhiana. She is presently working as
Associate Professor in the Department of Computer Science and Engineering at
National Institute of Technology, Jalandhar. Her research interests are Software
Engineering, Databases and Data mining.
Prof. Moin Uddin
Prof. Moin Uddin, Pro Vice Chancellor, Delhi Technological University, Delhi
,INDIA. He obtained his B.Sc. Engineering and M.Sc. Engineering (Electrical)
from AMU, Aligarh in 1972 and 1978 respectively. He obtained hid Ph. D degree
from University of Roorkee, Roorkee in 1994. Before Joining as the Pro Vice
Chancellor of Delhi Technological University, he was the Director of NIT,
Jalandhar for five years. He has worked as Head Electrical Engineering Department
and Dean Faculty of Engineering and Technology at Jamia Millia Islamia (Central
University) New Delhi. He supervised 25 Ph. D thesis and more than 30 M.Tech
dissertations. He has published more than 40 research papers in reputed journals
and conferences. Prof. Moin Uddin holds membership of many professional bodies. He is a Senior Member
of IEEE.
Dr. Arvinder Kaur
Dr. Arvinder Kaur, Professor, University School of IT, Guru Gobind Singh
Indraprastha University, Delhi, India. She completed her masters degree in
Computer Science from Thapar Institute of Engineering and Technology and Ph D
from Guru Gobind Singh Indraprastha University, Delhi. Prior to joining the school,
she worked with Dr. B. R. Ambedkar Regional Engineering College, Jalandhar and
Thapar Institute of Engineering and Technology. Her research interests include
Software Engineering, Object-Oriented Software Engineering, Software Metrics,
Microprocessors, Operating Systems, Artificial Intelligence, and Computer networks. She is a lifetime
member of ISTE and CSI. She is also a member of ACM. She has published 45 research papers in national
and international journals and conferences. Her paper titled, “Analysis of object oriented Metrics”was
published as a chapter in the book Innovations in Software Measurement (Shaker-Verlag, Aachen 2005).