Abstract
Mostly 5 to 15% of the women in the stage of reproduction face the disease called Polycystic Ovarian Syndrome (PCOS) which is the multifaceted, heterogeneous and complex. The long term consequences diseases like endometrial hyperplasia, type 2 diabetes mellitus and coronary disease are caused by the polycystic ovaries, chronic anovulation and hyperandrogenism are characterized with the resistance of insulin and the hypertension, abdominal obesity and dyslipidemia and hyperinsulinemia are called as Metabolic syndrome (frequent metabolic traits) The above cause the common disease called Anovulatory infertility. Computer based information along with advanced Data mining techniques are used for appropriate results. Classification is a classic data mining task, with roots in machine learning. Naïve Bayesian, Artificial Neural Network, Decision Tree, Support Vector Machines are the classification tasks in the data mining. Feature selection methods involve generation of the subset, evaluation of each subset, criteria for stopping the search and validation procedures. The characteristics of the search method used are important with respect to the time efficiency of the feature selection methods. PCA (Principle Component Analysis), Information gain Subset Evaluation, Fuzzy rough set evaluation, Correlation based Feature Selection (CFS) are some of the feature selection techniques, greedy first search, ranker etc are the search algorithms that are used in the feature selection. In this paper, a new algorithm which is based on Fuzzy neural subset evaluation and artificial neural network is proposed which reduces the task of classification and feature selection separately. This algorithm combines the neural fuzzy rough subset evaluation and artificial neural network together for the better performance than doing the tasks separately.
Keywords: ANN, SVM, PCA, CFS
Propose a Enhanced Framework for Prediction of Heart DiseaseIJERA Editor
Heart disease diagnosis requires more experience and it is a complex task. The Heart MRI, ECG and Stress Test etc are the numbers of medical tests are prescribed by the doctor for examining the heart disease and it is the way of tradition in the prediction of heart disease. Today world, the hidden information of the huge amount of health care data is contained by the health care industry. The effective decisions are made by means of this hidden information. For appropriate results, the advanced data mining techniques with the information which is based on the computer are used. In any empirical sciences, for the inference and categorisation, the new mathematical techniques to be used called Artificial neural networks (ANNs) it also be used to the modelling of the real neural networks. Acting, Wanting, knowing, remembering, perceiving, thinking and inferring are the nature of mental phenomena and these can be understand by using the theory of ANN. The problem of probability and induction can be arised for the inference and classification because these are the powerful instruments of ANN. In this paper, the classification techniques like Naive Bayes Classification algorithm and Artificial Neural Networks are used to classify the attributes in the given data set. The attribute filtering techniques like PCA (Principle Component Analysis) filtering and Information Gain Attribute Subset Evaluation technique for feature selection in the given data set to predict the heart disease symptoms. A new framework is proposed which is based on the above techniques, the framework will take the input dataset and fed into the feature selection techniques block, which selects any one techniques that gives the least number of attributes and then classification task is done using two algorithms, the same attributes that are selected by two classification task is taken for the prediction of heart disease. This framework consumes the time for predicting the symptoms of heart disease which make the user to know the important attributes based on the proposed framework.
Medical informatics growth can be observed now days. Advancement in different medical fields
discovers the various critical diseases and provides the guidelines for their cure. This has been possible
only because of well heeled medical databases as well as automation of data analysis process. Towards
this analysis process lots of learning and intelligence is required, the data mining techniques provides the
basis for that and various data mining techniques are available like Decision tree Induction, Rule Based
Classification or mining, Support vector machine, Stochastic classification, Logistic regression, Naïve
bayes, Artificial Neural Network & Fuzzy Logic, Genetic Algorithms. This paper provides the basic of
data mining with their effective techniques availability in medical sciences & reveals the efforts done on
medical databases using data mining techniques for human disease diagnosis.
Internet becomes the most popular surfing environment which increases the
service oriented data size. As the data size grows, finding and retrieving the most
similar data from the large volume of data would become more difficult task. This
problem is focused in the various research methods, which attempts to cluster the
large volume of data. In the existing research method Clustering-based Collaborative
Filtering approach (ClubCF) is introduced whose main goal is to cluster the similar
kind of data together, so that retrieval time cost can be reduced considerably.
However, existing research methods cannot find the similar reviews accurately which
needs to be focused more for efficient and accurate recommendation system. This is
ensured in the proposed research method by introducing the novel research technique
namely Modified Collaborative Filtering and Clustering with Regression (MoCFCR).
In this research method, initially k means algorithm is used to cluster the similar
movie reviewer together, so that recommendation process can be done in the easier
way. In order to handle the large volume of data this research work adapts the map
reduce framework which will divide the entire data into subsets which will assigned
on separate nodes with individual key values. After clustering, the clustered outcome
is merged together using inverted index procedure in which similarity between movies
would be calculated. Here collaborative filtering is applied to remove the movies that
are not relevant to input. Finally recommendations of movies are made in the accurate
way by using the logistic regression method. The overall evaluation of the proposed
research method is done in Hadoop from which it can be proved that the proposed
research technique can lead to provide better outcome than the existing research
techniques
An efficient information retrieval ontology system based indexing for contexteSAT Journals
Abstract Many of the research or development projects are constructed and vast type of artifacts are released such as article, patent, report of research, conference papers, journal papers, experimental data and so on. The searching of the particular context through the keywords from the repository is not an easy task because the earliest system the problem of huge recalls with low precision. This paper challenges to construct a search algorithm based on the ontology to retrieve the relevant contexts. Ontology's are great knowledge of retrieving the context. In this paper, we utilize the WordNet ontology to retrieve the relevant contexts from the document repository. It is very difficult to retrieve the relevant context in its original format since we use the pre-processing step, which helps to retrieve context. The pre-processing step includes two major steps first one is stop word removal and the second one is stemming process. The outcome of the pre-processing step is indexing consist of important keywords and their corresponding keywords. When the user enter the keyword to the system, the ontology makes the several steps to make the refine keywords. Finally, the refine keywords are matched with index and relevant contexts are retrieved. The experimentation process is carried out with the help of different set of contexts to achieve the results and the performance analysis of the proposed approach is estimated by the evaluation metrics like precision, recall and F-measure. Keywords— Ontologies; WordNet; contexts; stemming; indexing.
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the
availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection
based ensemble learning models is to classify the high dimensional data with high computational efficiency
and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray
datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
Propose a Enhanced Framework for Prediction of Heart DiseaseIJERA Editor
Heart disease diagnosis requires more experience and it is a complex task. The Heart MRI, ECG and Stress Test etc are the numbers of medical tests are prescribed by the doctor for examining the heart disease and it is the way of tradition in the prediction of heart disease. Today world, the hidden information of the huge amount of health care data is contained by the health care industry. The effective decisions are made by means of this hidden information. For appropriate results, the advanced data mining techniques with the information which is based on the computer are used. In any empirical sciences, for the inference and categorisation, the new mathematical techniques to be used called Artificial neural networks (ANNs) it also be used to the modelling of the real neural networks. Acting, Wanting, knowing, remembering, perceiving, thinking and inferring are the nature of mental phenomena and these can be understand by using the theory of ANN. The problem of probability and induction can be arised for the inference and classification because these are the powerful instruments of ANN. In this paper, the classification techniques like Naive Bayes Classification algorithm and Artificial Neural Networks are used to classify the attributes in the given data set. The attribute filtering techniques like PCA (Principle Component Analysis) filtering and Information Gain Attribute Subset Evaluation technique for feature selection in the given data set to predict the heart disease symptoms. A new framework is proposed which is based on the above techniques, the framework will take the input dataset and fed into the feature selection techniques block, which selects any one techniques that gives the least number of attributes and then classification task is done using two algorithms, the same attributes that are selected by two classification task is taken for the prediction of heart disease. This framework consumes the time for predicting the symptoms of heart disease which make the user to know the important attributes based on the proposed framework.
Medical informatics growth can be observed now days. Advancement in different medical fields
discovers the various critical diseases and provides the guidelines for their cure. This has been possible
only because of well heeled medical databases as well as automation of data analysis process. Towards
this analysis process lots of learning and intelligence is required, the data mining techniques provides the
basis for that and various data mining techniques are available like Decision tree Induction, Rule Based
Classification or mining, Support vector machine, Stochastic classification, Logistic regression, Naïve
bayes, Artificial Neural Network & Fuzzy Logic, Genetic Algorithms. This paper provides the basic of
data mining with their effective techniques availability in medical sciences & reveals the efforts done on
medical databases using data mining techniques for human disease diagnosis.
Internet becomes the most popular surfing environment which increases the
service oriented data size. As the data size grows, finding and retrieving the most
similar data from the large volume of data would become more difficult task. This
problem is focused in the various research methods, which attempts to cluster the
large volume of data. In the existing research method Clustering-based Collaborative
Filtering approach (ClubCF) is introduced whose main goal is to cluster the similar
kind of data together, so that retrieval time cost can be reduced considerably.
However, existing research methods cannot find the similar reviews accurately which
needs to be focused more for efficient and accurate recommendation system. This is
ensured in the proposed research method by introducing the novel research technique
namely Modified Collaborative Filtering and Clustering with Regression (MoCFCR).
In this research method, initially k means algorithm is used to cluster the similar
movie reviewer together, so that recommendation process can be done in the easier
way. In order to handle the large volume of data this research work adapts the map
reduce framework which will divide the entire data into subsets which will assigned
on separate nodes with individual key values. After clustering, the clustered outcome
is merged together using inverted index procedure in which similarity between movies
would be calculated. Here collaborative filtering is applied to remove the movies that
are not relevant to input. Finally recommendations of movies are made in the accurate
way by using the logistic regression method. The overall evaluation of the proposed
research method is done in Hadoop from which it can be proved that the proposed
research technique can lead to provide better outcome than the existing research
techniques
An efficient information retrieval ontology system based indexing for contexteSAT Journals
Abstract Many of the research or development projects are constructed and vast type of artifacts are released such as article, patent, report of research, conference papers, journal papers, experimental data and so on. The searching of the particular context through the keywords from the repository is not an easy task because the earliest system the problem of huge recalls with low precision. This paper challenges to construct a search algorithm based on the ontology to retrieve the relevant contexts. Ontology's are great knowledge of retrieving the context. In this paper, we utilize the WordNet ontology to retrieve the relevant contexts from the document repository. It is very difficult to retrieve the relevant context in its original format since we use the pre-processing step, which helps to retrieve context. The pre-processing step includes two major steps first one is stop word removal and the second one is stemming process. The outcome of the pre-processing step is indexing consist of important keywords and their corresponding keywords. When the user enter the keyword to the system, the ontology makes the several steps to make the refine keywords. Finally, the refine keywords are matched with index and relevant contexts are retrieved. The experimentation process is carried out with the help of different set of contexts to achieve the results and the performance analysis of the proposed approach is estimated by the evaluation metrics like precision, recall and F-measure. Keywords— Ontologies; WordNet; contexts; stemming; indexing.
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the
availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection
based ensemble learning models is to classify the high dimensional data with high computational efficiency
and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray
datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
Novel modelling of clustering for enhanced classification performance on gene...IJECEIAES
Gene expression data is popularized for its capability to disclose various disease conditions. However, the conventional procedure to extract gene expression data itself incorporates various artifacts that offer challenges in diagnosis a complex disease indication and classification like cancer. Review of existing research approaches indicates that classification approaches are few to proven to be standard with respect to higher accuracy and applicable to gene expression data apart from unaddresed problems of computational complexity. Therefore, the proposed manuscript introduces a novel and simplified model capable using Graph Fourier Transform, Eigen Value and vector for offering better classification performance considering case study of microarray database, which is one typical example of gene expressiondata. The study outcome shows that proposed system offers comparatively better accuracy and reduced computational complexity with the existing clustering approaches.
Abstract In this paper, the concept of data mining was summarized and its significance towards its methodologies was illustrated. The data mining based on Neural Network and Genetic Algorithm is researched in detail and the key technology and ways to achieve the data mining on Neural Network and Genetic Algorithm are also surveyed. This paper also conducts a formal review of the area of rule extraction from ANN and GA. Keywords: Data Mining, Neural Network, Genetic Algorithm, Rule Extraction.
An efficient feature selection algorithm for health care data analysisjournalBEEI
Diabete is a silent killer, which will slowly kill the person if it goes undetected. The existing system which uses F-score method and K-means clustering of checking whether a person has diabetes or not are 100% accurate, and anything which isn't a 100% is not acceptable in the medical field, as it could cost the lives of many people. Our proposed system aims at using some of the best features of the existing algorithms to predict diabetes, and combine these and based on these features; This research work turns them into a novel algorithm, which will be 100% accurate in its prediction. With the surge in technological advancements, we can use data mining to predict when a person would be diagnosed with diabetes. Specifically, we analyze the best features of chi-square algorithm and advanced clustering algorithm (ACA). This research work is done using the Pima Indian Diabetes dataset provided by National Institutes of Diabetes and Digestive and Kidney Diseases. Using classification theorems and methods we can consider different factors like age, BMI, blood pressure and the importance given to these attributes overall, and singles these attributes out, and use them for the prediction of diabetes.
Introduction to feature subset selection methodIJSRD
Data Mining is a computational progression to ascertain patterns in hefty data sets. It has various important techniques and one of them is Classification which is receiving great attention recently in the database community. Classification technique can solve several problems in different fields like medicine, industry, business, science. PSO is based on social behaviour for optimization problem. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Rough Set Theory (RST) is a mathematical tool which deals with the uncertainty and vagueness of the decision systems.
Delineation of techniques to implement on the enhanced proposed model using d...ijdms
In post genomic era with the advent of new technologies a huge amount of complex molecular data are
generated with high throughput. The management of this biological data is definitely a challenging task
due to complexity and heterogeneity of data for discovering new knowledge. Issues like managing noisy
and incomplete data are needed to be dealt with. Use of data mining in biological domain has made its
inventory success. Discovering new knowledge from the biological data is a major challenge in data
mining technique. The novelty of the proposed model is its combined use of intelligent techniques to classify
the protein sequence faster and efficiently. Use of FFT, fuzzy classifier, String weighted algorithm, gram
encoding method, neural network model and rough set classifier in a single model and in an appropriate
place can enhance the quality of the classification system .Thus the primary challenge is to identify and
classify the large protein sequences in a very fast and easy but intellectual way to decrease the time
complexity and space complexity.
Software Bug Detection Algorithm using Data mining TechniquesAM Publications
The main aim of software development is to develop high quality software and high quality software is
developed using enormous amount of software engineering data. The software engineering data can be used to gain
empirically based understanding of software development. The meaning full information can be extracted using
various data mining techniques. As Data Mining for Secure Software Engineering improves software productivity and
quality, software engineers are increasingly applying data mining algorithms to various software engineering tasks.
However mining software engineering data poses several challenges, requiring various algorithms to effectively mine
sequences, graphs and text from such data. Software engineering data includes code bases, execution traces,
historical code changes, mailing lists and bug data bases. They contains a wealth of information about a projectsstatus,
progress and evolution. Using well established data mining techniques, practitioners and researchers can
explore the potential of this valuable data in order to better manage their projects and do produce higher-quality
software systems that are delivered on time and within budget
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...csandit
Feature selection is a problem closely related to dimensionality reduction. A commonly used
approach in feature selection is ranking the individual features according to some criteria and
then search for an optimal feature subset based on an evaluation criterion to test the optimality.
The objective of this work is to predict more accurately the presence of Learning Disability
(LD) in school-aged children with reduced number of symptoms. For this purpose, a novel
hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The approach follows
a ranking of the symptoms of LD according to their importance in the data domain. Each
symptoms significance or priority values reflect its relative importance to predict LD among the
various cases. Then by eliminating least significant features one by one and evaluating the
feature subset at each stage of the process, an optimal feature subset is generated. The
experimental results shows the success of the proposed method in removing redundant
attributes efficiently from the LD dataset without sacrificing the classification performance.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Comparative study of various supervisedclassification methodsforanalysing def...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Unsupervised Feature Selection Based on the Distribution of Features Attribut...Waqas Tariq
Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature reductions methods have been developed to reduce the dimensionality of the data in order to simplify the calculation analysis in various applications such as text categorization, signal processing, image retrieval, gene expressions and etc. Among feature reduction techniques, feature selection is one the most popular methods due to the preservation of the original features. However, most of the current feature selection methods do not have a good performance when fed on imbalanced data sets which are pervasive in real world applications. In this paper, we propose a new unsupervised feature selection method attributed to imbalanced data sets, which will remove redundant features from the original feature space based on the distribution of features. To show the effectiveness of the proposed method, popular feature selection methods have been implemented and compared. Experimental results on the several imbalanced data sets, derived from UCI repository database, illustrate the effectiveness of our proposed methods in comparison with the other compared methods in terms of both accuracy and the number of selected features.
Lean: Scatter Diagram Tools offers understanding about scatter diagrams, also know as scatter plots. After this lesson, learners should be able to:
At the end of this lesson users should be able to
Articulate the usage of a scatter diagram
Explain how to develop a scatter diagram
Demonstrate the development of a scatter diagram
Scatter diagram is the graphical presentation of relationship between two variables. Scatter diagram is an important tool out of 7 fundamental tools of quality control. Scatter diagram helps to confirm the degree of relationship between cause and effect. Here cause is an independent variable and effect is dependent variable. Scatter diagram is an important statistical tool to analyze the relationship of two variables. To create the scatter diagram take the values of independent variable on X-Axis where as the dependent variable is taken on Y-Axis. Plot the intersection points of X and Y on the graph. Draw a straight line passing through all the points. Analyze the pattern of the points. For different degree of relationship different pattern of scatter diagram is formed. If Y increases as X increases and data points are on the straight line then there is perfect positive correlation. If Y decreases with increase of X and data points are on the straight line then there is perfect negative correlation. But when data is scattered all over the graph then there is zero correlation.
Novel modelling of clustering for enhanced classification performance on gene...IJECEIAES
Gene expression data is popularized for its capability to disclose various disease conditions. However, the conventional procedure to extract gene expression data itself incorporates various artifacts that offer challenges in diagnosis a complex disease indication and classification like cancer. Review of existing research approaches indicates that classification approaches are few to proven to be standard with respect to higher accuracy and applicable to gene expression data apart from unaddresed problems of computational complexity. Therefore, the proposed manuscript introduces a novel and simplified model capable using Graph Fourier Transform, Eigen Value and vector for offering better classification performance considering case study of microarray database, which is one typical example of gene expressiondata. The study outcome shows that proposed system offers comparatively better accuracy and reduced computational complexity with the existing clustering approaches.
Abstract In this paper, the concept of data mining was summarized and its significance towards its methodologies was illustrated. The data mining based on Neural Network and Genetic Algorithm is researched in detail and the key technology and ways to achieve the data mining on Neural Network and Genetic Algorithm are also surveyed. This paper also conducts a formal review of the area of rule extraction from ANN and GA. Keywords: Data Mining, Neural Network, Genetic Algorithm, Rule Extraction.
An efficient feature selection algorithm for health care data analysisjournalBEEI
Diabete is a silent killer, which will slowly kill the person if it goes undetected. The existing system which uses F-score method and K-means clustering of checking whether a person has diabetes or not are 100% accurate, and anything which isn't a 100% is not acceptable in the medical field, as it could cost the lives of many people. Our proposed system aims at using some of the best features of the existing algorithms to predict diabetes, and combine these and based on these features; This research work turns them into a novel algorithm, which will be 100% accurate in its prediction. With the surge in technological advancements, we can use data mining to predict when a person would be diagnosed with diabetes. Specifically, we analyze the best features of chi-square algorithm and advanced clustering algorithm (ACA). This research work is done using the Pima Indian Diabetes dataset provided by National Institutes of Diabetes and Digestive and Kidney Diseases. Using classification theorems and methods we can consider different factors like age, BMI, blood pressure and the importance given to these attributes overall, and singles these attributes out, and use them for the prediction of diabetes.
Introduction to feature subset selection methodIJSRD
Data Mining is a computational progression to ascertain patterns in hefty data sets. It has various important techniques and one of them is Classification which is receiving great attention recently in the database community. Classification technique can solve several problems in different fields like medicine, industry, business, science. PSO is based on social behaviour for optimization problem. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Rough Set Theory (RST) is a mathematical tool which deals with the uncertainty and vagueness of the decision systems.
Delineation of techniques to implement on the enhanced proposed model using d...ijdms
In post genomic era with the advent of new technologies a huge amount of complex molecular data are
generated with high throughput. The management of this biological data is definitely a challenging task
due to complexity and heterogeneity of data for discovering new knowledge. Issues like managing noisy
and incomplete data are needed to be dealt with. Use of data mining in biological domain has made its
inventory success. Discovering new knowledge from the biological data is a major challenge in data
mining technique. The novelty of the proposed model is its combined use of intelligent techniques to classify
the protein sequence faster and efficiently. Use of FFT, fuzzy classifier, String weighted algorithm, gram
encoding method, neural network model and rough set classifier in a single model and in an appropriate
place can enhance the quality of the classification system .Thus the primary challenge is to identify and
classify the large protein sequences in a very fast and easy but intellectual way to decrease the time
complexity and space complexity.
Software Bug Detection Algorithm using Data mining TechniquesAM Publications
The main aim of software development is to develop high quality software and high quality software is
developed using enormous amount of software engineering data. The software engineering data can be used to gain
empirically based understanding of software development. The meaning full information can be extracted using
various data mining techniques. As Data Mining for Secure Software Engineering improves software productivity and
quality, software engineers are increasingly applying data mining algorithms to various software engineering tasks.
However mining software engineering data poses several challenges, requiring various algorithms to effectively mine
sequences, graphs and text from such data. Software engineering data includes code bases, execution traces,
historical code changes, mailing lists and bug data bases. They contains a wealth of information about a projectsstatus,
progress and evolution. Using well established data mining techniques, practitioners and researchers can
explore the potential of this valuable data in order to better manage their projects and do produce higher-quality
software systems that are delivered on time and within budget
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...csandit
Feature selection is a problem closely related to dimensionality reduction. A commonly used
approach in feature selection is ranking the individual features according to some criteria and
then search for an optimal feature subset based on an evaluation criterion to test the optimality.
The objective of this work is to predict more accurately the presence of Learning Disability
(LD) in school-aged children with reduced number of symptoms. For this purpose, a novel
hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The approach follows
a ranking of the symptoms of LD according to their importance in the data domain. Each
symptoms significance or priority values reflect its relative importance to predict LD among the
various cases. Then by eliminating least significant features one by one and evaluating the
feature subset at each stage of the process, an optimal feature subset is generated. The
experimental results shows the success of the proposed method in removing redundant
attributes efficiently from the LD dataset without sacrificing the classification performance.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Comparative study of various supervisedclassification methodsforanalysing def...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Unsupervised Feature Selection Based on the Distribution of Features Attribut...Waqas Tariq
Since dealing with high dimensional data is computationally complex and sometimes even intractable, recently several feature reductions methods have been developed to reduce the dimensionality of the data in order to simplify the calculation analysis in various applications such as text categorization, signal processing, image retrieval, gene expressions and etc. Among feature reduction techniques, feature selection is one the most popular methods due to the preservation of the original features. However, most of the current feature selection methods do not have a good performance when fed on imbalanced data sets which are pervasive in real world applications. In this paper, we propose a new unsupervised feature selection method attributed to imbalanced data sets, which will remove redundant features from the original feature space based on the distribution of features. To show the effectiveness of the proposed method, popular feature selection methods have been implemented and compared. Experimental results on the several imbalanced data sets, derived from UCI repository database, illustrate the effectiveness of our proposed methods in comparison with the other compared methods in terms of both accuracy and the number of selected features.
Lean: Scatter Diagram Tools offers understanding about scatter diagrams, also know as scatter plots. After this lesson, learners should be able to:
At the end of this lesson users should be able to
Articulate the usage of a scatter diagram
Explain how to develop a scatter diagram
Demonstrate the development of a scatter diagram
Scatter diagram is the graphical presentation of relationship between two variables. Scatter diagram is an important tool out of 7 fundamental tools of quality control. Scatter diagram helps to confirm the degree of relationship between cause and effect. Here cause is an independent variable and effect is dependent variable. Scatter diagram is an important statistical tool to analyze the relationship of two variables. To create the scatter diagram take the values of independent variable on X-Axis where as the dependent variable is taken on Y-Axis. Plot the intersection points of X and Y on the graph. Draw a straight line passing through all the points. Analyze the pattern of the points. For different degree of relationship different pattern of scatter diagram is formed. If Y increases as X increases and data points are on the straight line then there is perfect positive correlation. If Y decreases with increase of X and data points are on the straight line then there is perfect negative correlation. But when data is scattered all over the graph then there is zero correlation.
Go through the seven quality tools training quiz and compare, how much you have learnt from this online training of 7QC tools? The quiz has 15 multiple choice questions based on seven quality tools. Choose one answer out of the given choices for every question write these choices on a paper. After completing the quiz compare yourself with answer key in the end of quiz. Find yourself where you are in learning of 7 QC Tools. If you find your performance is not up to the mark then go again for the training of seven QC tools. You may do it as many times as you want. Improve your performance every time you go through the training.
This Presentation will give you an overview about Artificial Intelligence : definition, advantages , disadvantages , benefits , applications .
We hope it to be useful .
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
Data mining environment produces a large amount of data, that need to be
analyses, pattern have to be extracted from that to gain knowledge. In this new period with
rumble of data both ordered and unordered, by using traditional databases and architectures, it
has become difficult to process, manage and analyses patterns. To gain knowledge about the
Big Data a proper architecture should be understood. Classification is an important data mining
technique with broad applications to classify the various kinds of data used in nearly every
field of our life. Classification is used to classify the item according to the features of the item
with respect to the predefined set of classes. This paper provides an inclusive survey of
different classification algorithms and put a light on various classification algorithms including
j48, C4.5, k-nearest neighbor classifier, Naive Bayes, SVM etc., using random concept.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Identification of important features and data mining classification technique...IJECEIAES
Employees absenteeism at the work costs organizations billions a year. Prediction of employees’ absenteeism and the reasons behind their absence help organizations in reducing expenses and increasing productivity. Data mining turns the vast volume of human resources data into information that can help in decision-making and prediction. Although the selection of features is a critical step in data mining to enhance the efficiency of the final prediction, it is not yet known which method of feature selection is better. Therefore, this paper aims to compare the performance of three well-known feature selection methods in absenteeism prediction, which are relief-based feature selection, correlation-based feature selection and information-gain feature selection. In addition, this paper aims to find the best combination of feature selection method and data mining technique in enhancing the absenteeism prediction accuracy. Seven classification techniques were used as the prediction model. Additionally, cross-validation approach was utilized to assess the applied prediction models to have more realistic and reliable results. The used dataset was built at a courier company in Brazil with records of absenteeism at work. Regarding experimental results, correlationbased feature selection surpasses the other methods through the performance measurements. Furthermore, bagging classifier was the best-performing data mining technique when features were selected using correlation-based feature selection with an accuracy rate of (92%).
A large number of techniques has been developed so far to tell the diversity of machine learning. Machine learning is categorized into supervised, unsupervised and reinforcement learning .Every instance in given data-set used by Machine learning algorithms is represented same set of features .On basis of label of instances it is divided into category. In this review paper our main focus is on Supervised, unsupervised learning techniques and its performance parameters.
A Survey on Classification of Feature Selection Strategiesijtsrd
Feature selection is an important part of machine learning. The Feature selection refers to the process of reducing the inputs for processing and analysis, or of finding the most meaningful inputs. A related term, feature engineering (or feature extraction), refers to the process of extracting useful information or features from existing data. Mining of particular information related to a concept is done on the basis of the feature of the data. The accessing of these features hence for data retrieval can be termed as the feature extraction mechanism. Different type of feature extraction methods is being used. In this paper, the different feature selection methodologies are examined in terms of need and method adopted for feature selection. The three types of method are mainly available, such as Shannons Entropy, Bayesian with K2 Prior and Bayesian Dirichlet with uniform prior (default). The objectives of this survey paper is to identify the existing contribution made by using their above mentioned algorithms and the result obtained. R. Pradeepa | K. Palanivel"A Survey on Classification of Feature Selection Strategies" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-2 , February 2018, URL: http://www.ijtsrd.com/papers/ijtsrd9632.pdf http://www.ijtsrd.com/computer-science/data-miining/9632/a-survey-on-classification-of-feature-selection-strategies/r-pradeepa
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Abstract Learning Analytics by nature relies on computational information processing activities intended to extract from raw data some interesting aspects that can be used to obtain insights into the behaviors of learners, the design of learning experiences, etc. There is a large variety of computational techniques that can be employed, all with interesting properties, but it is the interpretation of their results that really forms the core of the analytics process. As a rising subject, data mining and business intelligence are playing an increasingly important role in the decision support activity of every walk of life. The Variance Rover System (VRS) mainly focused on the large data sets obtained from online web visiting and categorizing this into clusters according some similarity and the process of predicting customer behavior and selecting actions to influence that behavior to benefit the company, so as to take optimized and beneficial decisions of business expansion. Keywords: Analytics, Business intelligence, Clustering, Data Mining, Standard K-means, Optimized K-means
Evaluating the efficiency of rule techniques for file classificationeSAT Journals
Abstract Text mining refers to the process of deriving high quality information from text. It is also known as knowledge discovery from text (KDT), deals with the machine supported analysis of text. It is used in various areas such as information retrieval, marketing, information extraction, natural language processing, document similarity, and so on. Document Similarity is one of the important techniques in text mining. In document similarity, the first and foremost step is to classify the files based on their category. In this research work, various classification rule techniques are used to classify the computer files based on their extensions. For example, the extension of computer files may be pdf, doc, ppt, xls, and so on. There are several algorithms for rule classifier such as decision table, JRip, Ridor, DTNB, NNge, PART, OneR and ZeroR. In this research work, three classification algorithms namely decision table, DTNB and OneR classifiers are used for performing classification of computer files based on their extension. The results produced by these algorithms are analyzed by using the performance factors classification accuracy and error rate. From the experimental results, DTNB proves to be more efficient than other two techniques. Index Terms: Data mining, Text mining, Classification, Decision table, DTNB, OneR
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
Large amount of data have been stored and manipulated using various database
technologies. Processing all the attributes for the particular means is the difficult task. To avoid such
difficulties, feature selection process is processed.In this paper,we are collect a eight various benchmark
datasets from UCI repository.Feature selection process is carried out using fuzzy entropy based
relevance measure algorithm and follows three selection strategies like Mean selection strategy,Half
selection strategy and Neural network for threshold selection strategy. After the features are selected,
they are evaluated using Radial Basis Function (RBF) network,Stacking,Bagging,AdaBoostM1 and Antminer
classification methodologies.The test results depicts that Neural network for threshold selection
strategy works well in selecting features and Ant-miner methodology works best in bringing out better
accuracy with selected feature than processing with original dataset.The obtained result of this
experiment shows that clearly the Ant-miner is superiority than other classifiers.Thus, this proposed Antminer
algorithm could be a more suitable method for producing good results with fewer features than
the original datasets.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Data Mining Framework for Network Intrusion Detection using Efficient TechniquesIJAEMSJORNAL
The implementation measures the classification accuracy on benchmark datasets after combining SIS and ANNs. In order to put a number on the gains made by using SIS as a strategic tool in data mining, extensive experiments and analyses are carried out. The predicted results of this investigation will have implications for both theoretical and applied settings. Predictive models in a wide variety of disciplines may benefit from the enhanced classification accuracy enabled by SIS inside ANNs. An invaluable resource for scholars and practitioners in the fields of AI and data mining, this study adds to the continuing conversation about how to maximize the efficacy of machine learning methods.
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...ijsc
As the size of the biomedical databases are growing day by day, finding an essential features in the disease prediction have become more complex due to high dimensionality and sparsity problems. Also, due to the availability of a large number of micro-array datasets in the biomedical repositories, it is difficult to
analyze, predict and interpret the feature information using the traditional feature selection based classification models. Most of the traditional feature selection based classification algorithms have computational issues such as dimension reduction, uncertainty and class imbalance on microarray datasets. Ensemble classifier is one of the scalable models for extreme learning machine due to its high efficiency, the fast processing speed for real-time applications. The main objective of the feature selection based ensemble learning models is to classify the high dimensional data with high computational efficiency and high true positive rate on high dimensional datasets. In this proposed model an optimized Particle swarm optimization (PSO) based Ensemble classification model was developed on high dimensional microarray datasets. Experimental results proved that the proposed model has high computational efficiency compared to the traditional feature selection based classification models in terms of accuracy , true positive rate and error rate are concerned.
A Survey and Comparative Study of Filter and Wrapper Feature Selection Techni...theijes
Feature selection is considered as a problem of global combinatorial optimization in machine learning, which reduces the number of features, removes irrelevant, noisy and redundant data. However, identification of useful features from hundreds or even thousands of related features is not an easy task. Selecting relevant genes from microarray data becomes even more challenging owing to the high dimensionality of features, multiclass categories involved and the usually small sample size. In order to improve the prediction accuracy and to avoid incomprehensibility due to the number of features different feature selection techniques can be implemented. This survey classifies and analyzes different approaches, aiming to not only provide a comprehensive presentation but also discuss challenges and various performance parameters. The techniques are generally classified into three; filter, wrapper and hybrid.
Classification problems specified in high dimensional data with smallnumber of observation are generally becoming common in specific microarray data. In the time of last two periods of years, manyefficient classification standard models and also Feature Selection (FS) algorithm which isalso referred as FS technique have basically been proposed for higher prediction accuracies. Although, the outcome of FS algorithm related to predicting accuracy is going to be unstable over the variations in considered trainingset, in high dimensional data. In this paperwe present a latest evaluation measure Q-statistic that includes the stability of the selected feature subset in inclusion to prediction accuracy. Then we are going to propose the standard Booster of a FS algorithm that boosts the basic value of the preferred Q-statistic of the algorithm applied. Therefore study on synthetic data and 14 microarray data sets shows that Booster boosts not only the value of Q-statistics but also the prediction accuracy of the algorithm applied.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Mechanical properties of hybrid fiber reinforced concrete for pavementseSAT Journals
Abstract
The effect of addition of mono fibers and hybrid fibers on the mechanical properties of concrete mixture is studied in the present
investigation. Steel fibers of 1% and polypropylene fibers 0.036% were added individually to the concrete mixture as mono fibers and
then they were added together to form a hybrid fiber reinforced concrete. Mechanical properties such as compressive, split tensile and
flexural strength were determined. The results show that hybrid fibers improve the compressive strength marginally as compared to
mono fibers. Whereas, hybridization improves split tensile strength and flexural strength noticeably.
Keywords:-Hybridization, mono fibers, steel fiber, polypropylene fiber, Improvement in mechanical properties.
Material management in construction – a case studyeSAT Journals
Abstract
The objective of the present study is to understand about all the problems occurring in the company because of improper application
of material management. In construction project operation, often there is a project cost variance in terms of the material, equipments,
manpower, subcontractor, overhead cost, and general condition. Material is the main component in construction projects. Therefore,
if the material management is not properly managed it will create a project cost variance. Project cost can be controlled by taking
corrective actions towards the cost variance. Therefore a methodology is used to diagnose and evaluate the procurement process
involved in material management and launch a continuous improvement was developed and applied. A thorough study was carried
out along with study of cases, surveys and interviews to professionals involved in this area. As a result, a methodology for diagnosis
and improvement was proposed and tested in selected projects. The results obtained show that the main problem of procurement is
related to schedule delays and lack of specified quality for the project. To prevent this situation it is often necessary to dedicate
important resources like money, personnel, time, etc. To monitor and control the process. A great potential for improvement was
detected if state of the art technologies such as, electronic mail, electronic data interchange (EDI), and analysis were applied to the
procurement process. These helped to eliminate the root causes for many types of problems that were detected.
Managing drought short term strategies in semi arid regions a case studyeSAT Journals
Abstract
Drought management needs multidisciplinary action. Interdisciplinary efforts among the experts in various fields of the droughts
prone areas are helpful to achieve tangible and permanent solution for this recurring problem. The Gulbarga district having the total
area around 16, 240 sq.km, and accounts 8.45 per cent of the Karnataka state area. The district has been situated with latitude 17º 19'
60" North and longitude of 76 º 49' 60" east. The district is situated entirely on the Deccan plateau positioned at a height of 300 to
750 m above MSL. Sub-tropical, semi-arid type is one among the drought prone districts of Karnataka State. The drought
management is very important for a district like Gulbarga. In this paper various short term strategies are discussed to mitigate the
drought condition in the district.
Keywords: Drought, South-West monsoon, Semi-Arid, Rainfall, Strategies etc.
Life cycle cost analysis of overlay for an urban road in bangaloreeSAT Journals
Abstract
Pavements are subjected to severe condition of stresses and weathering effects from the day they are constructed and opened to traffic
mainly due to its fatigue behavior and environmental effects. Therefore, pavement rehabilitation is one of the most important
components of entire road systems. This paper highlights the design of concrete pavement with added mono fibers like polypropylene,
steel and hybrid fibres for a widened portion of existing concrete pavement and various overlay alternatives for an existing
bituminous pavement in an urban road in Bangalore. Along with this, Life cycle cost analyses at these sections are done by Net
Present Value (NPV) method to identify the most feasible option. The results show that though the initial cost of construction of
concrete overlay is high, over a period of time it prove to be better than the bituminous overlay considering the whole life cycle cost.
The economic analysis also indicates that, out of the three fibre options, hybrid reinforced concrete would be economical without
compromising the performance of the pavement.
Keywords: - Fatigue, Life cycle cost analysis, Net Present Value method, Overlay, Rehabilitation
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materialseSAT Journals
Abstract
The issue of growing demand on our nation’s roadways over that past couple of decades, decreasing budgetary funds, and the need to
provide a safe, efficient, and cost effective roadway system has led to a dramatic increase in the need to rehabilitate our existing
pavements and the issue of building sustainable road infrastructure in India. With these emergency of the mentioned needs and this
are today’s burning issue and has become the purpose of the study.
In the present study, the samples of existing bituminous layer materials were collected from NH-48(Devahalli to Hassan) site.The
mixtures were designed by Marshall Method as per Asphalt institute (MS-II) at 20% and 30% Reclaimed Asphalt Pavement (RAP).
RAP material was blended with virgin aggregate such that all specimens tested for the, Dense Bituminous Macadam-II (DBM-II)
gradation as per Ministry of Roads, Transport, and Highways (MoRT&H) and cost analysis were carried out to know the economics.
Laboratory results and analysis showed the use of recycled materials showed significant variability in Marshall Stability, and the
variability increased with the increase in RAP content. The saving can be realized from utilization of recycled materials as per the
methodology, the reduction in the total cost is 19%, 30%, comparing with the virgin mixes.
Keywords: Reclaimed Asphalt Pavement, Marshall Stability, MS-II, Dense Bituminous Macadam-II
Laboratory investigation of expansive soil stabilized with natural inorganic ...eSAT Journals
Abstract
Soil stabilization has proven to be one of the oldest techniques to improve the soil properties. Literature review conducted revealed
that uses of natural inorganic stabilizers are found to be one of the best options for soil stabilization. In this regard an attempt has
been made to evaluate the influence of RBI-81 stabilizer on properties of black cotton soil through laboratory investigations. Black
cotton soil with varying percentages of RBI-81 viz., 0, 0.5, 1, 1.5, 2, and 2.5 percent were studied for moisture density relationships
and strength behaviour of soils. Also the effect of curing period was evaluated as literature review clearly emphasized the strength
gain of soils stabilized with RBI-81 over a period of time. The results obtained shows that the unconfined compressive strength of
specimens treated with RBI-81 increased approximately by 250% for a curing period of 28 days as compared to virgin soil. Further
the CBR value improved approximately by 400%. The studies indicated an increasing trend for soil strength behaviour with
increasing percentage of RBI-81 suggesting its potential applications in soil stabilization.
Influence of reinforcement on the behavior of hollow concrete block masonry p...eSAT Journals
Abstract
Reinforced masonry was developed to exploit the strength potential of masonry and to solve its lack of tensile strength. Experimental
and analytical studies have been carried out to investigate the effect of reinforcement on the behavior of hollow concrete block
masonry prisms under compression and to predict ultimate failure compressive strength. In the numerical program, three dimensional
non-linear finite elements (FE) model based on the micro-modeling approach is developed for both unreinforced and reinforced
masonry prisms using ANSYS (14.5). The proposed FE model uses multi-linear stress-strain relationships to model the non-linear
behavior of hollow concrete block, mortar, and grout. Willam-Warnke’s five parameter failure theory has been adopted to model the
failure of masonry materials. The comparison of the numerical and experimental results indicates that the FE models can successfully
capture the highly nonlinear behavior of the physical specimens and accurately predict their strength and failure mechanisms.
Keywords: Structural masonry, Hollow concrete block prism, grout, Compression failure, Finite element method,
Numerical modeling.
Influence of compaction energy on soil stabilized with chemical stabilizereSAT Journals
Abstract
Increase in traffic along with heavier magnitude of wheel loads cause rapid deterioration in pavements. There is a need to improve
density, strength of soil subgrade and other pavement layers. In this study an attempt is made to improve the properties of locally
available loamy soil using twin approaches viz., i) increasing the compaction of soil and ii) treating the soil with chemical stabilizer.
Laboratory studies are carried out on both untreated and treated soil samples compacted by different compaction efforts. Studies
show that increase in compaction effort results in increase in density of soil. However in soil treated with chemical stabilizer, rate of
increase in density is not significant. The soil treated with chemical stabilizer exhibits improvement in both strength and performance
properties.
Keywords: compaction, density, subgradestabilization, resilient modulus
Geographical information system (gis) for water resources managementeSAT Journals
Abstract
Water resources projects are inherited with overlapping and at times conflicting objectives. These projects are often of varied sizes
ranging from major projects with command areas of millions of hectares to very small projects implemented at the local level. Thus,
in all these projects there is seldom proper coordination which is essential for ensuring collective sustainability.
Integrated watershed development and management is the accepted answer but in turn requires a comprehensive framework that can
enable planning process involving all the stakeholders at different levels and scales is compulsory. Such a unified hydrological
framework is essential to evaluate the cause and effect of all the proposed actions within the drainage basins.
The present paper describes a hydrological framework developed in the form of a Hydrologic Information System (HIS) which is
intended to meet the specific information needs of the various line departments of a typical State connected with water related aspects.
The HIS consist of a hydrologic information database coupled with tools for collating primary and secondary data and tools for
analyzing and visualizing the data and information. The HIS also incorporates hydrological model base for indirect assessment of
various entities of water balance in space and time. The framework would be maintained and updated to reflect fully the most
accurate ground truth data and the infrastructure requirements for planning and management.
Keywords: Hydrological Information System (HIS); WebGIS; Data Model; Web Mapping Services
Forest type mapping of bidar forest division, karnataka using geoinformatics ...eSAT Journals
Abstract
The study demonstrate the potentiality of satellite remote sensing technique for the generation of baseline information on forest types
including tree plantation details in Bidar forest division, Karnataka covering an area of 5814.60Sq.Kms. The Total Area of Bidar
forest division is 5814Sq.Kms analysis of the satellite data in the study area reveals that about 84% of the total area is Covered by
crop land, 1.778% of the area is covered by dry deciduous forest, 1.38 % of mixed plantation, which is very threatening to the
environmental stability of the forest, future plantation site has been mapped. With the use of latest Geo-informatics technology proper
and exact condition of the trees can be observed and necessary precautions can be taken for future plantation works in an appropriate
manner
Keywords:-RS, GIS, GPS, Forest Type, Tree Plantation
Factors influencing compressive strength of geopolymer concreteeSAT Journals
Abstract
To study effects of several factors on the properties of fly ash based geopolymer concrete on the compressive strength and also the
cost comparison with the normal concrete. The test variables were molarities of sodium hydroxide(NaOH) 8M,14M and 16M, ratio of
NaOH to sodium silicate (Na2SiO3) 1, 1.5, 2 and 2.5, alkaline liquid to fly ash ratio 0.35 and 0.40 and replacement of water in
Na2SiO3 solution by 10%, 20% and 30% were used in the present study. The test results indicated that the highest compressive
strength 54 MPa was observed for 16M of NaOH, ratio of NaOH to Na2SiO3 2.5 and alkaline liquid to fly ash ratio of 0.35. Lowest
compressive strength of 27 MPa was observed for 8M of NaOH, ratio of NaOH to Na2SiO3 is 1 and alkaline liquid to fly ash ratio of
0.40. Alkaline liquid to fly ash ratio of 0.35, water replacement of 10% and 30% for 8 and 16 molarity of NaOH and has resulted in
compressive strength of 36 MPa and 20 MPa respectively. Superplasticiser dosage of 2 % by weight of fly ash has given higher
strength in all cases.
Keywords: compressive strength, alkaline liquid, fly ash
Experimental investigation on circular hollow steel columns in filled with li...eSAT Journals
Abstract
Composite Circular hollow Steel tubes with and without GFRP infill for three different grades of Light weight concrete are tested for
ultimate load capacity and axial shortening , under Cyclic loading. Steel tubes are compared for different lengths, cross sections and
thickness. Specimens were tested separately after adopting Taguchi’s L9 (Latin Squares) Orthogonal array in order to save the initial
experimental cost on number of specimens and experimental duration. Analysis was carried out using ANN (Artificial Neural
Network) technique with the assistance of Mini Tab- a statistical soft tool. Comparison for predicted, experimental & ANN output is
obtained from linear regression plots. From this research study, it can be concluded that *Cross sectional area of steel tube has most
significant effect on ultimate load carrying capacity, *as length of steel tube increased- load carrying capacity decreased & *ANN
modeling predicted acceptable results. Thus ANN tool can be utilized for predicting ultimate load carrying capacity for composite
columns.
Keywords: Light weight concrete, GFRP, Artificial Neural Network, Linear Regression, Back propagation, orthogonal
Array, Latin Squares
Experimental behavior of circular hsscfrc filled steel tubular columns under ...eSAT Journals
Abstract
This paper presents an outlook on experimental behavior and a comparison with predicted formula on the behaviour of circular
concentrically loaded self-consolidating fibre reinforced concrete filled steel tube columns (HSSCFRC). Forty-five specimens were
tested. The main parameters varied in the tests are: (1) percentage of fiber (2) tube diameter or width to wall thickness ratio (D/t
from 15 to 25) (3) L/d ratio from 2.97 to 7.04 the results from these predictions were compared with the experimental data. The
experimental results) were also validated in this study.
Keywords: Self-compacting concrete; Concrete-filled steel tube; axial load behavior; Ultimate capacity.
Evaluation of punching shear in flat slabseSAT Journals
Abstract
Flat-slab construction has been widely used in construction today because of many advantages that it offers. The basic philosophy in
the design of flat slab is to consider only gravity forces; this method ignores the effect of punching shear due to unbalanced moments
at the slab column junction which is critical. An attempt has been made to generate generalized design sheets which accounts both
punching shear due to gravity loads and unbalanced moments for cases (a) interior column; (b) edge column (bending perpendicular
to shorter edge); (c) edge column (bending parallel to shorter edge); (d) corner column. These design sheets are prepared as per
codal provisions of IS 456-2000. These design sheets will be helpful in calculating the shear reinforcement to be provided at the
critical section which is ignored in many design offices. Apart from its usefulness in evaluating punching shear and the necessary
shear reinforcement, the design sheets developed will enable the designer to fix the depth of flat slab during the initial phase of the
design.
Keywords: Flat slabs, punching shear, unbalanced moment.
Evaluation of performance of intake tower dam for recent earthquake in indiaeSAT Journals
Abstract
Intake towers are typically tall, hollow, reinforced concrete structures and form entrance to reservoir outlet works. A parametric
study on dynamic behavior of circular cylindrical towers can be carried out to study the effect of depth of submergence, wall thickness
and slenderness ratio, and also effect on tower considering dynamic analysis for time history function of different soil condition and
by Goyal and Chopra accounting interaction effects of added hydrodynamic mass of surrounding and inside water in intake tower of
dam
Key words: Hydrodynamic mass, Depth of submergence, Reservoir, Time history analysis,
Evaluation of operational efficiency of urban road network using travel time ...eSAT Journals
Abstract
Efficiency of the road network system is analyzed by travel time reliability measures. The study overlooks on an important measure of
travel time reliability and prioritizing Tiruchirappalli road network. Traffic volume and travel time were collected using license plate
matching method. Travel time measures were estimated from average travel time and 95th travel time. Effect of non-motorized vehicle
on efficiency of road system was evaluated. Relation between buffer time index and traffic volume was created. Travel time model has
been developed and travel time measure was validated. Then service quality of road sections in network were graded based on
travel time reliability measures.
Keywords: Buffer Time Index (BTI); Average Travel Time (ATT); Travel Time Reliability (TTR); Buffer Time (BT).
Estimation of surface runoff in nallur amanikere watershed using scs cn methodeSAT Journals
Abstract
The development of watershed aims at productive utilization of all the available natural resources in the entire area extending from
ridge line to stream outlet. The per capita availability of land for cultivation has been decreasing over the years. Therefore, water and
the related land resources must be developed, utilized and managed in an integrated and comprehensive manner. Remote sensing and
GIS techniques are being increasingly used for planning, management and development of natural resources. The study area, Nallur
Amanikere watershed geographically lies between 110 38’ and 110 52’ N latitude and 760 30’ and 760 50’ E longitude with an area of
415.68 Sq. km. The thematic layers such as land use/land cover and soil maps were derived from remotely sensed data and overlayed
through ArcGIS software to assign the curve number on polygon wise. The daily rainfall data of six rain gauge stations in and around
the watershed (2001-2011) was used to estimate the daily runoff from the watershed using Soil Conservation Service - Curve Number
(SCS-CN) method. The runoff estimated from the SCS-CN model was then used to know the variation of runoff potential with different
land use/land cover and with different soil conditions.
Keywords: Watershed, Nallur watershed, Surface runoff, Rainfall-Runoff, SCS-CN, Remote Sensing, GIS.
Estimation of morphometric parameters and runoff using rs & gis techniqueseSAT Journals
Abstract
Land and water are the two vital natural resources, the optimal management of these resources with minimum adverse environmental
impact are essential not only for sustainable development but also for human survival. Satellite remote sensing with geographic
information system has a pragmatic approach to map and generate spatial input layers of predicting response behavior and yield of
watershed. Hence, in the present study an attempt has been made to understand the hydrological process of the catchment at the
watershed level by drawing the inferences from moprhometric analysis and runoff. The study area chosen for the present study is
Yagachi catchment situated in Chickamaglur and Hassan district lies geographically at a longitude 75⁰52’08.77”E and
13⁰10’50.77”N latitude. It covers an area of 559.493 Sq.km. Morphometric analysis is carried out to estimate morphometric
parameters at Micro-watershed to understand the hydrological response of the catchment at the Micro-watershed level. Daily runoff
is estimated using USDA SCS curve number model for a period of 10 years from 2001 to 2010. The rainfall runoff relationship of the
study shows there is a positive correlation.
Keywords: morphometric analysis, runoff, remote sensing and GIS, SCS - method
-
Effect of variation of plastic hinge length on the results of non linear anal...eSAT Journals
Abstract The nonlinear Static procedure also well known as pushover analysis is method where in monotonically increasing loads are applied to the structure till the structure is unable to resist any further load. It is a popular tool for seismic performance evaluation of existing and new structures. In literature lot of research has been carried out on conventional pushover analysis and after knowing deficiency efforts have been made to improve it. But actual test results to verify the analytically obtained pushover results are rarely available. It has been found that some amount of variation is always expected to exist in seismic demand prediction of pushover analysis. Initial study is carried out by considering user defined hinge properties and default hinge length. Attempt is being made to assess the variation of pushover analysis results by considering user defined hinge properties and various hinge length formulations available in literature and results compared with experimentally obtained results based on test carried out on a G+2 storied RCC framed structure. For the present study two geometric models viz bare frame and rigid frame model is considered and it is found that the results of pushover analysis are very sensitive to geometric model and hinge length adopted. Keywords: Pushover analysis, Base shear, Displacement, hinge length, moment curvature analysis
Effect of use of recycled materials on indirect tensile strength of asphalt c...eSAT Journals
Abstract
Depletion of natural resources and aggregate quarries for the road construction is a serious problem to procure materials. Hence
recycling or reuse of material is beneficial. On emphasizing development in sustainable construction in the present era, recycling of
asphalt pavements is one of the effective and proven rehabilitation processes. For the laboratory investigations reclaimed asphalt
pavement (RAP) from NH-4 and crumb rubber modified binder (CRMB-55) was used. Foundry waste was used as a replacement to
conventional filler. Laboratory tests were conducted on asphalt concrete mixes with 30, 40, 50, and 60 percent replacement with RAP.
These test results were compared with conventional mixes and asphalt concrete mixes with complete binder extracted RAP
aggregates. Mix design was carried out by Marshall Method. The Marshall Tests indicated highest stability values for asphalt
concrete (AC) mixes with 60% RAP. The optimum binder content (OBC) decreased with increased in RAP in AC mixes. The Indirect
Tensile Strength (ITS) for AC mixes with RAP also was found to be higher when compared to conventional AC mixes at 300C.
Keywords: Reclaimed asphalt pavement, Foundry waste, Recycling, Marshall Stability, Indirect tensile strength.
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Correlation of artificial neural network classification and nfrs attribute filtering algorithm for pcos data
1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 03 | Mar-2015, Available @ http://www.ijret.org 519
CORRELATION OF ARTIFICIAL NEURAL NETWORK
CLASSIFICATION AND NFRS ATTRIBUTE FILTERING ALGORITHM
FOR PCOS DATA
K. Meena1
, M. Manimekalai2
, S. Rethinavalli3
1
Former Vice-chancellor, Bharathidasan University, Tricy, Tamilnadu, India
2
Director and Head, Department of Computer Applications, Shrimati Indira Gandhi College, Trichy, TN, India
3
Assistant Professor, Department of Computer Applications, Shrimati Indira Gandhi College, Trichy, TN, India
Abstract
Mostly 5 to 15% of the women in the stage of reproduction face the disease called Polycystic Ovarian Syndrome (PCOS) which is
the multifaceted, heterogeneous and complex. The long term consequences diseases like endometrial hyperplasia, type 2 diabetes
mellitus and coronary disease are caused by the polycystic ovaries, chronic anovulation and hyperandrogenism are characterized
with the resistance of insulin and the hypertension, abdominal obesity and dyslipidemia and hyperinsulinemia are called as
Metabolic syndrome (frequent metabolic traits) The above cause the common disease called Anovulatory infertility. Computer
based information along with advanced Data mining techniques are used for appropriate results. Classification is a classic data
mining task, with roots in machine learning. Naïve Bayesian, Artificial Neural Network, Decision Tree, Support Vector Machines
are the classification tasks in the data mining. Feature selection methods involve generation of the subset, evaluation of each
subset, criteria for stopping the search and validation procedures. The characteristics of the search method used are important
with respect to the time efficiency of the feature selection methods. PCA (Principle Component Analysis), Information gain Subset
Evaluation, Fuzzy rough set evaluation, Correlation based Feature Selection (CFS) are some of the feature selection techniques,
greedy first search, ranker etc are the search algorithms that are used in the feature selection. In this paper, a new algorithm
which is based on Fuzzy neural subset evaluation and artificial neural network is proposed which reduces the task of
classification and feature selection separately. This algorithm combines the neural fuzzy rough subset evaluation and artificial
neural network together for the better performance than doing the tasks separately.
Keywords: ANN, SVM, PCA, CFS
--------------------------------------------------------------------***----------------------------------------------------------------------
1. INTRODUCTION
The polycystic ovary syndrome (PCOS) in women is
pretended by the endocrinological which is the most
common issue. The following changes like hirsuitism acne,
oligo or amenorrhoea, anovulation, morphological change
and showing the increased levels of serum androgen and
these are demonstrated with the PCOS women on the
evidence of the ovary on the ultrasonography.
Diagnostically, the above is the current practice and it is the
agreed criteria in Rotterdam 2003 [1]. About 50% of the
general population, the patients with PCOS are obese which
are the higher prevalence. Due to the condition of the
metabolic element, the result of the long-term morbidity by
the insulin resistance.
The small cysts and the clusters of pearl-sized which induces
the PCOS frequently because it is referred as the polycystic
when there are many number of cysts in the ovaries. The
fluid-filled form of the immature eggs are contained by the
cystic. The symptoms of the PCOS are contributed due to the
large number of male hormones productions by the
Androgen [2]. The programming of utero fetal is brings out
by the PCOS phenotype which is the plays the major role in
the adult age of women and it might be the cause for the
PCOS. Due to the interaction of the genetic factors with the
obesity, the metabolic characteristic and result of the
menstrual disturbances are the last expressions of the
phenotype of PCOS. [3] (factors of environment) which
leads the women for developing of PCOS and it is
genetically inclined.
Nowadays, data mining is the exploration of large datasets to
extort hidden and formerly unknown patterns, relationships
and knowledge that are complicated to detect with
conventional statistical methods. In the emerging field of
healthcare data mining plays a major role to extract the
details for the deeper understanding of the medical data in
the providing of prognosis [4]. Due to the development of
modern technology, data mining applications in healthcare
consist about the analysis of health care centres for
enhancement of health policy-making and prevention of
hospital errors, early detection, prevention of diseases and
preventable hospital deaths, more value for money and cost
savings, and detection of fraudulent insurance claims.
The characteristic selection has been an energetic and
productive in the field of research area through pattern
recognition, machine learning, statistics and data mining
communities. The main intention of attribute selection is to
2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 03 | Mar-2015, Available @ http://www.ijret.org 520
choose a subset of input variables by eradicating features,
which are irrelevant or of non prognostic information.
Feature selection [5] has proven in both theory and practice
to be valuable in ornamental learning efficiency, escalating
analytical accuracy and reducing complexity of well-read
results. Feature selection in administered learning has a chief
objective in finding a feature subset that fabricates higher
classification accuracy. The number of feature N increases
because the expansions of the domain dimensionality.
Among that finding an optimal feature subset is intractable
and exertions associated feature selections have been
demonstrated to be NP-hard. At this point, it is crucial to
depict the traditional feature selection process, which
consists of four basic steps, namely, validation of the subset,
stopping criterion, evaluation of subset and subset
generation. Subset generation is a investigation process that
generates the candidate feature subsets for assessment based
on a certain search strategy. Depends on the certain
assessment, the comparison with the best prior one and each
candidate subset is evaluated. If the new subset revolves to
be better, it reinstates best one. Whenever the stopping
condition is fulfilled until the process is repeated. There are
large number of features that can exceed the number of data
themselves often exemplifies the data used in ML [6]. This
kind of problem is known as "the curse of dimensionality"
generates a challenge for a mixture of ML applications for
decision support. This can amplify the risk of taking into
account correlated or redundant attributes which can lead to
lower classification accuracy. As a result, the process of
eliminating irrelevant features is a crucial phase for
designing the decision support systems with high accuracy.
In this technical world, data mining is the only consistent
source accessible to unravel the intricacy of congregated
data. Meanwhile, the two categories of data mining tasks can
be generally categorized such as descriptive and predictive.
Descriptive mining tasks illustrate the common attributes of
the data in the database. Predictive mining tasks execute
implication on the present data in order to formulate the
predictions. Data available for mining is raw data. The data
collects from different source, therefore the format may be
different. Moreover, it may consist of noisy data, irrelevant
attributes, missing data etc. Discretization – Once the data
mining algorithm cannot cope with continuous attributes,
discretization [7] needs to be employed. However, this step
consists of transforming a continuous attribute into an
unconditional attribute, taking only a small number of
isolated values. Frequently, Discretization often improves the
comprehensibility of the discovered knowledge. Attribute
Selection – not all attributes are relevant and so for selecting
a subset of attributes relevant for mining, among all original
attributes, attribute selection [8] is mandatory.
2. CLASSIFICATION TECHNIQUES
The most commonly used data mining technique is the
classification that occupies a set of pre-classified patterns to
develop a model that can categorize the population of records
at large. The learning and classification is involved by the
process called the data classification. By the classification
algorithm, the training data are analyzed in the learning [8]
[9]. The approximation of the classification rules, the test
data are used in the classification. To the new data tuples, the
rules can be applied when the accuracy is acceptable. To
verify the set of parameters which is needed for the proper
discrimination, the pre-classified examples are used in the
classifier-training algorithm. The model which is called as a
classifier, only after these parameters encoded by the
algorithm. Artificial neural network, Naive Bayesian
classification algorithm classification techniques are used in
this paper.
3. FEATURE SELECTION TECHNQIUES
Before In general, Feature subset selection is a pre-
processing step used in machine learning [10]. It is valuable
in reducing dimensionality and eradicates irrelevant data
therefore it increases the learning accuracy. It refers to the
problem of identifying those features that are useful in
predicting class. Features can be discrete, continuous or
nominal. On the whole, features are described in three types.
1) Relevant, 2) Irrelevant, 3) Redundant. Feature selection
methods wrapper and embedded models. Moreover, Filter
model rely on analyzing the general qualities of data and
evaluating features and will not involve any learning
algorithm, where as wrapper model uses après determined
learning algorithm and use learning algorithms performance
on the provided features in the evaluation step to identify
relevant feature. The Embedded models integrate the feature
selection as a part of the model training process.
The collection of datas from medical sources is highly
voluminous in nature. The various significant factors distress
the success of data mining on medical data. If the data is
irrelevant, redundant then knowledge discovery during
training phase is more difficult. Figure 2 shows flow of FSS.
Fig -1: Feature Subset Selection
4. ARTIFICIAL NEURAL NETWORK
Moreover, the realistic provisional terms in neural networks
are non-linear statistical data modelling tools. For
discovering the patterns or modelling the complex
relationships between inputs and outputs the neural network
can be used. The process of collecting information from
datasets is the data warehousing firms also known as data
3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 03 | Mar-2015, Available @ http://www.ijret.org 521
mining by using neural network tool [11]. The more
informed decisions are made by users that helping data of the
cross-fertilization and there is distinction between these data
warehouse and ordinary databases and there is an authentic
manipulation.
Fig -2: Example Artificial Neural Network
Among the algorithms the most popular neural network
algorithms are Hopfield, Multilayer perception, counter
propagation networks, radial basis function and self
organizing maps etc. In which, the feed forward neural
network was the first and simplest type of artificial neural
network consists of 3 units input layer, hidden layer and
output layer. There are no cycles or loops in this network. A
neural network [12] has to be configured to fabricate the
required set of outputs. Basically there are three learning
conditions for neural network. 1) Supervised Learning, 2)
Unsupervised Learning, 3) Reinforcement learning the
perception is the basic unit of an artificial neural network
used for classification where patterns are linearly separable.
The basic model of neuron used in perception is the
Mcculluchpitts model. The learning for artificial neural
networks are as follows:
Step 1: Ret D= {{Xi,Yi}/i=1, 2, 3----n}} be the set of
training example.
Step 2: Initialize the weight vector with random value,W (o).
Step 3: Repeat.
Step 4:For each training sample (Xi, Yi) ∈ D.
Step 5: Compute the predicted output Yi^ (k)
Step 6: For each weight we does.
Step 7: Update the weight we (k+1) = Wj(k) +(y i -yi^
(k))xij.
Step 8: End for.
Step 9: End for.
Step 10: Until stopping criteria is met.
5. NEURAL FUZZY ROUGH SET EVALUATION
This particular technique has been implemented by us in
previous research paper [13] and in addition we have
implemented the further scope of the research in this paper.
The correlation between the decision feature E and a
condition feature Dj is denoted by RVj,e which refers RV
that measures the above value. For the range of [0, 1] its
value is normalized by the symmetrical uncertainty to assure
that they are comparable [14]. The knowledge of value of the
conditional attribute Dj completely predicts the value of the
decision feature E and it is indicated by the value of 1 and Dj
and E values which are independent, then the attribute value
Dj is irrelevant and it is indicated by the value zero [15].
Accordingly, the value of RVj,e is maximum, then the
feature is strong relevant or essential is assumed. When the
value of RV is low to the class such as RVj,e≤ 0.0001 then
we consider the feature is irrelevant or not essential and these
are examined in this paper.
Input: A training set is represented by ϕ (d1, d2 . . .dn,e)
Output: A reductant accuracy of the conditional feature D is
represented by SB
Begin
Step 1: When the forming of the set SN by the features,
eliminate the features that have lower threshold value.
Step 2: Arrange the value of RVj,e value in decreasing order
in SN
Step 3: Then initialize SB = max { RVj,e }
Step 4: To get the first element in SB the formula used for
that is Dk = getFirstElement (SB).
Step 5: Then go to begin stage
Step 6: for each feature DK in SN
Step 7: If (σ DK (SB) <σ (SB)
Step 8: SN →Dk ; new old { } SB = SB ∪ Dk
Step 9: SB = max{I (SBnew),I (SBold )}
Step 10: Dk = getNextElement(SB) ;
Step 11: End until ( Dk == null )
Step 12: Return SB;
End;
6. PROPOSED ALGORITHM
We proposed a new hybrid feature selection method by
combing neural fuzzy rough set evaluation and artificial
neural network. The neural fuzzy rough set algorithm
reduces the number of attributes based on the SU measure, In
neural fuzzy rough set each attributes are compared pair wise
to find the Similarity and the Attributes are compared to class
attribute to find the amount of contribution it provides to the
class value , based on these the attributes are removed. The
selected attributes from the NFRS algorithm is fed into
Artificial neural network for further reduction. Artificial
neural network calculates the conditional probability for each
attribute and the attribute which has highest conditional
probability is selected. Both the Algorithms NFRS and ANN
works on the Conditional Probability measure.
Input: T(K1; K2; :::; KM;L) // a training data set δ // a
predefined threshold
output: Tbest {Abest(highest IG)} // an optimal subset
Step 1: begin
Step 2: for j= 1 to M do begin
Step 3: calculate TUj;l for Kj;
Step 4: if (Tuj,l ≥ δ )
Step 5: append Kj to T'list;
Step 6: end;
Step 7: order T'list in descending TUj,l value;
Step 8: Kp = getFirstElement(T'list);
Step 9: do begin
Step 10: Kq = getNextElement(T'list , Kp);
4. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 03 | Mar-2015, Available @ http://www.ijret.org 522
Step 11: if (Kq <> NULL)
Step 12: do begin
Step 13: K'q = Kq;
Step 14: if (TUp,q ¸ TUq,l)
Step 15: remove Kq from T'list ;
Step 16: Kq = getNextElement(T'list , K 'q);
Step 17: else Kq = getNextElement(T'list , Kq);
Step 18: end until (Kq == NULL);
Step 19: Kp = getNextElement(T'list , Kp);
Step 20 end until (Kp == NULL);
Step 21 Tbest = T'list ;
Step 22 Tbest={X1,X2,..XN}
Step 23 for j=1 to N begin
Step 24 for k=j+1 to N begin
Step 25 P[Lm/(Xj,Xk)]=P[(Xj,Xk)/Lm]*P(Lm)
Step 26 P[L/(Xj,Xk)]= P[L1/(Xj,Xk)]+P[L2/(Xj,Xk)]+…
....+P[Ln/(Xj,Xk)]
Step 27 If(P[L/Xj,Xk)]>Ω)
Step 28 {
Step 29 if((P[L/Xj]>(P[L/Xk])
Step 30 Remove Xk from Tbest
Step 31 Tbest=IG(X)
Step 32 Else
Step 33 Remove Xj from Tbest
Step 34 Tbest=IG(X)
Step 35
Step 36 }
Step 37 end;
7. DATASET
The following dataset attributes is used for predicting the
PCOS (Polycystic Ovarian Syndrome) disease for the women
at earlier stage. The dataset is taken from the [16]. The
dataset contains 26 attributes and 303 instances. The
following table is the PCOS dataset.
Table 1: Attributes of PCOS dataset
S.No Attributes
1 ID_REF
2 IDENTIFIER
3 eENPCOS103.PCO1
4 eENPCOS107.PCO7
5 eENPCOS140.UC271
6 eEPPCOS105.PCO7_EpCAM
7 eEPPCOS109.PCO8_EpCAM
8 eEPPCOS119.PC11
9 eEPPCOS138.UC271_EpCAM
10 eMCPCOS102.PCO7
11 eMCPCOS106.PCO7
12 eMCPCOS120.PC11
13 eSCPCOS101.PCO1
14 eSCPCOS104.PCO7
15 eSCPCOS118.PC11
16 eSCPCOS134.UC271
17 eENCtrl.ETB65
18 eENCtrl016.UC182
19 eENCtrl032.UC208
20 eENCtrl036.UC209
21 eEPCtrl014.UC182_EpCAM
22 eEPCtrl030.UC208_EpCAM
23 eEPCtrl034.UC209_EpCAM
24 eMCCtrl.ETB65
25 eMCCtrl015.UC182
26 eMSCtrl031.UC208
27 eMCCtrl035.UC209
28 eSCCtrl.ETB65
29 eSCCtrl013.UC182
30 eSCCtrl029.UC208
31 eSCCtrl033.UC209
8. EXPERIMENTAL RESULT
The experiment is done using Orange Data mining tool. The
proposed algorithm is fed into the user defined coding block
for the execution
Table 2: Number of Original and Reduced Attributes
Attribute Filtering
Method
Total Number
of Attributes
Number of
Reduced
Attributes
Neural Fuzzy Rough
Set
26 5
ANN+NFRS 26 2
Table 3: Reduced Attributes by Neural Fuzzy Rough Set and
Neural network
Attribute Filtering
Method
Number of
attributes
Number of
filtered
Attributes
ANN+NFRS 26 2
Table 4: Classifier Accuracy with full dataset
S.
N
o
Classifiers
Correctly
Classified
Instances
Incorrectly
Classified
Instances
Accuracy
of the
classificati
on
1 SVM 232 71 76.45%
2 ANN 254 49 83.70%
3
Classificatio
n Tree
228 75 75.25%
4 Naïve Bayes 251 52 82.75%
Table 5: Number of filtered attributes by each attribute
filtering method
Attribute Filtering
Method
Number of attributes filtered
Correlation based Feature
Selection (CFS)
3 (3,5,4)
Information Gain 9 (3,5,2,21,4,15,19,11,6)
Gain ratio 10 (3,5,21,4,15,19,11,30,29,26)
Neural Fuzzy Rough Set
(NFRS)
5 (2,3,11,30,4)
Principle Component
Analysis (PCA)
11
(3,5,21,4,15,19,2,11,30,23,29)
5. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 03 | Mar-2015, Available @ http://www.ijret.org 523
Table 6: Accuracy of classifiers with filtered attributes
Attribute
Filtering
Method
Acc. of
SVM
(in %)
Acc.
of
ANN
(in
%)
Acc.
Classifi
cation
Tree
(in %)
Acc. of
Naïve
Bayes
(in %)
Aver-
age
(in
%)
CFS 80.25 83.30 76.75 82.10 80.60
Informa-
tion gain
72.65 75.50 73.30 74.25 73.92
Gain Ratio 71.45 74.85 72.60 73.35 73.10
NFRS 80.85 85.25 79.50 84.45 82.52
PCA 69.65 72.35 70.75 71.55 71.07
Table 7: Hybrid Feature Selection
Attribute Filtering Method Filtered
Attributes
NFRS Table 7: Hybrid Feature
Selection +ANN
2 (2,3)
Table 8: Accuracy of classification after NFRS+ANN
Classification Algorithm Accuracy (in %)
Support Vector Machine 76.58
Artificial Neural Network 83.83
Classification Tree 75.23
Naïve Bayes 82.85
Table 9: Comparison of accuracy of classifiers with
NFRS+ANN algorithm as feature selector
Classification
Method
Correctly
Classified
Instance
Accuracy
(in %)
Support Vector
Machine
232 76.45%
Artificial Neural
Network
254 83.70%
Classification
Tree
228 75.25%
Naïve Bayes 251 82.75%
9. RESULT AND ANALYSIS
The above experiment is done using PCOS disease dataset
which contains 26 attributes and 303 instances. From the
table 1, the technique NFRS (Neural Fuzzy Rough Set)
evaluation gives 5 filtered attributes out of 26 attributes,
whereas, when NFRS is merged with artificial neural
network (ANN), it generates the reduced number of
attributes 2 than the NFRS. And table 2 also gives the result
as same as the above. Table 3 gives the classification
accuracy with full dataset. From the table 3, Artificial Neural
Network (ANN) gives the more accuracy result than the
SVM (Support Vector Machine), Classification Tree and
Naïve Bayes algorithm. Table 4 gives the result of the
various attribute filtering methods, from that result, we can
conclude that the Neural Fuzzy Rough Set (NFRS) generates
the least number of filtered attributes than the other
techniques like CFS, PCA, Information gain and gain ratio.
Ratio The table 5 gives result for the accuracy of classifiers
with filtered attributes. From the table 5, we can conclude
that the feature selection technique NFRS and classification
technique of ANN gives the more accuracy classification
result than the others. Table 6 gives the result of the hybrid
feature selection method. In the table 7, after the merging of
NFRS+ANN, the result of the classification accuracy of
ANN is more than the others. And also from table 8, after the
NFRS+ANN, the classified accuracy of the classification for
ANN is higher than the other techniques.
10. CONCLUSION
In this paper, from the classification techniques like Artificial
Neural Network (ANN), Support Vector Machine (SVM),
classification tree and Naïve Bayes algorithm, the ANN
classification techniques gives the more classification result
than the others. The attribute filtering Neural Fuzzy Rough
Set (NFRS) generates the less number of attributes than the
Correlation based Feature Selection (CFS), Information gain
method, Gain ratio method and Principle Component
Analysis (PCA). In this paper, we proposed a new method
using NFRS and ANN. This proposed method gives the more
accuracy in the classification result as well as attribute
filtering. From this paper, we can conclude that instead of
doing the classification and feature selection separately we
can do together for the best result for predicting the PCOS
disease among the women using PCOS data set.
REFERENCES
[1]. T. M. Barber, M. I. McCarthy, J. A. H. Wass and S.
Franks, “Obesity and polycystic ovary syndrome”, Clinical
Endocrinology (2006) 65, 137–145.
[2]. “Polycystic Ovary Syndrome”, The American College
of Obstetricians and Gynecologistics.
[3]. Frederick R. Jelovsek, “Which Oral Contraceptive Pill
is Best for Me?”, pp.no:1-4.
[4]. Hian Chye Koh and Gerald Tan, “Data Mining
Applications in Healthcare”, Journal of Healthcare
Information Management — Vol. 19, No. 2, pp.no: 64-72.
[5]. S.Saravanakumar, S.Rinesh, “Effective Heart Disease
Prediction using Frequent Feature Selection Method”,
International Journal of Innovative Research in Computer
and Communication Engineering, Vol.2, Special Issue 1,
March 2014, pp.no: 2767-2774.
[6]. Jyoti Soni, Uzma Ansari, Dipesh Sharma and Sunita
Soni, “Intelligent and Effective Heart Disease Prediction
System using Weighted Associative Classifiers”,
International Journal on Computer Science and Engineering
(IJCSE), Vol. 3 No. 6 June 2011, pp.no:2385-2392.
[7]. Aieman Quadir Siddique, Md. Saddam Hossain,
“Predicting Heart-disease from Medical Data by Applying
Naïve Bayes and Apriori Algorithm”, International Journal
of Scientific & Engineering Research, Volume 4, Issue 10,
October-2013, pp.no: 224-231.
[8]. Vikas Chaurasia, Saurabh Pal, “Early Prediction of
Heart Diseases Using Data Mining Techniques”,
Carib.j.SciTech,2013,Vol.1, pp.no:208-217.
[9]. Shamsher Bahadur Patel, Pramod Kumar Yadav, Dr. D.
P.Shukla, “Predict the Diagnosis of Heart Disease Patients
Using Classification Mining Techniques”, IOSR Journal of
Agriculture and Veterinary Science (IOSR-JAVS), Volume
4, Issue 2 (Jul. - Aug. 2013), PP 61-64.
6. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 03 | Mar-2015, Available @ http://www.ijret.org 524
[10]. M. Anbarasi, E. Anupriya, N.CH.S.N.Iyengar,
“Enhanced Prediction of Heart Disease with Feature Subset
Selection using Genetic Algorithm”, International Journal of
Engineering Science and Technology, Vol. 2(10), 2010,
5370-5376.
[11]. D Ratnam, P HimaBindu, V.Mallik Sai, S.P.Rama
Devi, P.Raghavendra Rao, “Computer-Based Clinical
Decision Support System for Prediction of Heart Diseases
Using Naïve Bayes Algorithm”, International Journal of
Computer Science and Information Technologies, Vol. 5 (2)
, 2014, pp.no: 2384-2388.
[12]. Selvakumar.P, DR.Rajagopalan.S.P, “A Survey On
Neural Network Models For Heart Disease Prediction”,
Journal of Theoretical and Applied Information Technology,
20th September 2014. Vol. 67 No.2, pp.no:485-497.
[13]. Dr. K. Meena, Dr. M. Manimekalai, S. Rethinavalli,
“A Novel Framework for Filtering the PCOS Attributes
using Data Mining Techniques”, International Journal of
Engineering Research & Technology (IJERT), Vol. 4 Issue
01,January-2015, pp.no: 702-706.
[14]. Pradipta Maji and Sankar K. Pal, “Feature Selection
Using f-Information Measures in Fuzzy Approximation
Spaces” IEEE Transactions On Knowledge And Data
Engineering, Vol. 22, No. 6, June 2010.
[15]. Lei Yu, Huan Liu, “Efficient Feature Selection via
Analysis of Relevance and Redundancy” Journal of
Machine Learning Research 5 (2004) 1205–1224.
[16]. PCOS Dataset Source -
ftp://ftp.ncbi.nlm.nih.gov/geo/datasets/GDS4nnn/GDS4987/