Data mining environment produces a large amount of data, that need to be
analyses, pattern have to be extracted from that to gain knowledge. In this new period with
rumble of data both ordered and unordered, by using traditional databases and architectures, it
has become difficult to process, manage and analyses patterns. To gain knowledge about the
Big Data a proper architecture should be understood. Classification is an important data mining
technique with broad applications to classify the various kinds of data used in nearly every
field of our life. Classification is used to classify the item according to the features of the item
with respect to the predefined set of classes. This paper provides an inclusive survey of
different classification algorithms and put a light on various classification algorithms including
j48, C4.5, k-nearest neighbor classifier, Naive Bayes, SVM etc., using random concept.
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER
Data mining environment produces a large amount of data that need to be analyzed.
Using traditional databases and architectures, it has become difficult to process, manage and analyze
patterns. To gain knowledge about the Big Data a proper architecture should be understood.
Classification is an important data mining technique with broad applications to classify the various
kinds of data used in nearly every field of our life. Classification is used to classify the item
according to the features of the item with respect to the predefined set of classes. This paper put a
light on various classification algorithms including j48, C4.5, Naive Bayes using large dataset.
Hypothesis on Different Data Mining AlgorithmsIJERA Editor
In this paper, different classification algorithms for data mining are discussed. Data Mining is about
explaining the past & predicting the future by means of data analysis. Classification is a task of data mining,
which categories data based on numerical or categorical variables. To classify the data many algorithms are
proposed, out of them five algorithms are comparatively studied for data mining through classification. There are
four different classification approaches namely Frequency Table, Covariance Matrix, Similarity Functions &
Others. As work for research on classification methods, algorithms like Naive Bayesian, K Nearest Neighbors,
Decision Tree, Artificial Neural Network & Support Vector Machine are studied & examined using benchmark
datasets like Iris & Lung Cancer.
Abstract In this paper, the concept of data mining was summarized and its significance towards its methodologies was illustrated. The data mining based on Neural Network and Genetic Algorithm is researched in detail and the key technology and ways to achieve the data mining on Neural Network and Genetic Algorithm are also surveyed. This paper also conducts a formal review of the area of rule extraction from ANN and GA. Keywords: Data Mining, Neural Network, Genetic Algorithm, Rule Extraction.
Introduction to feature subset selection methodIJSRD
Data Mining is a computational progression to ascertain patterns in hefty data sets. It has various important techniques and one of them is Classification which is receiving great attention recently in the database community. Classification technique can solve several problems in different fields like medicine, industry, business, science. PSO is based on social behaviour for optimization problem. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Rough Set Theory (RST) is a mathematical tool which deals with the uncertainty and vagueness of the decision systems.
GI-ANFIS APPROACH FOR ENVISAGE HEART ATTACK DISEASE USING DATA MINING TECHNIQUESAM Publications
The process of selecting a subset of relevant features from the feature space for use in model construction and used to carry out the feature selection process is called as pre processing step. The filter approach computationally fast and given accuracy results. The Professional Medical Conduct Board Actions data consist of all public actions taken against physicians, physician assistants, specialist assistants, and medical professional. The Classification and Regression Trees (CART), which described the generation of binary decision trees CART were invented independently of one another at around the same time, yet follow a similar approach for learning decision trees from training tuples. The research used GI-ANFIS is used to data mining technique on heart data sets to provide the diagnosis results.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER
Data mining environment produces a large amount of data that need to be analyzed.
Using traditional databases and architectures, it has become difficult to process, manage and analyze
patterns. To gain knowledge about the Big Data a proper architecture should be understood.
Classification is an important data mining technique with broad applications to classify the various
kinds of data used in nearly every field of our life. Classification is used to classify the item
according to the features of the item with respect to the predefined set of classes. This paper put a
light on various classification algorithms including j48, C4.5, Naive Bayes using large dataset.
Hypothesis on Different Data Mining AlgorithmsIJERA Editor
In this paper, different classification algorithms for data mining are discussed. Data Mining is about
explaining the past & predicting the future by means of data analysis. Classification is a task of data mining,
which categories data based on numerical or categorical variables. To classify the data many algorithms are
proposed, out of them five algorithms are comparatively studied for data mining through classification. There are
four different classification approaches namely Frequency Table, Covariance Matrix, Similarity Functions &
Others. As work for research on classification methods, algorithms like Naive Bayesian, K Nearest Neighbors,
Decision Tree, Artificial Neural Network & Support Vector Machine are studied & examined using benchmark
datasets like Iris & Lung Cancer.
Abstract In this paper, the concept of data mining was summarized and its significance towards its methodologies was illustrated. The data mining based on Neural Network and Genetic Algorithm is researched in detail and the key technology and ways to achieve the data mining on Neural Network and Genetic Algorithm are also surveyed. This paper also conducts a formal review of the area of rule extraction from ANN and GA. Keywords: Data Mining, Neural Network, Genetic Algorithm, Rule Extraction.
Introduction to feature subset selection methodIJSRD
Data Mining is a computational progression to ascertain patterns in hefty data sets. It has various important techniques and one of them is Classification which is receiving great attention recently in the database community. Classification technique can solve several problems in different fields like medicine, industry, business, science. PSO is based on social behaviour for optimization problem. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Rough Set Theory (RST) is a mathematical tool which deals with the uncertainty and vagueness of the decision systems.
GI-ANFIS APPROACH FOR ENVISAGE HEART ATTACK DISEASE USING DATA MINING TECHNIQUESAM Publications
The process of selecting a subset of relevant features from the feature space for use in model construction and used to carry out the feature selection process is called as pre processing step. The filter approach computationally fast and given accuracy results. The Professional Medical Conduct Board Actions data consist of all public actions taken against physicians, physician assistants, specialist assistants, and medical professional. The Classification and Regression Trees (CART), which described the generation of binary decision trees CART were invented independently of one another at around the same time, yet follow a similar approach for learning decision trees from training tuples. The research used GI-ANFIS is used to data mining technique on heart data sets to provide the diagnosis results.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
Distributed digital artifacts incorporate cryptographic hash values to URI called trusty URIs in a distributed environment
building good in quality, verifiable and unchangeable web resources to prevent the rising man in the middle attack. The greatest
challenge of a centralized system is that it gives users no possibility to check whether data have been modified and the communication
is limited to a single server. As a solution for this, is the distributed digital artifact system, where resources are distributed among
different domains to enable inter-domain communication. Due to the emerging developments in web, attacks have increased rapidly,
among which man in the middle attack (MIMA) is a serious issue, where user security is at its threat. This work tries to prevent MIMA
to an extent, by providing self reference and trusty URIs even when presented in a distributed environment. Any manipulation to the
data is efficiently identified and any further access to that data is blocked by informing user that the uniform location has been
changed. System uses self-reference to contain trusty URI for each resource, lineage algorithm for generating seed and SHA-512 hash
generation algorithm to ensure security. It is implemented on the semantic web, which is an extension to the world wide web, using
RDF (Resource Description Framework) to identify the resource. Hence the framework was developed to overcome existing
challenges by making the digital artifacts on the semantic web distributed to enable communication between different domains across
the network securely and thereby preventing MIMA.
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryIJERA Editor
Theimmense volumes of data are populated into repositories from various applications. In order to find out desired information and knowledge from large datasets, the data mining techniques are very much helpful. Classification is one of the knowledge discovery techniques. In Classification, Decision trees are very popular in research community due to simplicity and easy comprehensibility. This paper presentsan updated review of recent developments in the field of decision trees.
With the development of database, the data volume stored in database increases rapidly and in the large
amounts of data much important information is hidden. If the information can be extracted from the
database they will create a lot of profit for the organization. The question they are asking is how to extract
this value. The answer is data mining. There are many technologies available to data mining practitioners,
including Artificial Neural Networks, Genetics, Fuzzy logic and Decision Trees. Many practitioners are
wary of Neural Networks due to their black box nature, even though they have proven themselves in many
situations. This paper is an overview of artificial neural networks and questions their position as a
preferred tool by data mining practitioners.
Efficient classification of big data using vfdt (very fast decision tree)eSAT Journals
Abstract
Decision Tree learning algorithms have been able to capture knowledge successfully. Decision Trees are best considered when
instances are described by attribute-value pairs and when the target function has a discrete value. The main task of these
decision trees is to use inductive methods to the given values of attributes of an unknown object and determine an
appropriate classification by applying decision tree rules. Decision Trees are very effective forms to evaluate the performance
and represent the algorithms because of their robustness, simplicity, capability of handling numerical and categorical data,
ability to work with large datasets and comprehensibility to a name a few. There are various decision tree algorithms available
like ID3, CART, C4.5, VFDT, QUEST, CTREE, GUIDE, CHAID, CRUISE, etc. In this paper a comparative study on three of
these popular decision tree algorithms - (Iterative Dichotomizer 3), C4.5 which is an evolution of ID3 and VFDT (Very
Fast Decision Tree has been made. An empirical study has been conducted to compare C4.5 and VFDT in terms of accuracy
and execution time and various conclusions have been drawn.
Key Words: Decision tree, ID3, C4.5, VFDT, Information Gain, Gain Ratio, Gini Index, Over−fitting.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Abstract Learning Analytics by nature relies on computational information processing activities intended to extract from raw data some interesting aspects that can be used to obtain insights into the behaviors of learners, the design of learning experiences, etc. There is a large variety of computational techniques that can be employed, all with interesting properties, but it is the interpretation of their results that really forms the core of the analytics process. As a rising subject, data mining and business intelligence are playing an increasingly important role in the decision support activity of every walk of life. The Variance Rover System (VRS) mainly focused on the large data sets obtained from online web visiting and categorizing this into clusters according some similarity and the process of predicting customer behavior and selecting actions to influence that behavior to benefit the company, so as to take optimized and beneficial decisions of business expansion. Keywords: Analytics, Business intelligence, Clustering, Data Mining, Standard K-means, Optimized K-means
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINEaciijournal
Today, enormous amount of data is collected in medical databases. These databases may contain valuable
information encapsulated in nontrivial relationships among symptoms and diagnoses. Extracting such
dependencies from historical data is much easier to done by using medical systems. Such knowledge can be
used in future medical decision making. In this paper, a new algorithm based on C4.5 to mind data for
medince applications proposed and then it is evaluated against two datasets and C4.5 algorithm in terms of
accuracy.
Survey on Various Classification Techniques in Data Miningijsrd.com
Dynamic Classification is an information mining (machine learning) strategy used to anticipate bunch participation for information cases. In this paper, we show the essential arrangement systems. A few significant sorts of arrangement technique including induction, Bayesian networks, k-nearest neighbor classifier, case-based reasoning, genetic algorithm and fuzzy logic techniques. The objective of this review is to give a complete audit of distinctive characterization procedures in information mining.
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION cscpconf
While designing a new type of engineering material one has to search for some existing
materials which suits design requirement and then he can try to produce new kind of
engineering material. This selection process itself is tedious as he has to select few numbers of
materials out of a set of lakhs of materials. Therefore in this paper a model is proposed to select
a particular material which suits the user requirement, by using some similarity/distance
measuring functionalities. Here thirteen different types of similarity/distance measuring
functionalities are examined. Performance Index Measure(PIM) is calculated to verify the
relative performance of the selected material with the target material. Then all the results are
normalised for the purpose of analysing the results. Hence the proposed model reduces the
wastage of time in selection and also avoids the haphazardly selection of the materials in materials design and manufacturing industries.
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...ijistjournal
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. We suggest improvements to the existing C4.5 decision tree algorithm. In this paper attribute oriented induction (AOI) and relevance analysis are incorporated with concept hierarchy’s knowledge and HeightBalancePriority algorithm for construction of decision tree along with Multi level mining. The assignment of priorities to attributes is done by evaluating information entropy, at different levels of abstraction for building decision tree using HeightBalancePriority algorithm. Modified DMQL queries are used to understand and explore the shortcomings of the decision trees generated by C4.5 classifier for education dataset and the results are compared with the proposed approach.
Comparative study of various supervisedclassification methodsforanalysing def...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Data Science - Part V - Decision Trees & Random Forests Derek Kane
This lecture provides an overview of decision tree machine learning algorithms and random forest ensemble techniques. The practical example includes diagnosing Type II diabetes and evaluating customer churn in the telecommunication industry.
One of the most important problems in modern finance is finding efficient ways to summarize and visualize
the stock market data to give individuals or institutions useful information about the market behavior for
investment decisions Therefore, Investment can be considered as one of the fundamental pillars of national
economy. So, at the present time many investors look to find criterion to compare stocks together and
selecting the best and also investors choose strategies that maximize the earning value of the investment
process. Therefore the enormous amount of valuable data generated by the stock market has attracted
researchers to explore this problem domain using different methodologies. Therefore research in data
mining has gained a high attraction due to the importance of its applications and the increasing generation
information. So, Data mining tools such as association rule, rule induction method and Apriori algorithm
techniques are used to find association between different scripts of stock market, and also much of the
research and development has taken place regarding the reasons for fluctuating Indian stock exchange.
But, now days there are two important factors such as gold prices and US Dollar Prices are more
dominating on Indian Stock Market and to find out the correlation between gold prices, dollar prices and
BSE index statistical correlation is used and this helps the activities of stock operators, brokers, investors
and jobbers. They are based on the forecasting the fluctuation of index share prices, gold prices, dollar
prices and transactions of customers. Hence researcher has considered these problems as a topic for
research.
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...theijes
Data mining works to extract information known in advance from the enormous quantities of data which can lead to knowledge. It provides information that helps to make good decisions. The effectiveness of data mining in access to knowledge to achieve the goal of which is the discovery of the hidden facts contained in databases and through the use of multiple technologies. Clustering is organizing data into clusters or groups such that they have high intra-cluster similarity and low inter cluster similarity. This paper deals with K-means clustering algorithm which collect a number of data based on the characteristics and attributes of this data, and process the Clustering by reducing the distances between the data center. This algorithm is applied using open source tool called WEKA, with the Insurance dataset as its input
Advanced Computational Intelligence: An International Journal (ACII)aciijournal
Today, enormous amount of data is collected in medical databases. These databases may contain valuable
information encapsulated in nontrivial relationships among symptoms and diagnoses. Extracting such
dependencies from historical data is much easier to done by using medical systems. Such knowledge can be
used in future medical decision making. In this paper, a new algorithm based on C4.5 to mind data for
medince applications proposed and then it is evaluated against two datasets and C4.5 algorithm in terms of
accuracy.
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
Distributed digital artifacts incorporate cryptographic hash values to URI called trusty URIs in a distributed environment
building good in quality, verifiable and unchangeable web resources to prevent the rising man in the middle attack. The greatest
challenge of a centralized system is that it gives users no possibility to check whether data have been modified and the communication
is limited to a single server. As a solution for this, is the distributed digital artifact system, where resources are distributed among
different domains to enable inter-domain communication. Due to the emerging developments in web, attacks have increased rapidly,
among which man in the middle attack (MIMA) is a serious issue, where user security is at its threat. This work tries to prevent MIMA
to an extent, by providing self reference and trusty URIs even when presented in a distributed environment. Any manipulation to the
data is efficiently identified and any further access to that data is blocked by informing user that the uniform location has been
changed. System uses self-reference to contain trusty URI for each resource, lineage algorithm for generating seed and SHA-512 hash
generation algorithm to ensure security. It is implemented on the semantic web, which is an extension to the world wide web, using
RDF (Resource Description Framework) to identify the resource. Hence the framework was developed to overcome existing
challenges by making the digital artifacts on the semantic web distributed to enable communication between different domains across
the network securely and thereby preventing MIMA.
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryIJERA Editor
Theimmense volumes of data are populated into repositories from various applications. In order to find out desired information and knowledge from large datasets, the data mining techniques are very much helpful. Classification is one of the knowledge discovery techniques. In Classification, Decision trees are very popular in research community due to simplicity and easy comprehensibility. This paper presentsan updated review of recent developments in the field of decision trees.
With the development of database, the data volume stored in database increases rapidly and in the large
amounts of data much important information is hidden. If the information can be extracted from the
database they will create a lot of profit for the organization. The question they are asking is how to extract
this value. The answer is data mining. There are many technologies available to data mining practitioners,
including Artificial Neural Networks, Genetics, Fuzzy logic and Decision Trees. Many practitioners are
wary of Neural Networks due to their black box nature, even though they have proven themselves in many
situations. This paper is an overview of artificial neural networks and questions their position as a
preferred tool by data mining practitioners.
Efficient classification of big data using vfdt (very fast decision tree)eSAT Journals
Abstract
Decision Tree learning algorithms have been able to capture knowledge successfully. Decision Trees are best considered when
instances are described by attribute-value pairs and when the target function has a discrete value. The main task of these
decision trees is to use inductive methods to the given values of attributes of an unknown object and determine an
appropriate classification by applying decision tree rules. Decision Trees are very effective forms to evaluate the performance
and represent the algorithms because of their robustness, simplicity, capability of handling numerical and categorical data,
ability to work with large datasets and comprehensibility to a name a few. There are various decision tree algorithms available
like ID3, CART, C4.5, VFDT, QUEST, CTREE, GUIDE, CHAID, CRUISE, etc. In this paper a comparative study on three of
these popular decision tree algorithms - (Iterative Dichotomizer 3), C4.5 which is an evolution of ID3 and VFDT (Very
Fast Decision Tree has been made. An empirical study has been conducted to compare C4.5 and VFDT in terms of accuracy
and execution time and various conclusions have been drawn.
Key Words: Decision tree, ID3, C4.5, VFDT, Information Gain, Gain Ratio, Gini Index, Over−fitting.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Abstract Learning Analytics by nature relies on computational information processing activities intended to extract from raw data some interesting aspects that can be used to obtain insights into the behaviors of learners, the design of learning experiences, etc. There is a large variety of computational techniques that can be employed, all with interesting properties, but it is the interpretation of their results that really forms the core of the analytics process. As a rising subject, data mining and business intelligence are playing an increasingly important role in the decision support activity of every walk of life. The Variance Rover System (VRS) mainly focused on the large data sets obtained from online web visiting and categorizing this into clusters according some similarity and the process of predicting customer behavior and selecting actions to influence that behavior to benefit the company, so as to take optimized and beneficial decisions of business expansion. Keywords: Analytics, Business intelligence, Clustering, Data Mining, Standard K-means, Optimized K-means
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINEaciijournal
Today, enormous amount of data is collected in medical databases. These databases may contain valuable
information encapsulated in nontrivial relationships among symptoms and diagnoses. Extracting such
dependencies from historical data is much easier to done by using medical systems. Such knowledge can be
used in future medical decision making. In this paper, a new algorithm based on C4.5 to mind data for
medince applications proposed and then it is evaluated against two datasets and C4.5 algorithm in terms of
accuracy.
Survey on Various Classification Techniques in Data Miningijsrd.com
Dynamic Classification is an information mining (machine learning) strategy used to anticipate bunch participation for information cases. In this paper, we show the essential arrangement systems. A few significant sorts of arrangement technique including induction, Bayesian networks, k-nearest neighbor classifier, case-based reasoning, genetic algorithm and fuzzy logic techniques. The objective of this review is to give a complete audit of distinctive characterization procedures in information mining.
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION cscpconf
While designing a new type of engineering material one has to search for some existing
materials which suits design requirement and then he can try to produce new kind of
engineering material. This selection process itself is tedious as he has to select few numbers of
materials out of a set of lakhs of materials. Therefore in this paper a model is proposed to select
a particular material which suits the user requirement, by using some similarity/distance
measuring functionalities. Here thirteen different types of similarity/distance measuring
functionalities are examined. Performance Index Measure(PIM) is calculated to verify the
relative performance of the selected material with the target material. Then all the results are
normalised for the purpose of analysing the results. Hence the proposed model reduces the
wastage of time in selection and also avoids the haphazardly selection of the materials in materials design and manufacturing industries.
EXTRACTING USEFUL RULES THROUGH IMPROVED DECISION TREE INDUCTION USING INFORM...ijistjournal
Classification is widely used technique in the data mining domain, where scalability and efficiency are the immediate problems in classification algorithms for large databases. We suggest improvements to the existing C4.5 decision tree algorithm. In this paper attribute oriented induction (AOI) and relevance analysis are incorporated with concept hierarchy’s knowledge and HeightBalancePriority algorithm for construction of decision tree along with Multi level mining. The assignment of priorities to attributes is done by evaluating information entropy, at different levels of abstraction for building decision tree using HeightBalancePriority algorithm. Modified DMQL queries are used to understand and explore the shortcomings of the decision trees generated by C4.5 classifier for education dataset and the results are compared with the proposed approach.
Comparative study of various supervisedclassification methodsforanalysing def...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Data Science - Part V - Decision Trees & Random Forests Derek Kane
This lecture provides an overview of decision tree machine learning algorithms and random forest ensemble techniques. The practical example includes diagnosing Type II diabetes and evaluating customer churn in the telecommunication industry.
One of the most important problems in modern finance is finding efficient ways to summarize and visualize
the stock market data to give individuals or institutions useful information about the market behavior for
investment decisions Therefore, Investment can be considered as one of the fundamental pillars of national
economy. So, at the present time many investors look to find criterion to compare stocks together and
selecting the best and also investors choose strategies that maximize the earning value of the investment
process. Therefore the enormous amount of valuable data generated by the stock market has attracted
researchers to explore this problem domain using different methodologies. Therefore research in data
mining has gained a high attraction due to the importance of its applications and the increasing generation
information. So, Data mining tools such as association rule, rule induction method and Apriori algorithm
techniques are used to find association between different scripts of stock market, and also much of the
research and development has taken place regarding the reasons for fluctuating Indian stock exchange.
But, now days there are two important factors such as gold prices and US Dollar Prices are more
dominating on Indian Stock Market and to find out the correlation between gold prices, dollar prices and
BSE index statistical correlation is used and this helps the activities of stock operators, brokers, investors
and jobbers. They are based on the forecasting the fluctuation of index share prices, gold prices, dollar
prices and transactions of customers. Hence researcher has considered these problems as a topic for
research.
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...theijes
Data mining works to extract information known in advance from the enormous quantities of data which can lead to knowledge. It provides information that helps to make good decisions. The effectiveness of data mining in access to knowledge to achieve the goal of which is the discovery of the hidden facts contained in databases and through the use of multiple technologies. Clustering is organizing data into clusters or groups such that they have high intra-cluster similarity and low inter cluster similarity. This paper deals with K-means clustering algorithm which collect a number of data based on the characteristics and attributes of this data, and process the Clustering by reducing the distances between the data center. This algorithm is applied using open source tool called WEKA, with the Insurance dataset as its input
Advanced Computational Intelligence: An International Journal (ACII)aciijournal
Today, enormous amount of data is collected in medical databases. These databases may contain valuable
information encapsulated in nontrivial relationships among symptoms and diagnoses. Extracting such
dependencies from historical data is much easier to done by using medical systems. Such knowledge can be
used in future medical decision making. In this paper, a new algorithm based on C4.5 to mind data for
medince applications proposed and then it is evaluated against two datasets and C4.5 algorithm in terms of
accuracy.
A Threshold fuzzy entropy based feature selection method applied in various b...IJMER
Large amount of data have been stored and manipulated using various database
technologies. Processing all the attributes for the particular means is the difficult task. To avoid such
difficulties, feature selection process is processed.In this paper,we are collect a eight various benchmark
datasets from UCI repository.Feature selection process is carried out using fuzzy entropy based
relevance measure algorithm and follows three selection strategies like Mean selection strategy,Half
selection strategy and Neural network for threshold selection strategy. After the features are selected,
they are evaluated using Radial Basis Function (RBF) network,Stacking,Bagging,AdaBoostM1 and Antminer
classification methodologies.The test results depicts that Neural network for threshold selection
strategy works well in selecting features and Ant-miner methodology works best in bringing out better
accuracy with selected feature than processing with original dataset.The obtained result of this
experiment shows that clearly the Ant-miner is superiority than other classifiers.Thus, this proposed Antminer
algorithm could be a more suitable method for producing good results with fewer features than
the original datasets.
Correlation of artificial neural network classification and nfrs attribute fi...eSAT Journals
Abstract
Mostly 5 to 15% of the women in the stage of reproduction face the disease called Polycystic Ovarian Syndrome (PCOS) which is the multifaceted, heterogeneous and complex. The long term consequences diseases like endometrial hyperplasia, type 2 diabetes mellitus and coronary disease are caused by the polycystic ovaries, chronic anovulation and hyperandrogenism are characterized with the resistance of insulin and the hypertension, abdominal obesity and dyslipidemia and hyperinsulinemia are called as Metabolic syndrome (frequent metabolic traits) The above cause the common disease called Anovulatory infertility. Computer based information along with advanced Data mining techniques are used for appropriate results. Classification is a classic data mining task, with roots in machine learning. Naïve Bayesian, Artificial Neural Network, Decision Tree, Support Vector Machines are the classification tasks in the data mining. Feature selection methods involve generation of the subset, evaluation of each subset, criteria for stopping the search and validation procedures. The characteristics of the search method used are important with respect to the time efficiency of the feature selection methods. PCA (Principle Component Analysis), Information gain Subset Evaluation, Fuzzy rough set evaluation, Correlation based Feature Selection (CFS) are some of the feature selection techniques, greedy first search, ranker etc are the search algorithms that are used in the feature selection. In this paper, a new algorithm which is based on Fuzzy neural subset evaluation and artificial neural network is proposed which reduces the task of classification and feature selection separately. This algorithm combines the neural fuzzy rough subset evaluation and artificial neural network together for the better performance than doing the tasks separately.
Keywords: ANN, SVM, PCA, CFS
Applying Classification Technique using DID3 Algorithm to improve Decision Su...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Predictionijtsrd
Data mining techniques play an important role in data analysis. For the construction of a classification model which could predict performance of students, particularly for engineering branches, a decision tree algorithm associated with the data mining techniques have been used in the research. A number of factors may affect the performance of students. Data mining technology which can related to this student grade well and we also used classification algorithms prediction. In this paper, we used educational data mining to predict students final grade based on their performance. We proposed student data classification using ID3 Iterative Dichotomiser 3 Decision Tree Algorithm Khin Khin Lay | San San Nwe "Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26545.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26545/using-id3-decision-tree-algorithm-to-the-student-grade-analysis-and-prediction/khin-khin-lay
Machine Learning Approaches and its Challengesijcnes
Real world data sets considerably is not in a proper manner. They may lead to have incomplete or missing values. Identifying a missed attributes is a challenging task. To impute the missing data, data preprocessing has to be done. Data preprocessing is a data mining process to cleanse the data. Handling missing data is a crucial part in any data mining techniques. Major industries and many real time applications hardly worried about their data. Because loss of data leads the company growth goes down. For example, health care industry has many datas about the patient details. To diagnose the particular patient we need an exact data. If these exist missing attribute values means it is very difficult to retain the datas. Considering the drawback of missing values in the data mining process, many techniques and algorithms were implemented and many of them not so efficient. This paper tends to elaborate the various techniques and machine learning approaches in handling missing attribute values and made a comparative analysis to identify the efficient method.
Evaluating the efficiency of rule techniques for file classificationeSAT Journals
Abstract Text mining refers to the process of deriving high quality information from text. It is also known as knowledge discovery from text (KDT), deals with the machine supported analysis of text. It is used in various areas such as information retrieval, marketing, information extraction, natural language processing, document similarity, and so on. Document Similarity is one of the important techniques in text mining. In document similarity, the first and foremost step is to classify the files based on their category. In this research work, various classification rule techniques are used to classify the computer files based on their extensions. For example, the extension of computer files may be pdf, doc, ppt, xls, and so on. There are several algorithms for rule classifier such as decision table, JRip, Ridor, DTNB, NNge, PART, OneR and ZeroR. In this research work, three classification algorithms namely decision table, DTNB and OneR classifiers are used for performing classification of computer files based on their extension. The results produced by these algorithms are analyzed by using the performance factors classification accuracy and error rate. From the experimental results, DTNB proves to be more efficient than other two techniques. Index Terms: Data mining, Text mining, Classification, Decision table, DTNB, OneR
This slide describe all the necessary topic on Data-Mining. Even this covered all the important Questions on Data Mining in Graduation Level. Basically it covers the actual 2 and 4 marks questions along with the answers that you will need after.
Different Classification Technique for Data mining in Insurance Industry usin...IOSRjournaljce
this paper addresses the issues and techniques for Property/Casualty actuaries applying data mining methods. Data mining means the effective unknown pattern discovery from a large amount database. It is an interactive knowledge discovery procedure which is includes data acquisition, data integration, data exploration, model building, and model validation. The paper provides an overview of the data discovery method and introduces some important data mining method for application to insurance concluding cluster discovery approaches.
A Decision tree is termed as good DT when it has small size and when new data is introduced it can be classified accurately. Pre-processing the input data is one of the good approaches for generating a good DT. When different data pre-processing methods are used with the combination of DT classifier it evaluates to give high performance. This paper involves the accuracy variation in the ID3 classifier when used in combination with different data pre-processing and feature selection method. The performances of DTs are produced from comparison of original and pre-processed input data and experimental results are shown by using standard decision tree algorithm-ID3 on a dataset.
A study and survey on various progressive duplicate detection mechanismseSAT Journals
Abstract One of the serious problems faced in several applications with personal details management, customer affiliation management, data mining, etc is duplicate detection. This survey deals with the various duplicate record detection techniques in both small and large datasets. To detect the duplicity with less time of execution and also without disturbing the dataset quality, methods like Progressive Blocking and Progressive Neighborhood are used. Progressive sorted neighborhood method also called as PSNM is used in this model for finding or detecting the duplicate in a parallel approach. Progressive Blocking algorithm works on large datasets where finding duplication requires immense time. These algorithms are used to enhance duplicate detection system. The efficiency can be doubled over the conventional duplicate detection method using this algorithm. Severa
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Similar to CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY (20)
A NEW DATA ENCODER AND DECODER SCHEME FOR NETWORK ON CHIPEditor IJMTER
System-on-chip (soc) based system has so many disadvantages in power-dissipation as
well as clock rate while the data transfer from one system to another system in on-chip. At the same
time, a higher operated system does not support the lower operated bus network for data transfer.
However an alternative scheme is proposed for high speed data transfer. But this scheme is limited to
SOCs. Unlike soc, network-on-chip (NOC) has so many advantages for data transfer. It has a special
feature to transfer the data in on-chip named as transitional encoder. Its operation is based on input
transitions. At the same time it supports systems which are higher operated frequencies. In this
project, a low-power encoding scheme is proposed. The proposed system yields lower dynamic
power dissipation due to the reduction of switching activity and coupling switching activity when
compared to existing system. Even-though many factors which is based on power dissipation, the
dynamic power dissipation is only considerable for reasonable advantage. The proposed system is
synthesized using quartus II 9.1 software. Besides, the proposed system will be extended up to
interlink PE communication with help of routers and PE’s which are performed by various
operations. To implement this system in real NOC’s contains the proposed encoders and decoders for
data transfer with regular traffic scenarios should be considered.
A RESEARCH - DEVELOP AN EFFICIENT ALGORITHM TO RECOGNIZE, SEPARATE AND COUNT ...Editor IJMTER
Coins are important part of our life. We use coins in a places like stores, banks, buses, trains
etc. So it becomes a basic need that coins can be sorted, counted automatically. For this, there is
necessary that the coins can be recognized automatically. Automated Coin Recognition System for the
Indian Coins of Rs. 1, 2, 5 and 10 with the rotation invariance. We have taken images from the both
sides of coin. So this system is capable to recognizing coins from both sides. Features are taken from the
images using techniques as a Hough Transformation, Pattern Averaging etc.
Analysis of VoIP Traffic in WiMAX EnvironmentEditor IJMTER
Worldwide Interoperability for Microwave Access (WiMAX) is currently one of the
hottest technologies in wireless communication. It is a standard based on the IEEE 802.16 wireless
technology that provides a very high throughput broadband connections over long distances. In
parallel, Voice Over Internet Protocol (VoIP) is a new technology which provides access to voice
communication over internet protocol and hence it is becomes an alternative to public switched
telephone networks (PSTN) due to its capability of transmission of voice as packets over IP
networks. A lot of research has been done in analyzing the performances of VoIP traffic over
WiMAX network. In this paper we review the analysis carried out by several authors for the most
common VoIP codec’s which are G.711, G.723.1 and G.729 over a WiMAX network using various
service classes. The objective is to compare the results for different types of service classes with
respect to the QoS parameters such as throughput, average delay and average jitter.
A Hybrid Cloud Approach for Secure Authorized De-DuplicationEditor IJMTER
The cloud backup is used for the personal storage of the people in terms of reducing the
mainlining process and managing the structure and storage space managing process. The challenging
process is the deduplication process in both the local and global backup de-duplications. In the prior
work they only provide the local storage de-duplication or vice versa global storage de-duplication in
terms of improving the storage capacity and the processing time. In this paper, the proposed system
is called as the ALG- Dedupe. It means the Application aware Local-Global Source De-duplication
proposed system to provide the efficient de-duplication process. It can provide the efficient deduplication process with the low system load, shortened backup window, and increased power
efficiency in the user’s personal storage. In the proposed system the large data is partitioned into
smaller part which is called as chunks of data. Here the data may contain the redundancy it will be
avoided before storing into the storage area.
Aging protocols that could incapacitate the InternetEditor IJMTER
The biggest threat to the Internet is the fact that it was never really designed. For e.g., the
BGP protocol is used by Internet routers to exchange information about changes to the Internet's
network topologies. However, it also is among the most fundamentally broken; as Internet routing
information can be poisoned with bogus routing information. Instead, it evolved in fits and start,
thanks to various protocols that have been cobbled together to fulfill the needs of the moment. Few
of protocols from them were designed with security in mind. or if they were sported no more than
was needed to keep out a nosy neighbor, not a malicious attacker. The result is a welter of aging
protocols susceptible to exploit on an Internet scale. Here are six Internet protocols that could stand
to be replaced sooner rather than later or are (mercifully) on the way out.
A Cloud Computing design with Wireless Sensor Networks For Agricultural Appli...Editor IJMTER
The emergence of exactitude agriculture has been promoted by the numerous developments within
the field of wireless sensing element actor networks (WSAN). These WSANs offer important data
for gathering, work management, development of crops, and limitation of crop diseases. Goals of
this paper to introducing cloud computing as a brand new way (technique) to be utilized in addition
to WSANs to any enhance their application and benefits to the area of agriculture.
A CAR POOLING MODEL WITH CMGV AND CMGNV STOCHASTIC VEHICLE TRAVEL TIMESEditor IJMTER
Carpooling (also car-sharing, ride-sharing, lift-sharing), is the sharing of car journeys so
that more than one person travels in a car. It helps to resolve a variety of problems that continue to
plague urban areas, ranging from energy demands and traffic congestion to environmental pollution.
Most of the existing method used stochastic disturbances arising from variations in vehicle travel
times for carpooling. However it doesn’t deal with the unmet demand with uncertain demand of the
vehicle for car pooling. To deal with this the proposed system uses Chance constrained
formulation/Programming (CCP) approach of the problem with stochastic demand and travel time
parameters, under mild assumptions on the distribution of stochastic parameters; and relates it with a
robust optimization approach. Since real problem sizes can be large, it could be difficult to find
optimal solutions within a reasonable period of time. Therefore solution algorithm using tabu
heuristic solution approach is developed to solve the model. Therefore, we constructed a stochastic
carpooling model that considers the in- fluence of stochastic travel times. The model is formulated as
an integer multiple commodity network flow problem. Since real problem sizes can be large, it could
be difficult to find optimal solutions within a reasonable period of time.
Sustainable Construction With Foam Concrete As A Green Green Building MaterialEditor IJMTER
A green building is an environmentally sustainable building, designed, constructed and
operated to minimise the total environmental impacts. Carbon dioxide (CO2) is the primary
greenhouse gas emitted through human activities. It is claimed that 5% of the world’s carbon dioxide
emission is attributed to cement industry, which is the vital constituent of concrete. Due to the
significant contribution to the environmental pollution, there is a need for finding an optimal solution
along with satisfying the civil construction needs. Apart from normal concrete bricks, a clay brick,
Foam concrete is a new innovative technology for sustainable building and civil construction which
fulfills the criteria of being a Green Material. This paper concludes that Foam Concrete can be an
effective sustainable material for construction and also focuses on the cost effectiveness in using
Foam Concrete as a building material in replacement with Clay Brick or other bricks.
USE OF ICT IN EDUCATION ONLINE COMPUTER BASED TESTEditor IJMTER
A good education system is required for overall prosperity of a nation. A tremendous
growth in the education sector had made the administration of education institutions complex. Any
researches reveal that the integration of ICT helps to reduce the complexity and enhance the overall
administration of education. This study has been undertaken to identify the various functional areas
to which ICT is deployed for information administration in education institutions and to find the
current extent of usage of ICT in all these functional areas pertaining to information administration.
The various factors that contribute to these functional areas were identified. A theoretical model was
derived and validated.
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
Data partitioning methods are used to partition the data values with similarity. Similarity
measures are used to estimate transaction relationships. Hierarchical clustering model produces tree
structured results. Partitioned clustering produces results in grid format. Text documents are
unstructured data values with high dimensional attributes. Document clustering group ups unlabeled text
documents into meaningful clusters. Traditional clustering methods require cluster count (K) for the
document grouping process. Clustering accuracy degrades drastically with reference to the unsuitable
cluster count.
Textual data elements are divided into two types’ discriminative words and nondiscriminative
words. Only discriminative words are useful for grouping documents. The involvement of
nondiscriminative words confuses the clustering process and leads to poor clustering solution in return.
A variation inference algorithm is used to infer the document collection structure and partition of
document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition
documents. DPM clustering model uses both the data likelihood and the clustering property of the
Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to
discover the latent cluster structure based on the DPM model. DPMFP clustering is performed without
requiring the number of clusters as input.
Document labels are used to estimate the discriminative word identification process. Concept
relationships are analyzed with Ontology support. Semantic weight model is used for the document
similarity analysis. The system improves the scalability with the support of labels and concept relations
for dimensionality reduction process.
Testing of Matrices Multiplication Methods on Different ProcessorsEditor IJMTER
There are many algorithms we found for matrices multiplication. Until now it has been
found that complexity of matrix multiplication is O(n3). Though Further research found that this
complexity can be decreased. This paper focus on the algorithm and its complexity of matrices
multiplication methods.
Malware is a worldwide pandemic. It is designed to damage computer systems without
the knowledge of the owner using the system. Software‟s from reputable vendors also contain
malicious code that affects the system or leaks information‟s to remote servers. Malware‟s includes
computer viruses, spyware, dishonest ad-ware, rootkits, Trojans, dialers etc. Malware detectors are
the primary tools in defense against malware. The quality of such a detector is determined by the
techniques it uses. It is therefore imperative that we study malware detection techniques and
understand their strengths and limitations. This survey examines different types of Malware and
malware detection methods.
SURVEY OF TRUST BASED BLUETOOTH AUTHENTICATION FOR MOBILE DEVICEEditor IJMTER
Practical requirements for securely demonstrating identities between two handheld
devices are an important concern. The adversary can inject a Man-In- The-Middle (MITM) attack to
intrude the protocol. Protocols that employ secret keys require the devices to share private
information in advance, in which it is not feasible in the above scenario. Apart from insecurely
typing passwords into handheld devices or comparing long hexadecimal keys displayed on the
devices’ screen, many other human-verifiable protocols have been proposed in the literature to solve
the problem. Unfortunately, most of these schemes are unsalable to more users. Even when there are
only three entities attempt to agree a session key, these protocols need to be rerun for three times.
So, in the existing method a bipartite and a tripartite authentication protocol is presented using a
temporary confidential channel. Besides, further extend the system into a transitive authentication
protocol that allows multiple handheld devices to establish a conference key securely and efficiently.
But this method detects only the outsider attacks. Method does not consider the insider attacks. So,
in the proposed method trust score based method is introduced which computes the trust values for
the nodes and provide the security. The trust score is computed has a positive influence on the
confidence with which an entity conducts transactions with that node. Network the behavior of the
node will be monitored periodically and its trust value is also updated .So depending on the behavior
of the node in the network trust relation will be established between two nodes.
GLAUCOMA is a chronic eye disease that can damage optic nerve. According to WHO It
is the second leading cause of blindness, and is predicted to affect around 80 million people by 2020.
Development of the disease leads to loss of vision, which occurs increasingly over a long period of
time. As the symptoms only occur when the disease is quite advanced so that glaucoma is called the
silent thief of sight. Glaucoma cannot be cured, but its development can be slowed down by
treatment. Therefore, detecting glaucoma in time is critical. However, many glaucoma patients are
unaware of the disease until it has reached its advanced stage. In this paper, some manual and
automatic methods are discussed to detect glaucoma. Manual analysis of the eye is time consuming
and the accuracy of the parameter measurements also varies with different clinicians. To overcome
these problems with manual analysis, the objective of this survey is to introduce a method to
automatically analyze the ultrasound images of the eye. Automatic analysis of this disease is much
more effective than manual analysis.
Survey: Multipath routing for Wireless Sensor NetworkEditor IJMTER
Reliability is playing very vital role in some application of Wireless Sensor Networks
and multipath routing is one of the ways to increase the probability of reliability. More over energy
consumption is constraint. In this paper, we provide a survey of the state-of-the-art of proposed
multipath routing algorithm for Wireless Sensor Networks. We study the design, analyze the tradeoff
of each design, and overview several presenting algorithms.
Step up DC-DC Impedance source network based PMDC Motor DriveEditor IJMTER
This paper is devoted to the Quasi Z source network based DC Drive. The cascaded
(two-stage) Quasi Z Source network could be derived by the adding of one diode, one inductor,
and two capacitors to the traditional quasi-Z-source inverter The proposed cascaded qZSI inherits all
the advantages of the traditional solution (voltage boost and buck functions in a single stage,
continuous input current, and improved reliability). Moreover, as compared to the conventional qZSI,
the proposed solution reduces the shoot-through duty cycle by over 30% at the same voltage boost
factor. Theoretical analysis of the two-stage qZSI in the shoot-through and non-shoot-through
operating modes is described. The proposed and traditional qZSI-networks are compared. A
prototype of a Quasi Z Source network based DC Drive was built to verify the theoretical
assumptions. The experimental results are presented and analyzed.
SPIRITUAL PERSPECTIVE OF AUROBINDO GHOSH’S PHILOSOPHY IN TODAY’S EDUCATIONEditor IJMTER
The paper reflects the spiritual philosophy of Aurobindo Ghosh which is helpful in today’s
education. In 19th century he wrote about spirituality, in accordance with that it is a core and vital part
of today’s education. It is very much essential for today’s kid. Here I propose the overview of that
philosophy.At the utmost regeneration of those values in today’s generation is the great deal with
education system. To develop the values and spiritual education in the youngers is the great moto of
mine. It is the materialistic world and without value redefinition among them is the harder task but not
difficult.
Software Quality Analysis Using Mutation Testing SchemeEditor IJMTER
The software test coverage is used measure the safety measures. The safety critical analysis is
carried out for the source code designed in Java language. Testing provides a primary means for
assuring software in safety-critical systems. To demonstrate, particularly to a certification authority, that
sufficient testing has been performed, it is necessary to achieve the test coverage levels recommended or
mandated by safety standards and industry guidelines. Mutation testing provides an alternative or
complementary method of measuring test sufficiency, but has not been widely adopted in the safetycritical industry. The system provides an empirical evaluation of the application of mutation testing to
airborne software systems which have already satisfied the coverage requirements for certification.
The system mutation testing to safety-critical software developed using high-integrity subsets of
C and Ada, identify the most effective mutant types and analyze the root causes of failures in test cases.
Mutation testing could be effective where traditional structural coverage analysis and manual peer
review have failed. They also show that several testing issues have origins beyond the test activity and
this suggests improvements to the requirements definition and coding process. The system also
examines the relationship between program characteristics and mutation survival and considers how
program size can provide a means for targeting test areas most likely to have dormant faults. Industry
feedback is also provided, particularly on how mutation testing can be integrated into a typical
verification life cycle of airborne software. The system also covers the safety and criticality levels of
Java source code.
Software Defect Prediction Using Local and Global AnalysisEditor IJMTER
The software defect factors are used to measure the quality of the software. The software
effort estimation is used to measure the effort required for the software development process. The defect
factor makes an impact on the software development effort. Software development and cost factors are
also decided with reference to the defect and effort factors. The software defects are predicted with
reference to the module information. Module link information are used in the effort estimation process.
Data mining techniques are used in the software analysis process. Clustering techniques are used
in the property grouping process. Rule mining methods are used to learn rules from clustered data
values. The “WHERE” clustering scheme and “WHICH” rule mining scheme are used in the defect
prediction and effort estimation process. The system uses the module information for the defect
prediction and effort estimation process.
The proposed system is designed to improve the defect prediction and effort estimation process.
The Single Objective Genetic Algorithm (SOGA) is used in the clustering process. The rule learning
operations are carried out sing the Apriori algorithm. The system improves the cluster accuracy levels.
The defect prediction and effort estimation accuracy is also improved by the system. The system is
developed using the Java language and Oracle relation database environment.
Software Cost Estimation Using Clustering and Ranking SchemeEditor IJMTER
Software cost estimation is an important task in the software design and development process.
Planning and budgeting tasks are carried out with reference to the software cost values. A variety of
software properties are used in the cost estimation process. Hardware, products, technology and
methodology factors are used in the cost estimation process. The software cost estimation quality is
measured with reference to the accuracy levels.
Software cost estimation is carried out using three types of techniques. They are regression based
model, anology based model and machine learning model. Each model has a set of technique for the
software cost estimation process. 11 cost estimation techniques fewer than 3 different categories are
used in the system. The Attribute Relational File Format (ARFF) is used maintain the software product
property values. The ARFF file is used as the main input for the system.
The proposed system is designed to perform the clustering and ranking of software cost
estimation methods. Non overlapped clustering technique is enhanced with optimal centroid estimation
mechanism. The system improves the clustering and ranking process accuracy. The system produces
efficient ranking results on software cost estimation methods.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfKamal Acharya
The College Bus Management system is completely developed by Visual Basic .NET Version. The application is connect with most secured database language MS SQL Server. The application is develop by using best combination of front-end and back-end languages. The application is totally design like flat user interface. This flat user interface is more attractive user interface in 2017. The application is gives more important to the system functionality. The application is to manage the student’s details, driver’s details, bus details, bus route details, bus fees details and more. The application has only one unit for admin. The admin can manage the entire application. The admin can login into the application by using username and password of the admin. The application is develop for big and small colleges. It is more user friendly for non-computer person. Even they can easily learn how to manage the application within hours. The application is more secure by the admin. The system will give an effective output for the VB.Net and SQL Server given as input to the system. The compiled java program given as input to the system, after scanning the program will generate different reports. The application generates the report for users. The admin can view and download the report of the data. The application deliver the excel format reports. Because, excel formatted reports is very easy to understand the income and expense of the college bus. This application is mainly develop for windows operating system users. In 2017, 73% of people enterprises are using windows operating system. So the application will easily install for all the windows operating system users. The application-developed size is very low. The application consumes very low space in disk. Therefore, the user can allocate very minimum local disk space for this application.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSEDuvanRamosGarzon1
AIRCRAFT GENERAL
The Single Aisle is the most advanced family aircraft in service today, with fly-by-wire flight controls.
The A318, A319, A320 and A321 are twin-engine subsonic medium range aircraft.
The family offers a choice of engines
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
1. International Journal of Modern Trends in Engineering and
Research
www.ijmter.com
e-ISSN: 2349-9745
p-ISSN: 2393-8161
@IJMTER-2014, All rights Reserved 57
CLASSIFICATION ALGORITHM USING RANDOM
CONCEPT ON A VERY LARGE DATA SET: A SURVEY
Pooja Sharma1
, Annu Mishra2
1
Computer Science & engineering,BU-UIT
2
Computer Science & engineering, BU-UIT
Abstract— Data mining environment produces a large amount of data, that need to be
analyses, pattern have to be extracted from that to gain knowledge. In this new period with
rumble of data both ordered and unordered, by using traditional databases and architectures, it
has become difficult to process, manage and analyses patterns. To gain knowledge about the
Big Data a proper architecture should be understood. Classification is an important data mining
technique with broad applications to classify the various kinds of data used in nearly every
field of our life. Classification is used to classify the item according to the features of the item
with respect to the predefined set of classes. This paper provides an inclusive survey of
different classification algorithms and put a light on various classification algorithms including
j48, C4.5, k-nearest neighbor classifier, Naive Bayes, SVM etc., using random concept.
Keywords - Classification, Classifier, Large Data, Random concept
I. INTRODUCTION
Data Mining is the technology to extract the knowledge from the data. It is used to explore and
analyze the same. The data to be mined varies from a small data set to a large data set i.e. big
data. Data Mining has also been termed as data dredging, data archaeology, information
discovery or information harvesting depending upon the area where it is being used [3]. Data
mining is growing in various applications widely like analysis of organic compounds, medicals
diagnosis, product design, targeted marketing, credit card fraud detection, financial forecasting,
automatic abstraction, predicting shares of television audiences etc. Data mining refers to the
analysis of the large quantities of data that are stored in computers. To discover previously
unknown, valid patterns and relationships in large data set data mining involves the use of
sophisticated data analysis tools [6]. These tools can include statistical models, mathematical
algorithm and machine learning methods. Classification techniques in data mining are capable
of processing a large amount of data. It classifies data based on training set and class labels and
can predict categorical class labels and hence can be used for classifying newly available data
[4]. Thus it can be outlined as an inevitable part of data mining and is gaining more popularity.
The benefit of analyzing the pattern and association in the data is to set the trend in the market,
to understand customers, analyse demands, and predict future possibilities in every aspect [3].
2. International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 01, Issue 04, [October - 2014]
e-ISSN: 2349-9745
p-ISSN: 2393-8161
@IJMTER-2014, All rights Reserved 58
II. LITERATURE SURVEY
2.1 Data Mining
Data mining is the process of discovering interesting knowledge, such as associations, patterns,
changes, significant structures and anomalies, from large amounts of data stored in databases or
data warehouses or other information repositories. Data mining consists of five major elements:
• Data warehouse system load, extract, transforms transaction data.
• In a multidimensional Database system Store and manage the data.
• For the business analysts and Information technology professionals provide data access.
• By application software analysis of data done.
• Data is presented in a useful format, such as a graph or Table [5].
2.2 Large Data
Large Data is usually defined in terms of the 3Vs: volume, velocity, variety. The large amount
of data being processed by the Data Mining environment. In other words, it is the collection of
data sets large and complex that it becomes difficult to process using on hand database
management tools or traditional data processing applications, so data mining tools were used.
Large Data are about turning unstructured, invaluable, imperfect, complex data into usable
information [7] [10]. Data have hidden information in them and to extract this new
information; interrelationship among the data has to be achieved. Information may be retrieved
from a hidden or a complex data set [8]. Browsing through a large data set would be difficult
and time consuming, we have to follow certain protocols, a proper algorithm and method is
needed to classify the data, find a suitable pattern among them. The standard data analysis
method such as classification, clustering, factorial, analysis need to be extended to get the
information and extract new knowledge treasure.
2.3 Classification
Classification is the most frequently used data mining task. Classification maps the data in to
predefined targets. Classification is a supervised learning because the targets are predefined
[16]. The aim of the classification is to build a classifier based on some cases with some
attributes to describe the objects or one attribute to describe the group of the objects [9]. Then,
the classifier is used to predict the group of attributes of new cases from the domain based on
the values of other attributes.
III. DATA MINING CLASSIFICATION METHODS
3.1 C4.5 Algorithm
To generate a decision trees C4.5 is a well-known algorithm, is an extension of the ID3
algorithm used to overcome its drawbacks. The decision trees generated by the C4.5 algorithm
can be used for classification, and for this reason, C4.5 is also referred to as a statistical
classifier. The C4.5 algorithm made a number of changes to improve ID3 algorithm [2]. Some
of these are:
• For missing values of attributes handling training data.
3. International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 01, Issue 04, [October - 2014]
e-ISSN: 2349-9745
p-ISSN: 2393-8161
@IJMTER-2014, All rights Reserved 59
• Differing cost attributes is handled.
• The decision tree is pruned after creation.
• Discrete and continuous values, attributes are handled.
Let the training data be a set S = s1, s2 ... of already classified samples. Each sample Si = xl,
x2... is a vector where xl, x2 ... represent attributes or features of the sample. The training data
is a vector C = c1, c2..., where c1, c2... represent the class to which each sample belongs to [8].
At each node of the tree, C4.5 chooses one attribute of the data that most effectively splits data
set of samples S into subsets that can be one class or the other. It is the normalized information
gain (difference in entropy) that results from choosing an attribute for splitting the data. The
attribute factor with the highest normalized information gain is considered to make the decision
[13]. The C4.5 algorithm then continues on the smaller sub-lists having next highest
normalized information gain. C4.5 uses a metric called "information gain," which is defined by
subtracting conditional entropy from the base entropy; that is, Gain (P|X) =E (P)-E (P|X). This
computation does not, in itself, produce anything new. However, it allows you to measure a
gain ratio. Gain ratio, defined as Gain Ratio (P|X) =Gain (P|X)/E(X), where(X) is the entropy
of the examples relative only to the attribute. It has an enhanced method of tree pruning that
reduces misclassification errors due noise or too much detail in the training data set. It uses
gain ratio impurity method to evaluate the splitting attribute. Decision trees are built in C4.5 by
using a set of training data or data sets as in ID3. At each node of the tree, C4.5 chooses one
attribute of the data that most effectively splits its set of samples into subsets enriched in one
class or the other [1]. Its criterion is the normalized information gain (difference in entropy)
[11] that results from choosing an attribute for splitting the data. The attribute with the highest
normalized information gain is chosen to make the decision.
a. Naive Bayes Algorithm
In simple terms, a naive bayes classifier assumes that the value of a particular feature is
unrelated to the presence or absence of any other feature, given the class variable. For example,
a fruit may be considered to be an apple if it is red, round, and about 3" in diameter. A naive
bayes classifier considers each of these features to contribute independently to the probability
that this fruit is an apple, regardless of the presence or absence of the other features [3].
For some types of probability models, naive bayes classifiers can be trained very efficiently in
a supervised learning setting. In many practical applications, parameter estimation for naive
bayes models uses the method of maximum likelihood; in other words, one can work with the
naive bayes model without accepting Bayesian probability or using any Bayesian methods.
An advantage of naive bayes is that it only requires a small amount of training data to estimate
the parameters (means and variances of the variables) necessary for classification. Because
independent variables are assumed, only the variances of the variables for each class need to be
determined and not the entire covariance matrix.
The Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes'
theorem with strong (naive) independence assumptions. Bayesian Classification provides a
useful perspective for understanding and evaluating many learning algorithms [12]. It
calculates explicit probabilities for hypothesis and it is robust to noise in input data. It improves
the classification performance by removing the irrelevant features and its computational time is
4. International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 01, Issue 04, [October - 2014]
e-ISSN: 2349-9745
p-ISSN: 2393-8161
@IJMTER-2014, All rights Reserved 60
short, but the naive bayes classifier requires a very large number of records to obtain good
results and it is instance-based or lazy in that they store all of the training samples abstractly.
b. Support Vector Machine Algorithm
A support vector machine constructs a hyperplane or set of hyperplanes in a high- or infinite-
dimensional space, which can be used for classification, regression, or other tasks. Intuitively, a
good separation is achieved by the hyperplane that has the largest distance to the nearest
training data point of any class (so-called functional margin), since in general the larger the
margin the lower the generalization error of the classifier [28].
Whereas the original problem may be stated in a finite dimensional space, it often happens that
the sets to discriminate are not linearly separable in that space. For this reason, it was proposed
that the original finite-dimensional space be mapped into a much higher-dimensional space,
presumably making the separation easier in that space. To keep the computational load
reasonable, the mappings used by SVM schemes are designed to ensure that dot products may
be computed easily in terms of the variables in the original space, by defining them in terms of
a kernel function selected to suit the problem. It produce very accurate classifier,
robust to noise especially popular in text classification problem where very high dimensional
space are the norms but computationally SVM is expensive thus runs slow.
3.4 K-nearest neighbors Algorithm
In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-
parametric method used for classification and regression. In both cases, the input consists of
the k closest training examples in the feature space [19]. The output depends on whether k-NN
is used for classification or regression:
• In k-NN classification, the output is a class membership. An object is classified by a
majority vote of its neighbors, with the object being assigned to the class most common
among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the
object is simply assigned to the class of that single nearest neighbor.
• In k-NN regression, the output is the property value for the object. This value is the
average of the values of its k nearest neighbors.
K-NN is a type of instance-based learning, or lazy learning, where the function is only
approximated locally and all computation is deferred until classification. The k-NN algorithm
is among the simplest of all machine learning algorithms.
Both for classification and regression, it can be useful to weight the contributions of the
neighbors, so that the nearer neighbors contribute more to the average than the more distant
ones. For example, a common weighting scheme consists in giving each neighbor a weight of
1/d, where d is the distance to the neighbor.
A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data. K-
NN requires an integer k, a training data set and a metric to measure closeness.
5. International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 01, Issue 04, [October - 2014]
e-ISSN: 2349-9745
p-ISSN: 2393-8161
@IJMTER-2014, All rights Reserved 61
3.5 J48 Algorithm
J48 classifier is a simple C4.5 decision tree for classification. It creates a binary tree. The
decision tree approach is most useful in classification problem. With this technique, a tree is
constructed to model the classification process. Once the tree is built, it is applied to each tuple
in the database and results in classification for that tuple [17] [18].
While building a tree, J48 ignores the missing values i.e. the value for that item can be
predicted based on what is known about the attribute values for the other records. The basic
idea is to divide the data into range based on the attribute values for that item that are found in
the training sample. J48 allows classification via either decision trees or rules generated from
them.
IV. COMPARATIVE STUDY OF SOME CLASSIFICATION ALGORITHM
S.NO ALGORITHM ADVANTAGES DISADVANTAGES
1. Random J48 1) It produces the accurate
result.
2) This algorithm also handles
the missing values in the
training data
3) Both the discrete and
continuous attributes are
handled [20] by this
algorithm.
4) No empty branches.
5) Use split info and gain
ratio.
1) The depth of the tree is linked
to tree size and the run-time
complexity of the algorithm
matches to the tree depth, which
cannot be greater than the
attributes.
2) Space complexity is very large as
we have to store the values
repeatedly in arrays [14] [15].
2. Random
C4.5
1) Produces the accurate
result.
2)Distribution of trees is
uniform [22]
3) Less Memory for large
program execution.
4) Model build time is less.
5) Searching time is short.
1) Performance of randomized
tree with classication noise is not
as good [21].
2) Problem of Over fitting.
3) Can’t deal with missing
values.
3. Random Forest 1) Almost always have lower
classification error.
2)Far easier for humans to
understand
3) Deal really well with
uneven data sets that have
missing variables.
Train faster
1)The high classification error
rate while training set is small in
comparison with the number of
classes.[27]
2) Not do well with imbalance
data.
3) Need to discrete data for some
particular construction
algorithm.
6. International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 01, Issue 04, [October - 2014]
e-ISSN: 2349-9745
p-ISSN: 2393-8161
@IJMTER-2014, All rights Reserved 62
4. Random
k-Nearest
Neighbour
1) It is a more effective and
more efficient model for high-
dimensional data
2) Applied to both qualitative
and quantitative responses.
3) It is significantly more stable
& & more robust for feature
selection when the input data
are noisy and unbalanced.
4)For multimodal classes it is
well suited.[23]
1) For local structure of the data
it is sensitive.
2) Memory limitation.
3) Being a supervised, it is lazy
learning algorithm i.e., runs
slowly.
5. Random
Naïve Byes
1) Suitable for applications
where computational power
and memory are limited
2) Remove irrelevant feature
for improving the
performance.
1) Require very large number of
records to obtain good results.
2) All the variables are
uncorrelated to each other.
3) It is really fragile to over
fitting.[25][26]
6. Random
SVM
1) Prediction accuracy is
generally high
2) Robust works when
training examples contain
errors.[24]
3) Especially popular in text
classification problems.
4) Fast evaluation of the
learned target function
1) Difficult to incorporate in
domain knowledge
2) SVM is a binary classifier. To
do a multi-class classification,
pair-wise classifications can be
used.
3) Long training time so
computationally expensive.
V. CONCLUSION
At present data mining is an important area of research and due to Increase in the amount of
data it becomes difficult to handle the large data, methodologies associated with different
algorithms used to handle such large data sets. Classification is a very suitable for solving the
problems of data mining because the classification techniques show how a data can be
determined and grouped when a new set of data is available. Each classification technique has
got its own advantages and disadvantages as given in the paper. We select the required
classification algorithm as our requirement. According to our theoretical study, the
performances of the algorithms are strongly depends on the entropy, information gain, gini
index and the features of the data sets. The big advantage of a decision tree classifier is that it
doesn't require a lot of information about the data to create a tree that could be very accurate
and very informative and, the knowledge in decision tree represented in form of [IF-THEN]
rules which is easier for humans understand.
7. International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 01, Issue 04, [October - 2014]
e-ISSN: 2349-9745
p-ISSN: 2393-8161
@IJMTER-2014, All rights Reserved 63
Decision Tree's algorithms (C4.5, j48) have less error rate and it is easier algorithm as
compared to other classification algorithms.
REFERENCES
[1] S.Archana, Dr.K.Elangovan “Survey of Classification Techniques in Data Mining” International Journal
of Computer Science and Mobile Applications, Vol.2 Issue. 2, February 2014, pg. 65-71.
[2] RanshulChaudhary, Prabhdeep Singh, Rajiv Mahajan “A Survey on Data Mining Techniques”
International Journal of Advanced Research in Computer and Communication Engineering Vol. 3 Issue
1, January 2014, pg. 5002-5003.
[3] Tina R. Patil, Mrs. S. S. Sherekar “Performance Analysis of Naive Bayes and J48 Classification
Algorithm for Data Classification” International Journal of Computer Science and Applications Vol. 6
No.2, April 2013, pg. 256-261.
[4] Ms.Aparna Raj, Mrs.Bincy G, Mrs.T.Mathu“Survey on Common Data Mining Classification
Techniques” International Journal of Wisdom Based Computing, Vol. 2(1), April 2012, pg. 12-15.
[5] Nikita Jain, Vishal Srivastava “Data Mining Techniques: A Survey Paper” IJRET: International Journal
of Research in Engineering and Technology Vol. 2 Issue 11, November 2013, pg. 116-119.
[6] Ingo Mierswa, Michael Wurst, Ralf Klinkenberg, Martin Scholz, Timm Euler, “YALE: rapid prototyping
for complex data mining tasks”, KDD '06 Proceedings of the 12th ACM SIGKDD international
conference on Knowledge discovery and data mining, pg. 935-940.
[7] UN Global Pulse “Big Data for Development: Challenges and Opportunities”, May 2012.
[8] DelveenLuqmanAbdAl.Nabi, ShereenShukri Ahmed, “Survey on Classification Algorithms for Data
Mining :( Comparison and Evaluation)” Vol.4, March 2013, pg. 18-25.
[9] A.ShameemFathima,
D.Manimegalai and NisarHundewale “A Review of Data Mining Classification
Techniques Applied for Diagnosis and Prognosis of the Arbovirus-Dengue” IJCSI International Journal
of Computer Science Issues, Vol. 8 Issue 6, November 2011, pg. 322-328.
[10] ChanchalYadav, Shuliang Wang, Manoj Kumar “Algorithm and approaches to handle large Data-A
Survey” IJCSN International Journal of Computer Science and Network, Vol. 2 Issue 3, April 2013, pg.
56-60.
[11] Mohd. Mahmood Ali1, Mohd. S. Qaseem, Lakshmi Rajamani, A. Govardhan “Extracting Useful Rules
through Improved Decision Tree Induction Using Information Entropy” International Journal of
Information Sciences and Techniques (IJIST) Vol.3 No.1, January 2013,pg. 27-41.
[12] Ali, M.M. ,Rajamani, L “Decision tree induction: Priority classification” International Conference on
Advances in Engineering, Science and Management (ICAESM), March 2012, pg. 668-673.
[13] Mr. Brijain R Patel, Mr. Kushik K Rana “A Survey on Decision Tree Algorithm for Classification”
International Journal of Engineering Development and Research, Vol. 2 Issue 1, March 2014, pg.20-24.
[14] Juneja, Deepti “A novel approach to construct decision tree using quick C4.5 algorithm” Oriental Journal
of Computer Science & Technology Vol. 3(2), February 2013, pg. 305-310.
[15] Dr. Neeraj Bhargava, Girja Sharma, Manish Mathuria “Decision Tree Analysis on J48 Algorithm for
Data Mining” International Journal of Advanced Research in Computer Science and Software
Engineering Vol. 3 Issues 6, June 2013, pg. 1114-1119.
[16] Tulips Angel Thankachan1, Dr. Kumudha Raimond “A Survey on Classification and Rule Extraction
Techniques for Datamining” IOSR Journal of Computer Engineering (IOSR-JCE) Vol. 8 Issue 5,
February 2013, pg. 75-78.
[17] Margaret H. Danham,S. Sridhar, “ Data mining, Introductory and Advanced Topics”, Person education,
1st ed., 2006.
[18] Aman Kumar Sharma, Suruchi Sahni, “A Comparative Study of Classification Algorithms for Spam
Email Data Analysis”, IJCSE, Vol. 3 No. 5, May 2011, pg. 1890-1895.
[19] Bhaskar N. Patel, Satish G. Prajapati and Dr.Kamaljit I. Lakhtaria “Efficient Classification of Data Using
Decision Tree” Bonfring International Journal of Data Mining, Vol. 2 No. 1, March 2012, pg. 6-12.
[20] Anshul Goyal, Rajni Mehta “Performance Comparison of Naïve Bayes and J48 Classification
Algorithms” International Journal of Applied Engineering Research, ISSN 0973-4562 Vol.7, November
2012, pg. 1-5.
8. International Journal of Modern Trends in Engineering and Research (IJMTER)
Volume 01, Issue 04, [October - 2014]
e-ISSN: 2349-9745
p-ISSN: 2393-8161
@IJMTER-2014, All rights Reserved 64
[21] Thomas G.Dietterich “An Experimental Comparison Of Three Methods for Constructing Ensembles of
Decision Trees: Bagging, Boosting, And Randomization” Department Of Computer Science, Oregon
State University, Corvallis, 1999, pg. 1-22.
[22] Suban Ravichandran, Vijay Bhanu Srinivasan and Chandrasekaran Ramasamy “Comparative Study on
Decision Tree Techniques for Mobile Call Detail Record” Journal of communication and computer 9
,December 2012, pg. 1331-1335.
[23] http://search.proquest.com/docview/30503150
[24] Bjorn Waske, Jon Atli Benediktsson “Sensitivity of Support Vector Machines to Random Feature
Selection in Classification of Hyperspectral Data” IEEE transaction on geoscience and remote Sensing,
Vol. 48, No. 7, July 2010, pg. 2880-2889.
[25] Juan J. Rodr ́ıguez and Ludmila I. Kuncheva “Naive Bayes Ensembles with a Random Oracle” Springer-
Verlag Berlin Heidelberg 2007, pg. 450-458.
[26] Martin Godec, Christian Leistner, Amir Saffari, Horst Bischof “On-line Random Naive Bayes for
Tracking” IEEE (ICPR), August 2010, pg. 3545-3548.
[27] http://www.dabi.temple.edu/~hbling/8590.002/Montillo_RandomForests_4-2-2009.pdf.
[28] http://cs229.stanford.edu/notes/cs229notes3.pdf