Distributed digital artifacts incorporate cryptographic hash values to URI called trusty URIs in a distributed environment
building good in quality, verifiable and unchangeable web resources to prevent the rising man in the middle attack. The greatest
challenge of a centralized system is that it gives users no possibility to check whether data have been modified and the communication
is limited to a single server. As a solution for this, is the distributed digital artifact system, where resources are distributed among
different domains to enable inter-domain communication. Due to the emerging developments in web, attacks have increased rapidly,
among which man in the middle attack (MIMA) is a serious issue, where user security is at its threat. This work tries to prevent MIMA
to an extent, by providing self reference and trusty URIs even when presented in a distributed environment. Any manipulation to the
data is efficiently identified and any further access to that data is blocked by informing user that the uniform location has been
changed. System uses self-reference to contain trusty URI for each resource, lineage algorithm for generating seed and SHA-512 hash
generation algorithm to ensure security. It is implemented on the semantic web, which is an extension to the world wide web, using
RDF (Resource Description Framework) to identify the resource. Hence the framework was developed to overcome existing
challenges by making the digital artifacts on the semantic web distributed to enable communication between different domains across
the network securely and thereby preventing MIMA.
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
Institution is a place where teacher explains and student just understands and learns the lesson. Every student has his own definition for toughness and easiness and there isn’t any absolute scale for measuring knowledge but examination score indicate the performance of student. In this case study, knowledge of data mining is combined with educational strategies to improve students’ performance. Generally, data mining (sometimes called data or knowledge discovery) is the process of analysing data from different perspectives and summarizing it into useful information. Data mining software is one of a number of analytical tools for data. It allows users to analyse data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational database. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).This project describes the use of clustering data mining technique to improve the efficiency of academic performance in the educational institutions .In this project, a live experiment was conducted on students .By conducting an exam on students of computer science major using MOODLE(LMS) and analysing that data generated using RapidMiner(Datamining Software) and later by performing clustering on the data. This method helps to identify the students who need special advising or counselling by the teacher to give high quality of education.
A statistical data fusion technique in virtual data integration environmentIJDKP
Data fusion in the virtual data integration environment starts after detecting and clustering duplicated
records from the different integrated data sources. It refers to the process of selecting or fusing attribute
values from the clustered duplicates into a single record representing the real world object. In this paper, a
statistical technique for data fusion is introduced based on some probabilistic scores from both data
sources and clustered duplicates
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryIJERA Editor
Theimmense volumes of data are populated into repositories from various applications. In order to find out desired information and knowledge from large datasets, the data mining techniques are very much helpful. Classification is one of the knowledge discovery techniques. In Classification, Decision trees are very popular in research community due to simplicity and easy comprehensibility. This paper presentsan updated review of recent developments in the field of decision trees.
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER
Data mining environment produces a large amount of data that need to be analyzed.
Using traditional databases and architectures, it has become difficult to process, manage and analyze
patterns. To gain knowledge about the Big Data a proper architecture should be understood.
Classification is an important data mining technique with broad applications to classify the various
kinds of data used in nearly every field of our life. Classification is used to classify the item
according to the features of the item with respect to the predefined set of classes. This paper put a
light on various classification algorithms including j48, C4.5, Naive Bayes using large dataset.
Recommendation system using bloom filter in mapreduceIJDKP
Many clients like to use the Web to discover product details in the form of online reviews. The reviews are
provided by other clients and specialists. Recommender systems provide an important response to the
information overload problem as it presents users more practical and personalized information facilities.
Collaborative filtering methods are vital component in recommender systems as they generate high-quality
recommendations by influencing the likings of society of similar users. The collaborative filtering method
has assumption that people having same tastes choose the same items. The conventional collaborative
filtering system has drawbacks as sparse data problem & lack of scalability. A new recommender system is
required to deal with the sparse data problem & produce high quality recommendations in large scale
mobile environment. MapReduce is a programming model which is widely used for large-scale data
analysis. The described algorithm of recommendation mechanism for mobile commerce is user based
collaborative filtering using MapReduce which reduces scalability problem in conventional CF system.
One of the essential operations for the data analysis is join operation. But MapReduce is not very
competent to execute the join operation as it always uses all records in the datasets where only small
fraction of datasets are applicable for the join operation. This problem can be reduced by applying
bloomjoin algorithm. The bloom filters are constructed and used to filter out redundant intermediate
records. The proposed algorithm using bloom filter will reduce the number of intermediate results and will
improve the join performance.
Efficient classification of big data using vfdt (very fast decision tree)eSAT Journals
Abstract
Decision Tree learning algorithms have been able to capture knowledge successfully. Decision Trees are best considered when
instances are described by attribute-value pairs and when the target function has a discrete value. The main task of these
decision trees is to use inductive methods to the given values of attributes of an unknown object and determine an
appropriate classification by applying decision tree rules. Decision Trees are very effective forms to evaluate the performance
and represent the algorithms because of their robustness, simplicity, capability of handling numerical and categorical data,
ability to work with large datasets and comprehensibility to a name a few. There are various decision tree algorithms available
like ID3, CART, C4.5, VFDT, QUEST, CTREE, GUIDE, CHAID, CRUISE, etc. In this paper a comparative study on three of
these popular decision tree algorithms - (Iterative Dichotomizer 3), C4.5 which is an evolution of ID3 and VFDT (Very
Fast Decision Tree has been made. An empirical study has been conducted to compare C4.5 and VFDT in terms of accuracy
and execution time and various conclusions have been drawn.
Key Words: Decision tree, ID3, C4.5, VFDT, Information Gain, Gain Ratio, Gini Index, Over−fitting.
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
Institution is a place where teacher explains and student just understands and learns the lesson. Every student has his own definition for toughness and easiness and there isn’t any absolute scale for measuring knowledge but examination score indicate the performance of student. In this case study, knowledge of data mining is combined with educational strategies to improve students’ performance. Generally, data mining (sometimes called data or knowledge discovery) is the process of analysing data from different perspectives and summarizing it into useful information. Data mining software is one of a number of analytical tools for data. It allows users to analyse data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational database. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).This project describes the use of clustering data mining technique to improve the efficiency of academic performance in the educational institutions .In this project, a live experiment was conducted on students .By conducting an exam on students of computer science major using MOODLE(LMS) and analysing that data generated using RapidMiner(Datamining Software) and later by performing clustering on the data. This method helps to identify the students who need special advising or counselling by the teacher to give high quality of education.
A statistical data fusion technique in virtual data integration environmentIJDKP
Data fusion in the virtual data integration environment starts after detecting and clustering duplicated
records from the different integrated data sources. It refers to the process of selecting or fusing attribute
values from the clustered duplicates into a single record representing the real world object. In this paper, a
statistical technique for data fusion is introduced based on some probabilistic scores from both data
sources and clustered duplicates
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryIJERA Editor
Theimmense volumes of data are populated into repositories from various applications. In order to find out desired information and knowledge from large datasets, the data mining techniques are very much helpful. Classification is one of the knowledge discovery techniques. In Classification, Decision trees are very popular in research community due to simplicity and easy comprehensibility. This paper presentsan updated review of recent developments in the field of decision trees.
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER
Data mining environment produces a large amount of data that need to be analyzed.
Using traditional databases and architectures, it has become difficult to process, manage and analyze
patterns. To gain knowledge about the Big Data a proper architecture should be understood.
Classification is an important data mining technique with broad applications to classify the various
kinds of data used in nearly every field of our life. Classification is used to classify the item
according to the features of the item with respect to the predefined set of classes. This paper put a
light on various classification algorithms including j48, C4.5, Naive Bayes using large dataset.
Recommendation system using bloom filter in mapreduceIJDKP
Many clients like to use the Web to discover product details in the form of online reviews. The reviews are
provided by other clients and specialists. Recommender systems provide an important response to the
information overload problem as it presents users more practical and personalized information facilities.
Collaborative filtering methods are vital component in recommender systems as they generate high-quality
recommendations by influencing the likings of society of similar users. The collaborative filtering method
has assumption that people having same tastes choose the same items. The conventional collaborative
filtering system has drawbacks as sparse data problem & lack of scalability. A new recommender system is
required to deal with the sparse data problem & produce high quality recommendations in large scale
mobile environment. MapReduce is a programming model which is widely used for large-scale data
analysis. The described algorithm of recommendation mechanism for mobile commerce is user based
collaborative filtering using MapReduce which reduces scalability problem in conventional CF system.
One of the essential operations for the data analysis is join operation. But MapReduce is not very
competent to execute the join operation as it always uses all records in the datasets where only small
fraction of datasets are applicable for the join operation. This problem can be reduced by applying
bloomjoin algorithm. The bloom filters are constructed and used to filter out redundant intermediate
records. The proposed algorithm using bloom filter will reduce the number of intermediate results and will
improve the join performance.
Efficient classification of big data using vfdt (very fast decision tree)eSAT Journals
Abstract
Decision Tree learning algorithms have been able to capture knowledge successfully. Decision Trees are best considered when
instances are described by attribute-value pairs and when the target function has a discrete value. The main task of these
decision trees is to use inductive methods to the given values of attributes of an unknown object and determine an
appropriate classification by applying decision tree rules. Decision Trees are very effective forms to evaluate the performance
and represent the algorithms because of their robustness, simplicity, capability of handling numerical and categorical data,
ability to work with large datasets and comprehensibility to a name a few. There are various decision tree algorithms available
like ID3, CART, C4.5, VFDT, QUEST, CTREE, GUIDE, CHAID, CRUISE, etc. In this paper a comparative study on three of
these popular decision tree algorithms - (Iterative Dichotomizer 3), C4.5 which is an evolution of ID3 and VFDT (Very
Fast Decision Tree has been made. An empirical study has been conducted to compare C4.5 and VFDT in terms of accuracy
and execution time and various conclusions have been drawn.
Key Words: Decision tree, ID3, C4.5, VFDT, Information Gain, Gain Ratio, Gini Index, Over−fitting.
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
Data mining environment produces a large amount of data, that need to be
analyses, pattern have to be extracted from that to gain knowledge. In this new period with
rumble of data both ordered and unordered, by using traditional databases and architectures, it
has become difficult to process, manage and analyses patterns. To gain knowledge about the
Big Data a proper architecture should be understood. Classification is an important data mining
technique with broad applications to classify the various kinds of data used in nearly every
field of our life. Classification is used to classify the item according to the features of the item
with respect to the predefined set of classes. This paper provides an inclusive survey of
different classification algorithms and put a light on various classification algorithms including
j48, C4.5, k-nearest neighbor classifier, Naive Bayes, SVM etc., using random concept.
Data mining is a process to extract information from a huge amount of data and transform it into an
understandable structure. Data mining provides the number of tasks to extract data from large databases such
as Classification, Clustering, Regression, Association rule mining. This paper provides the concept of
Classification. Classification is an important data mining technique based on machine learning which is used to
classify the each item on the bases of features of the item with respect to the predefined set of classes or groups.
This paper summarises various techniques that are implemented for the classification such as k-NN, Decision
Tree, Naïve Bayes, SVM, ANN and RF. The techniques are analyzed and compared on the basis of their
advantages and disadvantages
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...theijes
Data mining works to extract information known in advance from the enormous quantities of data which can lead to knowledge. It provides information that helps to make good decisions. The effectiveness of data mining in access to knowledge to achieve the goal of which is the discovery of the hidden facts contained in databases and through the use of multiple technologies. Clustering is organizing data into clusters or groups such that they have high intra-cluster similarity and low inter cluster similarity. This paper deals with K-means clustering algorithm which collect a number of data based on the characteristics and attributes of this data, and process the Clustering by reducing the distances between the data center. This algorithm is applied using open source tool called WEKA, with the Insurance dataset as its input
Abstract In this paper, the concept of data mining was summarized and its significance towards its methodologies was illustrated. The data mining based on Neural Network and Genetic Algorithm is researched in detail and the key technology and ways to achieve the data mining on Neural Network and Genetic Algorithm are also surveyed. This paper also conducts a formal review of the area of rule extraction from ANN and GA. Keywords: Data Mining, Neural Network, Genetic Algorithm, Rule Extraction.
J48 and JRIP Rules for E-Governance DataCSCJournals
Data are any facts, numbers, or text that can be processed by a computer. Data Mining is an analytic process which designed to explore data usually large amounts of data. Data Mining is often considered to be \"a blend of statistics. In this paper we have used two data mining techniques for discovering classification rules and generating a decision tree. These techniques are J48 and JRIP. Data mining tools WEKA is used in this paper.
With the development of database, the data volume stored in database increases rapidly and in the large
amounts of data much important information is hidden. If the information can be extracted from the
database they will create a lot of profit for the organization. The question they are asking is how to extract
this value. The answer is data mining. There are many technologies available to data mining practitioners,
including Artificial Neural Networks, Genetics, Fuzzy logic and Decision Trees. Many practitioners are
wary of Neural Networks due to their black box nature, even though they have proven themselves in many
situations. This paper is an overview of artificial neural networks and questions their position as a
preferred tool by data mining practitioners.
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...cscpconf
In the present study, the abilities of three classification methods of data mining namely artificial
neural networks with feed-forward back propagation algorithm, J48 decision tree method and
logistic regression analysis are compared in a medical real dataset. The prediction of
malignancy in suspected thyroid tumour patients is the objective of the study. The accuracy of
the correct predictions (the minimum error rate), the amount of time consuming in the
modelling process and the interpretability and simplicity of the results for clinical experts are
the factors considered to choose the best method
Hypothesis on Different Data Mining AlgorithmsIJERA Editor
In this paper, different classification algorithms for data mining are discussed. Data Mining is about
explaining the past & predicting the future by means of data analysis. Classification is a task of data mining,
which categories data based on numerical or categorical variables. To classify the data many algorithms are
proposed, out of them five algorithms are comparatively studied for data mining through classification. There are
four different classification approaches namely Frequency Table, Covariance Matrix, Similarity Functions &
Others. As work for research on classification methods, algorithms like Naive Bayesian, K Nearest Neighbors,
Decision Tree, Artificial Neural Network & Support Vector Machine are studied & examined using benchmark
datasets like Iris & Lung Cancer.
Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...Waqas Tariq
Machine Learning aims to generate classifying expressions simple enough to be understood easily by the human. There are many machine learning approaches available for classification. Among which decision tree learning is one of the most popular classification algorithms. In this paper we propose a systematic approach based on decision tree which is used to automatically determine the patient’s post–operative recovery status. Decision Tree structures are constructed, using data mining methods and then are used to classify discharge decisions.
Introduction to feature subset selection methodIJSRD
Data Mining is a computational progression to ascertain patterns in hefty data sets. It has various important techniques and one of them is Classification which is receiving great attention recently in the database community. Classification technique can solve several problems in different fields like medicine, industry, business, science. PSO is based on social behaviour for optimization problem. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Rough Set Theory (RST) is a mathematical tool which deals with the uncertainty and vagueness of the decision systems.
An Analysis of Outlier Detection through clustering methodIJAEMSJORNAL
This research paper deals with an outlier which is known as an unusual behavior of any substance present in the spot. This is a detection process that can be employed for both anomaly detection and abnormal observation. This can be obtained through other members who belong to that data set. The deviation present in the outlier process can be attained by measuring certain terms like range, size, activity, etc. By detecting outlier one can easily reject the negativity present in the field. For instance, in healthcare, the health condition of a person can be determined through his latest health report or his regular activity. When found the person being inactive there may be a chance for that person to be sick. Many approaches have been used in this research paper for detecting outliers. The approaches used in this research are 1) Centroid based approach based on K-Means and Hierarchical Clustering algorithm and 2) through Clustering based approach. This approach may help in detecting outlier by grouping all similar elements in the same group. For grouping, the elements clustering method paves a way for it. This research paper will be based on the above mentioned 2 approaches.
Incremental learning from unbalanced data with concept class, concept drift a...IJDKP
Recently, stream data mining applications has drawn vital attention from several research communities.
Stream data is continuous form of data which is distinguished by its online nature. Traditionally, machine
learning area has been developing learning algorithms that have certain assumptions on underlying
distribution of data such as data should have predetermined distribution. Such constraints on the problem
domain lead the way for development of smart learning algorithms performance is theoretically verifiable.
Real-word situations are different than this restricted model. Applications usually suffers from problems
such as unbalanced data distribution. Additionally, data picked from non-stationary environments are also
usual in real world applications, resulting in the “concept drift” which is related with data stream
examples. These issues have been separately addressed by the researchers, also, it is observed that joint
problem of class imbalance and concept drift has got relatively little research. If the final objective of
clever machine learning techniques is to be able to address a broad spectrum of real world applications,
then the necessity for a universal framework for learning from and tailoring (adapting) to, environment
where drift in concepts may occur and unbalanced data distribution is present can be hardly exaggerated.
In this paper, we first present an overview of issues that are observed in stream data mining scenarios,
followed by a complete review of recent research in dealing with each of the issue.
Evaluating the efficiency of rule techniques for file classificationeSAT Journals
Abstract Text mining refers to the process of deriving high quality information from text. It is also known as knowledge discovery from text (KDT), deals with the machine supported analysis of text. It is used in various areas such as information retrieval, marketing, information extraction, natural language processing, document similarity, and so on. Document Similarity is one of the important techniques in text mining. In document similarity, the first and foremost step is to classify the files based on their category. In this research work, various classification rule techniques are used to classify the computer files based on their extensions. For example, the extension of computer files may be pdf, doc, ppt, xls, and so on. There are several algorithms for rule classifier such as decision table, JRip, Ridor, DTNB, NNge, PART, OneR and ZeroR. In this research work, three classification algorithms namely decision table, DTNB and OneR classifiers are used for performing classification of computer files based on their extension. The results produced by these algorithms are analyzed by using the performance factors classification accuracy and error rate. From the experimental results, DTNB proves to be more efficient than other two techniques. Index Terms: Data mining, Text mining, Classification, Decision table, DTNB, OneR
Comprehensive Survey of Data Classification & Prediction Techniquesijsrd.com
In this paper, we present an literature survey of the modern data classification and prediction algorithms. All these algorithms are very important in real world applications like- heart disease prediction, cancer prediction etc. Classification of data is a very popular and computationally expensive task. The fundamentals of data classification are also discussed in brief.
Enhanced Quality of Service Based Routing Protocol Using Hybrid Ant Colony Op...Editor IJCATR
The main problem of QoS routing is to setup a multicast hierarchy that may meet particular QoS constraint. In order to reduce the constraints of the earlier work a new improved technique is proposed in this work. In the proposed technique the issue of multi-cast tree is eliminated using clustering based technique. First of all multi-radio and multichannel based clustering is deployed and these cluster head are responsible
for the multicasting. It will diminish the overall energy consumption of nodes and complexity of intelligent algorithms. The path will be evaluated based upon the ant colony optimization. Thus it has produced better results than other techniques.
Using a Mobile Based Web Service to Search for Missing People – A Case Study ...Editor IJCATR
Being out of touch with a loved one is concerning and not hearing from someone you care about is terrifying. Several cases
of missing people have been reported for many years, where most of the searches turn out unsuccessful. In order to quickly reunite
families and friends with their missing loved ones, a solution for effectively searching for the missing people is presented. In
evaluation of this solution, an F1 score test was simulated using 20 scenarios, out of which an impressive score of 0.72 was attained.
The study concludes that we need to leverage on mobile based technology to device a more efficient method of finding missing
persons more easily and quickly.
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
Data mining environment produces a large amount of data, that need to be
analyses, pattern have to be extracted from that to gain knowledge. In this new period with
rumble of data both ordered and unordered, by using traditional databases and architectures, it
has become difficult to process, manage and analyses patterns. To gain knowledge about the
Big Data a proper architecture should be understood. Classification is an important data mining
technique with broad applications to classify the various kinds of data used in nearly every
field of our life. Classification is used to classify the item according to the features of the item
with respect to the predefined set of classes. This paper provides an inclusive survey of
different classification algorithms and put a light on various classification algorithms including
j48, C4.5, k-nearest neighbor classifier, Naive Bayes, SVM etc., using random concept.
Data mining is a process to extract information from a huge amount of data and transform it into an
understandable structure. Data mining provides the number of tasks to extract data from large databases such
as Classification, Clustering, Regression, Association rule mining. This paper provides the concept of
Classification. Classification is an important data mining technique based on machine learning which is used to
classify the each item on the bases of features of the item with respect to the predefined set of classes or groups.
This paper summarises various techniques that are implemented for the classification such as k-NN, Decision
Tree, Naïve Bayes, SVM, ANN and RF. The techniques are analyzed and compared on the basis of their
advantages and disadvantages
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...theijes
Data mining works to extract information known in advance from the enormous quantities of data which can lead to knowledge. It provides information that helps to make good decisions. The effectiveness of data mining in access to knowledge to achieve the goal of which is the discovery of the hidden facts contained in databases and through the use of multiple technologies. Clustering is organizing data into clusters or groups such that they have high intra-cluster similarity and low inter cluster similarity. This paper deals with K-means clustering algorithm which collect a number of data based on the characteristics and attributes of this data, and process the Clustering by reducing the distances between the data center. This algorithm is applied using open source tool called WEKA, with the Insurance dataset as its input
Abstract In this paper, the concept of data mining was summarized and its significance towards its methodologies was illustrated. The data mining based on Neural Network and Genetic Algorithm is researched in detail and the key technology and ways to achieve the data mining on Neural Network and Genetic Algorithm are also surveyed. This paper also conducts a formal review of the area of rule extraction from ANN and GA. Keywords: Data Mining, Neural Network, Genetic Algorithm, Rule Extraction.
J48 and JRIP Rules for E-Governance DataCSCJournals
Data are any facts, numbers, or text that can be processed by a computer. Data Mining is an analytic process which designed to explore data usually large amounts of data. Data Mining is often considered to be \"a blend of statistics. In this paper we have used two data mining techniques for discovering classification rules and generating a decision tree. These techniques are J48 and JRIP. Data mining tools WEKA is used in this paper.
With the development of database, the data volume stored in database increases rapidly and in the large
amounts of data much important information is hidden. If the information can be extracted from the
database they will create a lot of profit for the organization. The question they are asking is how to extract
this value. The answer is data mining. There are many technologies available to data mining practitioners,
including Artificial Neural Networks, Genetics, Fuzzy logic and Decision Trees. Many practitioners are
wary of Neural Networks due to their black box nature, even though they have proven themselves in many
situations. This paper is an overview of artificial neural networks and questions their position as a
preferred tool by data mining practitioners.
PREDICTION OF MALIGNANCY IN SUSPECTED THYROID TUMOUR PATIENTS BY THREE DIFFER...cscpconf
In the present study, the abilities of three classification methods of data mining namely artificial
neural networks with feed-forward back propagation algorithm, J48 decision tree method and
logistic regression analysis are compared in a medical real dataset. The prediction of
malignancy in suspected thyroid tumour patients is the objective of the study. The accuracy of
the correct predictions (the minimum error rate), the amount of time consuming in the
modelling process and the interpretability and simplicity of the results for clinical experts are
the factors considered to choose the best method
Hypothesis on Different Data Mining AlgorithmsIJERA Editor
In this paper, different classification algorithms for data mining are discussed. Data Mining is about
explaining the past & predicting the future by means of data analysis. Classification is a task of data mining,
which categories data based on numerical or categorical variables. To classify the data many algorithms are
proposed, out of them five algorithms are comparatively studied for data mining through classification. There are
four different classification approaches namely Frequency Table, Covariance Matrix, Similarity Functions &
Others. As work for research on classification methods, algorithms like Naive Bayesian, K Nearest Neighbors,
Decision Tree, Artificial Neural Network & Support Vector Machine are studied & examined using benchmark
datasets like Iris & Lung Cancer.
Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...Waqas Tariq
Machine Learning aims to generate classifying expressions simple enough to be understood easily by the human. There are many machine learning approaches available for classification. Among which decision tree learning is one of the most popular classification algorithms. In this paper we propose a systematic approach based on decision tree which is used to automatically determine the patient’s post–operative recovery status. Decision Tree structures are constructed, using data mining methods and then are used to classify discharge decisions.
Introduction to feature subset selection methodIJSRD
Data Mining is a computational progression to ascertain patterns in hefty data sets. It has various important techniques and one of them is Classification which is receiving great attention recently in the database community. Classification technique can solve several problems in different fields like medicine, industry, business, science. PSO is based on social behaviour for optimization problem. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Rough Set Theory (RST) is a mathematical tool which deals with the uncertainty and vagueness of the decision systems.
An Analysis of Outlier Detection through clustering methodIJAEMSJORNAL
This research paper deals with an outlier which is known as an unusual behavior of any substance present in the spot. This is a detection process that can be employed for both anomaly detection and abnormal observation. This can be obtained through other members who belong to that data set. The deviation present in the outlier process can be attained by measuring certain terms like range, size, activity, etc. By detecting outlier one can easily reject the negativity present in the field. For instance, in healthcare, the health condition of a person can be determined through his latest health report or his regular activity. When found the person being inactive there may be a chance for that person to be sick. Many approaches have been used in this research paper for detecting outliers. The approaches used in this research are 1) Centroid based approach based on K-Means and Hierarchical Clustering algorithm and 2) through Clustering based approach. This approach may help in detecting outlier by grouping all similar elements in the same group. For grouping, the elements clustering method paves a way for it. This research paper will be based on the above mentioned 2 approaches.
Incremental learning from unbalanced data with concept class, concept drift a...IJDKP
Recently, stream data mining applications has drawn vital attention from several research communities.
Stream data is continuous form of data which is distinguished by its online nature. Traditionally, machine
learning area has been developing learning algorithms that have certain assumptions on underlying
distribution of data such as data should have predetermined distribution. Such constraints on the problem
domain lead the way for development of smart learning algorithms performance is theoretically verifiable.
Real-word situations are different than this restricted model. Applications usually suffers from problems
such as unbalanced data distribution. Additionally, data picked from non-stationary environments are also
usual in real world applications, resulting in the “concept drift” which is related with data stream
examples. These issues have been separately addressed by the researchers, also, it is observed that joint
problem of class imbalance and concept drift has got relatively little research. If the final objective of
clever machine learning techniques is to be able to address a broad spectrum of real world applications,
then the necessity for a universal framework for learning from and tailoring (adapting) to, environment
where drift in concepts may occur and unbalanced data distribution is present can be hardly exaggerated.
In this paper, we first present an overview of issues that are observed in stream data mining scenarios,
followed by a complete review of recent research in dealing with each of the issue.
Evaluating the efficiency of rule techniques for file classificationeSAT Journals
Abstract Text mining refers to the process of deriving high quality information from text. It is also known as knowledge discovery from text (KDT), deals with the machine supported analysis of text. It is used in various areas such as information retrieval, marketing, information extraction, natural language processing, document similarity, and so on. Document Similarity is one of the important techniques in text mining. In document similarity, the first and foremost step is to classify the files based on their category. In this research work, various classification rule techniques are used to classify the computer files based on their extensions. For example, the extension of computer files may be pdf, doc, ppt, xls, and so on. There are several algorithms for rule classifier such as decision table, JRip, Ridor, DTNB, NNge, PART, OneR and ZeroR. In this research work, three classification algorithms namely decision table, DTNB and OneR classifiers are used for performing classification of computer files based on their extension. The results produced by these algorithms are analyzed by using the performance factors classification accuracy and error rate. From the experimental results, DTNB proves to be more efficient than other two techniques. Index Terms: Data mining, Text mining, Classification, Decision table, DTNB, OneR
Comprehensive Survey of Data Classification & Prediction Techniquesijsrd.com
In this paper, we present an literature survey of the modern data classification and prediction algorithms. All these algorithms are very important in real world applications like- heart disease prediction, cancer prediction etc. Classification of data is a very popular and computationally expensive task. The fundamentals of data classification are also discussed in brief.
Enhanced Quality of Service Based Routing Protocol Using Hybrid Ant Colony Op...Editor IJCATR
The main problem of QoS routing is to setup a multicast hierarchy that may meet particular QoS constraint. In order to reduce the constraints of the earlier work a new improved technique is proposed in this work. In the proposed technique the issue of multi-cast tree is eliminated using clustering based technique. First of all multi-radio and multichannel based clustering is deployed and these cluster head are responsible
for the multicasting. It will diminish the overall energy consumption of nodes and complexity of intelligent algorithms. The path will be evaluated based upon the ant colony optimization. Thus it has produced better results than other techniques.
Using a Mobile Based Web Service to Search for Missing People – A Case Study ...Editor IJCATR
Being out of touch with a loved one is concerning and not hearing from someone you care about is terrifying. Several cases
of missing people have been reported for many years, where most of the searches turn out unsuccessful. In order to quickly reunite
families and friends with their missing loved ones, a solution for effectively searching for the missing people is presented. In
evaluation of this solution, an F1 score test was simulated using 20 scenarios, out of which an impressive score of 0.72 was attained.
The study concludes that we need to leverage on mobile based technology to device a more efficient method of finding missing
persons more easily and quickly.
LEACH is a hierarchical protocol in which most nodes transmit to cluster heads, and the cluster heads aggregate and
compress the data and forward it to the base station (sink).In LEACH, a TDMA-based MAC protocol is integrated with clustering and
a simple “routing” protocol. The goal of LEACH is to lower the energy consumption required to create and maintain clusters or to use
the energy of the nodes in such a manner so as to improve the life time of a wireless sensor network. In this paper we are presenting an
overview of the different protocol changes made in LEACH to improve network lifetime, throughput, coverage area of network etc.
Photo-Oxygenated Derivatives from EugenolEditor IJCATR
Photo-oxygenation reaction was performed on eugenol (1) (2- methoxy-4-(2'-propenyl) phenol) in
chloroform as a solvent and tetraphenyl porphrine (TPP) as singlet oxygen sensitizer. Irradiation of the reaction
mixture was carried out by sodium lamp at -20 °C for six hrs., during which dry oxygen was allowed to pass
through the reaction mixture. Two hydroperoxides (3) and (4) were formed. Eugenol methyl ether (2) was also
photo-oxygenated under the same conditions of eugenol, where only the side chain was photo-oxygenated at
position C-1' to give product (5). In addition to the epoxide derivative of eugenol methyl ester (8) was prepared
and its reaction with aminoantipyrine was carried out to give product (9). Product (8) could be considered as a
DNA- alkylating agent
Software Architecture Evaluation of Unmanned Aerial Vehicles Fuzzy Based Cont...Editor IJCATR
In this survey paper we discuss the recent techniques for software architecture evaluation methods for Unmanned Aerial Vehicle (UAV) systems that use fuzzy control methodology. We discuss the current methodologies and evaluation approaches, identify their limitations, and discuss the open research issues. These issues include methods used to evaluate the level of risk, communications latency, availability, sensor performance, automation, and human interaction.
Understanding Working Memory for Improving LearningEditor IJCATR
A web-based working memory (WM) test system is a management system website that allows students to test their ability
and skills in remembering visual patterns. It also enables you to record and store the data for individuals. The system is developed
using HTML, PHP and MySQL as a database system to manage and store the data. The system targets several users: children and
adults who suffer from attention deficit or learning problems. The main objectives for developing the website are to educate the
community on the benefits of performing the working memory test of the activity of the brain and improvements in social skills and
improving poor academic and professional performance, especially in maths and reading comprehension. This study implements a set
of tasks, testing 59 adults aged 18-24 years of age at King Abdul-Aziz University for testing and measuring WM and cognitive
abilities. Results showed tests depended on the age entry by the user. The implications of the test results will help people know their
WM level before ascertaining the appropriate suggestions, and to make the test suit our society.
Evaluation of Iris Recognition System on Multiple Feature Extraction Algorith...Editor IJCATR
Multi-algorithmic approach to enhancing the accuracy of iris recognition system is proposed and investigated. In this system, features are extracted from the iris using various feature extraction algorithms, namely LPQ, LBP, Gabor Filter, Haar, Db8 and Db16. Based on the experimental results, it is demonstrated that Mutli-algorithms Iris Recognition System is performing better than the unimodal system. The accuracy improvement offered by the proposed approach also showed that using more than two feature extraction algorithms in extracting the iris system might decrease the system performance. This is due to redundant features. The paper presents a detailed description of the experiments and provides an analysis of the performance of the proposed method.
Corrosion Behaviour of 6061 Al-SiC Composites in KOH MediumEditor IJCATR
The present research work deals with the corrosion behaviour of 6061 Al-15% (vol) SiC(P) composites. The addition of the
reinforcement like SiC to Aluminium has been reported to decrease the corrosion resistance of the matrix due to several reasons, one
of them being galvanic action between the reinforcement and the matrix. In the present work, the corrosion behaviour of 6061 Al-15%
(vol) SiC(P) composites in KOH at different concentration (0.5M, 1M, 1.5M) and different temperature (300C, 350C, 400C, 450C, 500C)
was determined by Tafel extrapolation technique. The inhibition action of 8-Hydroxyquinoline on corrosion behaviour of 6061 Al-
15% (vol) SiC(P) composites in KOH at different concentration of inhibitor (200ppm, 400ppm); different concentration of medium
(0.5M, 1M,1.5M) and different temperature (300C, 350C, 400C, 450C, 500C) was investigated. The results indicate that corrosion rate
of Al-SiC composite in KOH increases as the concentration of medium increases and also as temperature of medium increases. The
results indicate that the inhibitor is moderately effective in inhibiting the corrosion of 6061 Al-15% (vol) SiC(P) composites. As the
inhibitor concentration increases, the corrosion rate decreases. The surface morphology of the metal surface was investigated using
scanning electron microscope (SEM). Activation energy was evaluated using Arrhenius equation, and enthalpy of activation and
entropy of activation values were calculated using transition state equation
Photo-Oxygenated Derivatives from EugenolEditor IJCATR
Photo-oxygenation reaction was performed on eugenol (1) (2- methoxy-4-(2'-propenyl) phenol) in chloroform as a solvent and tetraphenyl porphrine (TPP) as singlet oxygen sensitizer. Irradiation of the reaction mixture was carried out by sodium lamp at -20 °C for six hrs., during which dry oxygen was allowed to pass through the reaction mixture. Two hydroperoxides (3) and (4) were formed. Eugenol methyl ether (2) was also photo-oxygenated under the same conditions of eugenol, where only the side chain was photo-oxygenated at position C-1' to give product (5). In addition to the epoxide derivative of eugenol methyl ester (8) was prepared and its reaction with aminoantipyrine was carried out to give product (9). Product (8) could be considered as a DNA- alkylating agent.
A Survey on Decision Support Systems in Social MediaEditor IJCATR
Web 3.0 is the upcoming phase in web evolution. Web 3.0 will be about “feeding you the information that you want, when you want it” i.e. personalization of the web. In web 3.0 the basic principle is linking, integrating and analyzing data from various data sources into new information streams by means of semantic technology. So, we can say that Web 3.0 comprises of two platforms semantic technologies and social computing environment. Recommender system is a subclass of decision support system. Recommendations in social web are used to personalize the web [20]. Social Tagging System is one type of social media. In this paper we present the survey of various recommendations in Social Tagging Systems (STSs) like tag, item, user and unified recommendations along with semantic web and also discussed about major thrust areas of research in each category.
The development of information technology should be ordered to improve the services including job fair information system
services. This work aimed to develop of web-based job fair information system. The methods used in this work consists of collecting
data method, and software development method. Collecting data method using observations, interview, and literature study. Software
development method using waterfall model comprising the steps of requirements, specfification and design, implementation, testing,deployment, and maintenance. The results of this work is software web-based information system provided a job information,registration, and test schedule information.
A Review on Basic Concepts and Important Standards of Power Quality in Power ...Editor IJCATR
This paper deals with the basic of Power quality in power system. In addition basic definition and important concepts was discussed in simple way. This paper also covers the important power quality standards. In addition IEEE, IEC, SEMI and UIE Power quality standards are listed. This paper would be helpful for the UG and PG students to study about the basics of Power quality in electrical engineering.
Holistic Approach for Arabic Word RecognitionEditor IJCATR
Optical Character Recognition (OCR) is one of the important branches. One segmenting words into character is one of the
most challenging steps on OCR. As the results of advances in machine speeds and memory sizes as well as the availability of large
training dataset, researchers currently study Holistic Approach “recognition of a word without segmentation”. This paper describes a
method to recognize off-line handwritten Arabic names. The classification approach is based on Hidden Markov models.. For each
Arabic word many HMM models with different number of states have been trained. The experiments result are encouraging, it also
show that best number of state for each word need careful selection and considerations.
Android and iOS Hybrid Applications for Surabaya Public Transportation RouteEditor IJCATR
This study is conducted to address the lack of route information of public transportation in Surabaya by creating an online
guide that can be accessed by passengers to get complete information on maps and travel routes for public transportation. This guide
is made interactive, simple, accessible and appropriate for transport adapted to the conditions in the city of Surabaya. This research
will develop an Android and iOS applications that can be used on smartphones and tablets using Android and iOS operating systems.
Maps and routes are obtained from the Department of Transportation of Surabaya. A survey was done by distributing questionnaires to
determine the passengers’ need for public transportation. Maps and route are developed using OpenStreetMap, Ajax, Javascript, XML,
OpenLayer, PostgreSQL, and PostGIS. The hybrid application is compiled using PhoneGap. Passengers simply point to the destination
of their journey, such as the name of the street or landmarks and public places. The system will automatically choose the alternative
lyn of bemo they should choose, including the routes to reach the destination. The information includes the connecting line of a public
minibus (called bemo in Indonesian) if the route needs to be connected by more than one bemo line. The information also includes the
fare to be paid. From the test results, both the Android and iOS applications can adapt to a wide range of smartphones with a variety of
screen sizes, from 3.5 inch to 5 inch smartphones and 7 inch tablets
Mobile Personalized Notes Using Memory PackageEditor IJCATR
Smart phones and their mobile application have been deeply affected people life in the modern society. We can find many kinds of mobile applications, either in Google Play or Apple Store, designed according to novel motivations to assist us in improving people life. This paper aims to develop a personalized note tool on smart phones to integrate different types of multimedia information to help the users record activities. In this paper we propose a structure called memory package for integrating multimedia information on smart phones. Users can create a memory package in a time period to record their activities by using multimedia contents such as text, image, video, audio, and GPS records. Our system designs a unified interface for the memory package that can help users easily generate and review their packages on phones. Users can preserve clips of user activities of life anytime and anywhere using the proposed system of personalized notes as well as the memory package.
Feature Extraction Techniques and Classification Algorithms for EEG Signals t...Editor IJCATR
EEG (Electroencephalogram) signal is a neuro signal which is generated due the different electrical activities in the brain.
Different types of electrical activities correspond to different states of the brain. Every physical activity of a person is due to some
activity in the brain which in turn generates an electrical signal. These signals can be captured and processed to get the useful information
that can be used in early detection of some mental diseases. This paper focus on the usefulness of EGG signal in detecting the human
stress levels. It also includes the comparison of various preprocessing algorithms ( DCT and DWT.) and various classification algorithms
(LDA, Naive Bayes and ANN.). The paper proposes a system which will process the EEG signal and by applying the combination of
classifiers, will detect the human stress levels.
Predicting students' performance using id3 and c4.5 classification algorithmsIJDKP
An educational institution needs to have an approximate prior knowledge of enrolled students to predict
their performance in future academics. This helps them to identify promising students and also provides
them an opportunity to pay attention to and improve those who would probably get lower grades. As a
solution, we have developed a system which can predict the performance of students from their previous
performances using concepts of data mining techniques under Classification. We have analyzed the data
set containing information about students, such as gender, marks scored in the board examinations of
classes X and XII, marks and rank in entrance examinations and results in first year of the previous batch
of students. By applying the ID3 (Iterative Dichotomiser 3) and C4.5 classification algorithms on this data,
we have predicted the general and individual performance of freshly admitted students in future
examinations.
Scalable decision tree based on fuzzy partitioning and an incremental approachIJECEIAES
Classification as a data mining materiel is the process of assigning entities to an already defined class by examining the features. The most significant feature of a decision tree as a classification method is its ability to data recursive partitioning. To choose the best attributes for partition, the value range of each continuous attribute should be divided into two or more intervals. Fuzzy partitioning can be used to reduce noise sensitivity and increase the stability of trees. Also, decision trees constructed with existing approaches, tend to be complex, and consequently are difficult to use in practical applications. In this article, a fuzzy decision tree has been introduced that tackles the problem of tree complexity and memory limitation by incrementally inserting data sets into the tree. Membership functions are generated automatically. Then fuzzy information gain is used as a fast-splitting attribute selection criterion and the expansion of a leaf is done attending only with the instances stored in it. The efficiency of this algorithm is examined in terms of accuracy and tree complexity. The results show that the proposed algorithm by reducing the complexity of the tree can overcome the memory limitation and make a balance between accuracy and complexity.
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Predictionijtsrd
Data mining techniques play an important role in data analysis. For the construction of a classification model which could predict performance of students, particularly for engineering branches, a decision tree algorithm associated with the data mining techniques have been used in the research. A number of factors may affect the performance of students. Data mining technology which can related to this student grade well and we also used classification algorithms prediction. In this paper, we used educational data mining to predict students final grade based on their performance. We proposed student data classification using ID3 Iterative Dichotomiser 3 Decision Tree Algorithm Khin Khin Lay | San San Nwe "Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26545.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-miining/26545/using-id3-decision-tree-algorithm-to-the-student-grade-analysis-and-prediction/khin-khin-lay
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...mlaij
A new model for online machine learning process of high speed data stream is proposed, to minimize the severe restrictions associated with the existing computer learning algorithms. Most of the existing models have three principle steps. In the first step, the system would create a model incrementally. In the second step the time taken by the examples to complete a prescribed procedure with their arrival speed is computed. In the third and final step of the model the size of memory required for computation is predicted in advance. To overcome these restrictions we proposed this new data stream classification algorithm, where the data can be partitioned into stream of trees. In this algorithm, the new data set can be updated with the existing tree. This algorithm, called incremental classification tree algorithm, is proved to be an excellent solution for processing larger data streams. In this paper, we present the experimental results of our new algorithm and prove that our method would eradicate the problems of the existing method.
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...mlaij
Abstract—A new model for online machine learning process of high speed data stream is proposed, to
minimize the severe restrictions associated with the existing computer learning algorithms. Most of the
existing models have three principle steps. In the first step, the system would create a model incrementally.
In the second step the time taken by the examples to complete a prescribed procedure with their arrival
speed is computed. In the third and final step of the model the size of memory required for computation is
predicted in advance. To overcome these restrictions we proposed this new data stream classification
algorithm, where the data can be partitioned into stream of trees. In this algorithm, the new data set can be
updated with the existing tree. This algorithm, called incremental classification tree algorithm, is proved to
be an excellent solution for processing larger data streams. In this paper, we present the experimental
results of our new algorithm and prove that our method would eradicate the problems of the existing
method.
Analysis on different Data mining Techniques and algorithms used in IOTIJERA Editor
In this paper, we discusses about five functionalities of data mining in IOT that affects the performance and that
are: Data anomaly detection, Data clustering, Data classification, feature selection, time series prediction. Some
important algorithm has also been reviewed here of each functionalities that show advantages and limitations as
well as some new algorithm that are in research direction. Here we had represent knowledge view of data
mining in IOT.
Applying Classification Technique using DID3 Algorithm to improve Decision Su...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Data Mining System and Applications: A Reviewijdpsjournal
In the Information Technology era information plays vital role in every sphere of the human life. It is very important to gather data from different data sources, store and maintain the data, generate information, generate knowledge and disseminate data, information and knowledge to every stakeholder. Due to vast use of computers and electronics devices and tremendous growth in computing power and storage capacity, there is explosive growth in data collection. The storing of the data in data warehouse enables entire enterprise to access a reliable current database. To analyze this vast amount of data and drawing fruitful conclusions and inferences it needs the special tools called data mining tools. This paper gives overview of the data mining systems and some of its applications.
Privacy preservation techniques in data miningeSAT Journals
Abstract In this paper different privacy preservation techniques are compared. Classification is the most commonly applied data mining technique, which employs a set of pre-classified examples to develop a model that can classify the population of records at large. Fraud detection and credit risk applications are particularly well suited to this type of analysis. This approach frequently employs decision tree or neural network-based classification algorithms. The data classification process involves learning and classification. In Learning the training data are analyzed by classification algorithm. In classification test data are used to estimate the accuracy of the classification rules. If the accuracy is acceptable the rules can be applied to the new data tuples . For a fraud detection application, this would include complete records of both fraudulent and valid activities determined on a record-by-record basis. The classifier-training algorithm uses these pre-classified examples to determine the set of parameters required for proper discrimination. The algorithm then encodes these parameters into a model called a classifier Index Terms: Data Mining, Privacy Preservation, Clustering, Classification Techniques, Naive Bayes.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILEScscpconf
In this paper we have focused on an efficient feature selection method in classification of audio files.
The main objective is feature selection and extraction. We have selected a set of features for further
analysis, which represents the elements in feature vector. By extraction method we can compute a
numerical representation that can be used to characterize the audio using the existing toolbox. In this
study Gain Ratio (GR) is used as a feature selection measure. GR is used to select splitting attribute
which will separate the tuples into different classes. The pulse clarity is considered as a subjective
measure and it is used to calculate the gain of features of audio files. The splitting criterion is
employed in the application to identify the class or the music genre of a specific audio file from
testing database. Experimental results indicate that by using GR the application can produce a
satisfactory result for music genre classification. After dimensionality reduction best three features
have been selected out of various features of audio file and in this technique we will get more than
90% successful classification result.
In this paper we have focused on an efficient feature selection method in classification of audio files.
The main objective is feature selection and extraction. We have selected a set of features for further
analysis, which represents the elements in feature vector. By extraction method we can compute a
numerical representation that can be used to characterize the audio using the existing toolbox. In this
study Gain Ratio (GR) is used as a feature selection measure. GR is used to select splitting attribute
which will separate the tuples into different classes. The pulse clarity is considered as a subjective
measure and it is used to calculate the gain of features of audio files. The splitting criterion is
employed in the application to identify the class or the music genre of a specific audio file from
testing database. Experimental results indicate that by using GR the application can produce a
satisfactory result for music genre classification. After dimensionality reduction best three features
have been selected out of various features of audio file and in this technique we will get more than
90% successful classification result.
Different Classification Technique for Data mining in Insurance Industry usin...IOSRjournaljce
this paper addresses the issues and techniques for Property/Casualty actuaries applying data mining methods. Data mining means the effective unknown pattern discovery from a large amount database. It is an interactive knowledge discovery procedure which is includes data acquisition, data integration, data exploration, model building, and model validation. The paper provides an overview of the data discovery method and introduces some important data mining method for application to insurance concluding cluster discovery approaches.
Data Mining Framework for Network Intrusion Detection using Efficient TechniquesIJAEMSJORNAL
The implementation measures the classification accuracy on benchmark datasets after combining SIS and ANNs. In order to put a number on the gains made by using SIS as a strategic tool in data mining, extensive experiments and analyses are carried out. The predicted results of this investigation will have implications for both theoretical and applied settings. Predictive models in a wide variety of disciplines may benefit from the enhanced classification accuracy enabled by SIS inside ANNs. An invaluable resource for scholars and practitioners in the fields of AI and data mining, this study adds to the continuing conversation about how to maximize the efficacy of machine learning methods.
Text Mining in Digital Libraries using OKAPI BM25 ModelEditor IJCATR
The emergence of the internet has made vast amounts of information available and easily accessible online. As a result, most libraries have digitized their content in order to remain relevant to their users and to keep pace with the advancement of the internet. However, these digital libraries have been criticized for using inefficient information retrieval models that do not perform relevance ranking to the retrieved results. This paper proposed the use of OKAPI BM25 model in text mining so as means of improving relevance ranking of digital libraries. Okapi BM25 model was selected because it is a probability-based relevance ranking algorithm. A case study research was conducted and the model design was based on information retrieval processes. The performance of Boolean, vector space, and Okapi BM25 models was compared for data retrieval. Relevant ranked documents were retrieved and displayed at the OPAC framework search page. The results revealed that Okapi BM 25 outperformed Boolean model and Vector Space model. Therefore, this paper proposes the use of Okapi BM25 model to reward terms according to their relative frequencies in a document so as to improve the performance of text mining in digital libraries.
Green Computing, eco trends, climate change, e-waste and eco-friendlyEditor IJCATR
This study focused on the practice of using computing resources more efficiently while maintaining or increasing overall performance. Sustainable IT services require the integration of green computing practices such as power management, virtualization, improving cooling technology, recycling, electronic waste disposal, and optimization of the IT infrastructure to meet sustainability requirements. Studies have shown that costs of power utilized by IT departments can approach 50% of the overall energy costs for an organization. While there is an expectation that green IT should lower costs and the firm’s impact on the environment, there has been far less attention directed at understanding the strategic benefits of sustainable IT services in terms of the creation of customer value, business value and societal value. This paper provides a review of the literature on sustainable IT, key areas of focus, and identifies a core set of principles to guide sustainable IT service design.
Policies for Green Computing and E-Waste in NigeriaEditor IJCATR
Computers today are an integral part of individuals’ lives all around the world, but unfortunately these devices are toxic to the environment given the materials used, their limited battery life and technological obsolescence. Individuals are concerned about the hazardous materials ever present in computers, even if the importance of various attributes differs, and that a more environment -friendly attitude can be obtained through exposure to educational materials. In this paper, we aim to delineate the problem of e-waste in Nigeria and highlight a series of measures and the advantage they herald for our country and propose a series of action steps to develop in these areas further. It is possible for Nigeria to have an immediate economic stimulus and job creation while moving quickly to abide by the requirements of climate change legislation and energy efficiency directives. The costs of implementing energy efficiency and renewable energy measures are minimal as they are not cash expenditures but rather investments paid back by future, continuous energy savings.
Performance Evaluation of VANETs for Evaluating Node Stability in Dynamic Sce...Editor IJCATR
Vehicular ad hoc networks (VANETs) are a favorable area of exploration which empowers the interconnection amid the movable vehicles and between transportable units (vehicles) and road side units (RSU). In Vehicular Ad Hoc Networks (VANETs), mobile vehicles can be organized into assemblage to promote interconnection links. The assemblage arrangement according to dimensions and geographical extend has serious influence on attribute of interaction .Vehicular ad hoc networks (VANETs) are subclass of mobile Ad-hoc network involving more complex mobility patterns. Because of mobility the topology changes very frequently. This raises a number of technical challenges including the stability of the network .There is a need for assemblage configuration leading to more stable realistic network. The paper provides investigation of various simulation scenarios in which cluster using k-means algorithm are generated and their numbers are varied to find the more stable configuration in real scenario of road.
Optimum Location of DG Units Considering Operation ConditionsEditor IJCATR
The optimal sizing and placement of Distributed Generation units (DG) are becoming very attractive to researchers these days. In this paper a two stage approach has been used for allocation and sizing of DGs in distribution system with time varying load model. The strategic placement of DGs can help in reducing energy losses and improving voltage profile. The proposed work discusses time varying loads that can be useful for selecting the location and optimizing DG operation. The method has the potential to be used for integrating the available DGs by identifying the best locations in a power system. The proposed method has been demonstrated on 9-bus test system.
Analysis of Comparison of Fuzzy Knn, C4.5 Algorithm, and Naïve Bayes Classifi...Editor IJCATR
Early detection of diabetes mellitus (DM) can prevent or inhibit complication. There are several laboratory test that must be done to detect DM. The result of this laboratory test then converted into data training. Data training used in this study generated from UCI Pima Database with 6 attributes that were used to classify positive or negative diabetes. There are various classification methods that are commonly used, and in this study three of them were compared, which were fuzzy KNN, C4.5 algorithm and Naïve Bayes Classifier (NBC) with one identical case. The objective of this study was to create software to classify DM using tested methods and compared the three methods based on accuracy, precision, and recall. The results showed that the best method was Fuzzy KNN with average and maximum accuracy reached 96% and 98%, respectively. In second place, NBC method had respective average and maximum accuracy of 87.5% and 90%. Lastly, C4.5 algorithm had average and maximum accuracy of 79.5% and 86%, respectively.
Web Scraping for Estimating new Record from Source SiteEditor IJCATR
Study in the Competitive field of Intelligent, and studies in the field of Web Scraping, have a symbiotic relationship mutualism. In the information age today, the website serves as a main source. The research focus is on how to get data from websites and how to slow down the intensity of the download. The problem that arises is the website sources are autonomous so that vulnerable changes the structure of the content at any time. The next problem is the system intrusion detection snort installed on the server to detect bot crawler. So the researchers propose the use of the methods of Mining Data Records and the method of Exponential Smoothing so that adaptive to changes in the structure of the content and do a browse or fetch automatically follow the pattern of the occurrences of the news. The results of the tests, with the threshold 0.3 for MDR and similarity threshold score 0.65 for STM, using recall and precision values produce f-measure average 92.6%. While the results of the tests of the exponential estimation smoothing using ? = 0.5 produces MAE 18.2 datarecord duplicate. It slowed down to 3.6 datarecord from 21.8 datarecord results schedule download/fetch fix in an average time of occurrence news.
Evaluating Semantic Similarity between Biomedical Concepts/Classes through S...Editor IJCATR
Most of the existing semantic similarity measures that use ontology structure as their primary source can measure semantic similarity between concepts/classes using single ontology. The ontology-based semantic similarity techniques such as structure-based semantic similarity techniques (Path Length Measure, Wu and Palmer’s Measure, and Leacock and Chodorow’s measure), information content-based similarity techniques (Resnik’s measure, Lin’s measure), and biomedical domain ontology techniques (Al-Mubaid and Nguyen’s measure (SimDist)) were evaluated relative to human experts’ ratings, and compared on sets of concepts using the ICD-10 “V1.0” terminology within the UMLS. The experimental results validate the efficiency of the SemDist technique in single ontology, and demonstrate that SemDist semantic similarity techniques, compared with the existing techniques, gives the best overall results of correlation with experts’ ratings.
Semantic Similarity Measures between Terms in the Biomedical Domain within f...Editor IJCATR
The techniques and tests are tools used to define how measure the goodness of ontology or its resources. The similarity between biomedical classes/concepts is an important task for the biomedical information extraction and knowledge discovery. However, most of the semantic similarity techniques can be adopted to be used in the biomedical domain (UMLS). Many experiments have been conducted to check the applicability of these measures. In this paper, we investigate to measure semantic similarity between two terms within single ontology or multiple ontologies in ICD-10 “V1.0” as primary source, and compare my results to human experts score by correlation coefficient.
A Strategy for Improving the Performance of Small Files in Openstack Swift Editor IJCATR
This is an effective way to improve the storage access performance of small files in Openstack Swift by adding an aggregate storage module. Because Swift will lead to too much disk operation when querying metadata, the transfer performance of plenty of small files is low. In this paper, we propose an aggregated storage strategy (ASS), and implement it in Swift. ASS comprises two parts which include merge storage and index storage. At the first stage, ASS arranges the write request queue in chronological order, and then stores objects in volumes. These volumes are large files that are stored in Swift actually. During the short encounter time, the object-to-volume mapping information is stored in Key-Value store at the second stage. The experimental results show that the ASS can effectively improve Swift's small file transfer performance.
Integrated System for Vehicle Clearance and RegistrationEditor IJCATR
Efficient management and control of government's cash resources rely on government banking arrangements. Nigeria, like many low income countries, employed fragmented systems in handling government receipts and payments. Later in 2016, Nigeria implemented a unified structure as recommended by the IMF, where all government funds are collected in one account would reduce borrowing costs, extend credit and improve government's fiscal policy among other benefits to government. This situation motivated us to embark on this research to design and implement an integrated system for vehicle clearance and registration. This system complies with the new Treasury Single Account policy to enable proper interaction and collaboration among five different level agencies (NCS, FRSC, SBIR, VIO and NPF) saddled with vehicular administration and activities in Nigeria. Since the system is web based, Object Oriented Hypermedia Design Methodology (OOHDM) is used. Tools such as Php, JavaScript, css, html, AJAX and other web development technologies were used. The result is a web based system that gives proper information about a vehicle starting from the exact date of importation to registration and renewal of licensing. Vehicle owner information, custom duty information, plate number registration details, etc. will also be efficiently retrieved from the system by any of the agencies without contacting the other agency at any point in time. Also number plate will no longer be the only means of vehicle identification as it is presently the case in Nigeria, because the unified system will automatically generate and assigned a Unique Vehicle Identification Pin Number (UVIPN) on payment of duty in the system to the vehicle and the UVIPN will be linked to the various agencies in the management information system.
Assessment of the Efficiency of Customer Order Management System: A Case Stu...Editor IJCATR
The Supermarket Management System deals with the automation of buying and selling of good and services. It includes both sales and purchase of items. The project Supermarket Management System is to be developed with the objective of making the system reliable, easier, fast, and more informative.
Energy-Aware Routing in Wireless Sensor Network Using Modified Bi-Directional A*Editor IJCATR
Energy is a key component in the Wireless Sensor Network (WSN)[1]. The system will not be able to run according to its function without the availability of adequate power units. One of the characteristics of wireless sensor network is Limitation energy[2]. A lot of research has been done to develop strategies to overcome this problem. One of them is clustering technique. The popular clustering technique is Low Energy Adaptive Clustering Hierarchy (LEACH)[3]. In LEACH, clustering techniques are used to determine Cluster Head (CH), which will then be assigned to forward packets to Base Station (BS). In this research, we propose other clustering techniques, which utilize the Social Network Analysis approach theory of Betweeness Centrality (BC) which will then be implemented in the Setup phase. While in the Steady-State phase, one of the heuristic searching algorithms, Modified Bi-Directional A* (MBDA *) is implemented. The experiment was performed deploy 100 nodes statically in the 100x100 area, with one Base Station at coordinates (50,50). To find out the reliability of the system, the experiment to do in 5000 rounds. The performance of the designed routing protocol strategy will be tested based on network lifetime, throughput, and residual energy. The results show that BC-MBDA * is better than LEACH. This is influenced by the ways of working LEACH in determining the CH that is dynamic, which is always changing in every data transmission process. This will result in the use of energy, because they always doing any computation to determine CH in every transmission process. In contrast to BC-MBDA *, CH is statically determined, so it can decrease energy usage.
Security in Software Defined Networks (SDN): Challenges and Research Opportun...Editor IJCATR
In networks, the rapidly changing traffic patterns of search engines, Internet of Things (IoT) devices, Big Data and data centers has thrown up new challenges for legacy; existing networks; and prompted the need for a more intelligent and innovative way to dynamically manage traffic and allocate limited network resources. Software Defined Network (SDN) which decouples the control plane from the data plane through network vitalizations aims to address these challenges. This paper has explored the SDN architecture and its implementation with the OpenFlow protocol. It has also assessed some of its benefits over traditional network architectures, security concerns and how it can be addressed in future research and related works in emerging economies such as Nigeria.
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...Editor IJCATR
Report handling on "LAPOR!" (Laporan, Aspirasi dan Pengaduan Online Rakyat) system depending on the system administrator who manually reads every incoming report [3]. Read manually can lead to errors in handling complaints [4] if the data flow is huge and grows rapidly, it needs at least three days to prepare a confirmation and it sensitive to inconsistencies [3]. In this study, the authors propose a model that can measure the identities of the Query (Incoming) with Document (Archive). The authors employed Class-Based Indexing term weighting scheme, and Cosine Similarities to analyse document similarities. CoSimTFIDF, CoSimTFICF and CoSimTFIDFICF values used in classification as feature for K-Nearest Neighbour (K-NN) classifier. The optimum result evaluation is pre-processing employ 75% of training data ratio and 25% of test data with CoSimTFIDF feature. It deliver a high accuracy 84%. The k = 5 value obtain high accuracy 84.12%
Hangul Recognition Using Support Vector MachineEditor IJCATR
The recognition of Hangul Image is more difficult compared with that of Latin. It could be recognized from the structural arrangement. Hangul is arranged from two dimensions while Latin is only from the left to the right. The current research creates a system to convert Hangul image into Latin text in order to use it as a learning material on reading Hangul. In general, image recognition system is divided into three steps. The first step is preprocessing, which includes binarization, segmentation through connected component-labeling method, and thinning with Zhang Suen to decrease some pattern information. The second is receiving the feature from every single image, whose identification process is done through chain code method. The third is recognizing the process using Support Vector Machine (SVM) with some kernels. It works through letter image and Hangul word recognition. It consists of 34 letters, each of which has 15 different patterns. The whole patterns are 510, divided into 3 data scenarios. The highest result achieved is 94,7% using SVM kernel polynomial and radial basis function. The level of recognition result is influenced by many trained data. Whilst the recognition process of Hangul word applies to the type 2 Hangul word with 6 different patterns. The difference of these patterns appears from the change of the font type. The chosen fonts for data training are such as Batang, Dotum, Gaeul, Gulim, Malgun Gothic. Arial Unicode MS is used to test the data. The lowest accuracy is achieved through the use of SVM kernel radial basis function, which is 69%. The same result, 72 %, is given by the SVM kernel linear and polynomial.
Application of 3D Printing in EducationEditor IJCATR
This paper provides a review of literature concerning the application of 3D printing in the education system. The review identifies that 3D Printing is being applied across the Educational levels [1] as well as in Libraries, Laboratories, and Distance education systems. The review also finds that 3D Printing is being used to teach both students and trainers about 3D Printing and to develop 3D Printing skills.
Survey on Energy-Efficient Routing Algorithms for Underwater Wireless Sensor ...Editor IJCATR
In underwater environment, for retrieval of information the routing mechanism is used. In routing mechanism there are three to four types of nodes are used, one is sink node which is deployed on the water surface and can collect the information, courier/super/AUV or dolphin powerful nodes are deployed in the middle of the water for forwarding the packets, ordinary nodes are also forwarder nodes which can be deployed from bottom to surface of the water and source nodes are deployed at the seabed which can extract the valuable information from the bottom of the sea. In underwater environment the battery power of the nodes is limited and that power can be enhanced through better selection of the routing algorithm. This paper focuses the energy-efficient routing algorithms for their routing mechanisms to prolong the battery power of the nodes. This paper also focuses the performance analysis of the energy-efficient algorithms under which we can examine the better performance of the route selection mechanism which can prolong the battery power of the node
Comparative analysis on Void Node Removal Routing algorithms for Underwater W...Editor IJCATR
The designing of routing algorithms faces many challenges in underwater environment like: propagation delay, acoustic channel behaviour, limited bandwidth, high bit error rate, limited battery power, underwater pressure, node mobility, localization 3D deployment, and underwater obstacles (voids). This paper focuses the underwater voids which affects the overall performance of the entire network. The majority of the researchers have used the better approaches for removal of voids through alternate path selection mechanism but still research needs improvement. This paper also focuses the architecture and its operation through merits and demerits of the existing algorithms. This research article further focuses the analytical method of the performance analysis of existing algorithms through which we found the better approach for removal of voids
Decay Property for Solutions to Plate Type Equations with Variable CoefficientsEditor IJCATR
In this paper we consider the initial value problem for a plate type equation with variable coefficients and memory in
1 n R n ), which is of regularity-loss property. By using spectrally resolution, we study the pointwise estimates in the spectral
space of the fundamental solution to the corresponding linear problem. Appealing to this pointwise estimates, we obtain the global
existence and the decay estimates of solutions to the semilinear problem by employing the fixed point theorem
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Distributed Digital Artifacts on the Semantic Web
1. International Journal of Computer Applications Technology and Research
Volume 5– Issue 2, 104 - 109, 2016, ISSN:- 2319–8656
www.ijcat.com 104
Educational Data Mining by Using Neural Network
Nitya Upadhyay
RITM
Lucknow, India
Abstract: At the present time, the amount of data in educational database is increasing day by day. These data enclose the
concealed information that can lift the student’s performance. Among all classification algorithms, decision tree is most
algorithm. Decision tree provides the more correct and relevant results which can be beneficial in improvement of learning
outcomes of a student. The ID3, C4.5 and CART decision tree algorithms are already implemented on the data of students to
anticipate their accomplishment. All three classification algorithm have a limitation that they all are used only for small
So, for large database we are using a new algorithm i.e. SPRINT which removes all the memory restriction and accuracy
arrives in other algorithms. It is fast and scalable than others because it can be implemented in both serial and parallel fashion
good data replacement and load balancing. In this paper, we are representing a new SPRINT decision tree algorithm which will
used to solve the problems of classification in educational data system.
Key words: Educational Data mining, Classification, WEKA
1. INTRODUCTION:
Data mining is an emergent and rising area of research and
development, both in academic as well as in business. It is
also called knowledge discovery in database (KDD) and is an
emerging methodology used in educational field to get the
required data and to find the hidden relationships helpful in
decision making. It is basically a process of analysing data
from different perspectives and summarizing it into useful
information (ramachandram, 2010). Now a day, large
quantities of data is being accumulated. Data mining can be
used in various applications like banking, telecommunication
industry, DNA analysis, Retail industry etc.
Educational Data Mining: It is concerned with
developing methods for exploring the unique types of data
that come from educational database and by using data
mining techniques; we can predict student’s academic
performance and their behaviour towards education (yadav,
2012). As we know, large amount of data is stored in
educational database; data mining is the process of
discovering interesting knowledge from these large amounts
of data stored in database, data warehouse or other
information repositories:
Figure 1.1- The cycle of applying data mining in
educational system
Various algorithms and techniques are used for knowledge
discovery from databases. These are as follows:-
Classification
Clustering
Regression
Artificial intelligence
Neural networks
Decision trees
Genetic algorithm
Association rules etc.
These techniques allow the users to analyse data from
different dimensions, categorize it and summarized the
relationship, identified during the mining process (yadav,
2012). Classification is one of the most useful data mining
techniques used for performance improvement in education
sector. It is based on predefined knowledge of the objects
used in grouping similar data objects together (baradhwaj,
2011). Classification has been identified as an important
problem in the emerging field of data mining. It maps data
into predefined groups of classes (kumar, 2011).
Classification is an important problem in data mining. It has
been studied extensively by the machine learning community
as a possible solution to the knowledge acquisition or
knowledge extraction problem. The input to the classifier
construction algorithm is a training set of records, each of
which is tagged with a class label. A set of attribute values
defined each record. Attributes with discrete domains are
referred to as categorical, while those with ordered domains
are referred to as numeric. The goal is to induce a model or
description for each class in terms of the attribute. The model
is then used by the classifier to classify future records whose
classes are unknown.
2. LITERATURE SURVEY:
A number of data mining techniques have already been done
on educational data mining to improve the performance of
students like Regression, Genetic algorithm, Bays
classification, k-means clustering, associate rules, prediction
etc. Data mining techniques can be used in educational field
to enhance our understanding of learning process to focus on
identifying, extracting and evaluating variables related to the
learning process of students.
Decision tree algorithm can be implemented in a serial or
parallel fashion based on the volume of data, memory space
2. International Journal of Computer Applications Technology and Research
Volume 5– Issue 2, 104 - 109, 2016, ISSN:- 2319–8656
www.ijcat.com 105
available on the computer resource and scalability of the
algorithm. The C4.5, ID3, CART decision tree algorithms are
already applied on the data of students to predict their
performance. But these are useful for only that data set whose
training data set is small. These algorithms are explained
below:-
ID3
Iterative Dichotomiser 3 is a decision tree algorithm
introduced in 1986 by Quinlan Ross. It is based on Hunt’s
algorithm. ID3 uses information gain measure to choose the
splitting attribute. It only accepts categorical attributes in
building a tree model. It does not give accurate result when
there is noise and it is serially implemented. Thus an intensive
pre-processing of data is carried out before building a decision
tree model with ID3 (verma, 2012). To find an optimal way to
classify a learning set, what we need to do is to minimize the
questions asked.
C4.5
It is an improvement of ID3 algorithm developed by Quilan
Ross in 1993. It is based on Hunt’s algorithm and also like
ID3, it is serially implemented. Pruning takes place in C4.5 by
replacing the internal node with a leaf node thereby reducing
the error rate. It accepts both continuous and categorical
attributes in building the decision tree. It has an enhanced
method of tree pruning that reduces misclassification errors
due to noise and too many details in the training data set.
Like ID3 the data is sorted at every node of the tree in order to
determine the best splitting attribute. It uses gain ratio
impurity method to evaluate the splitting attribute
(baradhwaj, 2011).
CART
It stands for classification and regression trees and was
introduced by Breiman in 1984.It builds both classifications
and regression trees. The classification tree construction by
CART is based on binary splitting of the attributes. It is also
based on Hunt’s algorithm and can be implemented serially.
It uses gini index splitting measure in selecting the splitting
attribute. CART is unique from other Hunt’s based algorithm
as it is also use for regression analysis with the help of the
regression trees (baradhwaj, 2011). The regression analysis
feature is used in forecasting a dependent variable given a set
of predictor variables over a given period of time. It uses
many single-variable splitting criteria like gini index, sym
gini etc and one multi-variable in determining the best split
point and data is stored at every node to determine the best
splitting point. The linear combination splitting criteria is
used during regression analysis.
SLIQ
It stands for supervised learning in ques. It was introduced
by Mehta et al (1996). It is fast scalable decision tree
algorithm that can be implemented in serial and parallel
pattern. It is not based on HUNT’S Algorithm for decision
tree classification. It partitions a training data set recursively
using breadth-first greedy strategy that is integrated with
pre-sorting technique during the tree building phase. The first
technique used in SLIQ is to implement a scheme that
eliminates the need to sort the data at each node of the
decision tree. In building a decision tree model SLIQ handles
both numeric and categorical attributes (Rissanem, 2010).
Sorting of data is required to find the split for numeric
attributes.
PUBLIC
It stands for pruning and building integrated in classification.
Public is a decision tree classifier that during the growing
phase, first determines if a node will be pruned during the
following pruning phase, and stops expanding such nodes.
Hence, PUBLIC integrates the pruning phase into the
building phase instead of performing them one after the other.
Traditional decision tree classifiers such as ID3, C4.5 and
CART generally construct a decision tree in two distinct
phases. In the first building phase, a decision tree is first built
by repeatedly scanning database, while in the second pruning
phase, nodes in the built tree are pruned to improve accuracy
and prevent over fitting (Rastogi, 2000).
Rainforest
It provides a framework for fast decision tree constructions of
large datasets. In this algorithm, we have a unifying
framework for decision tree classifiers that separates the
scalability aspects of algorithms for constructing a decision
tree from the central features that determine the quality of the
tree. This generic algorithm is easy to instantiate with specific
algorithms from the literature (including C4.5, CART,
CHAID, ID3 and extensions, SLIQ, Sprint and QUEST).
Rainforest is a general framework which is used to close the
gap between the limitations to main memory datasets of
algorithms in the machine learning and statistics literature
and the scalability requirements of a data mining environment
(Gehrke, 2010).
SPRINT algorithm
It stands for Scalable Parallelizable Induction of decision
tree algorithm. It was introduced by Shafer et al in 1996. It is
fast, scalable decision tree classifier. It is not based on Hunt’s
algorithm in constructing the decision tree, rather it partitions
the training data set recursively using breadth-first greedy
technique until each partition belong to the same leaf node or
class. It can be implemented in both serial and parallel pattern
for good data placement and load balancing (baradhwaj,
2011).
Sprint algorithm is designed to be easily parallelized,
allowing many processors to work together to build a single
consistent model. This parallelization exhibits excellent
scalability to the users.
It provides excellent speedup, size up and scale up
properties. The combination of these properties or
characteristics makes Sprint an ideal tool for data mining.
Algorithm:-
Partition (data S)
If (all points in S are of the same class) then
Return;
For each attribute A do evaluate splits on attribute
A;
Use best split found to partition S into S1 &S2;
Partition (S1);
Partition (S2);
Initial call: partition (Training data)
There are 2 major issues that have critical performance
implications in the tree-growth phase:
1. How to find split points that define node tests.
2. Having chosen a split point, how to partition the
data.
It uses two data structure: attribute list and histogram which is
not memory resident making sprint suitable for large data
sets, thus it removes all the data memory restrictions on data.
3. International Journal of Computer Applications Technology and Research
Volume 5– Issue 2, 104 - 109, 2016, ISSN:- 2319–8656
www.ijcat.com 106
It handles both continuous and categorical attributes. Data
structures of SPRINT are explained below:-
Attribute list - SPRINT initially creates an attribute list for
each attribute in the data. Entries in these lists, which we call
attribute records, consist of an attribute value, a class label
and the index of the record from which these values were
obtained. Initial list for continuous attributes are sorted by
attribute value once when first created.
Histograms – Two histograms are associated with
each decision-tree node that is under consideration for
splitting. These histograms denoted as Cbelow which
maintain data that has been processed and Cabove which
maintain data that hasn’t been processed. Categorical
attributes also have a histogram associated with a node.
However, only one histogram is needed and it contains the
class distribution for each value of the given attribute. We call
this histogram a count matrix. SPRINT has also been
designed to be easily parallelized. Measurements of this
parallel implementation on a shared-nothing IBM POWER
parallel system SP2. SPRINT has excellent scale up, speedup
and size up properties. The combination of these
characteristics makes SPRINT an ideal tool for data mining
(Shafer).
3. PRESENT WORK:
Decision tree classification algorithm can be implemented in
a serial or parallel fashion based on the volume of data,
memory space available on the computer resource and
scalability of the algorithm. The main disadvantages of serial
decision tree algorithm (ID3, C4.5 and CART) are low
classification accuracy when the training data is large. This
problem is solved by SPRINT decision tree algorithm. In
serial implementation of SPRINT, the training data set is
recursively partitioned using breadth-first technique.
In this research work, the dataset of 300 students have been
taking from B.tech. (Mechanical Engineering) by considering
the input parameters as: - name, reg. no., their open elective
subject in 4th
sem., midterm marks, end term marks, choice of
Open elective subject, polling should be there? Yes or no,
suggestion regarding polling: - if yes then why and if no then
why? There are 9 OE subjects in B.tech. (ME) and because of
limited sheets, most of the students do not get their own
choice of subject. It could be effect on their performance in
exam. So the output would come out to be how students are
performing according to the choice of their preference.
Objectives of Problem:
The objectives of the present investigation are framed so as to
assist the low academic achievers in higher education and
they are:-
Identification of the choice of students in polling system
which affects a student’s Performance during academic
career.
Validation of the developed model for higher education
students studying in various universities or institutions.
Prediction of student’s performance in their final exam.
In my proposed work, I am implementing SPRINT decision
tree algorithm for improved classification accuracy and
reduce misclassification errors and execution time. I have
explained this algorithm and then apply serial implementation
on it to find out the desired results. I am comparing it with
other existing algorithms to find out which will be more
efficient in terms of the accurately predicting the outcome of
the student and time taken to derive the tree.
Data structures:
1. Attribute lists:
The initial list created from the testing set are associated with
the root of the classification tree. As the tree is grown and
nodes are split to create new children, the attribute lists
belonging to each node are partitioned and associated with
the children. The example of the attribute list is:
Table 3.1: Example of attribute list of dataset
Table 3.2: Dataset after applying pre-sorting
After Pre-sorting:
In sprint algorithm, Sorting of data is required to find the split
for numeric attributes. It uses gini-splitting index for evaluate
split. Sprint only sort data once at the beginning of the tree
building phase by using different data structure. Each node
has its own attribute list and to find the best split point for a
node, we scan each of the node’s attribute lists and evaluate
splits based on that attribute.
Histogram: - Histograms are used to capture the class
distribution of the attribute records at each node.
Performing the Split:
When the best split point has been found for a
node, we execute the split by creating child nodes
and dividing the attribute records between them.
We can perform this by splitting the node’s list
into two as shown in figure 4. In our example, the
attribute used in the winning split point is Marks.
After this, we scan the list and apply the split test
on it. Then we move the records to two new
attribute list i.e. one for each new child. We have
no test that we can apply to the attribute values for
the remaining attribute lists of the node to decide
how to divide the records. To solve this problem,
we work with rids (Shafer).
Marks Grade Rid
72 Good 0
83 Good 1
78 Good 2
91 Good 3
65 Average 4
52 Average 5
43 Average 6
4. International Journal of Computer Applications Technology and Research
Volume 5– Issue 2, 104 - 109, 2016, ISSN:- 2319–8656
www.ijcat.com 107
As we partition the list of the splitting attribute i.e.
marks, we insert rids of each record into a hash
table to notify that the record was moved in which
child. We can scan the list of the remaining
attributes and probe the hash table after collected
rids.
The output then tells us with which child to place
the record. Splitting process is done in more than
one step, if the hash table is large for memory.
Finding split points:
During the process of making decision tree, the
goal at each node is to determine the split point
that best divides the dataset belonging to that node.
The value of a split point depends upon how well
it separates the classes. Many splitting have been
proposed in the past to evaluate the goodness of
the split. We need some function which can
measure which questions provide the most
balanced splitting. The information gain metric is
such a function.
o Measuring impurity: - we have a data
table that contains attributes and class of that
attribute, we can measure homogeneity or
heterogeneity of the table based on the classes. We
can say that a table is pure or homogenous if it
contains only a single class. If it contains several
classes, then the table is impure or homogenous.
There are so many indices to measure degree of
impurity. Most common indices are entropy, gin
index and classification error.
Entropy =
Entropy of a pure table is zero because the
probability is 1 and log (1) = 0. Entropy reaches
maximum value when all classes in the table have
equal probability. For a data set S
Gini Index = 1 - pj
2
In the above formula, Pj is the relative frequency of
class j in S. If a split divides S into two subsets S1
and S2, the index of the divided data Gini split(S) is
given by the following formula:
Gini split(S) = n1/n gini (S1) + n2/n gini (S2)
The advantage of this index is that its calculation
requires only the distribution of the class values in
each of the partitions. To find the best split point for
a node, we scan each of the node’s attribute lists
and evaluate splits based on that attribute.
The attribute containing the split point with the
lowest value for the Gini index is then used to split
the node. Gini index of a pure table consist of single
class is zero because the probability is 1 and
1- =0. Similar to entropy, gini index also reaches
maximum, value when all classes in the table have
equal probability.
Classification error = 1 – max {Pj}
Similar to entropy and Gini index, classification
error index of a pure table is zero because the
probability is 1 and 1-max (1) =0. The value of
classification error index is always between 0 and
1. In fact the maximum Gini index for a given
number of classes is always equal to the maximum
of classification error index because for a number
of classes n, we set probability is equal to p=1 ∕ N.
o Splitting criteria:
To determine the best attribute for a particular node
in the tree we use the measure called information
gain. The information gain, gain(S, A) of an
attribute A, relative to a collection of examples S, is
defined as
Gain ratio = Gain(S, A)
Split Information
The process of selecting a new attribute and
partitioning the dataset is now repeated for each
non terminal descendant node. Attributes that have
been incorporated higher in the tree are excluded,
so that any given attribute can appear at most once
along any path.
4. RESULTS:
The proposed SPRINT decision tree algorithm is
implemented in WEKA tool. It contains a collection of
visualization tools and algorithms for data analysis and
predictive modelling, together with graphical user interfaces
for easy access to this functionality. In this, data can be
imported in any format like CSV, Arff, binary etc. data can
also read from URL or database using SQL. There are various
models for classifiers like Naïve Bayes, Decision Trees etc.
We have used classifiers for our experiment purpose. In this,
the classify panel allows the user to apply classification
SPRINT decision tree and other existing algorithms to the
data set estimate the accuracy of the resulting model.
Figure 4.1: Preview after data set imported in Weka
In figure 4.1, Red colour implies that these attributes belong
to option A, Blue colour implies that these attributes belong
to option B and the green colour means that these attributes
belong to option C.
5. International Journal of Computer Applications Technology and Research
Volume 5– Issue 2, 104 - 109, 2016, ISSN:- 2319–8656
www.ijcat.com 108
Figure 4.2: Visualizing all Attributes used in URL
Classification
Figure 4.3: Classification by Sprint Decision tree
Figure 4.3 shows the comparison among all attributes on
parameters like accuracy, true positive rate and false positive
rate. The definitions of these terms are explained below:-
Accuracy: The accuracy is the proportion of total
number of predictions that were correct.
True Positive Rate: The true positive rate (TP) is
the proportion of examples which are classified as
class x, among all examples which truly have class
x, i.e. how much part of the class are captured. It is
equivalent to recall.
False positive Rate: The false positive rate (FN) is
the proportion of examples which are
classified as class X, but belong to a different class,
among all examples which are not of class X.
Precision: It is the proportion of examples which
truly have class x among all those which are
classified as class X.
F-Measure: It is a combined measure for precision
and recall defined by the following formula: -
F-Measure = 2*Precision*Recall / (Precision +
Recall)
4.1 COMPARISION:
The following table 1 shows the comparison between the
working of different decision algorithms on the basis of
different parameters.
Table 4.1:- Parameter Comparison of Decision tree
algorithms
4.2 OUTPUT
The three decision trees as examples of predictive models
obtained from the data set of 300 students by three machine
learning algorithms: C4.5 decision tree algorithm, random
tree algorithm and the new SPRINT decision tree algorithm.
Table 4.2 shows the simulation result of each algorithm. From
this table, we can see that a Sprint algorithm has highest
accuracy of 74.6667% compared to other algorithms. It also
shows the time complexity in seconds of various classifiers to
build the model for training data. By this experimental
comparison, it is clear that Sprint is the best algorithm among
four as it is more accurate and less time consuming.
The result can vary according to the machine on which we are
analysing our experiment. This is due to the specifications of
the machine like processor, RAM, ROM and its operating
system. However it will not affect the accuracy of the algorithm
used.
5. CONCLUSION:
The efficiency of all the decision tree algorithms can be
analysed based on their accuracy and time taken to derive the
tree. The main disadvantages of serial decision tree algorithm
(ID3, C4.5 and CART) are low classification accuracy when
the training data is large. This problem is solved by SPRINT
decision tree algorithm. SPRINT removes all the memory
restriction and accuracy problem which comes in other
existing algorithms. It is fast and scalable than others
because it can be implemented in both serial and parallel
fashion for good data placement and load balancing.
In this work, SPRINT decision tree algorithm has been
applied on the dataset of 300 students for predicting their
performance in exam on the basis of their choice in polling
system. This result help us to find that the students who are
opted their own choice of subject are giving better results than
others.
6. REFERENCES:
6. International Journal of Computer Applications Technology and Research
Volume 5– Issue 2, 104 - 109, 2016, ISSN:- 2319–8656
www.ijcat.com 109
[1] Brijsh Kumar bhardwaj and Saurabh Pal “Data mining: a
prediction for performance improvement using
classification”, International journal of computer science an
information security, vol. 9, no. 4, 2011.
[2] C.Romero and S.Ventra “Educational data mining: A
survey from 1995 to 2005”, 2006 Elsevier ltd. All rights
reserved. www.elsevier.com/locate/eswa
[3] Dorina kabakchieva,” Student performance prediction by
using data mining classification algorithms”, International
journal of computer science and management research, Vol 1
issue 4 November 2012
[4] Devi Prasad bhukya and S. Ramachandram,“Decision tree
induction- An Approach for data classification using AVL
–Tree”, International journal of computer and electrical
engineering, Vol. 2, no. 4, August,2010.
[5] John shafer, Rakesh agrawal, Manish Mehta “SPRINT: A
scalable parallel classifier for data mining” IBM Almaden
Center, 650 Harry road, San Jose, CA 95120.