This research paper deals with an outlier which is known as an unusual behavior of any substance present in the spot. This is a detection process that can be employed for both anomaly detection and abnormal observation. This can be obtained through other members who belong to that data set. The deviation present in the outlier process can be attained by measuring certain terms like range, size, activity, etc. By detecting outlier one can easily reject the negativity present in the field. For instance, in healthcare, the health condition of a person can be determined through his latest health report or his regular activity. When found the person being inactive there may be a chance for that person to be sick. Many approaches have been used in this research paper for detecting outliers. The approaches used in this research are 1) Centroid based approach based on K-Means and Hierarchical Clustering algorithm and 2) through Clustering based approach. This approach may help in detecting outlier by grouping all similar elements in the same group. For grouping, the elements clustering method paves a way for it. This research paper will be based on the above mentioned 2 approaches.
Comprehensive Survey of Data Classification & Prediction Techniquesijsrd.com
Â
In this paper, we present an literature survey of the modern data classification and prediction algorithms. All these algorithms are very important in real world applications like- heart disease prediction, cancer prediction etc. Classification of data is a very popular and computationally expensive task. The fundamentals of data classification are also discussed in brief.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
Â
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
Â
Distributed digital artifacts incorporate cryptographic hash values to URI called trusty URIs in a distributed environment
building good in quality, verifiable and unchangeable web resources to prevent the rising man in the middle attack. The greatest
challenge of a centralized system is that it gives users no possibility to check whether data have been modified and the communication
is limited to a single server. As a solution for this, is the distributed digital artifact system, where resources are distributed among
different domains to enable inter-domain communication. Due to the emerging developments in web, attacks have increased rapidly,
among which man in the middle attack (MIMA) is a serious issue, where user security is at its threat. This work tries to prevent MIMA
to an extent, by providing self reference and trusty URIs even when presented in a distributed environment. Any manipulation to the
data is efficiently identified and any further access to that data is blocked by informing user that the uniform location has been
changed. System uses self-reference to contain trusty URI for each resource, lineage algorithm for generating seed and SHA-512 hash
generation algorithm to ensure security. It is implemented on the semantic web, which is an extension to the world wide web, using
RDF (Resource Description Framework) to identify the resource. Hence the framework was developed to overcome existing
challenges by making the digital artifacts on the semantic web distributed to enable communication between different domains across
the network securely and thereby preventing MIMA.
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...theijes
Â
Data mining works to extract information known in advance from the enormous quantities of data which can lead to knowledge. It provides information that helps to make good decisions. The effectiveness of data mining in access to knowledge to achieve the goal of which is the discovery of the hidden facts contained in databases and through the use of multiple technologies. Clustering is organizing data into clusters or groups such that they have high intra-cluster similarity and low inter cluster similarity. This paper deals with K-means clustering algorithm which collect a number of data based on the characteristics and attributes of this data, and process the Clustering by reducing the distances between the data center. This algorithm is applied using open source tool called WEKA, with the Insurance dataset as its input
Comprehensive Survey of Data Classification & Prediction Techniquesijsrd.com
Â
In this paper, we present an literature survey of the modern data classification and prediction algorithms. All these algorithms are very important in real world applications like- heart disease prediction, cancer prediction etc. Classification of data is a very popular and computationally expensive task. The fundamentals of data classification are also discussed in brief.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
Â
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
Â
Distributed digital artifacts incorporate cryptographic hash values to URI called trusty URIs in a distributed environment
building good in quality, verifiable and unchangeable web resources to prevent the rising man in the middle attack. The greatest
challenge of a centralized system is that it gives users no possibility to check whether data have been modified and the communication
is limited to a single server. As a solution for this, is the distributed digital artifact system, where resources are distributed among
different domains to enable inter-domain communication. Due to the emerging developments in web, attacks have increased rapidly,
among which man in the middle attack (MIMA) is a serious issue, where user security is at its threat. This work tries to prevent MIMA
to an extent, by providing self reference and trusty URIs even when presented in a distributed environment. Any manipulation to the
data is efficiently identified and any further access to that data is blocked by informing user that the uniform location has been
changed. System uses self-reference to contain trusty URI for each resource, lineage algorithm for generating seed and SHA-512 hash
generation algorithm to ensure security. It is implemented on the semantic web, which is an extension to the world wide web, using
RDF (Resource Description Framework) to identify the resource. Hence the framework was developed to overcome existing
challenges by making the digital artifacts on the semantic web distributed to enable communication between different domains across
the network securely and thereby preventing MIMA.
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...theijes
Â
Data mining works to extract information known in advance from the enormous quantities of data which can lead to knowledge. It provides information that helps to make good decisions. The effectiveness of data mining in access to knowledge to achieve the goal of which is the discovery of the hidden facts contained in databases and through the use of multiple technologies. Clustering is organizing data into clusters or groups such that they have high intra-cluster similarity and low inter cluster similarity. This paper deals with K-means clustering algorithm which collect a number of data based on the characteristics and attributes of this data, and process the Clustering by reducing the distances between the data center. This algorithm is applied using open source tool called WEKA, with the Insurance dataset as its input
Data Mining System and Applications: A Reviewijdpsjournal
Â
In the Information Technology era information plays vital role in every sphere of the human life. It is very important to gather data from different data sources, store and maintain the data, generate information, generate knowledge and disseminate data, information and knowledge to every stakeholder. Due to vast use of computers and electronics devices and tremendous growth in computing power and storage capacity, there is explosive growth in data collection. The storing of the data in data warehouse enables entire enterprise to access a reliable current database. To analyze this vast amount of data and drawing fruitful conclusions and inferences it needs the special tools called data mining tools. This paper gives overview of the data mining systems and some of its applications.
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...Editor IJCATR
Â
In this paper we focus on some techniques for solving data mining tasks such as: Statistics, Decision Trees and Neural
Networks. The new approach has succeed in defining some new criteria for the evaluation process, and it has obtained valuable results
based on what the technique is, the environment of using each techniques, the advantages and disadvantages of each technique, the
consequences of choosing any of these techniques to extract hidden predictive information from large databases, and the methods of
implementation of each technique. Finally, the paper has presented some valuable recommendations in this field.
Data mining is utilized to manage huge measure of information which are put in the data ware houses and databases, to discover required information and data. Numerous data mining systems have been proposed, for example, association rules, decision trees, neural systems, clustering, and so on. It has turned into the purpose of consideration from numerous years. A re-known amongst the available data mining strategies is clustering of the dataset. It is the most effective data mining method. It groups the dataset in number of clusters based on certain guidelines that are predefined. It is dependable to discover the connection between the distinctive characteristics of data.
In k-mean clustering algorithm, the function is being selected on the basis of the relevancy of the function for predicting the data and also the Euclidian distance between the centroid of any cluster and the data objects outside the cluster is being computed for the clustering the data points. In this work, author enhanced the Euclidian distance formula to increase the cluster quality.
The problem of accuracy and redundancy of the dissimilar points in the clusters remains in the improved k-means for which new enhanced approach is been proposed which uses the similarity function for checking the similarity level of the point before including it to the cluster.
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...iosrjce
Â
In distributed peer-to-peer systems, huge amount of data are dispersed. Grouping of those data from
multiple sources is a tedious task. By applying effective data mining techniques the clustering of distributed data
is become ease and this decreases the hurdles of clustering due to processing, storage, and transmission costs.
To perform a dynamic distributed clustering, a fully decentralized clustering method has been proposed. HGD
Cluster can cluster a data set which is dispersed among a large number of nodes in a distributed environment
using hierarchal and grid based clustering techniques. When nodes are fully asynchronous and decentralized
and also adaptable to stir, then HGD cluster will apply. The general design principles employed in the proposed
algorithm also allow customization for other classes of clustering. It is fully capable of clustering dynamic and
distributed data sets. Using the algorithm, every node can maintain summarized views of the dataset.
Customizing HGD Cluster for execution of the hierarchal-based and grid-based clustering methods on the
summarized views is the main aim of the proposed system. Coping with dynamic data is made possible by
gradually adapting the clustering model.
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...cscpconf
Â
In today’s world, gigantic amount of data is available in science, industry, business and many
other areas. This data can provide valuable information which can be used by management for
making important decisions. But problem is that how can find valuable information. The answer
is data mining. Data Mining is popular topic among researchers. There is lot of work that
cannot be explored till now. But, this paper focuses on the fundamental concept of the Data mining i.e. Classification Techniques. In this paper BayesNet, NavieBayes, NavieBayes Uptable, Multilayer perceptron, Voted perceptron and J48 classifiers are used for the classification of data set. The performance of these classifiers analyzed with the help of Mean Absolute Error, Root Mean-Squared Error and Time Taken to build the model and the result can be shown statistical as well as graphically. For this purpose the WEKA data mining tool is used.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Evaluating the efficiency of rule techniques for file classificationeSAT Journals
Â
Abstract Text mining refers to the process of deriving high quality information from text. It is also known as knowledge discovery from text (KDT), deals with the machine supported analysis of text. It is used in various areas such as information retrieval, marketing, information extraction, natural language processing, document similarity, and so on. Document Similarity is one of the important techniques in text mining. In document similarity, the first and foremost step is to classify the files based on their category. In this research work, various classification rule techniques are used to classify the computer files based on their extensions. For example, the extension of computer files may be pdf, doc, ppt, xls, and so on. There are several algorithms for rule classifier such as decision table, JRip, Ridor, DTNB, NNge, PART, OneR and ZeroR. In this research work, three classification algorithms namely decision table, DTNB and OneR classifiers are used for performing classification of computer files based on their extension. The results produced by these algorithms are analyzed by using the performance factors classification accuracy and error rate. From the experimental results, DTNB proves to be more efficient than other two techniques. Index Terms: Data mining, Text mining, Classification, Decision table, DTNB, OneR
Certain Investigation on Dynamic Clustering in Dynamic Dataminingijdmtaiir
Â
Clustering is the process of grouping a set of objects
into classes of similar objects. Dynamic clustering comes in a
new research area that is concerned about dataset with dynamic
aspects. It requires updates of the clusters whenever new data
records are added to the dataset and may result in a change of
clustering over time. When there is a continuous update and
huge amount of dynamic data, rescan the database is not
possible in static data mining. But this is possible in Dynamic
data mining process. This dynamic data mining occurs when
the derived information is present for the purpose of analysis
and the environment is dynamic, i.e. many updates occur.
Since this has now been established by most researchers and
they will move into solving some of the problems and the
research is to concentrate on solving the problem of using data
mining dynamic databases. This paper gives some
investigation of existing work done in some papers related with
dynamic clustering and incremental data clustering
Different Classification Technique for Data mining in Insurance Industry usin...IOSRjournaljce
Â
this paper addresses the issues and techniques for Property/Casualty actuaries applying data mining methods. Data mining means the effective unknown pattern discovery from a large amount database. It is an interactive knowledge discovery procedure which is includes data acquisition, data integration, data exploration, model building, and model validation. The paper provides an overview of the data discovery method and introduces some important data mining method for application to insurance concluding cluster discovery approaches.
A statistical data fusion technique in virtual data integration environmentIJDKP
Â
Data fusion in the virtual data integration environment starts after detecting and clustering duplicated
records from the different integrated data sources. It refers to the process of selecting or fusing attribute
values from the clustered duplicates into a single record representing the real world object. In this paper, a
statistical technique for data fusion is introduced based on some probabilistic scores from both data
sources and clustered duplicates
Data mining is an integrated field, depicted technologies in combination to the areas having database, learning by machine, statistical study, and recognition in patterns of same type, information regeneration, A.I networks, knowledge-based portfolios, artificial intelligence, neural network, and data determination. In real terms, mining of data is the investigation of provisional data sets for finding hidden connections and to gather the information in peculiar form which are justifiable and understandable to the owner of gather or mined data. An unsupervised formula which differentiate data components into collections by which the components in similar group are more allied to one other and items in rest of cluster seems to be non-allied, by the criteria of measurement of equality or predictability is called process of clustering. Cluster analysis is a relegating task that is utilized to identify same group of object and it is additionally one of the most widely used method for many practical application in data mining. It is a method of grouping objects, where objects can be physical, such as a student or may be a summary such as customer comportment, handwriting. It has been proposed many clustering algorithms that it falls into the different clustering methods. The intention of this paper is to provide a relegation of some prominent clustering algorithms.
The development of data mining is inseparable from the recent developments in information technology that enables the accumulation of large amounts of data. For example, a shopping mall that records every sales transaction of goods using various POS (point of sales). Database data from these sales could reach a large storage capacity, even more being added each day, especially when the shopping center will develop into a nationwide network. The development of the internet at the moment also has a share large enough in the accumulation of data occurs. But the rapid growth of data accumulation it has created conditions that are often referred to as "data rich but information poor" because the data collected can not be used optimally for useful applications. Not infrequently the data set was left just seemed to be a "grave data". There are several techniques used in data mining which includes association, classification, and clustering. In this paper, the author will do a comparison between the performance of the technical classification methods naïve Bayes and C4.5 algorithms.
Introduction to feature subset selection methodIJSRD
Â
Data Mining is a computational progression to ascertain patterns in hefty data sets. It has various important techniques and one of them is Classification which is receiving great attention recently in the database community. Classification technique can solve several problems in different fields like medicine, industry, business, science. PSO is based on social behaviour for optimization problem. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Rough Set Theory (RST) is a mathematical tool which deals with the uncertainty and vagueness of the decision systems.
A Survey on Cluster Based Outlier Detection Techniques in Data StreamIIRindia
Â
In recent days, Data Mining (DM) is an emerging area of computational intelligence that provides new techniques, algorithms and tools for processing large volumes of data. Clustering is the most popular data mining technique today. Clustering used to separate a dataset into groups that finds intra-group similarity and inter-group similarity. Outlier detection (Anomaly) is to find small groups of data objects that are different when compared with rest of data. The outlier detection is an essential part of mining in data stream. Data Stream (DS) used to mine continuous arrival of high speed data Items. It plays an important role in the fields of telecommunication services, E-Commerce, Tracking customer behaviors and Medical analysis. Detecting outliers over data stream is an active research area. This survey presents the overview of fundamental outlier detection approaches and various types of outlier detection methods in data stream.
Data Mining System and Applications: A Reviewijdpsjournal
Â
In the Information Technology era information plays vital role in every sphere of the human life. It is very important to gather data from different data sources, store and maintain the data, generate information, generate knowledge and disseminate data, information and knowledge to every stakeholder. Due to vast use of computers and electronics devices and tremendous growth in computing power and storage capacity, there is explosive growth in data collection. The storing of the data in data warehouse enables entire enterprise to access a reliable current database. To analyze this vast amount of data and drawing fruitful conclusions and inferences it needs the special tools called data mining tools. This paper gives overview of the data mining systems and some of its applications.
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...Editor IJCATR
Â
In this paper we focus on some techniques for solving data mining tasks such as: Statistics, Decision Trees and Neural
Networks. The new approach has succeed in defining some new criteria for the evaluation process, and it has obtained valuable results
based on what the technique is, the environment of using each techniques, the advantages and disadvantages of each technique, the
consequences of choosing any of these techniques to extract hidden predictive information from large databases, and the methods of
implementation of each technique. Finally, the paper has presented some valuable recommendations in this field.
Data mining is utilized to manage huge measure of information which are put in the data ware houses and databases, to discover required information and data. Numerous data mining systems have been proposed, for example, association rules, decision trees, neural systems, clustering, and so on. It has turned into the purpose of consideration from numerous years. A re-known amongst the available data mining strategies is clustering of the dataset. It is the most effective data mining method. It groups the dataset in number of clusters based on certain guidelines that are predefined. It is dependable to discover the connection between the distinctive characteristics of data.
In k-mean clustering algorithm, the function is being selected on the basis of the relevancy of the function for predicting the data and also the Euclidian distance between the centroid of any cluster and the data objects outside the cluster is being computed for the clustering the data points. In this work, author enhanced the Euclidian distance formula to increase the cluster quality.
The problem of accuracy and redundancy of the dissimilar points in the clusters remains in the improved k-means for which new enhanced approach is been proposed which uses the similarity function for checking the similarity level of the point before including it to the cluster.
A Hierarchical and Grid Based Clustering Method for Distributed Systems (Hgd ...iosrjce
Â
In distributed peer-to-peer systems, huge amount of data are dispersed. Grouping of those data from
multiple sources is a tedious task. By applying effective data mining techniques the clustering of distributed data
is become ease and this decreases the hurdles of clustering due to processing, storage, and transmission costs.
To perform a dynamic distributed clustering, a fully decentralized clustering method has been proposed. HGD
Cluster can cluster a data set which is dispersed among a large number of nodes in a distributed environment
using hierarchal and grid based clustering techniques. When nodes are fully asynchronous and decentralized
and also adaptable to stir, then HGD cluster will apply. The general design principles employed in the proposed
algorithm also allow customization for other classes of clustering. It is fully capable of clustering dynamic and
distributed data sets. Using the algorithm, every node can maintain summarized views of the dataset.
Customizing HGD Cluster for execution of the hierarchal-based and grid-based clustering methods on the
summarized views is the main aim of the proposed system. Coping with dynamic data is made possible by
gradually adapting the clustering model.
Analysis of Bayes, Neural Network and Tree Classifier of Classification Techn...cscpconf
Â
In today’s world, gigantic amount of data is available in science, industry, business and many
other areas. This data can provide valuable information which can be used by management for
making important decisions. But problem is that how can find valuable information. The answer
is data mining. Data Mining is popular topic among researchers. There is lot of work that
cannot be explored till now. But, this paper focuses on the fundamental concept of the Data mining i.e. Classification Techniques. In this paper BayesNet, NavieBayes, NavieBayes Uptable, Multilayer perceptron, Voted perceptron and J48 classifiers are used for the classification of data set. The performance of these classifiers analyzed with the help of Mean Absolute Error, Root Mean-Squared Error and Time Taken to build the model and the result can be shown statistical as well as graphically. For this purpose the WEKA data mining tool is used.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Evaluating the efficiency of rule techniques for file classificationeSAT Journals
Â
Abstract Text mining refers to the process of deriving high quality information from text. It is also known as knowledge discovery from text (KDT), deals with the machine supported analysis of text. It is used in various areas such as information retrieval, marketing, information extraction, natural language processing, document similarity, and so on. Document Similarity is one of the important techniques in text mining. In document similarity, the first and foremost step is to classify the files based on their category. In this research work, various classification rule techniques are used to classify the computer files based on their extensions. For example, the extension of computer files may be pdf, doc, ppt, xls, and so on. There are several algorithms for rule classifier such as decision table, JRip, Ridor, DTNB, NNge, PART, OneR and ZeroR. In this research work, three classification algorithms namely decision table, DTNB and OneR classifiers are used for performing classification of computer files based on their extension. The results produced by these algorithms are analyzed by using the performance factors classification accuracy and error rate. From the experimental results, DTNB proves to be more efficient than other two techniques. Index Terms: Data mining, Text mining, Classification, Decision table, DTNB, OneR
Certain Investigation on Dynamic Clustering in Dynamic Dataminingijdmtaiir
Â
Clustering is the process of grouping a set of objects
into classes of similar objects. Dynamic clustering comes in a
new research area that is concerned about dataset with dynamic
aspects. It requires updates of the clusters whenever new data
records are added to the dataset and may result in a change of
clustering over time. When there is a continuous update and
huge amount of dynamic data, rescan the database is not
possible in static data mining. But this is possible in Dynamic
data mining process. This dynamic data mining occurs when
the derived information is present for the purpose of analysis
and the environment is dynamic, i.e. many updates occur.
Since this has now been established by most researchers and
they will move into solving some of the problems and the
research is to concentrate on solving the problem of using data
mining dynamic databases. This paper gives some
investigation of existing work done in some papers related with
dynamic clustering and incremental data clustering
Different Classification Technique for Data mining in Insurance Industry usin...IOSRjournaljce
Â
this paper addresses the issues and techniques for Property/Casualty actuaries applying data mining methods. Data mining means the effective unknown pattern discovery from a large amount database. It is an interactive knowledge discovery procedure which is includes data acquisition, data integration, data exploration, model building, and model validation. The paper provides an overview of the data discovery method and introduces some important data mining method for application to insurance concluding cluster discovery approaches.
A statistical data fusion technique in virtual data integration environmentIJDKP
Â
Data fusion in the virtual data integration environment starts after detecting and clustering duplicated
records from the different integrated data sources. It refers to the process of selecting or fusing attribute
values from the clustered duplicates into a single record representing the real world object. In this paper, a
statistical technique for data fusion is introduced based on some probabilistic scores from both data
sources and clustered duplicates
Data mining is an integrated field, depicted technologies in combination to the areas having database, learning by machine, statistical study, and recognition in patterns of same type, information regeneration, A.I networks, knowledge-based portfolios, artificial intelligence, neural network, and data determination. In real terms, mining of data is the investigation of provisional data sets for finding hidden connections and to gather the information in peculiar form which are justifiable and understandable to the owner of gather or mined data. An unsupervised formula which differentiate data components into collections by which the components in similar group are more allied to one other and items in rest of cluster seems to be non-allied, by the criteria of measurement of equality or predictability is called process of clustering. Cluster analysis is a relegating task that is utilized to identify same group of object and it is additionally one of the most widely used method for many practical application in data mining. It is a method of grouping objects, where objects can be physical, such as a student or may be a summary such as customer comportment, handwriting. It has been proposed many clustering algorithms that it falls into the different clustering methods. The intention of this paper is to provide a relegation of some prominent clustering algorithms.
The development of data mining is inseparable from the recent developments in information technology that enables the accumulation of large amounts of data. For example, a shopping mall that records every sales transaction of goods using various POS (point of sales). Database data from these sales could reach a large storage capacity, even more being added each day, especially when the shopping center will develop into a nationwide network. The development of the internet at the moment also has a share large enough in the accumulation of data occurs. But the rapid growth of data accumulation it has created conditions that are often referred to as "data rich but information poor" because the data collected can not be used optimally for useful applications. Not infrequently the data set was left just seemed to be a "grave data". There are several techniques used in data mining which includes association, classification, and clustering. In this paper, the author will do a comparison between the performance of the technical classification methods naïve Bayes and C4.5 algorithms.
Introduction to feature subset selection methodIJSRD
Â
Data Mining is a computational progression to ascertain patterns in hefty data sets. It has various important techniques and one of them is Classification which is receiving great attention recently in the database community. Classification technique can solve several problems in different fields like medicine, industry, business, science. PSO is based on social behaviour for optimization problem. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Rough Set Theory (RST) is a mathematical tool which deals with the uncertainty and vagueness of the decision systems.
A Survey on Cluster Based Outlier Detection Techniques in Data StreamIIRindia
Â
In recent days, Data Mining (DM) is an emerging area of computational intelligence that provides new techniques, algorithms and tools for processing large volumes of data. Clustering is the most popular data mining technique today. Clustering used to separate a dataset into groups that finds intra-group similarity and inter-group similarity. Outlier detection (Anomaly) is to find small groups of data objects that are different when compared with rest of data. The outlier detection is an essential part of mining in data stream. Data Stream (DS) used to mine continuous arrival of high speed data Items. It plays an important role in the fields of telecommunication services, E-Commerce, Tracking customer behaviors and Medical analysis. Detecting outliers over data stream is an active research area. This survey presents the overview of fundamental outlier detection approaches and various types of outlier detection methods in data stream.
Data Mining Framework for Network Intrusion Detection using Efficient TechniquesIJAEMSJORNAL
Â
The implementation measures the classification accuracy on benchmark datasets after combining SIS and ANNs. In order to put a number on the gains made by using SIS as a strategic tool in data mining, extensive experiments and analyses are carried out. The predicted results of this investigation will have implications for both theoretical and applied settings. Predictive models in a wide variety of disciplines may benefit from the enhanced classification accuracy enabled by SIS inside ANNs. An invaluable resource for scholars and practitioners in the fields of AI and data mining, this study adds to the continuing conversation about how to maximize the efficacy of machine learning methods.
Data Mining: Investment risk in the bankEditor IJCATR
Â
This paper will discuss the technology and methods behind data mining, how data mining works, how it helps to improve
national security, and how sustainable the technology is. Sustainability, with regard to data mining, refers to the impact on the quality of
life. Quality of life refers to the preservation of human rights and the ability to feel secure. The ethics and the fallbacks regarding privacy
will also be discussed in depth, including the benefits that accompany these fallbacks, and whether they outweigh the cons. Both technical
and ethical articles will be used to highlight and discuss the potential, good and bad, and the controversy of data mining. Applications
of data mining to security will also be proposed.
Data mining methods are expanding rapidly allowing for the mass collection of information. This mass amount of information is then
used by many government agencies to identify threats, gain intelligence, and obtain a better understanding of enemy networks. However,
the ability to collect this information from any computer draws into question whether or not data mining leads to a violation of the
average citizen’s privacy and has created a debate as to if data mining is ethically plausible
The Survey of Data Mining Applications And Feature Scope IJCSEIT Journal
Â
In this paper we have focused a variety of techniques, approaches and different areas of the research which
are helpful and marked as the important field of data mining Technologies. As we are aware that many MNC’s
and large organizations are operated in different places of the different countries. Each place of operation
may generate large volumes of data. Corporate decision makers require access from all such sources and
take strategic decisions .The data warehouse is used in the significant business value by improving the
effectiveness of managerial decision-making. In an uncertain and highly competitive business
environment, the value of strategic information systems such as these are easily recognized however in
today’s business environment, efficiency or speed is not the only key for competitiveness. This type of huge
amount of data’s are available in the form of tera- to peta-bytes which has drastically changed in the areas
of science and engineering. To analyze, manage and make a decision of such type of huge amount of data
we need techniques called the data mining which will transforming in many fields. This paper imparts more
number of applications of the data mining and also o focuses scope of the data mining which will helpful in
the further research.
Privacy preservation techniques in data miningeSAT Journals
Â
Abstract In this paper different privacy preservation techniques are compared. Classification is the most commonly applied data mining technique, which employs a set of pre-classified examples to develop a model that can classify the population of records at large. Fraud detection and credit risk applications are particularly well suited to this type of analysis. This approach frequently employs decision tree or neural network-based classification algorithms. The data classification process involves learning and classification. In Learning the training data are analyzed by classification algorithm. In classification test data are used to estimate the accuracy of the classification rules. If the accuracy is acceptable the rules can be applied to the new data tuples . For a fraud detection application, this would include complete records of both fraudulent and valid activities determined on a record-by-record basis. The classifier-training algorithm uses these pre-classified examples to determine the set of parameters required for proper discrimination. The algorithm then encodes these parameters into a model called a classifier Index Terms: Data Mining, Privacy Preservation, Clustering, Classification Techniques, Naive Bayes.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A STUDY OF TRADITIONAL DATA ANALYSIS AND SENSOR DATA ANALYTICSijistjournal
Â
The growth of smart and intelligent devices known as sensors generate large amount of data. These generated data over a time span takes such a large volume which is designated as big data. The data structure of repository holds unstructured data. The traditional data analytics methods well developed and used widely to analyze structured data and to limit extend the semi-structured data which involves additional processing over heads. The similar methods used to analyze unstructured data are different because of distributed computing approach where as there is a possibility of centralized processing in case of structured and semi-structured data. The under taken work is confined to analysis of both verities of methods. The result of this study is targeted to introduce methods available to analyze big data.
With the development of database, the data volume stored in database increases rapidly and in the large
amounts of data much important information is hidden. If the information can be extracted from the
database they will create a lot of profit for the organization. The question they are asking is how to extract
this value. The answer is data mining. There are many technologies available to data mining practitioners,
including Artificial Neural Networks, Genetics, Fuzzy logic and Decision Trees. Many practitioners are
wary of Neural Networks due to their black box nature, even though they have proven themselves in many
situations. This paper is an overview of artificial neural networks and questions their position as a
preferred tool by data mining practitioners.
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
Â
Institution is a place where teacher explains and student just understands and learns the lesson. Every student has his own definition for toughness and easiness and there isn’t any absolute scale for measuring knowledge but examination score indicate the performance of student. In this case study, knowledge of data mining is combined with educational strategies to improve students’ performance. Generally, data mining (sometimes called data or knowledge discovery) is the process of analysing data from different perspectives and summarizing it into useful information. Data mining software is one of a number of analytical tools for data. It allows users to analyse data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational database. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).This project describes the use of clustering data mining technique to improve the efficiency of academic performance in the educational institutions .In this project, a live experiment was conducted on students .By conducting an exam on students of computer science major using MOODLE(LMS) and analysing that data generated using RapidMiner(Datamining Software) and later by performing clustering on the data. This method helps to identify the students who need special advising or counselling by the teacher to give high quality of education.
Ontology Based PMSE with Manifold PreferenceIJCERT
Â
International journal from http://www.ijcert.org
IJCERT Standard on-line Journal
ISSN(Online):2349-7084,(An ISO 9001:2008 Certified Journal)
iso nicir csir
IJCERT (ISSN 2349–7084 (Online)) is approved by National Science Library (NSL), National Institute of Science Communication And Information Resources (NISCAIR), Council of Scientific and Industrial Research, New Delhi, India.
Similar to An Analysis of Outlier Detection through clustering method (18)
Vaccine management system project report documentation..pdfKamal Acharya
Â
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
Student information management system project report ii.pdfKamal Acharya
Â
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
Â
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Courier management system project report.pdfKamal Acharya
Â
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
Final project report on grocery store management system..pdfKamal Acharya
Â
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Â
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
Â
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
2. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-6, Issue-12, Dec-2020]
https://dx.doi.org/10.22161/ijaems.612.13 ISSN: 2454-1311
www.ijaems.com Page | 572
Simulation. Social data can be used for News, DigiCam, and
YouTube.
• Data: a collection of facts usually obtained as the result
of experiences, observations, or experiments
• Data may consist of numbers, words, images, …
• Data: lowest level of abstraction
Data mining Techniques:
There are six such data mining techniques namely Decision
Trees, Sequential Pattern s, Clustering, Prediction,
Association, and Classification, out of which we deal with
Clustering.
Learning Patterns
In data mining, how we now had to split the instruction
process into two classes particularly, Supervised
understanding and Unsupervised understanding. Back in
Supervised mastering, it's a kind of procedure by which the
two entered and desired output numbers have been all
provided. Input and output signal numbers are tagged for
classification to supply an understanding foundation for
prospective data processing system systems. The word
tricked learning stems out of the notion an algorithm is
learning more by your training data set, which is looked at
whilst an educator. The calculations include underneath
Supervised finding out are Conclusion bushes, Similarity
finding out, Bayesian logic, Service vector machines (SVM).
Back in Unsupervised mastering, there's not any requirement
to oversee this version. As an alternative, the version is
permitted to get the job done in its to detect advice. It largely
handles all the unlabelled info. Unsupervised learning
calculations enable us to do more intricate processing
responsibilities in comparison to learning. Even though,
experiential learning could be unpredictable compared with
additional all-natural learning procedures. In this paper, we
focus on Hierarchical Clustering and also Kmeans
Clustering(Arthur & Vassilvitskii, 2006).
Clustering:
Having similar faculties clusters objects need to shape, using
the automatic procedure. We utilize clustering, to specify
lessons. Then suitable things have to place in each class.
Cluster calculations could be categorized based on the
cluster models readily offered based on the type of info
people we try to analyze [2]. In machine learning,
perspective clusters correspond to both hidden patterns, the
search for clusters is still unsupervised understanding, and
the resulting system represents an information concept. From
a practical standpoint, clustering Has an extraordinary role in
data mining programs like scientific information exploration,
information retrieval, and text mining, spatial database
applications, Net analysis, CRM, promotion, health
diagnostics, computational biology, and Others(Manning et
al., 2008).
3. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-6, Issue-12, Dec-2020]
https://dx.doi.org/10.22161/ijaems.612.13 ISSN: 2454-1311
www.ijaems.com Page | 573
Cluster formation mechanism(Xu & Wunsch, 2005)
Cluster-based Ways for outlier detection in Statistics Sets
assist them to develop a set of equal aspects or bunch of
information details. Clustering methods are tremendously
Helpful for grouping related information objects from Data
sets and following this by employing space predicated
calculations, detection of Outliers has been accomplished,
which truly have been termed cluster-based outlier
detection.
The reward of this bunch of established approaches is they
usually would not need to become tricked. Moreover,
clustering established approaches have been capable to be
found in an incremental manner i.e later after learning how
the clusters, so fresh things might be fed into the machine
and also analyzed for outliers. 1 drawback of clustering
established procedures is They Are computationally costly
since they demand computation of pairwise
distances(Bhattacharya et al., 2015).
Clustering established outlier detection is also an
unsupervised outlier detection procedure at which category
tagsas "ordinary" or even"outlier" usually are perhaps not
introduced. Clustering signifies learning observation as
opposed to learning samples. Clustering established an
outlier detection method for expanding information flow
that allots burden to feature based on its own significance
in mining endeavor(Manning et al., 2008).
The outlier detection procedure is also rather effective while
the info out of your database has been segmented into
clusters. In most bunch just about every data,the stage is
called a qualification of their registration. Even the outlier is
discovered with no hindrance from the clustering system.
Clustering on flowing information is distinguished by grid
established and kmeans/k median System(Xu & Wunsch,
2005)].
Hierarchical Clustering
A hierarchical clustering system operates by grouping info
items to some tree of clusters. The standard of the pristine
hierarchical clustering system is affected by the own
inability to do alteration the moment a mix or divide decision
was implemented. In other words, when your special unify
or divide decision afterward ends up to have now already
been a lousy option, then the procedure can't backtrack and
fix it. In hierarchical clustering delegate every single and
every item (info stage) into your bunch. Subsequently,
calculate the length (correlation) among every one of the
clusters and then combine both similar clusters. Let us know
further by resolving a good model.
Dendrogram
Objective: For the one-dimensional data set
{15,20,30,50,63}, perform hierarchical clustering and plot the
dendogram to visualize it.
Solution: First, let’s visualize the data.
4. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-6, Issue-12, Dec-2020]
https://dx.doi.org/10.22161/ijaems.612.13 ISSN: 2454-1311
www.ijaems.com Page | 574
Observing the plot above, we can intuitively conclude that:
1. The first two points (15 and 20) are close to each other
and should be in the same cluster
2. The third point (30) is closer to the first formed
cluster(15, 20). So it is merged with the first cluster
next.
3. Now the newly formed cluster is (15, 20, 30).
4. Then, the next closer clusters are 50 and 63. So merge
them in the next step, i.e. (50, 63).
Merge, in each step, the two clusters, whose two closest
members have the smallest distance.
Two clusters formed are :
Cluster 1 : (15,20,30)
Cluster 2 : (50,63)
Hierarchical clustering is mostly used when the application
requires a hierarchy, e.g creation of a taxonomy. However,
they are expensive in terms of their computational and
storage requirements.
Agglomerative Hierarchical Clustering
This bottom-up strategy starts by placing each object in
its own cluster and then merges these atomic clusters into
larger and larger clusters until all of the objects are in a
single cluster or until certain termination conditions are
satisfied. Most hierarchical clustering methods belong to this
category. They differ only in their definition of inter-cluster
similarity.
The related algorithm is shown below :
Given :
A set of X objects{X1,…Xn}
A distance function dist(c1,c2)
for I = 1 to n
Ci = {Xi}
end for
C = {c1,…,cn}
I = n+1
whilec.size> 1 do
(Cmin1,cmin2)=minimum dist(ci,cj) for all ci, cj in C
Remove cmin1and cmin2 form C
Add {cmin1, cmin2} to C
I = i+1
end while
5. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-6, Issue-12, Dec-2020]
https://dx.doi.org/10.22161/ijaems.612.13 ISSN: 2454-1311
www.ijaems.com Page | 575
K-Means Clustering
Hierarchical clustering is most beneficial at detecting
embedded buildings inside the info. But it neglects in locating
aconsensus' round the complete data set. Hierarchical
clustering can place with clusters that seem shut, however, no
more advice concerning additional things can be thought. K
means procedure simply consider a little neighborhood of
points that are nearby and also additionally don't regard the
complete data set.
At a feeling, k means believes just about each and every
single point from the data set and makes use of this advice to
evolve that the clustering above a succession of iterations. K
means has become easily the hottest clustering algorithm
which reduces the total amount of their within-cluster
variances.
K means have plenty of variants to Boost for particular sorts
of information. To a top degree, All of Them do something
such as that:
K means selections points from multi-purpose distance to
reflect every one of the clusters. All these are termed as
centroids. Every individual will likely probably be nearest to
an inch of those bronchial centroids. They won't always be
nearest to the exact same individual, therefore that they'll
produce a bunch on their closest centroid. That which we are
clusters and just about every affected person is presently an
associate of the bunch. K means subsequently locates out the
center for every one of the k clusters predicated on its own
audience members (yep, employing the affected person
vectors!). This center gets to be the brand's newest centroid
for your own audience. Due to the fact the centroid is at
another place today, sufferers could be more closer to
additional centroids. To put it differently, they can change
audience membership. Duplicate before centroids no-longer
shift, and also the bunch memberships stabilize. That really is
known as convergence.
The essential selling purpose of k means is its own simplicity.
Its ease means that it's generally speedier and better compared
to many other calculations, notably during large data sets. It
becomes easier: k means may be utilized to pre-cluster that a
more gigantic data-set accompanied closely with way of an
expensive audience examination around the sub-clusters.
Kmeans may likewise be utilized to immediately "drama"
with k and research if you will find missed styles or
connections inside the data set.
The flaws of K Means are its own significance to Outliers,
also its particular own sensitivity for the very first selection of
centroids. 1 closing Thing to stay in your mind is how k
means is intended to use on steady statistics. You will find
plenty of implementations to get K Means clustering
accessible on the market, A Few Of these are Apache
Mahout,
A Far More Comprehensive collection of software that uses
outlier detection is:
1. Fraud-detection - discovering deceptive applications
for bank cards, say advantages or discovering fraud
using charge cards or even cellular telephones.
a. Advance application processing company - to -
find fraudulent software or maybe socialize
clients.
b. Intrusion-detection - discovering rapid
accessibility in pc system networks.
c. Task tracking - discovering mobile-phone
fraud by tracking cellphone exercise or
questionable transactions at the equity markets.
d. Network effectiveness - tracking the operation
of personal computer programs, such as to find
system bottlenecks.
e. Fault investigation - tracking procedures to find
flaws in engines, generators, pipelines, or
distance tools on distance shuttles such as.
f. Biomedical flaw detection - tracking producing
lines to find faulty generation operations such
as busted beams.
g. Satellite picture evaluation - pinpointing
publication options or misclassified attributes.
h. Discovering novelties in graphics - to get robot
geotaxis or surveillance procedures.
i. Movement segmentation - discovering image
includes relocating independently of this
desktop.
j. Timeseries tracking - tracking safety vital
software like high-speed or drilling milling.
k. Medical requirement tracking - including as for
example for instance heartrate screens.
l. Pharmaceutical exploration - determining
publication molecular arrangements.
m. Discovering novelty in the text to find the
beginning of information reports, such as
subject detection and monitoring to get dealers
to directly successfully nail fairness, products,
foreign currency trading reports, out-
performing or under-performing goods.
6. International Journal of Advanced Engineering, Management and Science (IJAEMS) [Vol-6, Issue-12, Dec-2020]
https://dx.doi.org/10.22161/ijaems.612.13 ISSN: 2454-1311
www.ijaems.com Page | 576
n. Discovering unexpected entrances in Data
Bases for information Mining to find glitches,
valid or frauds however abrupt entrances(Xu &
Wunsch, 2008).
III. CONCLUSION
This research paper deals with the process of detecting
outlier through the clustering approach. Outlier which is
known as an unusual behavior of any substance present in
the spot. This is a detection process that can be employed for
both anomaly detection and abnormal observation. This can
be obtained through other members who belong to that data
set. The positives and negatives of dealing with K-Means
and Hierarchical Clustering have been discussed. The
algorithm must be modified in order to obtain proper results
for detecting outlier. Further study on this research will
concentrate more on enhancing the algorithm to obtain a
better result.
REFERENCES
[1] Benjelloun, Fatima-Zahra, Ayoub Ait Lahcen, and Samir
Belfkih. "Outlier detection techniques for big data streams:
focus on cybersecurity." International Journal of Internet
Technology and Secured Transactions 9.4 (2019): 446-474.
[2] Sualeh, Muhammad, and Gon-Woo Kim. "Dynamic multi-
lidar based multiple object detection and
tracking." Sensors 19.6 (2019): 1474.
[3] Zhao, Xingwang, et al. "Optimal design of an indoor
environment by the CFD-based adjoint method with area-
constrained topology and cluster analysis." Building and
Environment 138 (2018): 171-180.
[4] Wang, Zhao-Yu, et al. "Weighted z-Distance-Based
Clustering and Its Application to Time-Series Data." Applied
Sciences 9.24 (2019): 5469.
[5] Bezerra, Clauber Gomes, et al. "An evolving approach to
unsupervised and real-time fault detection in industrial
processes." Expert systems with applications 63 (2016): 134-
144.
[6] Bhattacharya, Gautam, Koushik Ghosh, and Ananda S.
Chowdhury. "Outlier detection using neighborhood rank
difference." Pattern Recognition Letters 60 (2015): 24-31.
[7] Pimentel, Marco AF, et al. "A review of novelty
detection." Signal Processing 99 (2014): 215-249.
[8] Manning, Christopher D., Hinrich Schütze, and Prabhakar
Raghavan. Introduction to information retrieval. Cambridge
university press, 2008.
[9] Xu, Rui, and Donald Wunsch. "Survey of clustering
algorithms." IEEE Transactions on neural networks 16.3
(2005): 645-678.
[10] Arthur, David, and Sergei Vassilvitskii. k-means++: The
advantages of careful seeding. Stanford, 2006.
[11] Angelin, B., and A. Geetha. "Outlier Detection using
Clustering Techniques–K-means and K-median." 2020 4th
International Conference on Intelligent Computing and
Control Systems (ICICCS). IEEE, 2020.
[12] Angelin, B., and D. Devakumari MCA. "Outlier Detection
using Clustering Algorithm A Survey."