A statistical data fusion technique in virtual data integration environmentIJDKP
Data fusion in the virtual data integration environment starts after detecting and clustering duplicated
records from the different integrated data sources. It refers to the process of selecting or fusing attribute
values from the clustered duplicates into a single record representing the real world object. In this paper, a
statistical technique for data fusion is introduced based on some probabilistic scores from both data
sources and clustered duplicates
Analysis of Classification Algorithm in Data Miningijdmtaiir
Data Mining is the extraction of hidden predictive
information from large database. Classification is the process
of finding a model that describes and distinguishes data classes
or concept. This paper performs the study of prediction of class
label using C4.5 and Naïve Bayesian algorithm.C4.5 generates
classifiers expressed as decision trees from a fixed set of
examples. The resulting tree is used to classify future samples
.The leaf nodes of the decision tree contain the class name
whereas a non-leaf node is a decision node. The decision node
is an attribute test with each branch (to another decision tree)
being a possible value of the attribute. C4.5 uses information
gain to help it decide which attribute goes into a decision node.
A Naïve Bayesian classifier is a simple probabilistic classifier
based on applying Baye’s theorem with strong (naive)
independence assumptions. Naive Bayesian classifier assumes
that the effect of an attribute value on a given class is
independent of the values of the other attribute. This
assumption is called class conditional independence. The
results indicate that Predicting of class label using Naïve
Bayesian classifier is very effective and simple compared to
C4.5 classifier
Recommendation system using bloom filter in mapreduceIJDKP
Many clients like to use the Web to discover product details in the form of online reviews. The reviews are
provided by other clients and specialists. Recommender systems provide an important response to the
information overload problem as it presents users more practical and personalized information facilities.
Collaborative filtering methods are vital component in recommender systems as they generate high-quality
recommendations by influencing the likings of society of similar users. The collaborative filtering method
has assumption that people having same tastes choose the same items. The conventional collaborative
filtering system has drawbacks as sparse data problem & lack of scalability. A new recommender system is
required to deal with the sparse data problem & produce high quality recommendations in large scale
mobile environment. MapReduce is a programming model which is widely used for large-scale data
analysis. The described algorithm of recommendation mechanism for mobile commerce is user based
collaborative filtering using MapReduce which reduces scalability problem in conventional CF system.
One of the essential operations for the data analysis is join operation. But MapReduce is not very
competent to execute the join operation as it always uses all records in the datasets where only small
fraction of datasets are applicable for the join operation. This problem can be reduced by applying
bloomjoin algorithm. The bloom filters are constructed and used to filter out redundant intermediate
records. The proposed algorithm using bloom filter will reduce the number of intermediate results and will
improve the join performance.
Data mining is a process to extract information from a huge amount of data and transform it into an
understandable structure. Data mining provides the number of tasks to extract data from large databases such
as Classification, Clustering, Regression, Association rule mining. This paper provides the concept of
Classification. Classification is an important data mining technique based on machine learning which is used to
classify the each item on the bases of features of the item with respect to the predefined set of classes or groups.
This paper summarises various techniques that are implemented for the classification such as k-NN, Decision
Tree, Naïve Bayes, SVM, ANN and RF. The techniques are analyzed and compared on the basis of their
advantages and disadvantages
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER
Data mining environment produces a large amount of data that need to be analyzed.
Using traditional databases and architectures, it has become difficult to process, manage and analyze
patterns. To gain knowledge about the Big Data a proper architecture should be understood.
Classification is an important data mining technique with broad applications to classify the various
kinds of data used in nearly every field of our life. Classification is used to classify the item
according to the features of the item with respect to the predefined set of classes. This paper put a
light on various classification algorithms including j48, C4.5, Naive Bayes using large dataset.
A statistical data fusion technique in virtual data integration environmentIJDKP
Data fusion in the virtual data integration environment starts after detecting and clustering duplicated
records from the different integrated data sources. It refers to the process of selecting or fusing attribute
values from the clustered duplicates into a single record representing the real world object. In this paper, a
statistical technique for data fusion is introduced based on some probabilistic scores from both data
sources and clustered duplicates
Analysis of Classification Algorithm in Data Miningijdmtaiir
Data Mining is the extraction of hidden predictive
information from large database. Classification is the process
of finding a model that describes and distinguishes data classes
or concept. This paper performs the study of prediction of class
label using C4.5 and Naïve Bayesian algorithm.C4.5 generates
classifiers expressed as decision trees from a fixed set of
examples. The resulting tree is used to classify future samples
.The leaf nodes of the decision tree contain the class name
whereas a non-leaf node is a decision node. The decision node
is an attribute test with each branch (to another decision tree)
being a possible value of the attribute. C4.5 uses information
gain to help it decide which attribute goes into a decision node.
A Naïve Bayesian classifier is a simple probabilistic classifier
based on applying Baye’s theorem with strong (naive)
independence assumptions. Naive Bayesian classifier assumes
that the effect of an attribute value on a given class is
independent of the values of the other attribute. This
assumption is called class conditional independence. The
results indicate that Predicting of class label using Naïve
Bayesian classifier is very effective and simple compared to
C4.5 classifier
Recommendation system using bloom filter in mapreduceIJDKP
Many clients like to use the Web to discover product details in the form of online reviews. The reviews are
provided by other clients and specialists. Recommender systems provide an important response to the
information overload problem as it presents users more practical and personalized information facilities.
Collaborative filtering methods are vital component in recommender systems as they generate high-quality
recommendations by influencing the likings of society of similar users. The collaborative filtering method
has assumption that people having same tastes choose the same items. The conventional collaborative
filtering system has drawbacks as sparse data problem & lack of scalability. A new recommender system is
required to deal with the sparse data problem & produce high quality recommendations in large scale
mobile environment. MapReduce is a programming model which is widely used for large-scale data
analysis. The described algorithm of recommendation mechanism for mobile commerce is user based
collaborative filtering using MapReduce which reduces scalability problem in conventional CF system.
One of the essential operations for the data analysis is join operation. But MapReduce is not very
competent to execute the join operation as it always uses all records in the datasets where only small
fraction of datasets are applicable for the join operation. This problem can be reduced by applying
bloomjoin algorithm. The bloom filters are constructed and used to filter out redundant intermediate
records. The proposed algorithm using bloom filter will reduce the number of intermediate results and will
improve the join performance.
Data mining is a process to extract information from a huge amount of data and transform it into an
understandable structure. Data mining provides the number of tasks to extract data from large databases such
as Classification, Clustering, Regression, Association rule mining. This paper provides the concept of
Classification. Classification is an important data mining technique based on machine learning which is used to
classify the each item on the bases of features of the item with respect to the predefined set of classes or groups.
This paper summarises various techniques that are implemented for the classification such as k-NN, Decision
Tree, Naïve Bayes, SVM, ANN and RF. The techniques are analyzed and compared on the basis of their
advantages and disadvantages
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER
Data mining environment produces a large amount of data that need to be analyzed.
Using traditional databases and architectures, it has become difficult to process, manage and analyze
patterns. To gain knowledge about the Big Data a proper architecture should be understood.
Classification is an important data mining technique with broad applications to classify the various
kinds of data used in nearly every field of our life. Classification is used to classify the item
according to the features of the item with respect to the predefined set of classes. This paper put a
light on various classification algorithms including j48, C4.5, Naive Bayes using large dataset.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASEIJwest
Clustering is categorizing data into groups with similar objects. Data mining adds to complexities of clustering a large dataset with various features. Among these datasets, there are electronic business stores which offer their products through web. These stores require recommendation systems which can offer products to the user which the user might require them with higher probability. In this study, previous purchases of users are used to present a sorted list of products to the user. Identifying associations related to users and finding centers increases precision of the recommended list. Configuration of associations and creating a profile for users is important in current studies. In the proposed method, association rules are presented to model user interactions in the web which use time that a page is visited and frequency of visiting a page to weight pages and describes users’ interest to page groups. Therefore, weight of each transaction item describes user’s interest in that item. Analyzing results show that the proposed method presents a more complete model of users’ behavior because it combines weight and membership degree of pages simultaneously for ranking candidate pages. This method has obtained higher accuracy compared to other methods even in higher number of pages.
Abstract In this paper, the concept of data mining was summarized and its significance towards its methodologies was illustrated. The data mining based on Neural Network and Genetic Algorithm is researched in detail and the key technology and ways to achieve the data mining on Neural Network and Genetic Algorithm are also surveyed. This paper also conducts a formal review of the area of rule extraction from ANN and GA. Keywords: Data Mining, Neural Network, Genetic Algorithm, Rule Extraction.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Data mining is an integrated field, depicted technologies in combination to the areas having database, learning by machine, statistical study, and recognition in patterns of same type, information regeneration, A.I networks, knowledge-based portfolios, artificial intelligence, neural network, and data determination. In real terms, mining of data is the investigation of provisional data sets for finding hidden connections and to gather the information in peculiar form which are justifiable and understandable to the owner of gather or mined data. An unsupervised formula which differentiate data components into collections by which the components in similar group are more allied to one other and items in rest of cluster seems to be non-allied, by the criteria of measurement of equality or predictability is called process of clustering. Cluster analysis is a relegating task that is utilized to identify same group of object and it is additionally one of the most widely used method for many practical application in data mining. It is a method of grouping objects, where objects can be physical, such as a student or may be a summary such as customer comportment, handwriting. It has been proposed many clustering algorithms that it falls into the different clustering methods. The intention of this paper is to provide a relegation of some prominent clustering algorithms.
With the development of database, the data volume stored in database increases rapidly and in the large
amounts of data much important information is hidden. If the information can be extracted from the
database they will create a lot of profit for the organization. The question they are asking is how to extract
this value. The answer is data mining. There are many technologies available to data mining practitioners,
including Artificial Neural Networks, Genetics, Fuzzy logic and Decision Trees. Many practitioners are
wary of Neural Networks due to their black box nature, even though they have proven themselves in many
situations. This paper is an overview of artificial neural networks and questions their position as a
preferred tool by data mining practitioners.
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
Distributed digital artifacts incorporate cryptographic hash values to URI called trusty URIs in a distributed environment
building good in quality, verifiable and unchangeable web resources to prevent the rising man in the middle attack. The greatest
challenge of a centralized system is that it gives users no possibility to check whether data have been modified and the communication
is limited to a single server. As a solution for this, is the distributed digital artifact system, where resources are distributed among
different domains to enable inter-domain communication. Due to the emerging developments in web, attacks have increased rapidly,
among which man in the middle attack (MIMA) is a serious issue, where user security is at its threat. This work tries to prevent MIMA
to an extent, by providing self reference and trusty URIs even when presented in a distributed environment. Any manipulation to the
data is efficiently identified and any further access to that data is blocked by informing user that the uniform location has been
changed. System uses self-reference to contain trusty URI for each resource, lineage algorithm for generating seed and SHA-512 hash
generation algorithm to ensure security. It is implemented on the semantic web, which is an extension to the world wide web, using
RDF (Resource Description Framework) to identify the resource. Hence the framework was developed to overcome existing
challenges by making the digital artifacts on the semantic web distributed to enable communication between different domains across
the network securely and thereby preventing MIMA.
Introduction to feature subset selection methodIJSRD
Data Mining is a computational progression to ascertain patterns in hefty data sets. It has various important techniques and one of them is Classification which is receiving great attention recently in the database community. Classification technique can solve several problems in different fields like medicine, industry, business, science. PSO is based on social behaviour for optimization problem. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Rough Set Theory (RST) is a mathematical tool which deals with the uncertainty and vagueness of the decision systems.
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
Data mining environment produces a large amount of data, that need to be
analyses, pattern have to be extracted from that to gain knowledge. In this new period with
rumble of data both ordered and unordered, by using traditional databases and architectures, it
has become difficult to process, manage and analyses patterns. To gain knowledge about the
Big Data a proper architecture should be understood. Classification is an important data mining
technique with broad applications to classify the various kinds of data used in nearly every
field of our life. Classification is used to classify the item according to the features of the item
with respect to the predefined set of classes. This paper provides an inclusive survey of
different classification algorithms and put a light on various classification algorithms including
j48, C4.5, k-nearest neighbor classifier, Naive Bayes, SVM etc., using random concept.
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...csandit
Feature selection is a problem closely related to dimensionality reduction. A commonly used
approach in feature selection is ranking the individual features according to some criteria and
then search for an optimal feature subset based on an evaluation criterion to test the optimality.
The objective of this work is to predict more accurately the presence of Learning Disability
(LD) in school-aged children with reduced number of symptoms. For this purpose, a novel
hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The approach follows
a ranking of the symptoms of LD according to their importance in the data domain. Each
symptoms significance or priority values reflect its relative importance to predict LD among the
various cases. Then by eliminating least significant features one by one and evaluating the
feature subset at each stage of the process, an optimal feature subset is generated. The
experimental results shows the success of the proposed method in removing redundant
attributes efficiently from the LD dataset without sacrificing the classification performance.
Measurement of Efficiency Level in Nigerian Seaport after Reform Policy Imple...IOSR Journals
This paper focuses on the impact of reforms on port performance using Onne and Rivers ports as a reference point. It analyses the pre and post reform eras of the ports in terms of their performance. The reforms took effect from 1996 after the Federal Government of Nigeria concessioned the ports to private investors. Parameters such as Ship traffic, Cargo throughput, Ship turn round time, Berth Occupancy and personnel were used as variables for the assessment. Secondary Data were collected from the Nigerian Ports Authority and Integrated Logistic Services Nigeria (Intels) for the period 2001 to 2010 and analyzed using Data Envelopment Analysis to assess the efficiency of the port. Analysis revealed a continuous improvement in the overall efficiency of both Ports Since 2006 when the new measure was introduced. Average Ship turn-around time improved in the ports due to modern and fast cargo handling equipment and more cargo handling space which were provided. There is an increase in Ship traffic calling at the ports, resulting in increased cargo throughput and berth occupancy rate at ports of Onne and Rivers. The reform also led to more private investment in the ports’ existing and new facilities and the introduction of a World Class service in port operation. This study concludes that the Ports of Onne and Rivers are performing better under the reform programme of the Federal Government of Nigeria. It finally recommends the urgent need for a regulator to appraise the performance of the reform programme from time to time as provided by the agreement and for the full adoption and utilization of management information system (MIS) to aid performance efficiency.
m - projective curvature tensor on a Lorentzian para – Sasakian manifoldsIOSR Journals
In this paper we studied m-projectively flat, m-projectively conservative, 𝜑-m-projectively flat LP-Sasakian manifold. It has also been proved that quasi m- projectively flat LP-Sasakian manifold is locally isometric to the unit sphere 𝑆𝑛(1) if and only if 𝑀𝑛 is m-projectively flat.
Corporate Governance, Firm Size, and Earning Management: Evidence in Indonesi...IOSR Journals
Purpose –Thepurpose of this paper is to evaluate the impact of the corporate governance regulationsimplementation and firm size onthe earning management for food and beverages companies in Indonesian Stock Exchange. Design/methodology/approach –The multiple regression is utilized to test this relationship at 95% confidence.Corporate governance was proxied by board of director, audit quality, and board independence. Firm size was represented by natural logarithm of total assets. Earning management was measured by Jones model withdiscretionary accruals. Findings – Using data from the year 2005 annual reports of 51 food and beverages listed companies,including the composite index, the results showed that twoof the corporate governance variables, namely board of director and audit quality, as well as firm size are statistically significant in explaining earning management measured bydiscretionary accruals. Research limitations/implications – The regulations on corporate governance were implementedin 2005, but not all of food and beverages listed companies implemented the regulations in 2005. Practical implications – An implication of this finding is that regulatory efforts initiated after the1997 financial crisis to enhance corporate transparency and accountability did not appear to result on better corporate performance. Originality/value – This is one of the few studies which investigates the impact of regulatory actionson corporate governance on earning management immediately after its implementation.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASEIJwest
Clustering is categorizing data into groups with similar objects. Data mining adds to complexities of clustering a large dataset with various features. Among these datasets, there are electronic business stores which offer their products through web. These stores require recommendation systems which can offer products to the user which the user might require them with higher probability. In this study, previous purchases of users are used to present a sorted list of products to the user. Identifying associations related to users and finding centers increases precision of the recommended list. Configuration of associations and creating a profile for users is important in current studies. In the proposed method, association rules are presented to model user interactions in the web which use time that a page is visited and frequency of visiting a page to weight pages and describes users’ interest to page groups. Therefore, weight of each transaction item describes user’s interest in that item. Analyzing results show that the proposed method presents a more complete model of users’ behavior because it combines weight and membership degree of pages simultaneously for ranking candidate pages. This method has obtained higher accuracy compared to other methods even in higher number of pages.
Abstract In this paper, the concept of data mining was summarized and its significance towards its methodologies was illustrated. The data mining based on Neural Network and Genetic Algorithm is researched in detail and the key technology and ways to achieve the data mining on Neural Network and Genetic Algorithm are also surveyed. This paper also conducts a formal review of the area of rule extraction from ANN and GA. Keywords: Data Mining, Neural Network, Genetic Algorithm, Rule Extraction.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Data mining is an integrated field, depicted technologies in combination to the areas having database, learning by machine, statistical study, and recognition in patterns of same type, information regeneration, A.I networks, knowledge-based portfolios, artificial intelligence, neural network, and data determination. In real terms, mining of data is the investigation of provisional data sets for finding hidden connections and to gather the information in peculiar form which are justifiable and understandable to the owner of gather or mined data. An unsupervised formula which differentiate data components into collections by which the components in similar group are more allied to one other and items in rest of cluster seems to be non-allied, by the criteria of measurement of equality or predictability is called process of clustering. Cluster analysis is a relegating task that is utilized to identify same group of object and it is additionally one of the most widely used method for many practical application in data mining. It is a method of grouping objects, where objects can be physical, such as a student or may be a summary such as customer comportment, handwriting. It has been proposed many clustering algorithms that it falls into the different clustering methods. The intention of this paper is to provide a relegation of some prominent clustering algorithms.
With the development of database, the data volume stored in database increases rapidly and in the large
amounts of data much important information is hidden. If the information can be extracted from the
database they will create a lot of profit for the organization. The question they are asking is how to extract
this value. The answer is data mining. There are many technologies available to data mining practitioners,
including Artificial Neural Networks, Genetics, Fuzzy logic and Decision Trees. Many practitioners are
wary of Neural Networks due to their black box nature, even though they have proven themselves in many
situations. This paper is an overview of artificial neural networks and questions their position as a
preferred tool by data mining practitioners.
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
Distributed digital artifacts incorporate cryptographic hash values to URI called trusty URIs in a distributed environment
building good in quality, verifiable and unchangeable web resources to prevent the rising man in the middle attack. The greatest
challenge of a centralized system is that it gives users no possibility to check whether data have been modified and the communication
is limited to a single server. As a solution for this, is the distributed digital artifact system, where resources are distributed among
different domains to enable inter-domain communication. Due to the emerging developments in web, attacks have increased rapidly,
among which man in the middle attack (MIMA) is a serious issue, where user security is at its threat. This work tries to prevent MIMA
to an extent, by providing self reference and trusty URIs even when presented in a distributed environment. Any manipulation to the
data is efficiently identified and any further access to that data is blocked by informing user that the uniform location has been
changed. System uses self-reference to contain trusty URI for each resource, lineage algorithm for generating seed and SHA-512 hash
generation algorithm to ensure security. It is implemented on the semantic web, which is an extension to the world wide web, using
RDF (Resource Description Framework) to identify the resource. Hence the framework was developed to overcome existing
challenges by making the digital artifacts on the semantic web distributed to enable communication between different domains across
the network securely and thereby preventing MIMA.
Introduction to feature subset selection methodIJSRD
Data Mining is a computational progression to ascertain patterns in hefty data sets. It has various important techniques and one of them is Classification which is receiving great attention recently in the database community. Classification technique can solve several problems in different fields like medicine, industry, business, science. PSO is based on social behaviour for optimization problem. Feature Selection (FS) is a solution that involves finding a subset of prominent features to improve predictive accuracy and to remove the redundant features. Rough Set Theory (RST) is a mathematical tool which deals with the uncertainty and vagueness of the decision systems.
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
Data mining environment produces a large amount of data, that need to be
analyses, pattern have to be extracted from that to gain knowledge. In this new period with
rumble of data both ordered and unordered, by using traditional databases and architectures, it
has become difficult to process, manage and analyses patterns. To gain knowledge about the
Big Data a proper architecture should be understood. Classification is an important data mining
technique with broad applications to classify the various kinds of data used in nearly every
field of our life. Classification is used to classify the item according to the features of the item
with respect to the predefined set of classes. This paper provides an inclusive survey of
different classification algorithms and put a light on various classification algorithms including
j48, C4.5, k-nearest neighbor classifier, Naive Bayes, SVM etc., using random concept.
Feature Selection : A Novel Approach for the Prediction of Learning Disabilit...csandit
Feature selection is a problem closely related to dimensionality reduction. A commonly used
approach in feature selection is ranking the individual features according to some criteria and
then search for an optimal feature subset based on an evaluation criterion to test the optimality.
The objective of this work is to predict more accurately the presence of Learning Disability
(LD) in school-aged children with reduced number of symptoms. For this purpose, a novel
hybrid feature selection approach is proposed by integrating a popular Rough Set based feature
ranking process with a modified backward feature elimination algorithm. The approach follows
a ranking of the symptoms of LD according to their importance in the data domain. Each
symptoms significance or priority values reflect its relative importance to predict LD among the
various cases. Then by eliminating least significant features one by one and evaluating the
feature subset at each stage of the process, an optimal feature subset is generated. The
experimental results shows the success of the proposed method in removing redundant
attributes efficiently from the LD dataset without sacrificing the classification performance.
Measurement of Efficiency Level in Nigerian Seaport after Reform Policy Imple...IOSR Journals
This paper focuses on the impact of reforms on port performance using Onne and Rivers ports as a reference point. It analyses the pre and post reform eras of the ports in terms of their performance. The reforms took effect from 1996 after the Federal Government of Nigeria concessioned the ports to private investors. Parameters such as Ship traffic, Cargo throughput, Ship turn round time, Berth Occupancy and personnel were used as variables for the assessment. Secondary Data were collected from the Nigerian Ports Authority and Integrated Logistic Services Nigeria (Intels) for the period 2001 to 2010 and analyzed using Data Envelopment Analysis to assess the efficiency of the port. Analysis revealed a continuous improvement in the overall efficiency of both Ports Since 2006 when the new measure was introduced. Average Ship turn-around time improved in the ports due to modern and fast cargo handling equipment and more cargo handling space which were provided. There is an increase in Ship traffic calling at the ports, resulting in increased cargo throughput and berth occupancy rate at ports of Onne and Rivers. The reform also led to more private investment in the ports’ existing and new facilities and the introduction of a World Class service in port operation. This study concludes that the Ports of Onne and Rivers are performing better under the reform programme of the Federal Government of Nigeria. It finally recommends the urgent need for a regulator to appraise the performance of the reform programme from time to time as provided by the agreement and for the full adoption and utilization of management information system (MIS) to aid performance efficiency.
m - projective curvature tensor on a Lorentzian para – Sasakian manifoldsIOSR Journals
In this paper we studied m-projectively flat, m-projectively conservative, 𝜑-m-projectively flat LP-Sasakian manifold. It has also been proved that quasi m- projectively flat LP-Sasakian manifold is locally isometric to the unit sphere 𝑆𝑛(1) if and only if 𝑀𝑛 is m-projectively flat.
Corporate Governance, Firm Size, and Earning Management: Evidence in Indonesi...IOSR Journals
Purpose –Thepurpose of this paper is to evaluate the impact of the corporate governance regulationsimplementation and firm size onthe earning management for food and beverages companies in Indonesian Stock Exchange. Design/methodology/approach –The multiple regression is utilized to test this relationship at 95% confidence.Corporate governance was proxied by board of director, audit quality, and board independence. Firm size was represented by natural logarithm of total assets. Earning management was measured by Jones model withdiscretionary accruals. Findings – Using data from the year 2005 annual reports of 51 food and beverages listed companies,including the composite index, the results showed that twoof the corporate governance variables, namely board of director and audit quality, as well as firm size are statistically significant in explaining earning management measured bydiscretionary accruals. Research limitations/implications – The regulations on corporate governance were implementedin 2005, but not all of food and beverages listed companies implemented the regulations in 2005. Practical implications – An implication of this finding is that regulatory efforts initiated after the1997 financial crisis to enhance corporate transparency and accountability did not appear to result on better corporate performance. Originality/value – This is one of the few studies which investigates the impact of regulatory actionson corporate governance on earning management immediately after its implementation.
A optimized process for the synthesis of a key starting material for etodolac...IOSR Journals
Abstract An optimized process developed for the synthesis of 7-ethyltryptophol, a key starting material for etodolac, a non steroidal anti- inflammatory drug. Starting from commercially available 2-ethylphenylhydrazine. HCl and dihydro furan with con. H2SO4 as a catalyst in N, N- dimethyl acetamide ( DMAc). H2O (1:1) as a solvent in 75% yield . the method is easy, inexpensive , without purification getting pure solid. The process is very clean, high yielding & high quality and operationally simple.
Keywords: Etodolac, 7-ethyl tryptophol, 2-ethyl phenyl hydrazine hydrochloride, N,N-dimethyl acetamide.
Ethnobotanical Euphorbian plants of North Maharashtra RegionIOSR Journals
Euphorbiaceae is among the large flowering plant families consisting of a wide variety of vegetative
forms. Some of which plants are of great importance, It is need to explore traditional medicinal knowledge of
plant materials belonging to various genera of Euphorbiaceae available in North Maharashtra State. Plants
have always been the source of food, medicine and other necessities of life since the origin of human being.
Plant containing ethnomedicinal properties have been known and used in some forms or other tribal
communities of Satpuda region. These tribal have their own system of Ethnomedicine for the treatment of
different ailments. In the course of survey useful Euphorbian plants of Satpuda, 34 medicinal plants belonging
to 18 genus is documented. This article reports their botanical identity, family name, local language name part
used preparations and doses, if any. It is observed that tribes of this region uses various Euphorbian plant in
the form of decoction, infusion, extract, paste, powder etc. Thus the knowledge area of this region with respect
to ethnomedicine would be useful for botanist, pharmacologist and phytochemist for further explorations. It is
concluded that the family is a good starting point for the search for plant-based medicines.
To Study The Viscometric Measurement Of Substituted-2-Diphenylbutanamide And ...IOSR Journals
Recently in this laboratory the viscometric measurement of 4-[4-(4-chlorophenyl]-4-hydroxy piperidin-1-yl]-N, N-dimethyl-2, 2-diphenylbutanamide[CPHDD] and (2S, 6R)-7-chloro -2, 4, 6-trimethoxy-6'-methyl-3H, 4'H-spiro[1-benzofuran 2, 1’-] cychohex-2-ene]-3,4'-dione[CTMBCD] were carried out at different percentage compositions of solvent to investigate the solute-solvent interactions of drugs with solvent and the effect of dilution of the solvent. The effects of various substituents were also investigated. The results obtained during this investigation gave detail information about pharmacokinetics and pharmacodynamics of these drugs.
Configuring Associations to Increase Trust in Product Purchase dannyijwest
Clustering is categorizing data into groups with similar objects. Data mining adds to complexities of clustering a large dataset with various features. Among these datasets, there are electronic business stores which offer their products through web. These stores require recommendation systems which can offer products to the user which the user might require them with higher probability. In this study, previous purchases of users are used to present a sorted list of products to the user. Identifying associations related to users and finding centers increases precision of the recommended list. Configuration of associations and creating a profile for users is important in current studies. In the proposed method, association rules are presented to model user interactions in the web which use time that a page is visited and frequency of visiting a page to weight pages and describes users’ interest to page groups. Therefore, weight of each transaction item describes user’s interest in that item. Analyzing results show that the proposed method presents a more complete model of users’ behavior because it combines weight and membership degree of pages simultaneously for ranking candidate pages. This method has obtained higher accuracy compared to other methods even in higher number of pages.
Introduction to Multi-Objective Clustering EnsembleIJSRD
Association rule mining is a popular and well researched method for discovering interesting relations between variables in large databases. In this paper we introduce the concept of Data mining, Association rule and Multilevel association rule with different algorithm, its advantage and concept of Fuzzy logic and Genetic Algorithm. Multilevel association rules can be mined efficiently using concept hierarchies under a support-confidence framework.
The development of data mining is inseparable from the recent developments in information technology that enables the accumulation of large amounts of data. For example, a shopping mall that records every sales transaction of goods using various POS (point of sales). Database data from these sales could reach a large storage capacity, even more being added each day, especially when the shopping center will develop into a nationwide network. The development of the internet at the moment also has a share large enough in the accumulation of data occurs. But the rapid growth of data accumulation it has created conditions that are often referred to as "data rich but information poor" because the data collected can not be used optimally for useful applications. Not infrequently the data set was left just seemed to be a "grave data". There are several techniques used in data mining which includes association, classification, and clustering. In this paper, the author will do a comparison between the performance of the technical classification methods naïve Bayes and C4.5 algorithms.
A SURVEY ON DATA MINING IN STEEL INDUSTRIESIJCSES Journal
In Industrial environments, huge amount of data is being generated which in turn collected indatabase anddata warehouses from all involved areas such as planning, process design, materials, assembly, production, quality, process control, scheduling, fault detection,shutdown, customer relation management, and so on. Data Mining has become auseful tool for knowledge acquisition for industrial process of Iron and steel making. Due to the rapid growth in Data Mining, various industries started using data mining technology to search the hidden patterns, which might further be used to the system with the new knowledge which might design new models to enhance the production quality, productivity optimum cost and maintenance etc. The continuous improvement of all steel production process regarding the avoidance of quality deficiencies and the related improvement of production yield is an essential task of steel producer. Therefore, zero defect strategy is popular today and to maintain it several quality assurancetechniques areused. The present report explains the methods of data mining and describes its application in the industrial environment and especially, in the steel industry.
Certain Investigation on Dynamic Clustering in Dynamic Dataminingijdmtaiir
Clustering is the process of grouping a set of objects
into classes of similar objects. Dynamic clustering comes in a
new research area that is concerned about dataset with dynamic
aspects. It requires updates of the clusters whenever new data
records are added to the dataset and may result in a change of
clustering over time. When there is a continuous update and
huge amount of dynamic data, rescan the database is not
possible in static data mining. But this is possible in Dynamic
data mining process. This dynamic data mining occurs when
the derived information is present for the purpose of analysis
and the environment is dynamic, i.e. many updates occur.
Since this has now been established by most researchers and
they will move into solving some of the problems and the
research is to concentrate on solving the problem of using data
mining dynamic databases. This paper gives some
investigation of existing work done in some papers related with
dynamic clustering and incremental data clustering
A Survey on Constellation Based Attribute Selection Method for High Dimension...IJERA Editor
Attribute Selection is an important topic in Data Mining, because it is the effective way for reducing dimensionality, removing irrelevant data, removing redundant data, & increasing accuracy of the data. It is the process of identifying a subset of the most useful attributes that produces compatible results as the original entire set of attribute. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group called a cluster are more similar in some sense or another to each other than to those in other groups (Clusters). There are various approaches & techniques for attribute subset selection namely Wrapper approach, Filter Approach, Relief Algorithm, Distributional clustering etc. But each of one having some disadvantages like unable to handle large volumes of data, computational complexity, accuracy is not guaranteed, difficult to evaluate and redundancy detection etc. To get the upper hand on some of these issues in attribute selection this paper proposes a technique that aims to design an effective clustering based attribute selection method for high dimensional data. Initially, attributes are divided into clusters by using graph-based clustering method like minimum spanning tree (MST). In the second step, the most representative attribute that is strongly related to target classes is selected from each cluster to form a subset of attributes. The purpose is to increase the level of accuracy, reduce dimensionality; shorter training time and improves generalization by reducing over fitting.
Classification on multi label dataset using rule mining techniqueeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...theijes
Data mining works to extract information known in advance from the enormous quantities of data which can lead to knowledge. It provides information that helps to make good decisions. The effectiveness of data mining in access to knowledge to achieve the goal of which is the discovery of the hidden facts contained in databases and through the use of multiple technologies. Clustering is organizing data into clusters or groups such that they have high intra-cluster similarity and low inter cluster similarity. This paper deals with K-means clustering algorithm which collect a number of data based on the characteristics and attributes of this data, and process the Clustering by reducing the distances between the data center. This algorithm is applied using open source tool called WEKA, with the Insurance dataset as its input
Assessment of Cluster Tree Analysis based on Data Linkagesjournal ijrtem
Abstract: Details linkage is a procedure which almost adjoins two or more places of data (surveyed or proprietary) from different companies to generate a value chest of information which can be used for further analysis. This allows for the real application of the details. One-to-Many data linkage affiliates an enterprise from the first data set with a number of related companies from the other data places. Before performs concentrate on accomplishing one-to-one data linkages. So formerly a two level clustering shrub known as One-Class Clustering Tree (OCCT) with designed in Jaccard Likeness evaluate was suggested in which each flyer contains team instead of only one categorized sequence. OCCT's strategy to use Jaccard's similarity co-efficient increases time complexness significantly. So we recommend to substitute jaccard's similarity coefficient with Jaro wrinket similarity evaluate to acquire the team similarity related because it requires purchase into consideration using positional indices to calculate relevance compared with Jaccard's. An assessment of our suggested idea suffices as approval of an enhanced one-to-many data linkage system.
Index Terms: Maximum-Weighted Bipartite Matching, Ant Colony Optimization, Graph Partitioning Technique
Analysis on different Data mining Techniques and algorithms used in IOTIJERA Editor
In this paper, we discusses about five functionalities of data mining in IOT that affects the performance and that
are: Data anomaly detection, Data clustering, Data classification, feature selection, time series prediction. Some
important algorithm has also been reviewed here of each functionalities that show advantages and limitations as
well as some new algorithm that are in research direction. Here we had represent knowledge view of data
mining in IOT.
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...cscpconf
Data Mining is one of the most significant tools for discovering association patterns that are useful for many knowledge domains. Yet, there are some drawbacks in existing mining techniques. Three main weaknesses of current data-mining techniques are: 1) re-scanning of the entire database must be done whenever new attributes are added. 2) An association rule may be true on a certain granularity but fail on a smaller ones and vise verse. 3) Current methods can only be used to find either frequent rules or infrequent rules, but not both at the same time. This research proposes a novel data schema and an algorithm that solves the above weaknesses while improving on the efficiency and effectiveness of data mining strategies. Crucial mechanisms in each step will be clarified in this paper. Finally, this paper presents experimental results regarding efficiency, scalability, information loss, etc. of the proposed approach to prove its advantages.
Hypothesis on Different Data Mining AlgorithmsIJERA Editor
In this paper, different classification algorithms for data mining are discussed. Data Mining is about
explaining the past & predicting the future by means of data analysis. Classification is a task of data mining,
which categories data based on numerical or categorical variables. To classify the data many algorithms are
proposed, out of them five algorithms are comparatively studied for data mining through classification. There are
four different classification approaches namely Frequency Table, Covariance Matrix, Similarity Functions &
Others. As work for research on classification methods, algorithms like Naive Bayesian, K Nearest Neighbors,
Decision Tree, Artificial Neural Network & Support Vector Machine are studied & examined using benchmark
datasets like Iris & Lung Cancer.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Hierarchical Digital Twin of a Naval Power SystemKerry Sado
A hierarchical digital twin of a Naval DC power system has been developed and experimentally verified. Similar to other state-of-the-art digital twins, this technology creates a digital replica of the physical system executed in real-time or faster, which can modify hardware controls. However, its advantage stems from distributing computational efforts by utilizing a hierarchical structure composed of lower-level digital twin blocks and a higher-level system digital twin. Each digital twin block is associated with a physical subsystem of the hardware and communicates with a singular system digital twin, which creates a system-level response. By extracting information from each level of the hierarchy, power system controls of the hardware were reconfigured autonomously. This hierarchical digital twin development offers several advantages over other digital twins, particularly in the field of naval power systems. The hierarchical structure allows for greater computational efficiency and scalability while the ability to autonomously reconfigure hardware controls offers increased flexibility and responsiveness. The hierarchical decomposition and models utilized were well aligned with the physical twin, as indicated by the maximum deviations between the developed digital twin hierarchy and the hardware.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
1. IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 1, Ver. V (Jan – Feb. 2015), PP 33-42
www.iosrjournals.org
DOI: 10.9790/0661-17153342 www.iosrjournals.org 33 | Page
Data mining Algorithm’s Variant Analysis
1
Kiranpreet, 2
Ramandeep Kaur, 3
Sourabh Sahota, 4
Prince Verma
1
M.Tech CSE Dept., 2
M.Tech CSE Dept, 3
M.Tech CSE Dept, 4
Asst. professor CSE Dept.
CT Institute of Engg , Mgt.& Tech, Jalandhar (India)
Abstract: The Data Mining is extricating or mining information from extensive volume of information.
Information mining frequently includes the examination of information put away in an information distribution
center. This paper examines the three noteworthy information mining calculations: Association, classification
and clustering. Classification is the procedure of looking connections between variables in gave database.
Clustering comprises of anticipating a certain result focused around given information. Clustering is a
procedure of dividing a set of data(or items) into a set of important sub-classes, called clusters. Execution of the
3 techniques is exhibited utilizing WEKA apparatus.
Keyword Terms: Data mining, Weka tools, Association, Classification and clustering algorithms.
I. Introduction
Data mining discovers these examples and connections utilizing information investigation apparatuses
and systems to assemble models. There are two sorts of models in information mining. One is prescient models
i.e the methodology by which a model is made or decided to attempt to best anticipate the likelihood of a result..
An alternate is spellbinding models, is a scientific process that depicts certifiable occasions and the connections
between elements in charge of them. The term Data Mining, otherwise called Knowledge Discovery in
Databases (KDD) alludes to the nontrivial extraction of implied, conceivably valuable and already obscure data
from information in databases [12]. There are a few information mining procedure are Association,
Classification and Clustering [7]. Association principle learning is prominent and well hunt research system
down revelation intriguing connection between variable in vast databases. Classification speaks to a managed
learning strategy and additionally a measurable system for order. Classification calculations use distinctive
procedures for discovering relations between the indicator characteristics' qualities and the focus on quality's
qualities in the assemble information. Clustering is the undertaking of relegating a set of articles into gatherings
(called groups) with the goal that the items in the same group are more comparable (in some sense or an
alternate) to one another than to those in other clusters [13].
1.1 Data Mining Tasks
Data mining may involve six classes of tasks in common:
1. Anomaly detection – It is identification of unusual data records that might have errors and are un-interesting.
2. Association rule learning (Dependency modeling) – It is the process of searching relationships between
variables in provided database.
3.Clustering – It is the task of discovering groups and structures in the data that can be similar in some way or
another, without using any known structures.
4. Classification – It is the process of generalizing known structure to apply to new data.
5.Regression – It attempts to find a function that create model of data with the minimum errors.
6.Summarization – It is process of creating a compact representation of the data set, including report generation
and visualization.
II. Data Mining Algorithms
Algorithm is set of operation, fact and rules. In data mining there is different type of algorithm that is:-
2.1 Association rule mining
The primary sub-issue can be further partitioned into two sub-issues: competitor itemsets generalition
prepare and successive itemsets era process. We call those itemsets whose backing surpass the help edge as
huge or regular itemsets, those itemsets that are normal or have the would like to be expansive or successive are
called competitor itemsets. Different algorithms are proposed for ARM [1]:
2.1.1. AIS Algorithm [20]:
The AIS algorithm was the first calculation proposed for mining affiliation tenets. The calculation
comprises of two stages. The main stage constitutes the era of the incessant item sets. The algorithm utilizes
applicant era to catch the successive item sets. This is trailed by the era of the sure and successive affiliation
leads in the second stage. The primary disadvantage of the AIS algorithm is that it makes numerous disregards
2. Data mining Algorithm’s Variant Analysis
DOI: 10.9790/0661-17153342 www.iosrjournals.org 34 | Page
the database. Besides, it produces and numbers an excess of competitor item sets that end up being little, which
obliges more space and squanders much exertion that ended up being futile.
2.1.2. APRIORI Algorithm [20,23]:
Apriori includes a stage for discovering examples called continuous item sets. A successive items et is
a situated of things seeming together in various database records meeting a client detailed edge. Apriori utilizes
a base up pursuit that lists each and every successive item set. This suggests with a specific end goal to deliver
an incessant item set of length, it must create every last bit of its subsets since they excessively must be visit.
This exponential multifaceted nature in a broad sense limits Apriori-like calculations to finding just short
examples.
2.1.3. FP-TREE Algorithm [20,23]:
FP-tree-based algorithm is to parcel the first database to littler sub-databases by some part cells, and
afterward to mine item sets in these sub-databases. Unless no new item sets can be discovered, the segment is
recursively performed with the development of allotment cells. The FP-tree development takes precisely two
sweeps of the exchange database. The primary output gathers the set of successive things, and the second sweep
develops the FP-tree. Numerous different methodologies have been presented in the middle of with moment
changes. At the same time fundamental among them and which are premise for new upcoming algorithm are
Apriori and FP-tree Algorithm.
2.1.4. SETM Algorithm [23]:
SETM utilizes SQL to discover extensive thing sets. The calculation recalls Tides i.e. exchange Ids of
the exchanges with the applicant thing sets. It utilizes this data rather than subset operation. This methodology
has a detriment that if Ck needs to be sorted. Also additionally if Ck is so extensive there is no option fit in
cradle assigned memory space, the circle is utilized as a part of FIFO methodology. At that point this obliges
two outer sorts.
2.2 Classification
Classification anticipating a certain result focused around a given data. Order is an information mining
capacity that appoints things in a gathering to target classes or classes. The objective of arrangement is to
discover foresee the target class for each one case in the data. The algorithm tries to find connections between
the characteristics that would make it conceivable to anticipate the result.
Classification is an vital information mining strategy with expansive applications. It is utilized to order
everything in a set of information into one of predefined set of classes or gatherings. Arrangement calculation
assumes a critical part in record order. In this examination, we have examined two classifiers specifically
Bayesian and lazy.
Classification—A Two-Step Process
Model construction: Depicting a set of foreordained classes. each tuple/example is expected to fit in with a
predefined class, as dictated by the class name attribute. The set of tuples utilized for model development:
preparing set. The model is spoken to as grouping standards, choice trees or scientific equation.
Model usage: For classify future or obscure articles. Estimate accuracy of the model. The known name of test
sample is contrasted and the ordered result from the model. Precision rate is the rate of test set examples that are
accurately characterized by the model. Test set is free of preparing situated, generally over-fitting will happen.
2.2.1. Algorithm for Bayesian classification:- The Bayesian Classification speaks to a directed learning
strategy and also a measurable system for order. Accept a hidden probabilistic model and it permits us to catch
vulnerability about the model in a principled manner by deciding probabilities of the results. It can take care of
symptomatic and prescient issues. Bayesian grouping gives commonsense learning algorithms and earlier
information and watched information can be joined. Bayesian Classification gives a helpful viewpoint to
understanding and assessing numerous learning calculations [13].
2.2.1.1 Bayesian Network[14,15]:
Bayesian system (BN) is additionally called conviction systems, is a graphical model for likelihood
relationships among a set of variables gimmicks, This BN comprise of two segments. To begin with part is
principally a directed acyclic chart (DAG) in which the hubs in the diagram are known as the irregular variables
and the edges between the hubs or arbitrary variables speaks to the probabilistic dependencies among the
relating irregular variables.second segment is a situated of parameters that portray the restrictive likelihood of
every variable given its parents. bayesian system comprises of a structural model and a set of contingent
3. Data mining Algorithm’s Variant Analysis
DOI: 10.9790/0661-17153342 www.iosrjournals.org 35 | Page
probabilities. The structural model is a coordinated diagram in which hubs speak to properties and circular
segments speak to trait conditions. Property conditions are evaluated by contingent probabilities for every hub
provided for its parents.
2.2.1.2. Naïve Bayes Algorithm [13]:
The Naïve Bayes Classifier procedure is focused around the alleged Bayesian theorem and is especially
suited when the Trees dimensionality of the inputs is high. Regardless of its effortlessness, Naive Bayes can
frequently outflank more modern characterization routines. Figure unequivocal probabilities for speculation,
among the most down to earth methodologies to specific sorts of learning issues
2.2.1.3 Naïve Bayes Updatable [16]:
The name Naivebayes Updatable itself recommends that it is the updatable or enhanced form of
Naivebayes. A default exactness utilized by this classifier when fabricate Classifier is called with zero training
instances is of 0.1 for numeric properties and consequently it otherwise called incremental upgrade.
Naïve Bayes is better than Bayes network because
Naïve Bayes is a quick execution speed algorithm.[17]
The naive Bayes model is immensely engaging in view of its straightforwardness, style, and power. It is one
of the most established formal classification algorithms, but even in its least difficult structure it is regularly
shockingly successful.
Naive Bayes Classifier Introductory Overview: Naive Bayes Classifier procedure is focused around the
alleged Bayesian theorem and is especially suited when the Trees dimensionality of the inputs is high. Regardless
of its straightforwardness Naive Bayes can regularly outflank more modern order technique.
Fig No: 1
To show the idea of Naive Bayes Classification, consider the sample showed in the outline above. As
indicated, the items can be named either GREEN or RED. Our errand is to arrange new cases as they arrive, i.e.,
choose to which class mark they have a place, based on the as of now leaving objects. Since there is twice the
same number of GREEN protests as RED, it is sensible to accept that another case (which hasn't been watched
yet) is twice as liable to have participation GREEN instead of RED. In the Bayesian examination, this
conviction is known as the earlier likelihood. Earlier probabilities are focused around past experience, for this
situation the rate of GREEN and RED items, and frequently used to anticipate results before they really happen.
Algorithm:
Given the obstinate example intricacy for learning Bayesian classifiers, we must search for approaches
to decrease this many-sided quality. The Naive Bayes classifier does this by making a restrictive freedom
presumption that drastically decreases the quantity of parameters to be assessed when demonstrating P(x|y ),
from our unique 2(2n − 1) to only 2n.
Definition: Given arbitrary variables X,y and Z, we say X is restrictively free of Y given Z, if and just if the
likelihood dissemination administering X is autonomous of the estimation of Y given Z; that is
(∀ i, j , k ) P( X=xi/Y=y j ,Z=zk )=P(X=xi/Z=zk )
Example:-
Consider three Boolean random variables to portray the current climate: Rain, Thunder and Lightning. We may
sensibly attest that Thunder is independent of Rain given Lightning. Since we know Lightning reasons Thunder,
once we know whether there is Lightning, no extra data about Thunder is given by the estimation of Rain.
Obviously there is an agreeable reliance of Thunder is given by the estimation of Rain. Obviously there is an
acceptable reliance of Thunder on Rain when all is said in done; however there is no restrictive reliance once we
know the benefit of Lightning.
4. Data mining Algorithm’s Variant Analysis
DOI: 10.9790/0661-17153342 www.iosrjournals.org 36 | Page
2.2.2. Algorithm for Lazy Classification:- Lazy learners store the training instances and do no genuine work
until classification time. Lazy learning is a learning system in which speculation past the preparation information
is deferred until a question is made to the framework where the framework tries to sum up the preparation
information before getting inquiries.
2.2.2.1 IBL (Instance Based Learning):-
Instances based learning systems, for example, closest neighbor and mainly weighted relapse are
adroitly clear methodologies to approximating genuine esteemed or discrete-esteemed target capacities.
Example based techniques can likewise utilize more unpredictable, typical representations for occurrences.
2.2.2.2 IBK (K-Nearest Neighbor):-
IBK is a k-nearest neighbor classifier that uses the same separation metric. The quantity of closest
neighbors can be defined expressly in the article editorial manager or decided naturally utilizing abandon one-
out cross-approval center to a furthest utmost given by the tagged quality
2.2.2.3 K star:-
The K* algorithm can be characterized as a technique for cluster investigation which chiefly goes for
the parcel of „n‟ perception into „k‟ clusters in which every perception fits in with the group with the nearest
mean.
KNN is better than other lazy technique:
knn is straightforward and simple to actualize characterization system furthermore utilized for multi
classes [19].
K-NN Algorithm Introduce:-K-NN is a lazy learning method focused around voting and separations of
the k closest neighbors. K-Nearest Neighbor (KNN) calculation is a standout amongst the most well known
learning calculations in information mining. The K-NN calculation for persistent esteemed target capacities
Calculate the mean estimations of the k closest neighbors utilizing Euclidean distance.
Required three think:-
The set of store record.
Distance metric to compute the distance between record.
The value of k, the number of nearest neighbors to retrieve.
2.3 Clustering
Cluster analysis was started in humanities by Driver and Kroeber in 1932 and acquainted with brain
science by Zubin in 1938 and Robert Tryon in 1939and broadly utilized by Cattell starting as a part of 1943 for
quality hypothesis arrangement in identity brain research. Cluster analysis[1] gatherings objects (perceptions,
events)based on the data found in the information depicting the items or their connections. Cluster is a gathering
of questions that have a place with the same class. The objective is that the items in a gathering will be
comparable (or related) to one other and not quite the same as (or random to) the articles in different gatherings.
Clustering is a methodology of apportioning a set of information (or articles) into a set of significant sub-
classes, called clusters. A decent clustering strategy will create amazing clusters in which:
• the intra-class (that is, intra-bunch) likeness is high.
• the between class similitude is low.
The quality of a clustering result additionally relies on upon both the similitude measure utilized by the
system and its usage. The quality of a clustering system is likewise measured by its capacity to find some or the
majority of the shrouded examples. Then again, target assessment is hazardous: generally done by human/
master investing.
A. DBSCAN Clustering Algorithm
Density based clustering algorithm is one of the essential techniques for grouping in information
mining. DBSCAN (for thickness based spatial clustering of uses with commotion) is an information clustering
calculation proposed by Martin Ester, Hans-Peter Kriegel, Jorge Sander and Xiaowei Xu in 1996 The clusters
which are structured focused around the thickness are straightforward and it doesn't restrict itself to the states of
clusters. The density based gathering of clustering calculations speak to an information set in the same way as
dividing methods; converting an example to a point utilizing the information traits of the source set. A standout
amongst the most remarkable density based clustering calculations is the Dbscan[9]
DBSCAN separates data points into three classes:
‟Core points: These are points that are at the interior of a cluster.
5. Data mining Algorithm’s Variant Analysis
DOI: 10.9790/0661-17153342 www.iosrjournals.org 37 | Page
‟Border points: A border point is a point that is not a core point, but it falls within the neighborhood of a core
point.
‟Noise points: A noise point is any point that is not a core point or a border point.
To discover a cluster, DBSCAN begins with a discretionary instance (p) in information set (D) and recovers all
examples of D with admiration to Eps and Min Pts. The calculation makes utilization of a spatial information
structure(r*tree) to place focuses inside Eps separation from the center purposes of the groups [2]. An alternate
density based calculation OPTICS is presented in [3], which is an intuitive clustering calculation, lives up to
expectations by making a requesting of the information set speaking to its density based clustering algorithm.It
did clustering through becoming high thickness zone, and it can discover any shape if clustering(rong et
al.,2004).
The thought of it was:
1. ε-neighbor: the neighbors in ε semi measurement of an article
2. kernal object: specific number (Minp) of neighbors in ε semi measurement
3. To an item set D, if object p is the ε-neighbor of q, and q is bit object, then p can get "immediate thickness
reachable" from q.
4. To a ε, p can get "immediate thickness reachable" from q; D contains Mint objects; if an arrangementP1 ,p2
,..., ,pn,p1=q, pn=p q , then i1 p can get “direct density reachable” from i p ,pi∈D≤ i≤n
5. To ε and MinP, if there exist a object o(o∈D) , p and q can get “direct density reachable” from o, p and q are
density connected.
2.3.2. K-Mean Algorithm
The basic step of k-means clustering is simple. In the beginning we determine number of cluster K and we
assume the centroid or center of these clusters. K-means clustering is a method of cluster analysis which aims
to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean.
Then the K means algorithm will do the three steps below until convergence Iterate until stable (= no object
move group):
1. Determine the centroid coordinate
2. Determine the distance of each object to the centroids
3. Group the object based on minimum distance
2.3.2.1. Algorithmic steps for k-means clustering
Let X = {x1,x2,x3,……..,xn} be the set of data points and V = {v1,v2,…….,vc} be the set of centers.
1) Randomly select „c‟ cluster centers.
2) Calculate the distance between each data point and cluster centers.
3) Assign the data point to the cluster center whose distance from the cluster center is minimum of all the cluster
centers..
4) Recalculate the new cluster center using:
where, „ci‟ represents the number of data points in ith
cluster.
5) Recalculate the distance between each data point and new obtained cluster centers.
6) If no data point was reassigned then stop, otherwise repeat from step 3).
Example:-If our class (decision) attribute is tumor Type and its values are: malignant, benign, etc. - these will
be the classes. They will be represented by cluster1, cluster2, etc. However, the class information is never
provided to the algorithm. The class information can be used later on, to evaluate how accurately the algorithm
classified the objects.
Curvature Texture Blood
Consump
Tumor
Type
X1 0.8 1.2 A Benign
X2 0.75 1.4 B Benign
X3 0.23 0.4 D Malignant
X4 0.23 0.5 D Malignant
6. Data mining Algorithm’s Variant Analysis
DOI: 10.9790/0661-17153342 www.iosrjournals.org 38 | Page
Curvature Texture Blood
Consump
Tumor
Type
X1 0.8 1.2 A Benign
X2 0.75 1.4 B Benign
X3 0.23 0.4 D Malignant
X4 0.23 0.5 D Malignant
Fig No: 2
The way we do that, is by plotting the objects from the database into space. Each attribute is one dimension
Fig No: 3
After all the objects are plotted, we will calculate the distance between them, and the ones that are close to each
other – we will group them together, i.e. place them in the same cluster.
Fig No: 4
2.3.3 Hierarchical algorithm
Hierarchical clustering methods have attracted much attention by giving the user a maximum amount
of flexibility. In data mining, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a
method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering
generally fall into two types:
Agglomerative (bottom up)
1. Start with 1 point (singleton).
2. Recursively add two or more appropriate clusters.
3. Stop when k number of clusters is achieved.
Divisive (top down)
1. Start with a big cluster.
2. Recursively divides into smaller clusters.
3. Stop when k number of clusters is achieved.
7. Data mining Algorithm’s Variant Analysis
DOI: 10.9790/0661-17153342 www.iosrjournals.org 39 | Page
In the general case, the complexity of agglomerative clustering is , which makes them too slow for large
data sets. Divisive clustering with an exhaustive search is , which is even worse. However, for some
special cases, optimal efficient agglomerative methods (of complexity ) are known: SLINK for single-
linkage and CLINK for complete-linkage clustering.
For example:-
suppose this data is to be clustered, and the Euclidean distance is the distance matric.Cutting the tree at a given
height will give a partitioning clustering at a selected precision. In this example, cutting after the second row of
the dendrogram will yield clusters {a} {b c} {d e} {f}. Cutting after the third row will yield clusters {a} {b c}
{d e f}, which is a coarser clustering, with a smaller number but larger clusters.
Fig No: 5
Raw data
The hierarchical clustering dendrogram (from Greek dendron "tree" and gramma "drawing") is a tree diagram
frequently used to illustrate the arrangement of the clusters produced by hierarchical clustering. Dendrograms
are often used in computational biology to illustrate the clustering of genes or samples. would be as such:
Fig No: 6
Traditional representation
This method builds the hierarchy from the individual elements by progressively merging clusters. In
our example, we have six elements {a} {b} {c} {d} {e} and {f}. The first step is to determine which elements to
merge in a cluster. Usually, we want to take the two closest elements, according to the chosen distance.
Optionally, one can also construct a distance matrix at this stage, where the number in the i-th row j-th column
is the distance between the i-th and j-th elements. Then, as clustering progresses, rows and columns are merged
as the clusters are merged and the distances updated. This is a common way to implement this type of clustering,
and has the benefit of caching distances between clusters. A simple agglomerative clustering algorithm is
described in the single-linkage clustering page; it can easily be adapted to different types of linkage Suppose we
have merged the two closest elements b and c, we now have the following clusters {a}, {b, c}, {d}, {e} and {f},
8. Data mining Algorithm’s Variant Analysis
DOI: 10.9790/0661-17153342 www.iosrjournals.org 40 | Page
and want to merge them further. To do that, we need to take the distance between {a} and {b c}, and therefore
define the distance between two clusters. Usually the distance between two clusters and is one of the
following:
The maximum distance between elements of each cluster (also called complete-linkage clustering):
Max{d(x,y): x‟A,y‟B}
The minimum distance between elements of each cluster (also called single-linkage clustering):
Min{d(x,y): x‟A, y‟B}
The mean distance between elements of each cluster (also called average linkage clustering, used
e.g.UPGMA):
The sum of all intra-cluster variance.
III. Result Using Weka Tool:-
WEKA
In 1993, the University of Waikato in New Zealand started development of the original version of Weka (which
became a mixture of TCL/TK, C, and Make files). Weka (Waikato Environment for Knowledge Analysis) is a
popular suite of machine learning software written in Java, developed at the University of Waikato, New
Zealand.
Dataset Use:-
1. Monk
2.Titanic
3. iris_discretized
4.CPU
5. lymphography
3.1. Result of apriori Algorithm with monk dataset
Table 1: Monk Problem Dataset Results.
Fitting Contact
Lenses Dataset
Apriori
min sup=0.06 min sup=0.07 min sup=0.08 min sup=0.09
Confmin 0.0666 0.1333 0.1333 0.1333
Confmax 1 1 1 1
Confavg 0.35458 0.42041 0.42041 0.42041
Table 2: No of Rules (Monk Problem Dataset)
Table 3: Time Required in (Milliseconds) Monk Problem Dataset
Fitting Contact
Lenses Dataset
Apriori
min
sup=0.06
min
sup=0.07
min
sup=0.08
min
sup=0.09
Time Required
(milliseconds)
16290±100 2735±50 2735±50 2735±50
3.2 Result of K-NN and Naïve Bayes
Table 4: Evaluation of Naïve Bayes and Lazy Classifiers with Titanic Dataset
Fitting
Contact
Lenses
Dataset
Apriori
min sup=0.06 min sup=0.07 min sup=0.08 min sup=0.09
Total Rules 2682 966 192 172
Algorithm
Correctly
Instance (%)
Incorrectly
Instance (%)
Accuracy
TP Rate
(%)
Recall (%) Precision (%)
F-measure
(%)
Naïve Bayes 77.8283 22.1717 77.8 77.8 77.2 76.4
K-NN 79.055 20.945 79.1 79.1 82.1 75.9
10. Data mining Algorithm’s Variant Analysis
DOI: 10.9790/0661-17153342 www.iosrjournals.org 42 | Page
[17]. XindongWu· Vipin Kumar · J. Ross Quinlan · Joydeep Ghosh · Qiang Yang ·Hiroshi Motoda · Geoffrey J. McLachlan · Angus Ng ·
Bing Liu · Philip S. Yu ·Zhi-Hua Zhou · Michael Steinbach · David J. Hand · Dan Steinberg “Top 10 algorithms in data mining”
Springer-Verlag London Limited 2007, 4 December 2007.
[18]. Hung-Ju Huang and Chun-Nan Hsu, Member, IEEE “Bayesian Classification for Data From the Same Unknown Class” IEEE
Transactions On System , Man and Cybernetics-part B: Cybernetics, Vol.32, N0.2, April2002.
[19]. C. Lakshmi Devasena “Classification Of Multivariate Data Sets Without Missing Values Using Memory Based Classifiers-An
Effectiveness Evaluation” International Journal of Artificial Intelligence & Applications (IJAIA), Vol.4,No.1,January2013.
[20]. R. Agrawal, T. Imielinski, and A. N. Swami, 1993. Mining association rules between sets of items in large databases. ACM
SIGMOD International Conference on Management of Data, Washington, D.C., pp 207-216.
[21]. Jie Gao, Shaojun Li, Feng Qian “Optimization on Association Rules Apriori Algorithm” IEEE Conference, vol 2, pp 5901-5905,
2006.
[22]. Dongme Sun, Shaohua Teng, Wei Zhang, Haibin Zhu “An Algorithm to Improve the Effectiveness of Apriori” IEEE Conference,
pp 385-390, Aug 2007.
[23]. R.Divya and S.Vinod kumar “Survey On AIS, Apriori and FP-Tree algorithms” International Journal of Computer Science and
Management Research vol 1, issue 2, pp 194-200, September 2012..
[24]. Aastha Joshi, Rajneet Kaur,2013. Comparative Study of Various Clustering Techniques in Data Mining, International Journal of
Advanced Research in Computer Science and Software Engineering( ISSN: 2277 128X, Volume 3, Issue .
[25]. Sang Jun Lee, Keng Siau “A review of data mining techniques” Industrial Management and Data Systems, University of Nebraska-
Lincoln Press,USA, pp 41-46,2001
[26]. Peter Robb, Carlos Coronel. Database Systems: Design, Implementation and management. Cengage Learning, 8th
Edition, 2009.
[27]. M.S. Chen, J. Han, and P.Yu “Data mining: an overview from a database perspective‟‟, IEEE Transactions on Knowledge and Data
Engineering, vol. 8, no. 6, pp. 866-883,1996
[28]. U. Fayyad, S.G.Djorgovski and N.Weir “Automating the analysis and cataloging of sky surveys‟‟, Advances in Knowledge
Discovery and Data Mining, MIT Press, Cambridge, MA, pp. 471-94, 1996.
[29]. JaiWei Han ,Jian Pei ,Yiwen Yin & Runying Mao “Mining frequent patterns without candidate generation: A Frequent pattern tree
approach” Data mining and knowledge discovery ,Netherlands, pp 53-87, 2004.
[30]. U. Fayyad, G. Piatetsky-Shapiro and P.Smyth ``From data mining to knowledge discovery: an overview‟‟, Advances in Knowledge
Discovery and Data Mining, MIT Press, Cambridge, MA, 1996.
[31]. M.S. Chen, J. Han, and P.Yu “Data mining: an overview from a database perspective‟‟, IEEE Transactions on Knowledge and Data
Engineering, vol. 8, no. 6, pp. 866-883,1996.
[32]. Sang Jun Lee, Keng Siau “A review of data mining techniques” Industrial Management and Data Systems, University of Nebraska-
Lincoln Press,USA, pp 41-46,2001Peter Robb, Carlos Coronel. Database Systems: Design, Implementation and management.
Cengage Learning, 8th
Edition,2009.
[33]. G.Kesavaraj And Dr.S.Sukumaran “A Comparison Study on Performance Analysis of Data Mining Algorithms in Classification of
Local Area News Dataset using WEKA Tool” International Journal Of Engineering Sciences &Research Technology 2(10),October
2013.
[34]. Aaditya Desai And Dr. Sunil Rai “Analysis of Machine Learning Algorithms using WEKA” International Conference & Workshop
on Recent Trends in Technology, (TCET) 2012 Proceedings published in International Journal of Computer Applications® (IJCA).
[35]. M. Kantardzic, Data Mining - Concepts, Models, Methods, and Algorithms, IEEE Press, Wiley-Interscience, 2003, ISBN 0-471-
22852-4.
[36]. Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, Elsevier 2006, ISBN 1558609016.