Abstract—A new model for online machine learning process of high speed data stream is proposed, to
minimize the severe restrictions associated with the existing computer learning algorithms. Most of the
existing models have three principle steps. In the first step, the system would create a model incrementally.
In the second step the time taken by the examples to complete a prescribed procedure with their arrival
speed is computed. In the third and final step of the model the size of memory required for computation is
predicted in advance. To overcome these restrictions we proposed this new data stream classification
algorithm, where the data can be partitioned into stream of trees. In this algorithm, the new data set can be
updated with the existing tree. This algorithm, called incremental classification tree algorithm, is proved to
be an excellent solution for processing larger data streams. In this paper, we present the experimental
results of our new algorithm and prove that our method would eradicate the problems of the existing
method.
Get best thesis topics in machine learning from Experienced Ph.D. Writers at Techsparks with 100% Plagiarism Free Work & Affordable price. Our goal is to make students free from their assignments burden, by providing the best thesis assistance. For more details call us at-9465330425 or Visit at: https://bit.ly/3zRB3vN
This paper presents a review & performs a comparative evaluation of few known machine learning
algorithms in terms of their suitability & code performance on any given data set of any size. In this paper,
we describe our Machine Learning ToolBox that we have built using python programming language. The
algorithms used in the toolbox consists of supervised classification algorithms such as Naïve Bayes,
Decision Trees, SVM, K-nearest Neighbors and Neural Network (Backpropagation). The algorithms are
tested on iris and diabetes dataset and are compared on the basis of their accuracy under different
conditions. However using our tool one can apply any of the implemented ML algorithms on any dataset of
any size. The main goal of building a toolbox is to provide users with a platform to test their datasets on
different Machine Learning algorithms and use the accuracy results to determine which algorithms fits the
data best. The toolbox allows the user to choose a dataset of his/her choice either in structured or
unstructured form and then can choose the features he/she wants to use for training the machine We have
given our concluding remarks on the performance of implemented algorithms based on experimental
analysis
This Presentation covers Data Mining: Classification and Prediction, NEURAL NETWORK REPRESENTATION, NEURAL NETWORK APPLICATION DEVELOPMENT, BENEFITS AND LIMITATIONS OF NEURAL NETWORKS, Neural Networks, Real Estate Appraiser, Kinds of Data Mining Problems, Data Mining Techniques, Learning in ANN, Elements of ANN, Neural Network Architectures Recurrent Neural Networks and ANN Software.
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
Distributed digital artifacts incorporate cryptographic hash values to URI called trusty URIs in a distributed environment
building good in quality, verifiable and unchangeable web resources to prevent the rising man in the middle attack. The greatest
challenge of a centralized system is that it gives users no possibility to check whether data have been modified and the communication
is limited to a single server. As a solution for this, is the distributed digital artifact system, where resources are distributed among
different domains to enable inter-domain communication. Due to the emerging developments in web, attacks have increased rapidly,
among which man in the middle attack (MIMA) is a serious issue, where user security is at its threat. This work tries to prevent MIMA
to an extent, by providing self reference and trusty URIs even when presented in a distributed environment. Any manipulation to the
data is efficiently identified and any further access to that data is blocked by informing user that the uniform location has been
changed. System uses self-reference to contain trusty URI for each resource, lineage algorithm for generating seed and SHA-512 hash
generation algorithm to ensure security. It is implemented on the semantic web, which is an extension to the world wide web, using
RDF (Resource Description Framework) to identify the resource. Hence the framework was developed to overcome existing
challenges by making the digital artifacts on the semantic web distributed to enable communication between different domains across
the network securely and thereby preventing MIMA.
Get best thesis topics in machine learning from Experienced Ph.D. Writers at Techsparks with 100% Plagiarism Free Work & Affordable price. Our goal is to make students free from their assignments burden, by providing the best thesis assistance. For more details call us at-9465330425 or Visit at: https://bit.ly/3zRB3vN
This paper presents a review & performs a comparative evaluation of few known machine learning
algorithms in terms of their suitability & code performance on any given data set of any size. In this paper,
we describe our Machine Learning ToolBox that we have built using python programming language. The
algorithms used in the toolbox consists of supervised classification algorithms such as Naïve Bayes,
Decision Trees, SVM, K-nearest Neighbors and Neural Network (Backpropagation). The algorithms are
tested on iris and diabetes dataset and are compared on the basis of their accuracy under different
conditions. However using our tool one can apply any of the implemented ML algorithms on any dataset of
any size. The main goal of building a toolbox is to provide users with a platform to test their datasets on
different Machine Learning algorithms and use the accuracy results to determine which algorithms fits the
data best. The toolbox allows the user to choose a dataset of his/her choice either in structured or
unstructured form and then can choose the features he/she wants to use for training the machine We have
given our concluding remarks on the performance of implemented algorithms based on experimental
analysis
This Presentation covers Data Mining: Classification and Prediction, NEURAL NETWORK REPRESENTATION, NEURAL NETWORK APPLICATION DEVELOPMENT, BENEFITS AND LIMITATIONS OF NEURAL NETWORKS, Neural Networks, Real Estate Appraiser, Kinds of Data Mining Problems, Data Mining Techniques, Learning in ANN, Elements of ANN, Neural Network Architectures Recurrent Neural Networks and ANN Software.
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
Distributed digital artifacts incorporate cryptographic hash values to URI called trusty URIs in a distributed environment
building good in quality, verifiable and unchangeable web resources to prevent the rising man in the middle attack. The greatest
challenge of a centralized system is that it gives users no possibility to check whether data have been modified and the communication
is limited to a single server. As a solution for this, is the distributed digital artifact system, where resources are distributed among
different domains to enable inter-domain communication. Due to the emerging developments in web, attacks have increased rapidly,
among which man in the middle attack (MIMA) is a serious issue, where user security is at its threat. This work tries to prevent MIMA
to an extent, by providing self reference and trusty URIs even when presented in a distributed environment. Any manipulation to the
data is efficiently identified and any further access to that data is blocked by informing user that the uniform location has been
changed. System uses self-reference to contain trusty URI for each resource, lineage algorithm for generating seed and SHA-512 hash
generation algorithm to ensure security. It is implemented on the semantic web, which is an extension to the world wide web, using
RDF (Resource Description Framework) to identify the resource. Hence the framework was developed to overcome existing
challenges by making the digital artifacts on the semantic web distributed to enable communication between different domains across
the network securely and thereby preventing MIMA.
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERIJCSEA Journal
Comparison study of algorithms is very much required before implementing them for the needs of any
organization. The comparisons of algorithms are depending on the various parameters such as data
frequency, types of data and relationship among the attributes in a given data set. There are number of
learning and classifications algorithms are used to analyse, learn patterns and categorize data are
available. But the problem is the one to find the best algorithm according to the problem and desired
output. The desired result has always been higher accuracy in predicting future values or events from the
given dataset. Algorithms taken for the comparisons study are Neural net, SVM, Naïve Bayes, BFT and
Decision stump. These top algorithms are most influential data mining algorithms in the research
community. These algorithms have been considered and mostly used in the field of knowledge discovery
and data mining.
A survey of modified support vector machine using particle of swarm optimizat...Editor Jacotech
The main objective of this survey paper is to provide a detailed description of Wireless Sensor Networks with Medium Access Control layer and Routing layer. In the medium access control layer, Event Driven Time Division Multiple Access protocol is studied and in Network layer, two routing protocols Bellman-Ford and Dynamic Source Routing are studied.
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER
Data mining environment produces a large amount of data that need to be analyzed.
Using traditional databases and architectures, it has become difficult to process, manage and analyze
patterns. To gain knowledge about the Big Data a proper architecture should be understood.
Classification is an important data mining technique with broad applications to classify the various
kinds of data used in nearly every field of our life. Classification is used to classify the item
according to the features of the item with respect to the predefined set of classes. This paper put a
light on various classification algorithms including j48, C4.5, Naive Bayes using large dataset.
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
IMAGE CLASSIFICATION USING KNN, RANDOM FOREST AND SVM ALGORITHM ON GLAUCOMA DATASETS AND EXPLAIN THE ACCURACY, SENSITIVITY, AND SPECIFICITY OF EACH AND EVERY ALGORITHMS
Machine learning_ Replicating Human BrainNishant Jain
Slides will make you realize how humans makes decision and following the same pattern how Machines are trained to learn and make decisions. Slides gives an overview of all the steps involved in designing an efficient decision making machine.
Overview of basic concepts related to Data Mining: database, data model, fuzzy sets, information retrieval, data warehouse, dimensional modeling, data cubes, OLAP, machine learning.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
A Study on Machine Learning and Its WorkingIJMTST Journal
Machine learning (ML) is widely popular these days and used in wide variety of domains for prediction of
outcomes. In machine learning lot of algorithms exists for predicting the outcomes. But choosing the right
algorithm according to the domain plays a very important in deciding the performance of the algorithm. This
paper consists of five sections is organized in following way first section deals about collection of data from
various resources, second section deals about data cleaning, third part deals about choosing the correct ML
algorithm, fourth part deals about gaining knowledge from models and final part deals about data
visualization
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERIJCSEA Journal
Comparison study of algorithms is very much required before implementing them for the needs of any
organization. The comparisons of algorithms are depending on the various parameters such as data
frequency, types of data and relationship among the attributes in a given data set. There are number of
learning and classifications algorithms are used to analyse, learn patterns and categorize data are
available. But the problem is the one to find the best algorithm according to the problem and desired
output. The desired result has always been higher accuracy in predicting future values or events from the
given dataset. Algorithms taken for the comparisons study are Neural net, SVM, Naïve Bayes, BFT and
Decision stump. These top algorithms are most influential data mining algorithms in the research
community. These algorithms have been considered and mostly used in the field of knowledge discovery
and data mining.
A survey of modified support vector machine using particle of swarm optimizat...Editor Jacotech
The main objective of this survey paper is to provide a detailed description of Wireless Sensor Networks with Medium Access Control layer and Routing layer. In the medium access control layer, Event Driven Time Division Multiple Access protocol is studied and in Network layer, two routing protocols Bellman-Ford and Dynamic Source Routing are studied.
SURVEY ON CLASSIFICATION ALGORITHMS USING BIG DATASETEditor IJMTER
Data mining environment produces a large amount of data that need to be analyzed.
Using traditional databases and architectures, it has become difficult to process, manage and analyze
patterns. To gain knowledge about the Big Data a proper architecture should be understood.
Classification is an important data mining technique with broad applications to classify the various
kinds of data used in nearly every field of our life. Classification is used to classify the item
according to the features of the item with respect to the predefined set of classes. This paper put a
light on various classification algorithms including j48, C4.5, Naive Bayes using large dataset.
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
IMAGE CLASSIFICATION USING KNN, RANDOM FOREST AND SVM ALGORITHM ON GLAUCOMA DATASETS AND EXPLAIN THE ACCURACY, SENSITIVITY, AND SPECIFICITY OF EACH AND EVERY ALGORITHMS
Machine learning_ Replicating Human BrainNishant Jain
Slides will make you realize how humans makes decision and following the same pattern how Machines are trained to learn and make decisions. Slides gives an overview of all the steps involved in designing an efficient decision making machine.
Overview of basic concepts related to Data Mining: database, data model, fuzzy sets, information retrieval, data warehouse, dimensional modeling, data cubes, OLAP, machine learning.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
A Study on Machine Learning and Its WorkingIJMTST Journal
Machine learning (ML) is widely popular these days and used in wide variety of domains for prediction of
outcomes. In machine learning lot of algorithms exists for predicting the outcomes. But choosing the right
algorithm according to the domain plays a very important in deciding the performance of the algorithm. This
paper consists of five sections is organized in following way first section deals about collection of data from
various resources, second section deals about data cleaning, third part deals about choosing the correct ML
algorithm, fourth part deals about gaining knowledge from models and final part deals about data
visualization
Here are some stone-cold facts that define the LGBT community. The community is composed of people who feel that their gender and sexuality are different from that of mainstream society. It is most important to realize that all people are extremely complex, and respecting diversity is of the utmost importance. The LGBT community has no clear boundaries and is being redefined every day. Terms are changing and definition are constantly evolving. The “secret” to working with this “uniquely perfect” population is to engage in preparatory empathy in order to avoid missing the whole person and their needs.
Data mining and machine learning have become a vital part of crime detection and prevention. In this
research, we use WEKA, an open source data mining software, to conduct a comparative study between the
violent crime patterns from the Communities and Crime Unnormalized Dataset provided by the University
of California-Irvine repository and actual crime statistical data for the state of Mississippi that has been
provided by neighborhoodscout.com. We implemented the Linear Regression, Additive Regression, and
Decision Stump algorithms using the same finite set of features, on the Communities and Crime Dataset.
Overall, the linear regression algorithm performed the best among the three selected algorithms. The scope
of this project is to prove how effective and accurate the machine learning algorithms used in data mining
analysis can be at predicting violent crime patterns.
Identifying and classifying unknown Network Disruptionjagan477830
Since the evolution of modern technology and with the drastic increase in the scale of network communication more and more network disruptions in traffic and private protocols have been taking place. Identifying and classifying the unknown network disruptions can provide support and even help to maintain the backup systems.
Novel Ensemble Tree for Fast Prediction on Data StreamsIJERA Editor
Data Streams are sequential set of data records. When data appears at highest speed and constantly, so predicting
the class accordingly to the time is very essential. Currently Ensemble modeling techniques are growing
speedily in Classification of Data Stream. Ensemble learning will be accepted since its benefit to manage huge
amount of data stream, means it will manage the data in a large size and also it will be able to manage concept
drifting. Prior learning, mostly focused on accuracy of ensemble model, prediction efficiency has not considered
much since existing ensemble model predicts in linear time, which is enough for small applications and
accessible models workings on integrating some of the classifier. Although real time application has huge
amount of data stream so we required base classifier to recognize dissimilar model and make a high grade
ensemble model. To fix these challenges we developed Ensemble tree which is height balanced tree indexing
structure of base classifier for quick prediction on data streams by ensemble modeling techniques. Ensemble
Tree manages ensembles as geodatabases and it utilizes R tree similar to structure to achieve sub linear time
complexity
Concept Drift Identification using Classifier Ensemble Approach IJECEIAES
Abstract:-In Internetworking system, the huge amount of data is scattered, generated and processed over the network. The data mining techniques are used to discover the unknown pattern from the underlying data. A traditional classification model is used to classify the data based on past labelled data. However in many current applications, data is increasing in size with fluctuating patterns. Due to this new feature may arrive in the data. It is present in many applications like sensornetwork, banking and telecommunication systems, financial domain, Electricity usage and prices based on its demand and supplyetc .Thus change in data distribution reduces the accuracy of classifying the data. It may discover some patterns as frequent while other patterns tend to disappear and wrongly classify. To mine such data distribution, traditionalclassification techniques may not be suitable as the distribution generating the items can change over time so data from the past may become irrelevant or even false for the current prediction. For handlingsuch varying pattern of data, concept drift mining approach is used to improve the accuracy of classification techniques. In this paper we have proposed ensemble approach for improving the accuracy of classifier. The ensemble classifier is applied on 3 different data sets. We investigated different features for the different chunk of data which is further given to ensemble classifier. We observed the proposed approach improves the accuracy of classifier for different chunks of data.
Incremental learning from unbalanced data with concept class, concept drift a...IJDKP
Recently, stream data mining applications has drawn vital attention from several research communities.
Stream data is continuous form of data which is distinguished by its online nature. Traditionally, machine
learning area has been developing learning algorithms that have certain assumptions on underlying
distribution of data such as data should have predetermined distribution. Such constraints on the problem
domain lead the way for development of smart learning algorithms performance is theoretically verifiable.
Real-word situations are different than this restricted model. Applications usually suffers from problems
such as unbalanced data distribution. Additionally, data picked from non-stationary environments are also
usual in real world applications, resulting in the “concept drift” which is related with data stream
examples. These issues have been separately addressed by the researchers, also, it is observed that joint
problem of class imbalance and concept drift has got relatively little research. If the final objective of
clever machine learning techniques is to be able to address a broad spectrum of real world applications,
then the necessity for a universal framework for learning from and tailoring (adapting) to, environment
where drift in concepts may occur and unbalanced data distribution is present can be hardly exaggerated.
In this paper, we first present an overview of issues that are observed in stream data mining scenarios,
followed by a complete review of recent research in dealing with each of the issue.
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...ijdpsjournal
The paper aims at proposing a solution for designing and developing a seamless automation and
integration of machine learning capabilities for Big Data with the following requirements: 1) the ability to
seamlessly handle and scale very large amount of unstructured and structured data from diversified and
heterogeneous sources; 2) the ability to systematically determine the steps and procedures needed for
analyzing Big Data datasets based on data characteristics, domain expert inputs, and data pre-processing
component; 3) the ability to automatically select the most appropriate libraries and tools to compute and
accelerate the machine learning computations; and 4) the ability to perform Big Data analytics with high
learning performance, but with minimal human intervention and supervision. The whole focus is to provide
a seamless automated and integrated solution which can be effectively used to analyze Big Data with highfrequency
and high-dimensional features from different types of data characteristics and different
application problem domains, with high accuracy, robustness, and scalability. This paper highlights the
research methodologies and research activities that we propose to be conducted by the Big Data
researchers and practitioners in order to develop and support seamless automation and integration of
machine learning capabilities for Big Data analytics.
SEAMLESS AUTOMATION AND INTEGRATION OF MACHINE LEARNING CAPABILITIES FOR BIG ...ijdpsjournal
The paper aims at proposing a solution for designing and developing a seamless automation and integration of machine learning capabilities for Big Data with the following requirements: 1) the ability to seamlessly handle and scale very large amount of unstructured and structured data from diversified and heterogeneous sources; 2) the ability to systematically determine the steps and procedures needed for
analyzing Big Data datasets based on data characteristics, domain expert inputs, and data pre-processing component; 3) the ability to automatically select the most appropriate libraries and tools to compute and accelerate the machine learning computations; and 4) the ability to perform Big Data analytics with high learning performance, but with minimal human intervention and supervision. The whole focus is to provide
a seamless automated and integrated solution which can be effectively used to analyze Big Data with highfrequency
and high-dimensional features from different types of data characteristics and different application problem domains, with high accuracy, robustness, and scalability. This paper highlights the research methodologies and research activities that we propose to be conducted by the Big Data researchers and practitioners in order to develop and support seamless automation and integration of machine learning capabilities for Big Data analytics.
Analysis on different Data mining Techniques and algorithms used in IOTIJERA Editor
In this paper, we discusses about five functionalities of data mining in IOT that affects the performance and that
are: Data anomaly detection, Data clustering, Data classification, feature selection, time series prediction. Some
important algorithm has also been reviewed here of each functionalities that show advantages and limitations as
well as some new algorithm that are in research direction. Here we had represent knowledge view of data
mining in IOT.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
Apriori is one of the key algorithms to generate frequent itemsets. Analysing frequent itemset is a crucial
step in analysing structured data and in finding association relationship between items. This stands as an
elementary foundation to supervised learning, which encompasses classifier and feature extraction
methods. Applying this algorithm is crucial to understand the behaviour of structured data. Most of the
structured data in scientific domain are voluminous. Processing such kind of data requires state of the art
computing machines. Setting up such an infrastructure is expensive. Hence a distributed environment
such as a clustered setup is employed for tackling such scenarios. Apache Hadoop distribution is one of
the cluster frameworks in distributed environment that helps by distributing voluminous data across a
number of nodes in the framework. This paper focuses on map/reduce design and implementation of
Apriori algorithm for structured data analysis.
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Editor IJCATR
Nowadays, the automotive industry has attracted the attention of consumers, and product quality is considered as an
essential element in current competitive markets. Security and comfort are the main criteria and parameters of selecting a car.
Therefore, standard dataset of CAR involving six features and characteristics and 1728 instances have been used. In this paper, it
has been tried to select a car with the best characteristics by using intelligent algorithms (Random Forest, J48, SVM,
NaiveBayse) and combining these algorithms with aggregated classifiers such as Bagging and AdaBoostMI. In this study, speed
and accuracy of intelligent algorithms in identifying the best car have been taken into account.
In the recent years the scope of data mining has evolved into an active area of research because of the previously unknown and interesting knowledge from very large database collection. The data mining is applied on a variety of applications in multiple domains like in business, IT and many more sectors. In Data Mining the major problem which receives great attention by the community is the classification of the data. The classification of data should be such that it could be they can be easily verified and should be easily interpreted by the humans. In this paper we would be studying various data mining techniques so that we can find few combinations for enhancing the hybrid technique which would be having multiple techniques involved so enhance the usability of the application. We would be studying CHARM Algorithm, CM-SPAM Algorithm, Apriori Algorithm, MOPNAR Algorithm and the Top K Rules.
In a Power plant with a Distributed Control System ( DCS ), process parameters are continuously stored in
databases at discrete intervals. The data contained in these databases may not appear to contain valuable
relational information but practically such a relation exists. The large number of process parameter values
are changing with time in a Power Plant. These parameters are part of rules framed by domain experts for
the expert system. With the changes in parameters there is a quite high possibility to form new rules using
the dynamics of the process itself. We present an efficient algorithm that generates all significant rules
based on the real data. The association based algorithms were compared and the best suited algorithm for
this process application was selected. The application for the Learning system is studied in a Power Plant
domain. The SCADA interface was developed to acquire online plant data.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
EMPIRICAL APPLICATION OF SIMULATED ANNEALING USING OBJECT-ORIENTED METRICS TO...ijcsa
The work is about using Simulated Annealing Algorithm for the effort estimation model parameter
optimization which can lead to the reduction in the difference in actual and estimated effort used in model
development.
The model has been tested using OOP’s dataset, obtained from NASA for research purpose.The data set
based model equation parameters have been found that consists of two independent variables, viz. Lines of
Code (LOC) along with one more attribute as a dependent variable related to software development effort
(DE). The results have been compared with the earlier work done by the author on Artificial Neural
Network (ANN) and Adaptive Neuro Fuzzy Inference System (ANFIS) and it has been observed that the
developed SA based model is more capable to provide better estimation of software development effort than
ANN and ANFIS
Feature selection in high-dimensional datasets is
considered to be a complex and time-consuming problem. To
enhance the accuracy of classification and reduce the execution
time, Parallel Evolutionary Algorithms (PEAs) can be used. In
this paper, we make a review for the most recent works which
handle the use of PEAs for feature selection in large datasets.
We have classified the algorithms in these papers into four main
classes (Genetic Algorithms (GA), Particle Swarm Optimization
(PSO), Scattered Search (SS), and Ant Colony Optimization
(ACO)). The accuracy is adopted as a measure to compare the
efficiency of these PEAs. It is noticeable that the Parallel Genetic
Algorithms (PGAs) are the most suitable algorithms for feature
selection in large datasets; since they achieve the highest accuracy.
On the other hand, we found that the Parallel ACO is timeconsuming
and less accurate comparing with other PEA.
Selecting the Right Type of Algorithm for Various Applications - PhdassistancePhD Assistance
Machine learning algorithms may be classified mainly into three main types. Supervised learning constructs a mathematical model from the training data, including input and output labels. The techniques of data categorization and regression are deemed supervised learning. In unsupervised learning, the system constructs a model using just the input characteristics but no output labeling. The classifiers are then trained to search the dataset for a specific pattern.
Learn More:https://bit.ly/3sX9xuQ
Contact Us:
Website: https://www.phdassistance.com/
UK: +44 7537144372
India No:+91-9176966446
Email: info@phdassistance.com
Similar to EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH SPEED DATA STREAMS (20)
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH SPEED DATA STREAMS
1. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016
DOI:10.5121/mlaij.2016.3302 15
EVALUATION OF A NEW INCREMENTAL
CLASSIFICATION TREE ALGORITHM FOR MINING
HIGH SPEED DATA STREAMS
N. Sivakumar1
and S. Anbu2
1
Research Scholar, Department of Computer Science and Engineering, St. Peter’s
University, Avadi, Chennai, Tamilnadu, India.
2
Professor, Department of Computer Science and Engineering, St. Peter’s College of
Engineering and Technology, Avadi, Chennai, Tamilnadu, India.
ABSTRACT
Abstract—A new model for online machine learning process of high speed data stream is proposed, to
minimize the severe restrictions associated with the existing computer learning algorithms. Most of the
existing models have three principle steps. In the first step, the system would create a model incrementally.
In the second step the time taken by the examples to complete a prescribed procedure with their arrival
speed is computed. In the third and final step of the model the size of memory required for computation is
predicted in advance. To overcome these restrictions we proposed this new data stream classification
algorithm, where the data can be partitioned into stream of trees. In this algorithm, the new data set can be
updated with the existing tree. This algorithm, called incremental classification tree algorithm, is proved to
be an excellent solution for processing larger data streams. In this paper, we present the experimental
results of our new algorithm and prove that our method would eradicate the problems of the existing
method.
KEYWORDS
Decision tree, Data mining, Machine learning, incremental decision tree, classification tree, data streams,
High speed data streams.
1. INTRODUCTION
Almost of the current computer applications are so designed to produce huge volume of diverse
data. Analysis and reorganization of such data require huge computer memory both RAM and
ROM than available. At present, new machine learning algorithms are being designed and
developed so as to overcome these problems. Even though the incremental learning algorithms
are effective in saving time and memory, they cannot be used to process larger data sets.
Recently, the researchers are concentrating on development of new algorithms, which are suitable
for processing such large data sets. The algorithms are so designed that they enable computer to
learn from a single pass, require less memory usage and update incrementally at a later stage.
Further, they should perform the data mining task at any time.
High speed data stream can be finite or infinite. A finite data stream can also be large one but
come from finite source. In single pass method, the algorithm scan the data stream, linearly with
size of the data. This leads to faster mining of large databases than multiple pass at hand.
2. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016
16
In contrast to finite dataset the infinite dataset, come from known but endless sources, which
comprise of continuous data. The algorithm to process infinite dataset must be able to read the
data quickly. If the algorithm is slow in processing the data that would result in loss of vital data.
1.1. Data Stream Classification problem
The classification of data stream is one of the important areas in the in the data mining.
Therefore, the effective algorithms are required to take the data from temporary location into
permanent record.
There are two typical approaches in the supervised machine learning: 1. Classification: The
classification is mainly relevant with class attribute as dependent variables; this can be divided
into two levels: building and testing. The building model level could be used to estimate an
output from learning algorithm and the testing model level estimate the quality of building model
level.2. Regression: Regression is relevant with numerical attributes as its output.
The different methods such as neural networks, decision tree, rule based etc are used for the
classification. These methods are contrived to build classification model where distinct passes on
the stored data is possible.
2. LITERATURE SURVEY ON CLASSIFICATION METHODS
2.1. Ensemble based classification
This method is based on classic classification algorithms such as Naïve Bayesian classifier. It is
also called decision tree model, which increase the quality of data in the output.[2]
2.2. VFDT (Very Fast Decision Tree):
VFDT splits the tree using the current best attribute that satisfies the Hoeffding bound. It has
some drawbacks like it ties the attributes and tree grows out of memory.[1]
2.3. On-Demand classification
The mechanism of on-demand classification method consists of two principle components. The
first component stores the summarized data the input data streams frequently. The second one
continuously uses the summary statistics to execute the classification. The main motivation factor
of this method is that it can be defined over a time possibility which depends on the nature of the
concept drift and data progress.[3],[4].
2.4. ANNCAD Algorithm
The Adaptive Nearest Neighbor Classification Algorithm for Data stream (ANNCAD) is yet
another classification algorithm is used currently. This algorithm is used for multiple decision data
representation. The classification starts with seeking the records from nearest neighbor at better
levels.[5]
3. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016
17
3. THEORY AND MOTIVATION
3.1. Theory
Huge data: Aims at executing large volume of data where the data is recorded and labeled. The
algorithm constructs incremental based classification tree.
Single pass: The data is loaded and computed only once in the large database. This can be
applied on big data, even for unsecured problems
Execution Process: This can be calculated in this following ways
Weak Data: Incomplete data, missing data, noisy data and data with concept drift are considered
weak data.
Classifier: Incremental decision tree applies a test and train operation. In this case, if any new
data arrives that will travel from root to leaf. At the time of travelling, the node is trained
incrementally.
3.2. Motivation
The incremental process does not require loading full data in one go; instead, it only required
some part of data to train the decision tree. When new data is arrived the count is updated and
tests weather the data satisfies the required conditions. If all the conditions are satisfied the tree
gets updated incrementally.
4. DESIGNING OF NEW PROPOSED MODEL
The Figure 1 depicts schematic representation of the proposed model where the system re-builds
the classification model with the most recent data. By using the error rate as guide to new data
sets the frequency of model building and the memory size is adjusted over time.
In the figure 1 the high speed data is split into data sets using info-fuzzy techniques for
classification tree model. It uses information theory to calculate the memory size. The main idea
to change the memory size is based on the classification error rate. These are described in the
section VI.
Each level of the tree represents only one attribute except the root node layer. The nodes
represent different values of the attribute.
4. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016
18
High speed
data
Build
incremental
classification tree
Model
stability
Update the tree
size and time
Decrease the tree
size & Time
The process of constructing tree is determined if the split of an attribute would decrease the
entropy or not. The measure used is mutual conditional information that assesses the dependency
between the current input attribute under examination and the output attribute.
At each stage the algorithm chooses an attribute with the maximum mutual information and adds
a layer. Each node represents different value of this attribute. The iteration stops when there is no
increase in the common information for any of the remaining attributes that have not been
measured in the tree.
In Figure 1 the ‘model stability’ takes different decisions based on input data type. The drift data
decrease the value of the tree while stable data is added incrementally upon arrival of each dataset
of the stream.
In Figure 1 the ‘model stability’ takes the decision based on drift data and stable data.
Figure 1: Proposed model
4.1. Advantages of our algorithm
• It makes liberal measure of data than Hoeffding bound used in VFDT.
• We can change the memory size of the model according to the classification error rate.
• Increase in the error rate indicates a high probability of a concept drift. The window size
changes according to the value of this increase.
• The algorithm is ready to gain the knowledge for additional data or recent data.
• To train an existing classifier, the algorithm do not access the initial data again and again
• Keep up the knowledge which it has previously learned
5. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016
19
• Arrival of new data is possible, while at the same time it greets and accommodates new
data that is introduced over the period of time.
• It can learn from streamed data and undergo updating incrementally to get the advantages
of newly arrived data.
5. AN INCREMENTAL CLASSIFICATION TREE ALGORITHM
Algorithm
Declare
Respond node (or leaf node)
Current node
Decision node (or non-leaf node)
Tree, subtree, root, node
Attribute; Xnew, Xold
Begin
Step 1: For each node; update +ve and –Ve values
Step 2:If the node +ve then respond node as‘+’ else‘-‘
Otherwise
a. If current node respond node then
Response node decision node
Otherwise
b. If decision node attribute
i. Make attribute value as low
ii. Dispose below the decision node tree
c. Update all the decision node from current node
Step 3: Grow the branch if needed
Step 4: If the tree empty then expand the tree
Otherwise
Step 5: If the tree un expandable then do
i. add the instances
ii. expand it one level by testing
iii. update the –ve instances during training
Step 6: If current node has lowest value then
i. Restructure the lowest value as root
ii. Recursively update the current node
Step 7: If attribute Xnew is exist in root then
stop
Otherwise
i. Recursively pull Xnew to root
ii. Transpose Xnew as root and Xold as subtree
End
6. AN ERROR REDUCTION PROCESS
The overall error rate of the process is calculated by measuring the difference between the error
rate during the training at one hand and the error rate during the model validation at the other
hand. The following errors are calculated.
1. Cross validation error
2. Generalization error
6. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016
20
6.1. Cross validation error
We take all the predicted errors from all m stages, we add them and that helps us to calculate the
cross validation.
Let the m parts be C1, C2, Cm. where Cm denotes number of observations in m. The nm is number
of observations in part m. If N is multiple of m then
Where MSE is defined as follows
Where
• is used for observation i, obtained from the data.
• The MSE is obtained by fitting the value to the K-1and we calculate the error (MSE).
• It is a weighted average formula with nmn because each of the bends might not be of same
size.
The cross-validation error is an average standard error and it gives us the validation estimate.
6.2. Generalization error
This calculates the accuracy an algorithm to predict outcome value for formerly unseen data. This
generally minimizes the calculation time and avoids overfit. Overfitting occurs in case of
complex datasets, with too many parameters to capture. Generalization error can be calculated
using the following equation.
Where G is generalization error, is classifier, I is the indicator, (x,y) is independently
distributed according to the P(x,y).
If the value of generalization error is high, the prediction of the tree is expected to be uncertain.
However, having low error does not guarantee good prediction for the given incoming data.
Generally this error is an optimistic estimate of the predictive error on new data.
7. EXPERIMENTAL RESULTS
The experimental results given below were obtained using MatLab software and using a
computer with high-end system configuration. All examples were used for testing the model
before train them. The learning algorithm presented in this paper is proved to be better. Our
algorithm can be controlled to execute the maximum tree size and maximum number of data sets
7. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016
21
could be allowed in our model. Our experiments furnished good tree classification accuracy,
better learning speed and low memory usage for the same data stream as depicted in Figure 5.
7.1. Datasets
For our experiment we used two data sets namely ionosphere and fisheriris. These datasets are
composed of more than 20000 examples as described below. In our experiments we used only the
part of examples of the original data sets to perform our experiments fast.
7.2. Creating an Incremental classification tree
Figure 2 shows the graphic description of a classification tree.
Figure 2: Creating an incremental classification tree
In Figure 2, the tree is designed to classify data based on 2 predictors; they are x1 and x2. The
prediction begins at the top node, represented by a triangle (∆). If the first decision x1 is less than
0.5 the tree follows left side branch as a result the tree classifies the data as type 0. If, x1 is
greater than 0.5, then the right side branch is followed to the lower-right triangle node.
Subsequently, the tree checks if value of x2 is less than 0.5. If so, it follows again the left side
branch and the tree sorts out the data as type 0. If not, then follow the right side branch and it
sorts out the data as type 1.
7.3. Select an appropriate Tree Depth.
Figure 3 shows the cross authorized classification trees for the known datasets with minimum leaf
occupation. This example demonstrates the way to control and select the depth of a decision tree.
The best leaf size lies between 20 and 50 observations per leaf.
8. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016
22
Figure 3: Tree depth diagram
7.4. Comparison between default tree and optimal tree
Figure 4: Default tree
The default tree and optimum tree are depicted in Figure 4 and Figure 5 respectively. The near-
optimal tree has at least 40 observations for each leaf, while the default tree has 10 observations
for each parent node and 1 observation for each leaf node. A comparison between default tree and
near-optimal tree shows that the default tree is larger in size than the near-optimal tree and it
gives different accuracy than the cross validated error rate.
9. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016
23
Figure 5: Near-Optimal tree
It may also be noted that the near-optimal tree (Figure 5) is much smaller and gives a much
higher generalization error. Yet it gives similar accuracy for cross-validated data (Figure 3).
8. CONCLUSION
In this paper the incremental classification tree algorithm has been demonstrated to be an
important and novel method to mine the high speed streamed data and validate the new data thus
predicted. Also, performance of our new algorithm was compared with an existing popular
algorithm. The results are explained in detail in section 7. The results obtained revealed that our
algorithm performs much faster than the existing algorithm.
REFERENCES
[1]. Domingo’s P., and Hulten G. ‘Mining high-speed data streams’, in Proc. of 6th ACM SIGKDD
International conference on Knowledge Discovery and Data Mining (KDD’00), ACM, New York,
NY, USA, pp. 71-80, 2000.
[2]. Lazaridis I., and Mehrotra S, ‘Capturing sensor-generated time series with quality guarantees’. In
Proceedings of the 19th
International Conference on Data Engineering. 2003.
[3]. AslamJ., Butler Z., Constantin F., CrespiV., CybenkoG. and RusD., ‘Tracking a Moving Object
with a Binary Sensor Network’. In Proceedings of ACM SenSy, 2003.
[4]. Babcock B., and Olston C. ‘Distributed top-k monitoring’. In Proceedings of the ACM SIGMOD
International Conference on Management of Data, 2003.
[5]. Heinzelman,W.R, Chandrakasan.A & Balakrishnan.H ‘Energy-Efficient Communication
Protocol for Wireless Micro sensor Networks’. In Proceedings of the 33rd Hawaii Intl. Conf. on
System Sciences, Volume 8, 2000.
[6]. Hang Yang, ‘Solving Problems of Imperfect Data Streams by Incremental Decision Trees’, journal
of emerging technologies in web intelligence, vol. 5, no. 3, August 2013.
10. Machine Learning and Applications: An International Journal (MLAIJ) Vol.3, No.3, September 2016
24
[7]. Prerana Gupta, Amit Thakkar, Amit Ganatra, ‘Comprehensive study on techniques of Incremental
learning with decision trees for streamed data’, Journal of emerging technologies in web
intelligence, vol. 5, NO. 3, August 2013.
[8]. Pallavi Kulkarni and Roshani Ade, ‘Incremental learning from Unbalanced data with concept
class, Concept drift and missing features: A REVIEW’, International Journal of Data Mining &
Knowledge Management Process, Vol.4, No.6, November 2014.
[9]. Junhui Wang and Xiaotong Shen, ‘Estimation of Generalization error: Random and Fixed inputs’,
Statistica Sinica 16, 569-588, 2006.
[10]. Ponkiya Parita, Purnima Singh, ‘A Review on Tree Based Incremental Classification’, Ponkiya
Parita et al, International Journal of Computer Science and Information Technologies, Vol. 5 (1) ,
306-309, 2014.