In this paper fuzzy VRPTW with an uncertain travel time is considered. Credibility theory is used to model
the problem and specifies a preference index at which it is desired that the travel times to reach the
customers fall into their time windows. We propose the integration of fuzzy and ant colony system based
evolutionary algorithm to solve the problem while preserving the constraints. Computational results for
certain benchmark problems having short and long time horizons are presented to show the effectiveness of
the algorithm. Comparison between different preferences indexes have been obtained to help the user in
making suitable decisions
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERijnlc
We aim to model an adaptive log file parser. As the content of log files often evolves over time, we established a dynamic statistical model which learns and adapts processing and parsing rules. First, we limit the amount of unstructured text by clustering based on semantics of log file lines. Next, we only take the most relevant cluster into account and focus only on those frequent patterns which lead to the desired output table similar to Vaarandi [10]. Furthermore, we transform the found frequent patterns and the output stating the parsed table into a Hidden Markov Model (HMM). We use this HMM as a specific, however, flexible representation of a pattern for log file parsing to maintain high quality output. After training our model on one system type and applying it to a different system with slightly different log file patterns, we achieve an accuracy over 99.99%
Dimensionality Reduction Techniques for Document Clustering- A SurveyIJTET Journal
Abstract— Dimensionality reduction technique is applied to get rid of the inessential terms like redundant and noisy terms in documents. In this paper a systematic study is conducted for seven dimensionality reduction methods such as Latent Semantic Indexing (LSI), Random Projection (RP), Principle Component Analysis (PCA) and CUR decomposition, Latent Dirichlet Allocation(LDA), Singular value decomposition (SVD). Linear Discriminant Analysis(LDA)
TOWARDS MORE ACCURATE CLUSTERING METHOD BY USING DYNAMIC TIME WARPINGijdkp
An intrinsic problem of classifiers based on machine learning (ML) methods is that their learning time
grows as the size and complexity of the training dataset increases. For this reason, it is important to have
efficient computational methods and algorithms that can be applied on large datasets, such that it is still
possible to complete the machine learning tasks in reasonable time. In this context, we present in this paper
a more accurate simple process to speed up ML methods. An unsupervised clustering algorithm is
combined with Expectation, Maximization (EM) algorithm to develop an efficient Hidden Markov Model
(HMM) training. The idea of the proposed process consists of two steps. In the first step, training instances
with similar inputs are clustered and a weight factor which represents the frequency of these instances is
assigned to each representative cluster. Dynamic Time Warping technique is used as a dissimilarity
function to cluster similar examples. In the second step, all formulas in the classical HMM training
algorithm (EM) associated with the number of training instances are modified to include the weight factor
in appropriate terms. This process significantly accelerates HMM training while maintaining the same
initial, transition and emission probabilities matrixes as those obtained with the classical HMM training
algorithm. Accordingly, the classification accuracy is preserved. Depending on the size of the training set,
speedups of up to 2200 times is possible when the size is about 100.000 instances. The proposed approach
is not limited to training HMMs, but it can be employed for a large variety of MLs methods.
Soft Computing Techniques Based Image Classification using Support Vector Mac...ijtsrd
n this paper we compare different kernel had been developed for support vector machine based time series classification. Despite the better presentation of Support Vector Machine SVM on many concrete classification problems, the algorithm is not directly applicable to multi dimensional routes having different measurements. Training support vector machines SVM with indefinite kernels has just fascinated consideration in the machine learning public. This is moderately due to the fact that many similarity functions that arise in practice are not symmetric positive semidefinite. In this paper, by spreading the Gaussian RBF kernel by Gaussian elastic metric kernel. Gaussian elastic metric kernel is extended version of Gaussian RBF. The extended version divided in two ways time wrap distance and its real penalty. Experimental results on 17 datasets, time series data sets show that, in terms of classification accuracy, SVM with Gaussian elastic metric kernel is much superior to other kernels, and the ultramodern similarity measure methods. In this paper we used the indefinite resemblance function or distance directly without any conversion, and, hence, it always treats both training and test examples consistently. Finally, it achieves the highest accuracy of Gaussian elastic metric kernel among all methods that train SVM with kernels i.e. positive semi definite PSD and Non PSD, with a statistically significant evidence while also retaining sparsity of the support vector set. Tarun Jaiswal | Dr. S. Jaiswal | Dr. Ragini Shukla ""Soft Computing Techniques Based Image Classification using Support Vector Machine Performance"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23437.pdf
Paper URL: https://www.ijtsrd.com/computer-science/artificial-intelligence/23437/soft-computing-techniques-based-image-classification-using-support-vector-machine-performance/tarun-jaiswal
In this paper fuzzy VRPTW with an uncertain travel time is considered. Credibility theory is used to model
the problem and specifies a preference index at which it is desired that the travel times to reach the
customers fall into their time windows. We propose the integration of fuzzy and ant colony system based
evolutionary algorithm to solve the problem while preserving the constraints. Computational results for
certain benchmark problems having short and long time horizons are presented to show the effectiveness of
the algorithm. Comparison between different preferences indexes have been obtained to help the user in
making suitable decisions
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERijnlc
We aim to model an adaptive log file parser. As the content of log files often evolves over time, we established a dynamic statistical model which learns and adapts processing and parsing rules. First, we limit the amount of unstructured text by clustering based on semantics of log file lines. Next, we only take the most relevant cluster into account and focus only on those frequent patterns which lead to the desired output table similar to Vaarandi [10]. Furthermore, we transform the found frequent patterns and the output stating the parsed table into a Hidden Markov Model (HMM). We use this HMM as a specific, however, flexible representation of a pattern for log file parsing to maintain high quality output. After training our model on one system type and applying it to a different system with slightly different log file patterns, we achieve an accuracy over 99.99%
Dimensionality Reduction Techniques for Document Clustering- A SurveyIJTET Journal
Abstract— Dimensionality reduction technique is applied to get rid of the inessential terms like redundant and noisy terms in documents. In this paper a systematic study is conducted for seven dimensionality reduction methods such as Latent Semantic Indexing (LSI), Random Projection (RP), Principle Component Analysis (PCA) and CUR decomposition, Latent Dirichlet Allocation(LDA), Singular value decomposition (SVD). Linear Discriminant Analysis(LDA)
TOWARDS MORE ACCURATE CLUSTERING METHOD BY USING DYNAMIC TIME WARPINGijdkp
An intrinsic problem of classifiers based on machine learning (ML) methods is that their learning time
grows as the size and complexity of the training dataset increases. For this reason, it is important to have
efficient computational methods and algorithms that can be applied on large datasets, such that it is still
possible to complete the machine learning tasks in reasonable time. In this context, we present in this paper
a more accurate simple process to speed up ML methods. An unsupervised clustering algorithm is
combined with Expectation, Maximization (EM) algorithm to develop an efficient Hidden Markov Model
(HMM) training. The idea of the proposed process consists of two steps. In the first step, training instances
with similar inputs are clustered and a weight factor which represents the frequency of these instances is
assigned to each representative cluster. Dynamic Time Warping technique is used as a dissimilarity
function to cluster similar examples. In the second step, all formulas in the classical HMM training
algorithm (EM) associated with the number of training instances are modified to include the weight factor
in appropriate terms. This process significantly accelerates HMM training while maintaining the same
initial, transition and emission probabilities matrixes as those obtained with the classical HMM training
algorithm. Accordingly, the classification accuracy is preserved. Depending on the size of the training set,
speedups of up to 2200 times is possible when the size is about 100.000 instances. The proposed approach
is not limited to training HMMs, but it can be employed for a large variety of MLs methods.
Soft Computing Techniques Based Image Classification using Support Vector Mac...ijtsrd
n this paper we compare different kernel had been developed for support vector machine based time series classification. Despite the better presentation of Support Vector Machine SVM on many concrete classification problems, the algorithm is not directly applicable to multi dimensional routes having different measurements. Training support vector machines SVM with indefinite kernels has just fascinated consideration in the machine learning public. This is moderately due to the fact that many similarity functions that arise in practice are not symmetric positive semidefinite. In this paper, by spreading the Gaussian RBF kernel by Gaussian elastic metric kernel. Gaussian elastic metric kernel is extended version of Gaussian RBF. The extended version divided in two ways time wrap distance and its real penalty. Experimental results on 17 datasets, time series data sets show that, in terms of classification accuracy, SVM with Gaussian elastic metric kernel is much superior to other kernels, and the ultramodern similarity measure methods. In this paper we used the indefinite resemblance function or distance directly without any conversion, and, hence, it always treats both training and test examples consistently. Finally, it achieves the highest accuracy of Gaussian elastic metric kernel among all methods that train SVM with kernels i.e. positive semi definite PSD and Non PSD, with a statistically significant evidence while also retaining sparsity of the support vector set. Tarun Jaiswal | Dr. S. Jaiswal | Dr. Ragini Shukla ""Soft Computing Techniques Based Image Classification using Support Vector Machine Performance"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-3 , April 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23437.pdf
Paper URL: https://www.ijtsrd.com/computer-science/artificial-intelligence/23437/soft-computing-techniques-based-image-classification-using-support-vector-machine-performance/tarun-jaiswal
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...ijnlc
The tremendous increase in the amount of available research documents impels researchers to propose topic models to extract the latent semantic themes of a documents collection. However, how to extract the hidden topics of the documents collection has become a crucial task for many topic model applications. Moreover, conventional topic modeling approaches suffer from the scalability problem when the size of documents collection increases. In this paper, the Correlated Topic Model with variational ExpectationMaximization algorithm is implemented in MapReduce framework to solve the scalability problem. The proposed approach utilizes the dataset crawled from the public digital library. In addition, the full-texts of the crawled documents are analysed to enhance the accuracy of MapReduce CTM. The experiments are conducted to demonstrate the performance of the proposed algorithm. From the evaluation, the proposed approach has a comparable performance in terms of topic coherences with LDA implemented in MapReduce framework.
The well-known Vehicle Routing Problem (VRP) consist of assigning routeswith a set
ofcustomersto different vehicles, in order tominimize the cost of transport, usually starting from a central
warehouse and using a fleet of fixed vehicles. There are numerousapproaches for the resolution of this kind of
problems, being the metaheuristic techniques the most used, including the Genetic Algorithms (AG). The
number of approachesto the different parameters of an AG (selection, crossing, mutation...) in the literature is
such that it is not easy to take a resolution of a VRP problem directly. This paper aims to simplify this task by
analyzing the best known approaches with standard VRP data sets, and showing the parameter configurations
that offer the best results.
Simulating Multivariate Random Normal Data using Statistical Computing Platfo...ijtsrd
Many faculty members, as well as students, in the area of educational research methodology, sometimes have a need for generating data to use for simulation and computation purposes, demonstration of multivariate analysis techniques, or construction of student projects or assignments. As a great teaching tool, using simulated data helps us understand the intricacies of statistical concepts and techniques. The process of generating multivariate normal data is a nontrivial process and practical guides without dense mathematics are limited in the literature Nissen and Saft, 2014 . Hence, the purpose of this paper is to offer researchers a practical guide for and a quick access to generating multivariate random data with a given mean and variance covariance structure. A detailed outline of simulating multivariate normal data with a given mean and variance covariance matrix using Eigen or spectral and Cholesky decompositions is presented and implemented in statistical computing platform R version 3.4.4 R Core Team, 2018 . Mehmet Turegun ""Simulating Multivariate Random Normal Data using Statistical Computing Platform R"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23987.pdf
Paper URL: https://www.ijtsrd.com/humanities-and-the-arts/education/23987/simulating-multivariate-random-normal-data-using-statistical-computing-platform-r/mehmet-turegun
Nonlinear Extension of Asymmetric Garch Model within Neural Network Framework csandit
The importance of volatility for all market partici
pants has led to the development and
application of various econometric models. The most
popular models in modelling volatility are
GARCH type models because they can account excess k
urtosis and asymmetric effects of
financial time series. Since standard GARCH(1,1) mo
del usually indicate high persistence in the
conditional variance, the empirical researches turn
ed to GJR-GARCH model and reveal its
superiority in fitting the asymmetric heteroscedast
icity in the data. In order to capture both
asymmetry and nonlinearity in data, the goal of thi
s paper is to develop a parsimonious NN
model as an extension to GJR-GARCH model and to det
ermine if GJR-GARCH-NN outperforms
the GJR-GARCH model.
A Novel Approach to Mathematical Concepts in Data Miningijdmtaiir
-This paper describes three different fundamental
mathematical programming approaches that are relevant to
data mining. They are: Feature Selection, Clustering and
Robust Representation. This paper comprises of two clustering
algorithms such as K-mean algorithm and K-median
algorithms. Clustering is illustrated by the unsupervised
learning of patterns and clusters that may exist in a given
databases and useful tool for Knowledge Discovery in
Database (KDD). The results of k-median algorithm are used
to collecting the blood cancer patient from a medical database.
K-mean clustering is a data mining/machine learning algorithm
used to cluster observations into groups of related observations
without any prior knowledge of those relationships. The kmean algorithm is one of the simplest clustering techniques
and it is commonly used in medical imaging, biometrics and
related fields.
Leather Quality Estimation Using an Automated Machine Vision SystemIOSR Journals
In the presented work, it is proposed to design the feature vector of the parameters of leather
material in order to completely define the quality of the leather material. The proposed parameters are holes,
cracks, spots, cuts, and roughness etc.. The defects are localized according to their position on leather surface,
their size and shape. The histogram analysis method is proposed to be developed for use with very low-level
image features, such as color and luminance and is used as an image descriptor for color-matching
requirements. In the present scenario, it is observed that the leather quality is highly sensitive to surface finish
of the leather material. Manually it is not possible always to inspect each area of the leather surface under test
because of heavy lot of material and it is time consuming too. The main problem during the inspection is that
how to achieve the repeatable quality at regular interval. Therefore, in order to get the authentic leather
quality, it is required to have a set of features that could be given some numerical value so that the quality can
be justified in quantified manner. A machine vision system offers a fair solution in order to solve this problem.
In the proposed work, we propose to deduce some mathematical parameters besides size and location that are arranged in a vector so as to determine the leather quality
Effective Waste Management and Environmental ControlIOSR Journals
There is wide spread interest in the world today in the methods that enable the re-use of waste.
According to Webster’s Mew Practical Dictionary, ‘Waste’ means “Thrown away as worthless after being used.
i.e. of no further use to a person, animal or plant; contrary to this opinion, it has been discovered that what is
regarded as waste or worthless, when worked upon can be manipulated to generated or produce materials that
are beneficial for the use of man.
This paper throw light into how waste resources can be control by analysis the theories of waste
management, recycling, re-use disposal and compositing from organic wastes and ways by which farm and
municipal waste can be worked upon to produce materials that are beneficial for the use of man
Comparative Study and Analysis of Image Inpainting TechniquesIOSR Journals
Abstract: Image inpainting is a technique to fill missing region or reconstruct damage area from an image.It
removes an undesirable object from an image in visually plausible way.For filling the part of image, it use
information from the neighboring area. In this dissertation work, we present a Examplar based method for
filling in the missing information in an image, which takes structure synthesis and texture sysnthesis together.
In exemplar based approach it used local information from an image to patch propagation.We have also
implement Nonlocal Mean approach for exemplar based image inpainting.In Nonlocal mean approach it find
multiple samples of best exemplar patches for patch propagation and weight their contribution according to
their similarity to the neighborhood under evaluation. We have further extended this algorithm by considering
collaborative filtering method to synthesize and propagate with multiple samples of best exemplar patches. We
have to preformed experiment on many images and found that our algorithm successfully inpaint the target
region.We have tested the accuracy of our algorithm by finding parameter like PSNR and compared PSNR
value for all three different approaches.
Keywords: Texture Synthesis, Structure Synthesis, Patch Propagation ,imageinpainting ,nonlocal approach,
collabrative filtering.
Qualitative Analysis of Legume Pericarp (Pod Wall) and Seeds of Acacia Farnes...IOSR Journals
Present study deals with the qualitative analysis of ethanolic extract of Legume pericarp (pod wall)
and seeds of Acacia farnesiana (L). In which we analyze 22 Phytochemical, which are use full for controlling
the diseases in Human beings. In India, Acacia farnesiana L. is known as Mulla tumma, Kampu tumma in local
area and it is commonly known as Aroma and sweet acacia also. The aim of the present study is to investigate
the presence or absence of phytochemicals such as Flavonoids, Alkaloids, Steroids, Proteins, Carbohydrates,
Tannin, Amides, Terpenoides, Amines, Phenol, Test for Unsaturation, Carboxylic acid, Test for NH2, Nitrogen,
Sulphur, Halogen, Starch, Saponin, Ascorbic acid, Glycosides, Reducing Sugar and Triterpenoids contents of
the selected medicinal plants. The ethanolic extract of legume pericarp indicates the presence of major
bioactive compound compare to seeds.
Novel Way to Isolate Adipose Derive Mesenchymal Stem Cells & Its Future Clini...IOSR Journals
Abstract: Adipose-derived stem cells (ADSCs), were isolated from discarded human fat tissue, obtained from csection with our recently modified methods, in Stem Cell & Regenerative Medicine Lab, VSBT. Here we develop
two methods to isolate Adipose derived mesenchymal stem cells with enzyme digestion and use of
phosphatidylcholine and deoxycholate. Surface protein expression was analyzed by flow cytometry to
characterize the cell phenotype. The multi-lineage potential of ADSCs was testified by differentiating cells with
adipogenic inducer. ADSCs can be cultured in vitro for up to one month without passage. Also, the flow
cytometry analysis showed that ADSCs expressed high levels of stem cell related surface marker CD105.
ADSCs have strong proliferation ability, maintain their phenotypes, and have stronger multi-differentiation
potential. The molecular basis of ADSC differentiation was studied using bioinformatics tools with an aim to
identify the key proteins involved in differentiation, such that they could be used as potential targets for drug
development for the treatment of obesity. The key proteins involved were found to be PPARG and C/EBPα. The
structures of the proteins were retrieved from MMDB (Molecular Modelling Database) and PDB (Protein Data
Bank) respectively. Key Words: Adipose-derived stem cells, Mesenchymal stem cells, Enzyme digestion, Phosphatidylcholine, Deoxycholate, PPARG, C/EBPα, etc.
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...ijnlc
The tremendous increase in the amount of available research documents impels researchers to propose topic models to extract the latent semantic themes of a documents collection. However, how to extract the hidden topics of the documents collection has become a crucial task for many topic model applications. Moreover, conventional topic modeling approaches suffer from the scalability problem when the size of documents collection increases. In this paper, the Correlated Topic Model with variational ExpectationMaximization algorithm is implemented in MapReduce framework to solve the scalability problem. The proposed approach utilizes the dataset crawled from the public digital library. In addition, the full-texts of the crawled documents are analysed to enhance the accuracy of MapReduce CTM. The experiments are conducted to demonstrate the performance of the proposed algorithm. From the evaluation, the proposed approach has a comparable performance in terms of topic coherences with LDA implemented in MapReduce framework.
The well-known Vehicle Routing Problem (VRP) consist of assigning routeswith a set
ofcustomersto different vehicles, in order tominimize the cost of transport, usually starting from a central
warehouse and using a fleet of fixed vehicles. There are numerousapproaches for the resolution of this kind of
problems, being the metaheuristic techniques the most used, including the Genetic Algorithms (AG). The
number of approachesto the different parameters of an AG (selection, crossing, mutation...) in the literature is
such that it is not easy to take a resolution of a VRP problem directly. This paper aims to simplify this task by
analyzing the best known approaches with standard VRP data sets, and showing the parameter configurations
that offer the best results.
Simulating Multivariate Random Normal Data using Statistical Computing Platfo...ijtsrd
Many faculty members, as well as students, in the area of educational research methodology, sometimes have a need for generating data to use for simulation and computation purposes, demonstration of multivariate analysis techniques, or construction of student projects or assignments. As a great teaching tool, using simulated data helps us understand the intricacies of statistical concepts and techniques. The process of generating multivariate normal data is a nontrivial process and practical guides without dense mathematics are limited in the literature Nissen and Saft, 2014 . Hence, the purpose of this paper is to offer researchers a practical guide for and a quick access to generating multivariate random data with a given mean and variance covariance structure. A detailed outline of simulating multivariate normal data with a given mean and variance covariance matrix using Eigen or spectral and Cholesky decompositions is presented and implemented in statistical computing platform R version 3.4.4 R Core Team, 2018 . Mehmet Turegun ""Simulating Multivariate Random Normal Data using Statistical Computing Platform R"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23987.pdf
Paper URL: https://www.ijtsrd.com/humanities-and-the-arts/education/23987/simulating-multivariate-random-normal-data-using-statistical-computing-platform-r/mehmet-turegun
Nonlinear Extension of Asymmetric Garch Model within Neural Network Framework csandit
The importance of volatility for all market partici
pants has led to the development and
application of various econometric models. The most
popular models in modelling volatility are
GARCH type models because they can account excess k
urtosis and asymmetric effects of
financial time series. Since standard GARCH(1,1) mo
del usually indicate high persistence in the
conditional variance, the empirical researches turn
ed to GJR-GARCH model and reveal its
superiority in fitting the asymmetric heteroscedast
icity in the data. In order to capture both
asymmetry and nonlinearity in data, the goal of thi
s paper is to develop a parsimonious NN
model as an extension to GJR-GARCH model and to det
ermine if GJR-GARCH-NN outperforms
the GJR-GARCH model.
A Novel Approach to Mathematical Concepts in Data Miningijdmtaiir
-This paper describes three different fundamental
mathematical programming approaches that are relevant to
data mining. They are: Feature Selection, Clustering and
Robust Representation. This paper comprises of two clustering
algorithms such as K-mean algorithm and K-median
algorithms. Clustering is illustrated by the unsupervised
learning of patterns and clusters that may exist in a given
databases and useful tool for Knowledge Discovery in
Database (KDD). The results of k-median algorithm are used
to collecting the blood cancer patient from a medical database.
K-mean clustering is a data mining/machine learning algorithm
used to cluster observations into groups of related observations
without any prior knowledge of those relationships. The kmean algorithm is one of the simplest clustering techniques
and it is commonly used in medical imaging, biometrics and
related fields.
Leather Quality Estimation Using an Automated Machine Vision SystemIOSR Journals
In the presented work, it is proposed to design the feature vector of the parameters of leather
material in order to completely define the quality of the leather material. The proposed parameters are holes,
cracks, spots, cuts, and roughness etc.. The defects are localized according to their position on leather surface,
their size and shape. The histogram analysis method is proposed to be developed for use with very low-level
image features, such as color and luminance and is used as an image descriptor for color-matching
requirements. In the present scenario, it is observed that the leather quality is highly sensitive to surface finish
of the leather material. Manually it is not possible always to inspect each area of the leather surface under test
because of heavy lot of material and it is time consuming too. The main problem during the inspection is that
how to achieve the repeatable quality at regular interval. Therefore, in order to get the authentic leather
quality, it is required to have a set of features that could be given some numerical value so that the quality can
be justified in quantified manner. A machine vision system offers a fair solution in order to solve this problem.
In the proposed work, we propose to deduce some mathematical parameters besides size and location that are arranged in a vector so as to determine the leather quality
Effective Waste Management and Environmental ControlIOSR Journals
There is wide spread interest in the world today in the methods that enable the re-use of waste.
According to Webster’s Mew Practical Dictionary, ‘Waste’ means “Thrown away as worthless after being used.
i.e. of no further use to a person, animal or plant; contrary to this opinion, it has been discovered that what is
regarded as waste or worthless, when worked upon can be manipulated to generated or produce materials that
are beneficial for the use of man.
This paper throw light into how waste resources can be control by analysis the theories of waste
management, recycling, re-use disposal and compositing from organic wastes and ways by which farm and
municipal waste can be worked upon to produce materials that are beneficial for the use of man
Comparative Study and Analysis of Image Inpainting TechniquesIOSR Journals
Abstract: Image inpainting is a technique to fill missing region or reconstruct damage area from an image.It
removes an undesirable object from an image in visually plausible way.For filling the part of image, it use
information from the neighboring area. In this dissertation work, we present a Examplar based method for
filling in the missing information in an image, which takes structure synthesis and texture sysnthesis together.
In exemplar based approach it used local information from an image to patch propagation.We have also
implement Nonlocal Mean approach for exemplar based image inpainting.In Nonlocal mean approach it find
multiple samples of best exemplar patches for patch propagation and weight their contribution according to
their similarity to the neighborhood under evaluation. We have further extended this algorithm by considering
collaborative filtering method to synthesize and propagate with multiple samples of best exemplar patches. We
have to preformed experiment on many images and found that our algorithm successfully inpaint the target
region.We have tested the accuracy of our algorithm by finding parameter like PSNR and compared PSNR
value for all three different approaches.
Keywords: Texture Synthesis, Structure Synthesis, Patch Propagation ,imageinpainting ,nonlocal approach,
collabrative filtering.
Qualitative Analysis of Legume Pericarp (Pod Wall) and Seeds of Acacia Farnes...IOSR Journals
Present study deals with the qualitative analysis of ethanolic extract of Legume pericarp (pod wall)
and seeds of Acacia farnesiana (L). In which we analyze 22 Phytochemical, which are use full for controlling
the diseases in Human beings. In India, Acacia farnesiana L. is known as Mulla tumma, Kampu tumma in local
area and it is commonly known as Aroma and sweet acacia also. The aim of the present study is to investigate
the presence or absence of phytochemicals such as Flavonoids, Alkaloids, Steroids, Proteins, Carbohydrates,
Tannin, Amides, Terpenoides, Amines, Phenol, Test for Unsaturation, Carboxylic acid, Test for NH2, Nitrogen,
Sulphur, Halogen, Starch, Saponin, Ascorbic acid, Glycosides, Reducing Sugar and Triterpenoids contents of
the selected medicinal plants. The ethanolic extract of legume pericarp indicates the presence of major
bioactive compound compare to seeds.
Novel Way to Isolate Adipose Derive Mesenchymal Stem Cells & Its Future Clini...IOSR Journals
Abstract: Adipose-derived stem cells (ADSCs), were isolated from discarded human fat tissue, obtained from csection with our recently modified methods, in Stem Cell & Regenerative Medicine Lab, VSBT. Here we develop
two methods to isolate Adipose derived mesenchymal stem cells with enzyme digestion and use of
phosphatidylcholine and deoxycholate. Surface protein expression was analyzed by flow cytometry to
characterize the cell phenotype. The multi-lineage potential of ADSCs was testified by differentiating cells with
adipogenic inducer. ADSCs can be cultured in vitro for up to one month without passage. Also, the flow
cytometry analysis showed that ADSCs expressed high levels of stem cell related surface marker CD105.
ADSCs have strong proliferation ability, maintain their phenotypes, and have stronger multi-differentiation
potential. The molecular basis of ADSC differentiation was studied using bioinformatics tools with an aim to
identify the key proteins involved in differentiation, such that they could be used as potential targets for drug
development for the treatment of obesity. The key proteins involved were found to be PPARG and C/EBPα. The
structures of the proteins were retrieved from MMDB (Molecular Modelling Database) and PDB (Protein Data
Bank) respectively. Key Words: Adipose-derived stem cells, Mesenchymal stem cells, Enzyme digestion, Phosphatidylcholine, Deoxycholate, PPARG, C/EBPα, etc.
On Tracking Behavior of Streaming Data: An Unsupervised ApproachWaqas Tariq
In the recent years, data streams have been in the gravity of focus of quite a lot number of researchers in different domains. All these researchers share the same difficulty when discovering unknown pattern within data streams that is concept change. The notion of concept change refers to the places where underlying distribution of data changes from time to time. There have been proposed different methods to detect changes in the data stream but most of them are based on an unrealistic assumption of having data labels available to the learning algorithms. Nonetheless, in the real world problems labels of streaming data are rarely available. This is the main reason why data stream communities have recently focused on unsupervised domain. This study is based on the observation that unsupervised approaches for learning data stream are not yet matured; namely, they merely provide mediocre performance specially when applied on multi-dimensional data streams. In this paper, we propose a method for Tracking Changes in the behavior of instances using Cumulative Density Function; abbreviated as TrackChCDF. Our method is able to detect change points along unlabeled data stream accurately and also is able to determine the trend of data called closing or opening. The advantages of our approach are three folds. First, it is able to detect change points accurately. Second, it works well in multi-dimensional data stream, and the last but not the least, it can determine the type of change, namely closing or opening of instances over the time which has vast applications in different fields such as economy, stock market, and medical diagnosis. We compare our algorithm to the state-of-the-art method for concept change detection in data streams and the obtained results are very promising.
Concept Drift for obtaining Accurate Insight on Process Executioniosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Online learning algorithmes often have to operate in the presence of concept drifts. A recent study
revealed that different diversity levels in an ensemble of learning machines are required in order to maintain high
generalization on both old and new concepts. Inspired by this study and based on a further study of diversity with
different strategies to deal with drifts, so propose a new online ensemble learning approach called Diversity for
Dealing with Drifts (DDD).DDD maintains ensembles with different diversity levels and is able to attain better
accuracy than other approaches. Furthermore, it is very robust, outperforming other drift handling approaches in
terms of accuracy when there are false positive drift detections. It is always performed at least as well as
other drift handling approaches under various conditions, with very few exceptions. Presents an analysis of low
and high diversity ensembles combined with different strategies to deal with concept drift and proposes a new
approach (DDD) to handle drifts.
Abstract: Tracking and detecting the crowd is the main problem in the current era hence we are making video scenes method .Detection of many individual objects has been improved over recent years. It has been challenging to detect and track the tasks due to occlusions and variation in people appearance. Facing these challenges, we suggest to leverage information on the global structure of the scenes and to resolve jointly. We explore the constraints of the crowd density and detection of optimization using joint energy function. We show how the optimization of such energy function improves to track and detect in floating crowds. We validate our approach on a challenging video dataset of crowded scenes. The addition of different features which is relevant to tracking peoples such as movement, size, height and the observation models in the particle filters and followed by a clustering methods. It minimizes the communication cost and Data Retrieval is easy.
Linear Regression Model Fitting and Implication to Self Similar Behavior Traf...IOSRjournaljce
We present a simple linear regression model fit in the direction of self-similarity behavior of internet user’s arrival data pattern. It has been reported that Internet traffic exhibits self-similarity. Motivated by this fact, real time internet users arrival patterns considered as traffic and the results carried out and proven that it has the self-similar nature by various Hurst index methods. The present study provides a mathematical model equation in terms linear regression as a tool to predict the arrival pattern of Internet users data at web centers. Numerical results, analysis discussed and presented here plays a significant role in improvement of the services and forecasting analysis of arrival protocols at web centers in the view of quality of service (QOS).
Concept drift and machine learning model for detecting fraudulent transaction...IJECEIAES
In a streaming environment, data is continuously generated and processed in an ongoing manner, and it is necessary to detect fraudulent transactions quickly to prevent significant financial losses. Hence, this paper proposes a machine learning-based approach for detecting fraudulent transactions in a streaming environment, with a focus on addressing concept drift. The approach utilizes the extreme gradient boosting (XGBoost) algorithm. Additionally, the approach employs four algorithms for detecting continuous stream drift. To evaluate the effectiveness of the approach, two datasets are used: a credit card dataset and a Twitter dataset containing financial fraudrelated social media data. The approach is evaluated using cross-validation and the results demonstrate that it outperforms traditional machine learning models in terms of accuracy, precision, and recall, and is more robust to concept drift. The proposed approach can be utilized as a real-time fraud detection system in various industries, including finance, insurance, and e-commerce.
A Survey On Tracking Moving Objects Using Various AlgorithmsIJMTST Journal
Sparse representation has been applied to the object tracking problem. Mining the self- similarities between particles via multitask learning can improve tracking performance. How-ever, some particles may be different from others when they are sampled from a large region. Imposing all particles share the same structure may degrade the results. To overcome this problem, we propose a tracking algorithm based on robust multitask sparse representation (RMTT) in this letter. When we learn the particle representations, we decompose the sparse coefficient matrix into two parts in our algorithm. Joint sparse regularization is imposed on one coefficient matrix while element-wise sparse regularization is imposed on another matrix. The former regularization exploits self-similarities of particles while the later one considers the differences between them.
In this short presentation, we will provide some recent developments in the field of crowd monitoring, modelling and management. We will illustrate these by showing various projects that we are involved in, including the SmartStation project, and the different events organised in and around the city of Amsterdam (including the Europride, SAIL, etc.).
In the talk, we will discuss the different components of the system and the methods and technology involved in these. We focus on advanced data collection techniques, the use of social media data, data fusion and the advanced macroscopic modelling required for this. Also, we will show examples of interventions that have been tested, showing how these systems are used in practise.
A Hybrid Architecture for Tracking People in Real-Time Using a Video Surveill...sipij
This paper describes a novel method for tracking customers using images taken from video-surveillance
cameras. This system analyzes the number of customers and their motions through the aisles of big-box
stores (supermarkets) in real-time. The originality of our approach is based on the study of the blobs
properties for managing the splitting/merging issues using a mathematical morphology operator. In the
order hand, in order to manage a high number of customers in real-time, we combine the advantage of two
tracking algorithms.
A LIGHT-WEIGHT DISTRIBUTED SYSTEM FOR THE PROCESSING OF REPLICATED COUNTER-LI...ijdpsjournal
In order to increase availability in a distributed system some or all of the data items are replicated and
stored at separate sites. This is an issue of key concern especially since there is such a proliferation of
wireless technologies and mobile users. However, the concurrent processing of transactions at separate
sites can generate inconsistencies in the stored information. We have built a distributed service that
manages updates to widely deployed counter-like replicas. There are many heavy-weight distributed
systems targeting large information critical applications. Our system is intentionally, relatively lightweight
and useful for the somewhat reduced information critical applications. The service is built on our
distributed concurrency control scheme which combines optimism and pessimism in the processing of
transactions. The service allows a transaction to be processed immediately (optimistically) at any
individual replica as long as the transaction satisfies a cost bound. All transactions are also processed in a
concurrent pessimistic manner to ensure mutual consistency
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streamsirjes
In the data mining field the classification of data stream creates many problems. The challenges
faces in the data stream are infinite length, concept drift, concept evaluation and feature evolution. Most of the
existing system focuses on the only first two challenges. We propose a framework in which each classifier is
prepared with the novel class detector for addressing the two challenges concept drift and concept evaluation
and for addressing the feature evolution feature set homogeneous technique is proposed. We improved the
novel class detection module by building it more adaptive to evolving the stream. SVM based feature extraction
for RBF kernel method is also proposed for detecting the novel class from the steaming data. By using the
concept of permutation and combination RBF kernel extracts the features and find out the relation between
them. This improves the novel class detect technique and provide more accuracy for classifying the data
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
In this paper Compare the performance of two
classification algorithm. I t is useful to differentiate
algorithms based on computational performance rather
than classification accuracy alone. As although
classification accuracy between the algorithms is similar,
computational performance can differ significantly and it
can affect to the final results. So the objective of this paper
is to perform a comparative analysis of two machine
learning algorithms namely, K Nearest neighbor,
classification and Logistic Regression. In this paper it
was considered a large dataset of 7981 data points and 112
features. Then the performance of the above mentioned
machine learning algorithms are examined. In this paper
the processing time and accuracy of the different machine
learning techniques are being estimated by considering the
collected data set, over a 60% for train and remaining
40% for testing. The paper is organized as follows. In
Section I, introduction and background analysis of the
research is included and in section II, problem statement.
In Section III, our application and data analyze Process,
the testing environment, and the Methodology of our
analysis are being described briefly. Section IV comprises
the results of two algorithms. Finally, the paper concludes
with a discussion of future directions for research by
eliminating the problems existing with the current
research methodology.
An Efficient Algorithm for Mining Frequent Itemsets within Large Windows over...Waqas Tariq
Sliding window is an interesting model for frequent pattern mining over data stream due to handling concept change by considering recent data. In this study, a novel approximate algorithm for frequent itemset mining is proposed which operates in both transactional and time sensitive sliding window model. This algorithm divides the current window into a set of partitions and estimates the support of newly appeared itemsets within the previous partitions of the window. By monitoring essential set of itemsets within incoming data, this algorithm does not waste processing power for itemsets which are not frequent in the current window. Experimental evaluations using both synthetic and real datasets shows the superiority of the proposed algorithm with respect to previously proposed algorithms.
Constructing a classification model is important in machine learning for a particular task. A
classification process involves assigning objects into predefined groups or classes based on a
number of observed attributes related to those objects. Artificial neural network is one of the
classification algorithms which, can be used in many application areas. This paper investigates
the potential of applying the feed forward neural network architecture for the classification of
medical datasets. Migration based differential evolution algorithm (MBDE) is chosen and
applied to feed forward neural network to enhance the learning process and the network
learning is validated in terms of convergence rate and classification accuracy. In this paper,
MBDE algorithm with various migration policies is proposed for classification problems using
medical diagnosis.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
The Art of the Pitch: WordPress Relationships and Sales
D017122026
1. IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 17, Issue 1, Ver. II (Jan – Feb. 2015), PP 20-26
www.iosrjournals.org
DOI: 10.9790/0661-17122026 www.iosrjournals.org 20 | Page
A Review on Concept Drift
Yamini Kadwe1
, Vaishali Suryawanshi2
1,2
(Department of IT, M.I.T. College Of Engineering, Pune, India)
Abstract: The concept changes in continuously evolving data streams are termed as concept drifts. It is
required to address the problems caused due to concept drift and adapt according to the concept changes. This
can be achieved by designing supervised or unsupervised techniques in such a way, that concept changes are
considered, and useful knowledge is extracted. This paper discusses various techniques to manage concept
drifts. The synthetic and real datasets with different concept drifts and the applications are discussed.
Keywords: Drift detectors, ensemble classifiers, data stream mining, and bagging.
I. Introduction
Today, is a world of advanced technologies, every field is automated. Due to advances in technology,
plenty of data is generated every second. Examples of such applications include network monitoring, web
mining, sensor networks, telecommunications data management, and financial applications [14]. The data needs
to be gathered and processed, to extract unknown, useful and interesting knowledge. But it is impossible to
manually extract that knowledge due to the volume and speed of the data gathered.
Concept drift occurs when the concept about which data is being collected shifts from time to time after
a minimum stability period. This problem of concept drift needs to be considered to mine data with acceptable
accuracy level. Some examples of concept drift include spam detection, financial fraud detection, climate
change prediction, customer preferences for online shopping.
This paper is organized as follows, Section II gives the overview of concept drift, involving problem,
need to adapt concept drift and types of concept drift. Section III explains various methods of detecting concept
drift. Section IV discusses about statistical tests for concept drift. In Section V classifiers for dealing with
concept drifts are discussed. Section VI gives a summary on possible datasets based on the type of drifts present
and Section VII summarizes consideration of concept drift in various real-world applications.
II. Overview
Problem of Concept Drift:
There has been increased importance of concept drift in machine learning as well as data mining tasks.
Today, data is organized in the form of data streams rather than static databases. Also the concepts and data
distributions ought to change over a long period of time.
Need for Concept drift adaptation:
In dynamically changing or non-stationary environments, the data distribution can change over time
yielding the phenomenon of concept drift[4]. The concept drifts can be quickly adapted by storing concept
descriptions, so that they can be re-examined and reused later. Hence, adaptive learning is required to deal with
data in non-stationary environments. When concept drift is detected, the current model needs to be updated to
maintain accuracy.
Types of Concept drift:
Depending on the relation between the input data and target variable, concept change take different
forms. Concept drift between time point t0 and time point t1 can be defined as-
∃X : pt0(X, y) ≠ pt1(X, y) (1)
where pt0 denotes the joint distribution at time t0 between the set of input variables X and the target variable y.
Kelly et al. presented the three ways in which concept drift may occur [3]:
prior probabilities of classes, p(y) may change over time
class-conditional probability distributions, p(X,y) might change
posterior probabilities p(y|X) might change.
Concept drift may be classified in terms of the [4] speed of change and the reason of change as shown
in figure 1. When 'a set of examples has legitimate class labels at one time and has different legitimate labels at
another time', it is real drift, i.e. reason of change[20], refers to changes in p(y|X).
2. A Review on Concept Drift
DOI: 10.9790/0661-17122026 www.iosrjournals.org 21 | Page
Fig 1: Types of drift: circles represent instances; different colors represent different classes[4]
Fig 2: Patterns of concept change [4]
When 'the target concepts remain the same but the data distribution changes'[6], it is virtual drift, i.e.
speed of change, refers to changes in p(X).
A drift can be sudden or abrupt, when concept switching is from one to another (refer figure 2)[4]. The
concept change can be incremental, consisting of many intermediate concepts in between. Drift may be gradual;
change is not abrupt, but goes back to previous pattern for some time. Concept drift handling algorithms should
not mix the true drift with an outlier (blip) or noise, which refers to an anomaly. A recurring drifts is when new
concepts that were not seen before, or previously seen concepts may reoccur after some time.
Detecting Concept changes:
The ways to monitor concept drift are as given below:
Concept drift is monitored by checking with the data's probability distribution, since it changes with time.
One can judge whether concept drift has happened, by monitoring and tracking the relevance between
various sample characteristics or attributions.
Concept drifts leads to changes in features of classification models.
Classification accuracy can be taken into account while detecting concept drift on a given data stream.
Recall, precision and F-measure are some of the accuracy indicators of classification
The arrival of the timestamp of single sample or block sample can be taken as an additional input attribute, to
determine occurrence concept drift. It keeps a check on whether the classification rule has become outdated.
III. Concept Drift Detectors
This section discusses algorithms allowing to detect concept drift, known as concept drift detectors.
They alarm the base learner, that the model should be rebuilt or updated.
DDM:
In the Drift Detection Method (DDM), proposed by Gama et al. uses Binomial Distribution[14]. For
each point i in the sequence that is being sampled, the error rate is the probability of misclassifying (pi), with
standard deviation (si) given by eq 2-
si = (2)
they store the values of pi and si when pi + si reaches its minimum value during the process i.e. pmin and
smin. These values are used to calculate a warning level condition presented in eq. 3 and an alarm level condition
presented in eq. 4 -
pi + si ≥ pmin + α . smin (warning level) (3)
pi + si ≥ pmin + β.smin (alarm level ) (4)
Beyond the warning the examples are stored in anticipation of a possible change of context. Beyond the
alarm level, the concept drift is supposed to be true, the model induced by the learning method is reset, also pmin
and smin, and a new model is learnt using the examples stored since the warning level triggered. DDM works
best on data streams with sudden drift as gradually changing concepts can pass without triggering the alarm
level.
3. A Review on Concept Drift
DOI: 10.9790/0661-17122026 www.iosrjournals.org 22 | Page
EDDM:
Baena-García et al. proposed a modification of DDM called EDDM [16]. The same warning- alarm
mechanism, was used but instead of using the classifier‟s error rate, the distance-error-rate was proposed. They
denote p'i as the average distance between two consecutive errors and s'i as its standard deviation. Using these
values the new warning and alarm conditions are given by eq. 5 and eq. 6.
p'i +2 . s'i / p'max+2.s'max < α (warning level) (5)
p'i+3. s'i/ p'max+3.s'max < β (alarm level) (6)
the values of p'i and s'i are stored when reaches its maximum value p'i +2 .s'i (obtaining p'max and s'max). EDDM
works better than DDM for slow gradual drift, but is more sensitive to noise. Another drawback is that it
considers the thresholds and searches for concept drift when a minimum of 30 errors have occurred.
Adwin:
Bifet et al. proposed this method, that uses sliding windows of variable size, which are recomputed
online according to the rate of change observed from the data in these windows[13]. The window(W) is
dynamically enlarged when there is no clear change in the context, and shrinks it when a change is detected.
Additionally, ADWIN provides rigorous guarantees of its performance, in the form of limits on the rates of false
positives and false negatives. ADWIN works only for one-dimensional data. A separate window must be
maintained for each dimension, for n-dimensional raw data, which results in handling more than one window.
Paired Learners:
The Paired Learners, proposed by Stephen Bach et al., uses two learners: stable and reactive[17]. The
stable learner predicts based on all of its experience, while the reactive one predicts based on a window of recent
examples. It uses the interplay between these two learners and their accuracy differences to cope with concept
drift. The reactive learner can be implemented in two different ways; by rebuilding the learner with the last
w(window size) examples, or by using a retractable learner that can unlearn examples.
Exponentially weighted moving average for Concept Drift Detection (ECDD):
Ross et al., proposed a drift detection method based on Exponentially Weighted Moving Average
(EWMA)[15], used for identifying an increase in the mean of a sequence of random variables. In EWMA, the
probability of incorrectly classifying an instance before the change point and the standard deviation of the
stream are known. In ECDD, the values of success and failure probability(1 and 0) are computed online, based
on the classification accuracy of the base learner in the actual instance, together with an estimator of the
expected time between false positive detections.
Statistical Test of Equal Proportions (STEPD):
The STEPD proposed by Nishida et al., assumes that 'the accuracy of a classifier for recent W
examples will be equal to the overall accuracy from the beginning of the learning if the target concept is
stationary; and a significant decrease of recent accuracy suggests that the concept is changing'[18]. A chi-square
test is performed by computing a statistic and its value is compared to the percentile of the standard normal
distribution to obtain the observed significance level. If this value is less than a significance level, then the null-
hypothesis is rejected, assuming that a concept drift has occurred. The warning and drift thresholds are also
used, similar to the ones presented by DDM, EDDM, PHT, and ECDD.
DOF:
The method proposed by Sobhani et al. detects drifts by processing data chunk by chunk, the nearest
neighbor in the previous batch is computed for each instance in the current batch and comparing their
corresponding labels. A distance map is created, associating the index of the instance in the previous batch and
the label computed by its nearest neighbor; degree of drift is computed based on the distance map. The average
and standard deviation of all degrees of drift are computed and, if the current value is away from the average
more than s standard deviations, a concept drift is raised, where s is a parameter of the algorithm. [10]This
algorithm is more effective for problems with well separated and balanced classes.
IV. Statistical Tests For Concept Drift:
The design of a change detector is a compromise between detecting true changes and avoiding false
alarms. This is accomplished by carrying out statistical tests that verifies if the running error or class distribution
remain constant over time.
4. A Review on Concept Drift
DOI: 10.9790/0661-17122026 www.iosrjournals.org 23 | Page
CUSUM test:
The cumulative sum algorithm[24], is a change detection algorithm that raises an alarm when the mean
of the input data is significantly different from zero. The CUSUM input ϵt can be any filter residual, for
example, the prediction error from a Kalman filter. The CUSUM test is as follows-
go = 0
gt = max (0, gt-1 + ϵt - υ)
if gt > h then alarm and gt = 0 (7)
The CUSUM test is memoryless, and its accuracy depends on the choice of parameters υ and h.
Page Hinkley test: It is a sequential analysis technique, proposed by, that computes the observed values and their
mean up to the current moment. The Page-Hinkley test[5] is given as -
go = 0, gt = gt-1 + ϵt - υ
Gt = min(gt)
if gt - Gt > h then alarm and gt = 0 (8)
Geometric moving average test:
The Geometric Moving Average (GMA) test [25] is as below:
go = 0
gt = λgt−1 + (1 − λ)ϵt
if gt > h then alarm and gt = 0 (9)
The forgetting factor λ is used to give more or less weight to the last data arrived. The threshold h is
used to tune the sensitivity and false alarm rate of the detector.
Statistical test:
CUSUM and GMA are methods those deal with numeric sequences. A statistical test is a procedure for
deciding whether a hypothesis about a quantitative feature of a population is true or false. We test an hypothesis
by drawing a random sample from the population in question and calculating an appropriate statistic on its
items.
To detect change, we need to compare two sources of data, and decide if the hypothesis H0 that they
come from the same distribution is true. Otherwise, a hypothesis test will reject H0 and a change is detected. The
simplest way for hypothesis, is to study the difference from which a standard hypothesis test can be formulated.
0 - 1 Є N(0, σ2
0 + σ2
1 ), under H0
or, to make a χ2
test, [( 0 - 1)2
/ σ2
0 + σ2
1] Є χ2
(1), under H0
The Kolmogorov-Smirnov test (non-parametric) is another statistical test to compare two populations.
The KS-test has the advantage of making no assumption about the distribution of data.
V. Concept Drift Handling
Kuncheva proposes to group ensemble strategies for changing environments[9] as follows:
Dynamic combiners (horse racing): component classifiers are trained and their combination is changed
using forgetting process.
Updated training data : component classifiers in the ensemble are created incrementally by incoming
examples.
Updating the ensemble member : ensemble members are updated online or retrained with blocks of data.
Structural changes of the ensemble : ensemble members are reevaluated and the worst classifiers are
updated or replaced with a classifier trained on the most recent examples, with any concept change.
Adding new features - The attributes used are changed, as an attribute becomes significant, without
redesigning the ensemble structure.
The approaches to handle concept drifts includes single classifier and ensemble classifier approaches.
The single classifiers are traditional learners that were modeled for stationary data mining and have the qualities
of an online learner and a forgetting mechanism. Basically, ensemble classifiers are sets of single classifiers
whose individual decisions are aggregated by a voting rule. The ensemble classifiers provide better
classification accuracy as compared to the single classifiers due combined decision. They have a natural way of
adapting to concept changes due to their modularity.
Streaming Ensemble Algorithm(SEA): The SEA[8], proposed by Street and Kim, changes its structure based
on concept change. It is a heuristic replacement strategy of the weakest base classifier based on accuracy and
diversity. The combined decision was based on simple majority voting and base classifiers unpruned. This
algorithm works best for at most 25 components of the ensemble.
5. A Review on Concept Drift
DOI: 10.9790/0661-17122026 www.iosrjournals.org 24 | Page
Accuracy Weighted Ensemble (AWE): In SEA, it is crucial to properly define the data chunk size as it
determines the ensembles flexibility. The algorithm AWE, proposed by Wang et al., trains a new classifier C' on
each incoming data chunk and use that chunk to evaluate all the existing ensemble members to select the best
component classifiers. AWE is best suited for large data streams and works well for recurring and other drifts.
Adaptive Classifier Ensemble(ACE): To overcome AWE‟s slow drift reactions, Nishida proposed a hybrid
approach in which a data chunk ensemble is aided by a drift detector, called Adaptive Classifier Ensemble
(ACE), aims at reacting to sudden drifts by tracking the classifier‟s error rate with each incoming example,
while slowly reconstructing a classifier ensemble with large chunks of examples.
Hoeffding option trees(HOT) and ASHT Bagging: Hoeffding Option Trees (HOT) provide a compact
structure that works like a set of weighted classifiers, and are built in an incremental fashion. This
algorithms[27] allows each training example to update a set of option nodes rather than just a single leaf.
Adaptive-Size Hoeffding Tree Bagging (ASHT Bagging) diversifies ensemble members by using trees of
different sizes and uses a forgetting mechanism. Compared to HOT, ASHT Bagging proves to be more accurate
on most data sets. But both are time and memory expensive than option trees or single classifiers.
Accuracy Diversified Ensemble(ADE): The algorithm called Accuracy Diversified Ensemble (ADE)[22], not
only selects but also updates components according to the current distribution. ADE differs from AWE in
weight definition, the use of online base classifiers, bagging, and updating components with incoming examples.
Compared to ASHT and HOT, we do not limit base classifier size, do not use any windows, and update
members only if they are accurate enough according to the current distribution.
Accuracy Updated Ensemble(AUE): Compared to AWE, AUE1 [7] conditionally updates component
classifiers. It maintains a weighted pool of component classifiers and predicts classes for incoming examples
based on weighted voting rule. It substitutes the weakest performing ensemble member and new classifier is
created with each data chunk of examples, also their weights are adjusted. It uses Hoeffding trees as component
classifiers. Compared to AUE1, AUE2 introduces a new weighting function[22], does not require cross-
validation of the candidate classifier, does not keep a classifier buffer, prunes its base learners, and always
updates its components. It does not limit base classifier size and use any windows. The OAUE[23], tries to
combine block-based ensembles and online processing.
VI. Datasets With Concept Drift
Artificial datasets give the ground truth of the data, however, real datasets are more interesting as they
correspond to real-world applications where the algorithms‟ usability is tested[22].
6.1 Real datasets:
Forest Covertype, obtained from US Forest Service (USFS) Region 2 Resource Information System (RIS)
data, contains 581, 012 instances and 54 attributes.
Poker-Hand consists of 1,000, 000 instances and 11 attributes.
Electricity dataset, collected from the Australian New South Wales Electricity Market, contains 45, 312
instances.
Airlines Dataset contains 539,383 examples described by seven attributes.
Ozone level detection data set consists of 2,534 entries and is highly unbalanced (2% or 5% positives
depending on the criteria of “ozone days”).
6.2 Synthetic datasets:
The synthetic datasets allow us to analyze how the methods deal with the types of drift included in the datasets,
as it is known in advance when the drifts begin and end. For abrupt or sudden drifts, Stagger, Gauss, Mixed2
can be used. The Waveforrm, LED generator or Circles dataset best suited for gradual drifts. Hyperplane dataset
works well for both gradual and incremental drift. Radial basis function(RBF) can also be used for incremental
drift, and blips can also be incorporated.
Applications
This sections describes various real-life problem [11,12] in different domains related to the concept drifts in the
data generated from these real domains.
6. A Review on Concept Drift
DOI: 10.9790/0661-17122026 www.iosrjournals.org 25 | Page
Fig 3: Applications of Real-domain concept drift
Monitoring and control often employs unsupervised learning, which detects abnormal behavior. In
monitoring and control applications the data volumes are large and it needs to be processed in real time.
Personal assistance and information applications mainly organize and/or personalize the flow of
information. the class labels are mostly „soft‟ and the costs of mistake are relatively low.
Decision support includes diagnostics, evaluation of creditworthiness. Decision support and diagnostics
applications usually involve limited amount of data. Decisions are not required to be made in real time but high
accuracy is essential in these applications and the costs of mistakes are large.
Artificial intelligence applications include a wide spectrum of moving and stationary systems, which
interact with changing environment. The objects learn how to interact with the environment and since the
environment is changing, the learners need to be adaptive.
VII. Conclusion
This paper describes about the problem of concept drift. It summarizes the need, types and reasons for
concept change. The various concept drift detection methods viz. DDM, EDDM, Paired learners, ECDD,
ADWIN, STEPD and DOF are discussed and methods it adopts to detect concept change. To identify if concept
drift has occurred, statistical tests like CUSUM, Page-Hinkley and GMA test are explained. Various classifier
approaches, especially, ensemble classifiers provide better accuracy in case of concept change. The ensemble
classifiers SEA, AWE, ACE, ADE, HOT, ASHT, AUE adapt according to the drift that occurs, yielding good
classifier accuracy. Later, applications and the datasets, real and synthetic, suited for various concept drifts can
be used to check the adaptability of any algorithm handling concept drift.
In future, we can enhance the classification performance of the ensemble algorithms discussed above,
by adapting it to various drifts and diversity.
References
[1]. P. M. Goncalves, Silas G.T. de Carvalho Santos, Roberto S.M. Barros, Davi C.L. Vieira, (2014) "Review: A comparative study on
concept drift detectors", A International Journal: Expert Systems with Applications,8144–8156.
[2]. L. L. Minku and Xin Yao(2011), "DDD: A New Ensemble Approach For Dealing With Concept Drift", IEEE TKDE, Vol. 24, pp.
619 - 633.
[3]. M. G. Kelly, D. J. Hand, and N. M. Adams(1999), "The Impact of Changing Populations on Classifier Performance", In Proc. of the
5th ACM SIGKDD Int. Conf. on Knowl. Disc. and Dat. Mining (KDD). ACM, 367–371.
[4]. J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, A. Bouchachia(2014), "A Survey on Concept Drift Adaptation ", ACM Computing
Surveys, Vol. 46, No. 4, Article 44.
[5]. Mouss, H., Mouss, D., Mouss, N., Sefouhi, L.(2004), "Test of Page-Hinkley, an Approach for Fault Detection in an Agro-
Alimentary Production System", 5th Asian Control Conference, IEEE Computer Society, vol. 2, pp. 815--818.
[6]. S. Delany, P. Cunningham, A. Tsymbal, and L. Coyle. (2005),"A Case-based Technique for Tracking Concept Drift in Spam
filtering", Knowledge-Based Sys. 18, 4–5 , 187–195.
[7]. D. Brzezinski and J. Stefanowski(2011), “Accuracy updated ensemble for data streams with concept drift,” Proc. 6th HAIS Int.
Conf. Hybrid Artificial Inteligent. Syst., II, pp. 155–163.
[8]. W. N. Street and Y. Kim(2001), “A streaming ensemble algorithm (SEA) for large-scale classification,” in Proc. 7th ACM
SIGKDD Int. Conf. Knowl. Discovery Data Mining, pp. 377–382.
[9]. Ludmila I. Kuncheva(2004), "Classifier ensembles for changing environments", Multiple Classifier Systems, Lecture Notes in
Computer Science, Springer , vol. 3077, pages 1–15.
[10]. Sobhani P. and Beigy H.(2011), "New drift detection method for data streams", Adaptive and intelligent systems, Lecture notes in
computer science, Vol. 6943, pp. 88–97.
[11]. D Brzezinski, J Stefanowsk(2011), "Mining data streams with concept drift " Poznan University of Technology Faculty of
Computing Science and Management Institute of Computing Science.
7. A Review on Concept Drift
DOI: 10.9790/0661-17122026 www.iosrjournals.org 26 | Page
[12]. I Žliobaite (2010), "Adaptive Training Set Formation", Doctoral dissertation Physical sciences, informatics (09P) Vilnius
University.
[13]. A Bifet(2009), "Adaptive Learning and Mining for Data Streams and Frequent Patterns", Doctoral Thesis.
[14]. J Gama, P Medas, G Castillo and Pedro Rodrigues(2004), "Learning with Drift Detection", Lecture Notes in Computer Science,
Vol. 3171, pp 286-295.
[15]. G. J. Ross, N. M. Adams, D. Tasoulis, D. Hand(2012), "Exponentially weighted moving average charts for detecting concept drift",
International Journal Pattern Recognition Letters, 191-198.
[16]. M Baena-Garcia, J Campo-Avila, R Fidalgo, A Bifet, R Gavaldµa and R Morales-Bueno(2006), "Early Drift Detection Method",
IWKDDS, pp. 77–86.
[17]. S. H. Bach and M. A. Maloof (2008), "Paired Learners for Concept Drift", Eighth IEEE International Conference on Data Mining,
pp. 23-32.
[18]. K. Nishida(2008), "Learning and Detecting Concept Drift", A Dissertation: Doctor of Philosophy in Information Science and
Technology, Graduate School of Information Science and Technology, Hokkaido University.
[19]. D Brzezinski, J Stefanowski(2012), "From Block-based Ensembles to Online Learners In Changing Data Streams: If- and How-To",
ECML PKDD Workshop on Instant Interactive Data Mining, pp. 60–965.
[20]. J. Kolter and M. A. Maloof (2007), "Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts", Journal of
Machine Learning Research 8, 2755-2790.
[21]. P. B. Dongre, L. G. Malik(2014), " A Review on Real Time Data Stream Classification and Adapting To Various Concept Drift
Scenarios", IEEE International Advance Computing Conference (IACC), pp. 533-537.
[22]. D Brzezinski, J Stefanowski (2014),"Reacting to Different Types of Concept Drift:The Accuracy Updated Ensemble Algorithm"
IEEE Transactions On Neural Networks And Learning Systems, Vol. 25, pp. 81-94.
[23]. D Brzezinski, J Stefanowski(2014), "Combining block-based and online methods in learning ensembles from concept drifting data
streams", An International Journal: Information Sciences 265, 50–67.
[24]. E. S. Page.(1954) Continuous inspection schemes. Biometrika, 41(1/2):100–115.
[25]. S. W. Roberts(2000), "Control chart tests based on geometric moving averages", Technometrics, 42(1):97–101.
[26]. R. Elwell and R. Polikar(2011), “Incremental learning of concept drift in nonstationary environments,” IEEE Trans. Neural Netw.,
vol. 22, no. 10, pp. 1517–1531.
[27]. A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavaldà(2009),“New ensemble methods for evolving data streams,” in Proc.
15th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, pp. 139-148.