SlideShare a Scribd company logo
1 of 5
Download to read offline
International Journal of Modeling and Optimization, Vol. 2, No. 4, August 2012




              A New Adaptive Ensemble Boosting Classifier for
                      Concept Drifting Stream Data
                                Kapil K. Wankhade and Snehlata S. Dongre, Members, IACSIT


                                                                                modeling, radio frequency identification, web mining,
   Abstract—With the emergence of large-volume and high                          scientific data, financial data, etc.
speed streaming data, mining of stream data has become a focus                      Most recent learning algorithms [1]-[12] maintain a
on increasing interests in real applications including credit card               decision model that continuously evolve over time, taking
fraud detection, target marketing, network intrusion detection,
                                                                                 into account that the environment is non-stationary and
etc. The major new challenges in stream data mining are, (a)
since streaming data may flow in and out indefinitely and in fast
                                                                                 computational resources are limited.
speed, it is usually expected that a stream data mining process                     The desirable properties of learning systems for efficient
can only scan a data once and (b) since the characteristics of the               mining continuous, high-volume, open-ended data streams:
data may evolve over time, it is desirable to incorporate                              Require small constant time per data example; use fix
evolving features of streaming data. This paper introduced new                             amount of main memory, irrespective to the total
adaptive ensemble boosting approach for the classification of                              number of examples.
streaming data with concept drift. This adaptive ensemble
                                                                                       Built a decision model using a single scan over the
boosting method uses adaptive sliding window and Hoeffding
Tree with naï bayes adaptive as base learner. The result
                ve
                                                                                           training data.
shows that the proposed algorithm works well in changing                               Generate an anytime model independent from the
environment as compared with other ensemble classifiers.                                   order of the examples.
                                                                                       Ability to deal with concept drift. For stationary data,
   Index Terms—Concept drift, ensemble approach, hoeffding                                 ability to produce decision models that are nearly
tree, sliding window, stream data                                                          identical to the ones we would obtain using a batch
                                                                                           learner.
                                                                                    Ensemble method is advantageous over single classifier
                         I. INTRODUCTION                                         methods. Ensemble methods are combinations of several
   The last twenty years or so have witnessed large progress                     models whose individual predictions are combined in some
in machine learning and in its capability to handle real-world                   manner to form a final prediction. Ensemble learning
applications. Nevertheless, machine learning so far has                          classifiers often have better accuracy and they are easier to
mostly centered on one-shot data analysis from                                   scale and parallelize than single classifier methods.
homogeneous and stationary data, and on centralized                                 The paper organized as, types of concept drift are
algorithms. Most of machine learning and data mining                             discussed in Section II. Section III explains proposed method
approaches assume that examples are independent,                                 with boosting, adaptive sliding window, hoeffding tree.
identically distributed and generated from a stationary                          Experiments and results are included in Section IV with
distribution. A large number of learning algorithms assume                       concluding conclusion in Section V.
that computational resources are unlimited, e.g., data fits in
main memory. In that context, standard data mining
techniques use finite training sets and generate static models.                                 II. TYPES OF CONCEPT DRIFT
Nowadays we are faced with tremendous amount of                                     Change may come as a result of the changing environment
distributed data that could be generated from the ever                           of the problem e.g., floating probability distributions,
increasing number of smart devices. In most cases, this data                     migrating clusters of data, loss of old and appearance of new
is transient, and may not even be stored permanently.                            classes and/or features, class label swaps, etc.
   Our ability to collect data is changing dramatically.                            Fig. 1 shows four examples of simple changes that may
Nowadays, computers and small devices send data to other                         occur in a single variable along time. The first plot (Noise)
computers. We are faced with the presence of distributed                         shows changes that are deemed non-significant and are
sources of detailed data. Data continuously flow, eventually                     perceived as noise. The classifier should not respond to
at high-speed, generated from non-stationary processes.                          minor fluctuations, and can use the noisy data to improve its
Examples of data mining applications that are faced with this                    robustness for the underlying stationary distribution. The
scenario include sensor networks, social networks, user                          second plot (Blip) represents a „rare event‟. Rare events can
                                                                                 be regarded as outliers in a static distribution. Examples of
                                                                                 such events include anomalies in landfill gas emission,
   Manuscript received May 25, 2012; revised June 25, 2012.
   K. K. Wankhade is with the Department of Information Technolgy, G. H.         fraudulent card transactions, network intrusion and rare
Raisoni College of Engineering, Nagpur, Maharashtra, INDIA 440019                medical conditions. Finding a rare event in streaming data
(kaps.wankhade@gmail.com).                                                       can signify the onset of a concept drift. Hence the methods
   S. S. Dongre is with the Department of Computer Science and
Engineering, G. H. Raisoni College of Engineering, Nagpur, Maharashtra,
                                                                                 for on-line detection of rare events can be a component of the
INDIA 440019 (dongre.sneha@gmail.com).                                           novelty detection paradigm. The last two plots in Fig. 1

                                                                           493
International Journal of Modeling and Optimization, Vol. 2, No. 4, August 2012

(Abrupt) and (Gradual) show typical examples of the two                  values are different, and the older portion of the window is
major types of concept drift represented in a single                     dropped. In other words, W is kept as long as possible while
dimension.                                                               the null hypothesis “µ has remained constant in W” is
                                                                                                   t
                                                                         sustainable up to confidence δ. “Large enough” and “distinct
                                                                         enough” above are made precise by choosing an appropriate
                                                                         statistical test for distribution change, which in general
                                                                         involves the value of δ, the lengths of the sub-windows and
                                                                         their contents. At every step, outputs the value of W as an
                                                                                                                             ˆ
                                                                         approximation to µ .
                                                                                           W

                                                                           C. Hoeffding Tree
          Fig. 1. Types of concept change in streaming data.
                                                                            The Hoeffding Tree algorithm is a decision tree learning
                                                                         method for stream data classification. It typically runs in
                   III. PROPOSED METHOD                                  sublinear time and produces a nearly identical decision tree to
                                                                         that of traditional batch learners. It uses Hoeffding Trees,
   This adaptive ensemble method uses boosting, adaptive
                                                                         which exploit the idea that a small sample can be enough to
sliding window and Hoeffding tree for improvement of
                                                                         choose an optimal splitting attribute. This idea is supported
performance.
                                                                         mathematically by the Hoeffding bound.
  A. Boosting                                                               Suppose we make N independent observations of a
   Boosting is a machine learning meta-algorithm for                     randomvariable r with range R, where r is an attribute
performing supervised learning. While boosting is not                    selection measure. In the case of Hoeffding trees, r is
algorithmically constrained, most boosting algorithms                    information gain. If we compute the mean, r of this sample,
consist of iteratively learning weak classifiers with respect to         the Hoeffding bound states that the true mean of r is at least
a distribution and adding them to a final strong classifier.              r   , with probability 1-δ, where δ is user-specified and
When they are added, they are typically weighted in some
way that is usually related to the weak learners' accuracy.                                           R 2 ln(1/  
After a weak learner is added, the data is reweighted;                                                                               (1)
                                                                                                           2N
examples that are misclassified gain weight and examples
that are classified correctly lose weight. Boosting focuses on
the misclassified tuples, it risks overfitting the resulting               D. Adaptive Ensemble Boosting Classifier
composite model to such data. Therefore, sometimes the                      The designed algorithm uses boosting as ensemble
resulting “boosted” model may be less accurate than a single             method, sliding window and Hoeffding tree for to improve
model derived from the same data. Bagging is less                        ensemble performance. Figure 1 shows algorithm for the data
susceptible to model overfitting. While both can significantly           stream classification. In this algorithm there are m base
improve accuracy in comparison to a single model, boosting               models, D, a data used for training and initially assigns
tends to achieve greater accuracy. There is reason for the               weight wk to each base model equal to 1. The learning
improvement in performance that it generates a hypothesis                algorithm is provided with a series of training datasets {xit Є
whose error on training set is small by combining many                   X: yit Є Y}, i= 1,..., mt where t is an index on the changing
hypotheses whose error may be large. The effect of boosting              data, hence xit is the ith instance obtained from the ith dataset.
has to do with variance reduction. However unlike bagging,               The algorithm has two inputs, a supervised classification
boosting may also reduce the bias of the learning algorithm.             algorithm, Base classifier to train individual classifiers and
                                                                         the training data Dt drawn from the current distribution Pt(x,
  B. Adaptive Sliding Window
                                                                         y) at time t. When new data comes at time t, the data is
   In data streams environment data comes infinitely and                 divided into number of sliding windows W1,…,Wn. The
huge in amount. So it is impossible to stores and processes              sliding window is used for change detection, time and
such data fast. To overcome these problems window                        memory management. In this algorithm sliding window is
technique comes forward. Window strategies have been used                parameter and assumption free in the sense that it
in conjunction with mining algorithms in two ways: one,                  automatically detects and adapts to the current rate of change.
externally to the learning algorithm; the window system is               Window is not maintained explicitly but compressed using a
used to monitor the error rate of the current model, which               variant of the exponential histogram technique [14]. The
under stable distributions should keep decreasing or at most
                                                                         expected values W = total / width of respective windows
                                                                                             ˆ
stabilize; when instead this rate grows significantly, change
is declared and the base learning algorithm is invoked to                are calculated.
revise or rebuild the model with fresh data. A window is                    If change detect, it raises change alarm. With using
maintained that keeps the most recent examples and                       expected values algorithm decides which sub-window has
according to some set of rules from window older examples                dropped or not. Then algorithm generated one new classifier
are dropped.                                                             ht, which is then combined with all previous classifiers to
   The idea behind sliding window [13] is that, whenever two             create the composite hypothesis Ht. The decision of Ht serves
“large enough” sub-windows of W exhibit “distinct enough”                as the ensemble decision. The error of the composite
averages, one can conclude that the corresponding expected               hypothesis, Et, is computed on new dataset, which is then


                                                                   494
International Journal of Modeling and Optimization, Vol. 2, No. 4, August 2012

used to update the distribution weights. Once the distribution             [15], OzaBoost [15] and OCBoost [16], OzaBagADWIN
is updated, the algorithm calls the Base classifier & asks it to           [17] and OzaBagASHT [17]. The experiments were
create the tth classifier ht using drawn from the current                  performed on a 2.59 GHz Intel Dual Core processor with 3
training dataset Dt i.e. ht : X → Y. All classifiers are generated         GB RAM, running on UBUNTU 8.04. For testing and
for hk, k=1,…,t are evaluated on the current dataset by                    comparison MOA [18] tool and Interleaved Test-Then-Train
computing εkt, the error of the kth classifier hk at the tth time          method has used.
step.

                     mt
            i   Dt (i).[hk ( xi )  yi ]
              t

                     i 1                                      (2)

for k = 1,2,…,t
   At current time step t, we have t error measures one for
each classifier generated. The error is calculated using
equation (2) and then it is used for to update weight. Then the
weight is dynamically updated using equation (3),

                    w'k  (1   t k ) /  t k                 (3)

   Using this updated weight, classifier is constructed. The
performance evaluation of classifier is important in concept
drifting environment. If the performance of the classifier has
going down then this algorithm drops the classifier and add
next classifier in ensemble for maintaining ensemble size.
   This classifier assigns dynamic sample weight. It keeps the
window of length W using only O (log W) memory & O (log
W) processing time per item, rather than the O (W) one
expects from a naï implementation. It is used as change
                     ve
detector since it shrinks window if and only if there has been
significant change in recent examples, and estimator for the
current average of the sequence it is reading since, with high
probability, older parts of the window with a significantly
                                                                                               Fig. 2. Proposed agorithm.
different average are automatically dropped. The proposed
classifier uses Hoeffding tree as a base learner. Because of
this algorithm woks faster and increases performance.                        A. Experiment using Synthetic Data
Basically algorithm is light weight means that it uses less                   The synthetic data sets are generated with drifting concept
memory. Windowing technique does not store whole                           concepts based on random radial basis function (RBF) and
window explicitly but instead of this it only stores statistics            SEA.
required for further computation. In this proposed algorithm                  Random Radial Basis Function Generator- The RBF
the data structure maintains an approximation of the number                generator works as follows - A fixed number of random
of 1‟s in a sliding window of length W with logarithmic                    centroids are generated. Each center has a random position, a
memory and update time. The data structure is adaptive in a                single standard deviation, class label and weight. New
way that can provide this approximation simultaneously for                 examples are generated by selecting a center displacement is
about O(logW) sub windows whose lengths follow a                           randomly drawn from a Gaussian distribution with standard
geometric law, with no memory overhead with respect to                     deviation determined by the chosen centroid also determines
keeping the count for a single window. Keeping exact counts                the class label of the example. This effectively creates a
for a fixed-window size is impossible in sub linear memory.                normally distributed hypersphere of examples surrounding
This problem can be tackled by shrinking or enlarging the                  each central point with varying densities. Only numeric
window strategically so that what would otherwise be an                    attributes are generated. Drift is introduced by moving the
approximate count happens to be exact. More precisely, to                  centroids with constant speed. This speed is initialized by a
design the algorithm a parameter M is chosen which controls                drift parameter.
the amount of memory used to O(MlogW/M) words.                                SEA Generator- This artificial dataset contains abrupt
                                                                           concept drift, first introduced in [13]. It is generated using
                                                                           three attributes, where only the two first attributes are
                  IV. EXPERIMENTS AND RESULTS                              relevant. All three attributes have values between 0 and 10.
  The proposed method has tested on both real and synthetic                The points of the dataset are divided into 4 blocks with
datasets. The proposed algorithm has compared with OzaBag                  different concepts. In each block, the classification is done

                                                                     495
International Journal of Modeling and Optimization, Vol. 2, No. 4, August 2012

using f1+f2 ≤θ, where f1 and f2 represent the first two
attributes and θ is a threshold value. The most frequent values
are 8, 9, 7 and 9.5 for the data blocks. 10% of class noise was
then introduced into each block of data by randomly
changing the class value of 10% of instances.
   LED Generator- The dataset generated is to predict the
digit displayed on a seven segment LED display, where each
attribute has a 10% chance of being inverted. It has an
optimal Bayes classification rate of 74%. The particular
configuration of the generator used for experiments (led)
produces 24 binary attributes, 17 of which are irrelevant.
                                                                                      Fig. 3. Learning curves using RBF dataset.
   For all the above datasets the ensemble size has set to 10
and the dataset size in instances has been selected as 1
million. All the experiments are carried out using same
parameter settings and prequential method.
   Prequential method- In Prequential method the error of a
model is computed from the sequence of examples. For each
example in the stream, the actual model makes a prediction
based only on the example attribute-values. The
prequential-error is computed based on an accumulated sum
of a loss function between the prediction and observed
values.
   Comparison in terms of time (in sec) and memory (in MB)
                                                                                      Fig. 4. Learning curves using SEA dataset.
has tabulated in Table I and II respectively. The learning
curves using RBF, SEA and LED datasets has depicted in
Fig. 3, Fig. 4 and Fig. 5.

  TABLE I: COMPARISON IN TERMS OF TIME (IN SEC) USING SYNTHETIC
                            DATASET




                                                                                       Fig. 5. Learning curves using LED dataset.
TABLE II: COMPARISON IN TERMS OF MEMORY(IN Mb) USING SYNTHETIC
                           DATASET                                      TABLE III: COMPARISON IN TERMS OF TIME(IN SEC) USING REAL DATASET




  B. Experiment using Real dataset                                       TABLE IV: COMPARISON IN TERMS OF ACCURACY(IN %) USING REAL
   In this subsection, the proposed method has tested on real                                         DATASET
data sets from UCI machine learning repository-
http://archive.ics.edu/ml/datasets.html.   The      proposed
classifier is compared with Ozabag, OzaBagASHT,
OzaBagADWIN, OzaBoost and OCBoost. The result is
calculated in terms of time, accuracy and memory. The
comparison in terms of time, accuracy and memory has
tabulated in Table III, Table IV and Table V.




                                                                  496
International Journal of Modeling and Optimization, Vol. 2, No. 4, August 2012

      TABLE V: COMPARISON IN TERMS OF MEMORY(IN Mb) USING REAL                        [10] J. Gama, Pedro Medas, G. Castillo, and P. Rodrigues, “Learning with
                                  DATASET                                                  drift detection,” in Advances in Artificial Intelligence - SBIA 2004, 17th
                                                                                           Brazilian Symposium on Artificial Intelligence, volume 3171 of Lecture
                                                                                           Notes in Computer Science, Springer Verlag, 2004, pp. 286–295.
                                                                                      [11] J. Z. Kolter and M. A. Maloof, “Using additive expert ensembles to
                                                                                           cope with concept drift,” in Proc. International conference on Machine
                                                                                           learning (ICML), Bonn, Germany, 2005, pp. 449-456.
                                                                                      [12] G. Cormode, S. Muthukrishnan, and W. Zhuang, “Conquering the
                                                                                           divide: Continuous clustering of distribututed data streams,” in ICDE,
                                                                                           2007, pp. 1036-1045.
                                                                                      [13] Albert Bieft and Ricard Gavald`a, “Learning from time changing data
                                                                                           with adaptive windowing,” in SIAM International Conference on Data
                                                                                           Mining, 2007, pp. 443-449.
                                                                                      [14] M. Datar, A. Gionis, P. Indyk, and R. Motwani, “Maintaining stream
                            V. CONCLUSION                                                  statistics over sliding windows,” SIAM Journal on Computing, vol. 14,
                                                                                           no. 1, pp 27-45, 2002.
   In this paper we have investigated the major issues in                             [15] N. Oza and S. Russell, “Online bagging and boosting,” in Artificial
classifying large volume, high speed and dynamically                                       Intelligence and Statistics, Morgan Kaufmann, 2001, pp. 105-112.
changing streaming data and proposed a novel approach,                                [16] R. Pelossof, M. Jones, I. Vovsha, and C. Rudin., “Online coordinate
                                                                                           boosting, http://arxiv.org/abs/0810.4553,” 2008.
adaptive ensemble boosting classifier, which uses adaptive                            [17] A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavalda, “New
sliding window and hoeffding tree for improving                                            ensemble methods for evloving data streams,” in KDD’09, ACM, Paris,
performance.                                                                               2009, pp. 139-148.
                                                                                      [18] Goeffrey Holmes, Richard Kirkby and Bernhard Pfahringer. “Moa:
   Comparing with other classifier, The adaptive ensemble                                  Massive                        [Online].                        Analysis,
boosting classifier method achieves distinct features as; it is                            http://sourceforge.net/projects/moa-datasream,” 2007.
dynamically adaptive, uses less memory and processes data
fastly.
   Thus, adaptive ensemble boosting classifier represents a
new methodology for effective classification of dynamic,
fast-growing and large volume data streams.
                                                                                                              K. K. Wankhade He received B. E. degree in
                                                                                                              Information Technology from Swami Ramanand
                               REFERENCES                                                                     Teerth Marathwada University, Nanded, India in
[1]    G. Widmer and M. Kubat, “Learning in the presence of concept drift                                     2007and M. Tech. degree in Computer Engineering
       and hidden contexts,” Journal on Machine Learning, vol. 23, no. 1, pp.                                 from University of Pune, Pune, India in 2010.
       69-101, 1996.                                                                                              He is currently working as Assistant Professor
[2]    Y. Freund and R. Schapire, “Experiments with a new boosting                                            the Department of Information Technology at G. H.
       algorithm,” in the Proceeding of International Conference of Machine                                   Raisoni College of Engineering, Nagpur, India.
       Learning, 1996, pp 148-156.                                                                            Number of publications is in reputed Journal and
[3]    P. Domingos and G. Hulten, “Mining high-speed data streams,” in                International conferences like Springer and IEEE. His research is on Data
       KDD’01 : Proceedings of the sixth ACM SIDKDD international                     Stream Mining, Machine Learning, Decision Support System, Artificial
       conference on Knowledge discovery and dta mining, New York, NY,                Neural Network and Embedded System. His book has published on titled
       USA, ACM Press, 2000, pp. 71-80.                                               Data Streams Mining: Classification and Application, LAP Publication
[4]    G. Hulten, L. Spencer, and P. Domingos, “Mining time-changing data             House, Germany, 2010.
       streams,” in Proc. ACM SIGKDD, San Francisco, CA, USA, 2001, pp.                  Mr. Kapil K. Wankhade is a member of IACSIT and IEEE organizations.
       97-106.                                                                        He is currently working as a reviewer for Springer‟s Evolving System
[5]    W. N. Street and Y. Kim, “A streaming ensemble algorithm (sea) for             Journal and IJCEE journal.
       large-scale classification,” in KDD’01 : Proceedings of the seventh
       ACM SIDKDD international conference on Knowledge discovery and
       dta mining, New York, NY, USA, ACM Press, 2001, pp. 377-382.                                         S. S. Dongre She received B. E. degree in Computer
[6]    H. Wang, W. Fan, P. S. Yu, and J. Han, “Mining concept drifting data                                Science and Engineering from Pt. Ravishankar Shukla
       streams using ensemble classifiers,” in KDD’03: Proceedings of the                                  University, Raipur, India in 2007 and M. Tech. degree
       ninth ACM SIGKDD international conference on Knowledge discovery                                    in Computer Engineering from University of Pune,
       and data mining. New York, NY, USA, ACM Press, 2003, pp 226-235.                                    Pune, India in 2010.
[7]    J. Z. Kolter and M. A. Maloof, “Dynamic weighted majority: a new                                        She is currently working as Assistant Professor the
       ensemble method for tracking concept drift,” in 3rd International                                   Department of Computer Science and Engineering at
       Conference on Data Mining, ICDM’03, IEEE CS Press, 2003, pp.                                        G. H. Raisoni College of Engineering, Nagpur, India.
       123-130.                                                                                            Number of publications is in reputed International
[8]    B. Babcock, M. Datar, R. Motwani, and L. O‟Callaghan, “Maintaining                                  conferences like IEEE and Journals. Her research is on
       variance and k-medians over data stream windows,” in Proceeding of             Data Stream Mining, Machine Learning, Decision Support System, ANN
       the 22nd Symposium on Principles of Database Systems, 2003, pp.                and Embedded System. Her book has published on titled Data Streams
       234-243.                                                                       Mining: Classification and Application, LAP Publication House, Germany,
[9]    J. Gama, R. Rocha, and P. Medas, “Accurate decision trees for mining           2010.
       high-speed data streams,” in Proceedings of the 9th ACM SIGKDD                    Ms. Snehlata S. Dongre is a member of IACSIT, IEEE and ISTE
       International Conference on Knowledge Discovery and Data Mining,               organizations.
       2003, pp. 523-528.




                                                                                497

More Related Content

What's hot

Assessing data dissemination strategies
Assessing data dissemination strategiesAssessing data dissemination strategies
Assessing data dissemination strategiesOpen University, KMi
 
An Extensive Review on Generative Adversarial Networks GAN’s
An Extensive Review on Generative Adversarial Networks GAN’sAn Extensive Review on Generative Adversarial Networks GAN’s
An Extensive Review on Generative Adversarial Networks GAN’sijtsrd
 
Multispectral image analysis using random
Multispectral image analysis using randomMultispectral image analysis using random
Multispectral image analysis using randomijsc
 
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONEXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONijaia
 
Cryptographic Transmission for Secured Medical Images using DWT,DCT and SHA
Cryptographic Transmission for Secured Medical Images using DWT,DCT and SHACryptographic Transmission for Secured Medical Images using DWT,DCT and SHA
Cryptographic Transmission for Secured Medical Images using DWT,DCT and SHARamya Rani
 
Test PDF
Test PDFTest PDF
Test PDFAlgnuD
 
CCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsCCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsAlbert Orriols-Puig
 
Text classification based on gated recurrent unit combines with support vecto...
Text classification based on gated recurrent unit combines with support vecto...Text classification based on gated recurrent unit combines with support vecto...
Text classification based on gated recurrent unit combines with support vecto...IJECEIAES
 
Multilabel Image Annotation using Multimodal Analysis
Multilabel Image Annotation using Multimodal AnalysisMultilabel Image Annotation using Multimodal Analysis
Multilabel Image Annotation using Multimodal Analysisijtsrd
 
A scenario based approach for dealing with
A scenario based approach for dealing withA scenario based approach for dealing with
A scenario based approach for dealing withijcsa
 
Eat it, Review it: A New Approach for Review Prediction
Eat it, Review it: A New Approach for Review PredictionEat it, Review it: A New Approach for Review Prediction
Eat it, Review it: A New Approach for Review Predictionvivatechijri
 
An Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringAn Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringIDES Editor
 
Quality - Security Uncompromised and Plausible Watermarking for Patent Infrin...
Quality - Security Uncompromised and Plausible Watermarking for Patent Infrin...Quality - Security Uncompromised and Plausible Watermarking for Patent Infrin...
Quality - Security Uncompromised and Plausible Watermarking for Patent Infrin...CSCJournals
 
IRJET - Random Data Perturbation Techniques in Privacy Preserving Data Mi...
IRJET -  	  Random Data Perturbation Techniques in Privacy Preserving Data Mi...IRJET -  	  Random Data Perturbation Techniques in Privacy Preserving Data Mi...
IRJET - Random Data Perturbation Techniques in Privacy Preserving Data Mi...IRJET Journal
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesIRJET Journal
 

What's hot (20)

Assessing data dissemination strategies
Assessing data dissemination strategiesAssessing data dissemination strategies
Assessing data dissemination strategies
 
An Extensive Review on Generative Adversarial Networks GAN’s
An Extensive Review on Generative Adversarial Networks GAN’sAn Extensive Review on Generative Adversarial Networks GAN’s
An Extensive Review on Generative Adversarial Networks GAN’s
 
Multispectral image analysis using random
Multispectral image analysis using randomMultispectral image analysis using random
Multispectral image analysis using random
 
winbis1005
winbis1005winbis1005
winbis1005
 
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATIONEXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
EXTENDING OUTPUT ATTENTIONS IN RECURRENT NEURAL NETWORKS FOR DIALOG GENERATION
 
G44093135
G44093135G44093135
G44093135
 
Cryptographic Transmission for Secured Medical Images using DWT,DCT and SHA
Cryptographic Transmission for Secured Medical Images using DWT,DCT and SHACryptographic Transmission for Secured Medical Images using DWT,DCT and SHA
Cryptographic Transmission for Secured Medical Images using DWT,DCT and SHA
 
Test PDF
Test PDFTest PDF
Test PDF
 
Dsto tr-1436
Dsto tr-1436Dsto tr-1436
Dsto tr-1436
 
CCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsCCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data sets
 
Text classification based on gated recurrent unit combines with support vecto...
Text classification based on gated recurrent unit combines with support vecto...Text classification based on gated recurrent unit combines with support vecto...
Text classification based on gated recurrent unit combines with support vecto...
 
Multilabel Image Annotation using Multimodal Analysis
Multilabel Image Annotation using Multimodal AnalysisMultilabel Image Annotation using Multimodal Analysis
Multilabel Image Annotation using Multimodal Analysis
 
A scenario based approach for dealing with
A scenario based approach for dealing withA scenario based approach for dealing with
A scenario based approach for dealing with
 
Eat it, Review it: A New Approach for Review Prediction
Eat it, Review it: A New Approach for Review PredictionEat it, Review it: A New Approach for Review Prediction
Eat it, Review it: A New Approach for Review Prediction
 
An Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringAn Iterative Improved k-means Clustering
An Iterative Improved k-means Clustering
 
Quality - Security Uncompromised and Plausible Watermarking for Patent Infrin...
Quality - Security Uncompromised and Plausible Watermarking for Patent Infrin...Quality - Security Uncompromised and Plausible Watermarking for Patent Infrin...
Quality - Security Uncompromised and Plausible Watermarking for Patent Infrin...
 
IRJET - Random Data Perturbation Techniques in Privacy Preserving Data Mi...
IRJET -  	  Random Data Perturbation Techniques in Privacy Preserving Data Mi...IRJET -  	  Random Data Perturbation Techniques in Privacy Preserving Data Mi...
IRJET - Random Data Perturbation Techniques in Privacy Preserving Data Mi...
 
41 125-1-pb
41 125-1-pb41 125-1-pb
41 125-1-pb
 
395 404
395 404395 404
395 404
 
Feature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering TechniquesFeature Subset Selection for High Dimensional Data using Clustering Techniques
Feature Subset Selection for High Dimensional Data using Clustering Techniques
 

Viewers also liked (8)

O proxecto arume nos medios
O proxecto arume nos mediosO proxecto arume nos medios
O proxecto arume nos medios
 
Bloggrupal
BloggrupalBloggrupal
Bloggrupal
 
most
most most
most
 
Expolingua fun with moodle oct 2012
Expolingua fun with moodle oct 2012Expolingua fun with moodle oct 2012
Expolingua fun with moodle oct 2012
 
O proxecto Arume imaxes
O proxecto Arume imaxesO proxecto Arume imaxes
O proxecto Arume imaxes
 
L'immagine dell'informazione - Il manage creativo che organizza le notizie
L'immagine dell'informazione - Il manage creativo che organizza le notizieL'immagine dell'informazione - Il manage creativo che organizza le notizie
L'immagine dell'informazione - Il manage creativo che organizza le notizie
 
Guia dmii
Guia dmiiGuia dmii
Guia dmii
 
Diabetes mellitus
Diabetes mellitusDiabetes mellitus
Diabetes mellitus
 

Similar to 169 s170

Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...IOSRjournaljce
 
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkPutra Wanda
 
Detecting outliers and anomalies in data streams
Detecting outliers and anomalies in data streamsDetecting outliers and anomalies in data streams
Detecting outliers and anomalies in data streamsfatimabenjelloun1
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET Journal
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET Journal
 
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...IJECEIAES
 
Big data: Challenges, Practices and Technologies
Big data: Challenges, Practices and TechnologiesBig data: Challenges, Practices and Technologies
Big data: Challenges, Practices and TechnologiesNavneet Randhawa
 
Mining frequent itemsets (mfi) over
Mining frequent itemsets (mfi) overMining frequent itemsets (mfi) over
Mining frequent itemsets (mfi) overIJDKP
 
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...Marco Brambilla
 
The MADlib Analytics Library
The MADlib Analytics Library The MADlib Analytics Library
The MADlib Analytics Library EMC
 
Beginner's Guide to Diffusion Models..pptx
Beginner's Guide to Diffusion Models..pptxBeginner's Guide to Diffusion Models..pptx
Beginner's Guide to Diffusion Models..pptxIshaq Khan
 
SWARM OPTIMIZED MODULAR NEURAL NETWORK BASED DIAGNOSTIC SYSTEM FOR BREAST CAN...
SWARM OPTIMIZED MODULAR NEURAL NETWORK BASED DIAGNOSTIC SYSTEM FOR BREAST CAN...SWARM OPTIMIZED MODULAR NEURAL NETWORK BASED DIAGNOSTIC SYSTEM FOR BREAST CAN...
SWARM OPTIMIZED MODULAR NEURAL NETWORK BASED DIAGNOSTIC SYSTEM FOR BREAST CAN...ijscai
 
Comprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction TechniquesComprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction Techniquesijsrd.com
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...BaoTramDuong2
 
Massive Data Analytics and the Cloud
Massive Data Analytics and the CloudMassive Data Analytics and the Cloud
Massive Data Analytics and the CloudBooz Allen Hamilton
 
Application-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud EnvironmentApplication-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud EnvironmentSafayet Hossain
 
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...AIRCC Publishing Corporation
 
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...ijcsit
 

Similar to 169 s170 (20)

386 390
386 390386 390
386 390
 
386 390
386 390386 390
386 390
 
Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...Different Classification Technique for Data mining in Insurance Industry usin...
Different Classification Technique for Data mining in Insurance Industry usin...
 
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkRunPool: A Dynamic Pooling Layer for Convolution Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
 
Detecting outliers and anomalies in data streams
Detecting outliers and anomalies in data streamsDetecting outliers and anomalies in data streams
Detecting outliers and anomalies in data streams
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...
 
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
A Novel Integrated Framework to Ensure Better Data Quality in Big Data Analyt...
 
Big data: Challenges, Practices and Technologies
Big data: Challenges, Practices and TechnologiesBig data: Challenges, Practices and Technologies
Big data: Challenges, Practices and Technologies
 
Mining frequent itemsets (mfi) over
Mining frequent itemsets (mfi) overMining frequent itemsets (mfi) over
Mining frequent itemsets (mfi) over
 
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
 
The MADlib Analytics Library
The MADlib Analytics Library The MADlib Analytics Library
The MADlib Analytics Library
 
Beginner's Guide to Diffusion Models..pptx
Beginner's Guide to Diffusion Models..pptxBeginner's Guide to Diffusion Models..pptx
Beginner's Guide to Diffusion Models..pptx
 
SWARM OPTIMIZED MODULAR NEURAL NETWORK BASED DIAGNOSTIC SYSTEM FOR BREAST CAN...
SWARM OPTIMIZED MODULAR NEURAL NETWORK BASED DIAGNOSTIC SYSTEM FOR BREAST CAN...SWARM OPTIMIZED MODULAR NEURAL NETWORK BASED DIAGNOSTIC SYSTEM FOR BREAST CAN...
SWARM OPTIMIZED MODULAR NEURAL NETWORK BASED DIAGNOSTIC SYSTEM FOR BREAST CAN...
 
Comprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction TechniquesComprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction Techniques
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
 
Massive Data Analytics and the Cloud
Massive Data Analytics and the CloudMassive Data Analytics and the Cloud
Massive Data Analytics and the Cloud
 
Application-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud EnvironmentApplication-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud Environment
 
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
Efficient Attack Detection in IoT Devices using Feature Engineering-Less Mach...
 
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
EFFICIENT ATTACK DETECTION IN IOT DEVICES USING FEATURE ENGINEERING-LESS MACH...
 

Recently uploaded

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 

Recently uploaded (20)

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 

169 s170

  • 1. International Journal of Modeling and Optimization, Vol. 2, No. 4, August 2012 A New Adaptive Ensemble Boosting Classifier for Concept Drifting Stream Data Kapil K. Wankhade and Snehlata S. Dongre, Members, IACSIT  modeling, radio frequency identification, web mining, Abstract—With the emergence of large-volume and high scientific data, financial data, etc. speed streaming data, mining of stream data has become a focus Most recent learning algorithms [1]-[12] maintain a on increasing interests in real applications including credit card decision model that continuously evolve over time, taking fraud detection, target marketing, network intrusion detection, into account that the environment is non-stationary and etc. The major new challenges in stream data mining are, (a) since streaming data may flow in and out indefinitely and in fast computational resources are limited. speed, it is usually expected that a stream data mining process The desirable properties of learning systems for efficient can only scan a data once and (b) since the characteristics of the mining continuous, high-volume, open-ended data streams: data may evolve over time, it is desirable to incorporate  Require small constant time per data example; use fix evolving features of streaming data. This paper introduced new amount of main memory, irrespective to the total adaptive ensemble boosting approach for the classification of number of examples. streaming data with concept drift. This adaptive ensemble  Built a decision model using a single scan over the boosting method uses adaptive sliding window and Hoeffding Tree with naï bayes adaptive as base learner. The result ve training data. shows that the proposed algorithm works well in changing  Generate an anytime model independent from the environment as compared with other ensemble classifiers. order of the examples.  Ability to deal with concept drift. For stationary data, Index Terms—Concept drift, ensemble approach, hoeffding ability to produce decision models that are nearly tree, sliding window, stream data identical to the ones we would obtain using a batch learner. Ensemble method is advantageous over single classifier I. INTRODUCTION methods. Ensemble methods are combinations of several The last twenty years or so have witnessed large progress models whose individual predictions are combined in some in machine learning and in its capability to handle real-world manner to form a final prediction. Ensemble learning applications. Nevertheless, machine learning so far has classifiers often have better accuracy and they are easier to mostly centered on one-shot data analysis from scale and parallelize than single classifier methods. homogeneous and stationary data, and on centralized The paper organized as, types of concept drift are algorithms. Most of machine learning and data mining discussed in Section II. Section III explains proposed method approaches assume that examples are independent, with boosting, adaptive sliding window, hoeffding tree. identically distributed and generated from a stationary Experiments and results are included in Section IV with distribution. A large number of learning algorithms assume concluding conclusion in Section V. that computational resources are unlimited, e.g., data fits in main memory. In that context, standard data mining techniques use finite training sets and generate static models. II. TYPES OF CONCEPT DRIFT Nowadays we are faced with tremendous amount of Change may come as a result of the changing environment distributed data that could be generated from the ever of the problem e.g., floating probability distributions, increasing number of smart devices. In most cases, this data migrating clusters of data, loss of old and appearance of new is transient, and may not even be stored permanently. classes and/or features, class label swaps, etc. Our ability to collect data is changing dramatically. Fig. 1 shows four examples of simple changes that may Nowadays, computers and small devices send data to other occur in a single variable along time. The first plot (Noise) computers. We are faced with the presence of distributed shows changes that are deemed non-significant and are sources of detailed data. Data continuously flow, eventually perceived as noise. The classifier should not respond to at high-speed, generated from non-stationary processes. minor fluctuations, and can use the noisy data to improve its Examples of data mining applications that are faced with this robustness for the underlying stationary distribution. The scenario include sensor networks, social networks, user second plot (Blip) represents a „rare event‟. Rare events can be regarded as outliers in a static distribution. Examples of such events include anomalies in landfill gas emission, Manuscript received May 25, 2012; revised June 25, 2012. K. K. Wankhade is with the Department of Information Technolgy, G. H. fraudulent card transactions, network intrusion and rare Raisoni College of Engineering, Nagpur, Maharashtra, INDIA 440019 medical conditions. Finding a rare event in streaming data (kaps.wankhade@gmail.com). can signify the onset of a concept drift. Hence the methods S. S. Dongre is with the Department of Computer Science and Engineering, G. H. Raisoni College of Engineering, Nagpur, Maharashtra, for on-line detection of rare events can be a component of the INDIA 440019 (dongre.sneha@gmail.com). novelty detection paradigm. The last two plots in Fig. 1 493
  • 2. International Journal of Modeling and Optimization, Vol. 2, No. 4, August 2012 (Abrupt) and (Gradual) show typical examples of the two values are different, and the older portion of the window is major types of concept drift represented in a single dropped. In other words, W is kept as long as possible while dimension. the null hypothesis “µ has remained constant in W” is t sustainable up to confidence δ. “Large enough” and “distinct enough” above are made precise by choosing an appropriate statistical test for distribution change, which in general involves the value of δ, the lengths of the sub-windows and their contents. At every step, outputs the value of W as an ˆ approximation to µ . W C. Hoeffding Tree Fig. 1. Types of concept change in streaming data. The Hoeffding Tree algorithm is a decision tree learning method for stream data classification. It typically runs in III. PROPOSED METHOD sublinear time and produces a nearly identical decision tree to that of traditional batch learners. It uses Hoeffding Trees, This adaptive ensemble method uses boosting, adaptive which exploit the idea that a small sample can be enough to sliding window and Hoeffding tree for improvement of choose an optimal splitting attribute. This idea is supported performance. mathematically by the Hoeffding bound. A. Boosting Suppose we make N independent observations of a Boosting is a machine learning meta-algorithm for randomvariable r with range R, where r is an attribute performing supervised learning. While boosting is not selection measure. In the case of Hoeffding trees, r is algorithmically constrained, most boosting algorithms information gain. If we compute the mean, r of this sample, consist of iteratively learning weak classifiers with respect to the Hoeffding bound states that the true mean of r is at least a distribution and adding them to a final strong classifier. r   , with probability 1-δ, where δ is user-specified and When they are added, they are typically weighted in some way that is usually related to the weak learners' accuracy. R 2 ln(1/   After a weak learner is added, the data is reweighted;  (1) 2N examples that are misclassified gain weight and examples that are classified correctly lose weight. Boosting focuses on the misclassified tuples, it risks overfitting the resulting D. Adaptive Ensemble Boosting Classifier composite model to such data. Therefore, sometimes the The designed algorithm uses boosting as ensemble resulting “boosted” model may be less accurate than a single method, sliding window and Hoeffding tree for to improve model derived from the same data. Bagging is less ensemble performance. Figure 1 shows algorithm for the data susceptible to model overfitting. While both can significantly stream classification. In this algorithm there are m base improve accuracy in comparison to a single model, boosting models, D, a data used for training and initially assigns tends to achieve greater accuracy. There is reason for the weight wk to each base model equal to 1. The learning improvement in performance that it generates a hypothesis algorithm is provided with a series of training datasets {xit Є whose error on training set is small by combining many X: yit Є Y}, i= 1,..., mt where t is an index on the changing hypotheses whose error may be large. The effect of boosting data, hence xit is the ith instance obtained from the ith dataset. has to do with variance reduction. However unlike bagging, The algorithm has two inputs, a supervised classification boosting may also reduce the bias of the learning algorithm. algorithm, Base classifier to train individual classifiers and the training data Dt drawn from the current distribution Pt(x, B. Adaptive Sliding Window y) at time t. When new data comes at time t, the data is In data streams environment data comes infinitely and divided into number of sliding windows W1,…,Wn. The huge in amount. So it is impossible to stores and processes sliding window is used for change detection, time and such data fast. To overcome these problems window memory management. In this algorithm sliding window is technique comes forward. Window strategies have been used parameter and assumption free in the sense that it in conjunction with mining algorithms in two ways: one, automatically detects and adapts to the current rate of change. externally to the learning algorithm; the window system is Window is not maintained explicitly but compressed using a used to monitor the error rate of the current model, which variant of the exponential histogram technique [14]. The under stable distributions should keep decreasing or at most expected values W = total / width of respective windows ˆ stabilize; when instead this rate grows significantly, change is declared and the base learning algorithm is invoked to are calculated. revise or rebuild the model with fresh data. A window is If change detect, it raises change alarm. With using maintained that keeps the most recent examples and expected values algorithm decides which sub-window has according to some set of rules from window older examples dropped or not. Then algorithm generated one new classifier are dropped. ht, which is then combined with all previous classifiers to The idea behind sliding window [13] is that, whenever two create the composite hypothesis Ht. The decision of Ht serves “large enough” sub-windows of W exhibit “distinct enough” as the ensemble decision. The error of the composite averages, one can conclude that the corresponding expected hypothesis, Et, is computed on new dataset, which is then 494
  • 3. International Journal of Modeling and Optimization, Vol. 2, No. 4, August 2012 used to update the distribution weights. Once the distribution [15], OzaBoost [15] and OCBoost [16], OzaBagADWIN is updated, the algorithm calls the Base classifier & asks it to [17] and OzaBagASHT [17]. The experiments were create the tth classifier ht using drawn from the current performed on a 2.59 GHz Intel Dual Core processor with 3 training dataset Dt i.e. ht : X → Y. All classifiers are generated GB RAM, running on UBUNTU 8.04. For testing and for hk, k=1,…,t are evaluated on the current dataset by comparison MOA [18] tool and Interleaved Test-Then-Train computing εkt, the error of the kth classifier hk at the tth time method has used. step. mt  i   Dt (i).[hk ( xi )  yi ] t i 1 (2) for k = 1,2,…,t At current time step t, we have t error measures one for each classifier generated. The error is calculated using equation (2) and then it is used for to update weight. Then the weight is dynamically updated using equation (3), w'k  (1   t k ) /  t k (3) Using this updated weight, classifier is constructed. The performance evaluation of classifier is important in concept drifting environment. If the performance of the classifier has going down then this algorithm drops the classifier and add next classifier in ensemble for maintaining ensemble size. This classifier assigns dynamic sample weight. It keeps the window of length W using only O (log W) memory & O (log W) processing time per item, rather than the O (W) one expects from a naï implementation. It is used as change ve detector since it shrinks window if and only if there has been significant change in recent examples, and estimator for the current average of the sequence it is reading since, with high probability, older parts of the window with a significantly Fig. 2. Proposed agorithm. different average are automatically dropped. The proposed classifier uses Hoeffding tree as a base learner. Because of this algorithm woks faster and increases performance. A. Experiment using Synthetic Data Basically algorithm is light weight means that it uses less The synthetic data sets are generated with drifting concept memory. Windowing technique does not store whole concepts based on random radial basis function (RBF) and window explicitly but instead of this it only stores statistics SEA. required for further computation. In this proposed algorithm Random Radial Basis Function Generator- The RBF the data structure maintains an approximation of the number generator works as follows - A fixed number of random of 1‟s in a sliding window of length W with logarithmic centroids are generated. Each center has a random position, a memory and update time. The data structure is adaptive in a single standard deviation, class label and weight. New way that can provide this approximation simultaneously for examples are generated by selecting a center displacement is about O(logW) sub windows whose lengths follow a randomly drawn from a Gaussian distribution with standard geometric law, with no memory overhead with respect to deviation determined by the chosen centroid also determines keeping the count for a single window. Keeping exact counts the class label of the example. This effectively creates a for a fixed-window size is impossible in sub linear memory. normally distributed hypersphere of examples surrounding This problem can be tackled by shrinking or enlarging the each central point with varying densities. Only numeric window strategically so that what would otherwise be an attributes are generated. Drift is introduced by moving the approximate count happens to be exact. More precisely, to centroids with constant speed. This speed is initialized by a design the algorithm a parameter M is chosen which controls drift parameter. the amount of memory used to O(MlogW/M) words. SEA Generator- This artificial dataset contains abrupt concept drift, first introduced in [13]. It is generated using three attributes, where only the two first attributes are IV. EXPERIMENTS AND RESULTS relevant. All three attributes have values between 0 and 10. The proposed method has tested on both real and synthetic The points of the dataset are divided into 4 blocks with datasets. The proposed algorithm has compared with OzaBag different concepts. In each block, the classification is done 495
  • 4. International Journal of Modeling and Optimization, Vol. 2, No. 4, August 2012 using f1+f2 ≤θ, where f1 and f2 represent the first two attributes and θ is a threshold value. The most frequent values are 8, 9, 7 and 9.5 for the data blocks. 10% of class noise was then introduced into each block of data by randomly changing the class value of 10% of instances. LED Generator- The dataset generated is to predict the digit displayed on a seven segment LED display, where each attribute has a 10% chance of being inverted. It has an optimal Bayes classification rate of 74%. The particular configuration of the generator used for experiments (led) produces 24 binary attributes, 17 of which are irrelevant. Fig. 3. Learning curves using RBF dataset. For all the above datasets the ensemble size has set to 10 and the dataset size in instances has been selected as 1 million. All the experiments are carried out using same parameter settings and prequential method. Prequential method- In Prequential method the error of a model is computed from the sequence of examples. For each example in the stream, the actual model makes a prediction based only on the example attribute-values. The prequential-error is computed based on an accumulated sum of a loss function between the prediction and observed values. Comparison in terms of time (in sec) and memory (in MB) Fig. 4. Learning curves using SEA dataset. has tabulated in Table I and II respectively. The learning curves using RBF, SEA and LED datasets has depicted in Fig. 3, Fig. 4 and Fig. 5. TABLE I: COMPARISON IN TERMS OF TIME (IN SEC) USING SYNTHETIC DATASET Fig. 5. Learning curves using LED dataset. TABLE II: COMPARISON IN TERMS OF MEMORY(IN Mb) USING SYNTHETIC DATASET TABLE III: COMPARISON IN TERMS OF TIME(IN SEC) USING REAL DATASET B. Experiment using Real dataset TABLE IV: COMPARISON IN TERMS OF ACCURACY(IN %) USING REAL In this subsection, the proposed method has tested on real DATASET data sets from UCI machine learning repository- http://archive.ics.edu/ml/datasets.html. The proposed classifier is compared with Ozabag, OzaBagASHT, OzaBagADWIN, OzaBoost and OCBoost. The result is calculated in terms of time, accuracy and memory. The comparison in terms of time, accuracy and memory has tabulated in Table III, Table IV and Table V. 496
  • 5. International Journal of Modeling and Optimization, Vol. 2, No. 4, August 2012 TABLE V: COMPARISON IN TERMS OF MEMORY(IN Mb) USING REAL [10] J. Gama, Pedro Medas, G. Castillo, and P. Rodrigues, “Learning with DATASET drift detection,” in Advances in Artificial Intelligence - SBIA 2004, 17th Brazilian Symposium on Artificial Intelligence, volume 3171 of Lecture Notes in Computer Science, Springer Verlag, 2004, pp. 286–295. [11] J. Z. Kolter and M. A. Maloof, “Using additive expert ensembles to cope with concept drift,” in Proc. International conference on Machine learning (ICML), Bonn, Germany, 2005, pp. 449-456. [12] G. Cormode, S. Muthukrishnan, and W. Zhuang, “Conquering the divide: Continuous clustering of distribututed data streams,” in ICDE, 2007, pp. 1036-1045. [13] Albert Bieft and Ricard Gavald`a, “Learning from time changing data with adaptive windowing,” in SIAM International Conference on Data Mining, 2007, pp. 443-449. [14] M. Datar, A. Gionis, P. Indyk, and R. Motwani, “Maintaining stream V. CONCLUSION statistics over sliding windows,” SIAM Journal on Computing, vol. 14, no. 1, pp 27-45, 2002. In this paper we have investigated the major issues in [15] N. Oza and S. Russell, “Online bagging and boosting,” in Artificial classifying large volume, high speed and dynamically Intelligence and Statistics, Morgan Kaufmann, 2001, pp. 105-112. changing streaming data and proposed a novel approach, [16] R. Pelossof, M. Jones, I. Vovsha, and C. Rudin., “Online coordinate boosting, http://arxiv.org/abs/0810.4553,” 2008. adaptive ensemble boosting classifier, which uses adaptive [17] A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavalda, “New sliding window and hoeffding tree for improving ensemble methods for evloving data streams,” in KDD’09, ACM, Paris, performance. 2009, pp. 139-148. [18] Goeffrey Holmes, Richard Kirkby and Bernhard Pfahringer. “Moa: Comparing with other classifier, The adaptive ensemble Massive [Online]. Analysis, boosting classifier method achieves distinct features as; it is http://sourceforge.net/projects/moa-datasream,” 2007. dynamically adaptive, uses less memory and processes data fastly. Thus, adaptive ensemble boosting classifier represents a new methodology for effective classification of dynamic, fast-growing and large volume data streams. K. K. Wankhade He received B. E. degree in Information Technology from Swami Ramanand REFERENCES Teerth Marathwada University, Nanded, India in [1] G. Widmer and M. Kubat, “Learning in the presence of concept drift 2007and M. Tech. degree in Computer Engineering and hidden contexts,” Journal on Machine Learning, vol. 23, no. 1, pp. from University of Pune, Pune, India in 2010. 69-101, 1996. He is currently working as Assistant Professor [2] Y. Freund and R. Schapire, “Experiments with a new boosting the Department of Information Technology at G. H. algorithm,” in the Proceeding of International Conference of Machine Raisoni College of Engineering, Nagpur, India. Learning, 1996, pp 148-156. Number of publications is in reputed Journal and [3] P. Domingos and G. Hulten, “Mining high-speed data streams,” in International conferences like Springer and IEEE. His research is on Data KDD’01 : Proceedings of the sixth ACM SIDKDD international Stream Mining, Machine Learning, Decision Support System, Artificial conference on Knowledge discovery and dta mining, New York, NY, Neural Network and Embedded System. His book has published on titled USA, ACM Press, 2000, pp. 71-80. Data Streams Mining: Classification and Application, LAP Publication [4] G. Hulten, L. Spencer, and P. Domingos, “Mining time-changing data House, Germany, 2010. streams,” in Proc. ACM SIGKDD, San Francisco, CA, USA, 2001, pp. Mr. Kapil K. Wankhade is a member of IACSIT and IEEE organizations. 97-106. He is currently working as a reviewer for Springer‟s Evolving System [5] W. N. Street and Y. Kim, “A streaming ensemble algorithm (sea) for Journal and IJCEE journal. large-scale classification,” in KDD’01 : Proceedings of the seventh ACM SIDKDD international conference on Knowledge discovery and dta mining, New York, NY, USA, ACM Press, 2001, pp. 377-382. S. S. Dongre She received B. E. degree in Computer [6] H. Wang, W. Fan, P. S. Yu, and J. Han, “Mining concept drifting data Science and Engineering from Pt. Ravishankar Shukla streams using ensemble classifiers,” in KDD’03: Proceedings of the University, Raipur, India in 2007 and M. Tech. degree ninth ACM SIGKDD international conference on Knowledge discovery in Computer Engineering from University of Pune, and data mining. New York, NY, USA, ACM Press, 2003, pp 226-235. Pune, India in 2010. [7] J. Z. Kolter and M. A. Maloof, “Dynamic weighted majority: a new She is currently working as Assistant Professor the ensemble method for tracking concept drift,” in 3rd International Department of Computer Science and Engineering at Conference on Data Mining, ICDM’03, IEEE CS Press, 2003, pp. G. H. Raisoni College of Engineering, Nagpur, India. 123-130. Number of publications is in reputed International [8] B. Babcock, M. Datar, R. Motwani, and L. O‟Callaghan, “Maintaining conferences like IEEE and Journals. Her research is on variance and k-medians over data stream windows,” in Proceeding of Data Stream Mining, Machine Learning, Decision Support System, ANN the 22nd Symposium on Principles of Database Systems, 2003, pp. and Embedded System. Her book has published on titled Data Streams 234-243. Mining: Classification and Application, LAP Publication House, Germany, [9] J. Gama, R. Rocha, and P. Medas, “Accurate decision trees for mining 2010. high-speed data streams,” in Proceedings of the 9th ACM SIGKDD Ms. Snehlata S. Dongre is a member of IACSIT, IEEE and ISTE International Conference on Knowledge Discovery and Data Mining, organizations. 2003, pp. 523-528. 497