• What is Data mining ?
• Extracting knowledge from historical data.
• What is Data stream Mining ?
• Extracting knowledge from real high stream data
• Why we use Data stream Mining ?
Continue flow Data
Network Traffic Data
Call Center Data
Data Stream Classification
• Uses past labeled data to build classification model
• Predicts the labels of future instances using the model
• Helps decision making
• Decision tree is a classification model. Its
structure is a like a general tree structure or flow
– Internal node: It is used for testing the attribute
– Leaf node: class labels.
Fig: Decision Tree of Weather
Decision Tree (cont...)
– Classic decision tree assume all training data
can be simultaneously stored in main
– Disk-based decision tree repeatedly read
training data from disk sequentially.
• VFDT takes less time as compare to Decision tree.
• In order to find the best attribute at a node, it will take small
subset of the training examples that pass through that node.
– Given a stream of examples, use the first ones to
choose the root attribute.
– Once the root attribute is chosen, the successive
examples are passed down to the corresponding
leaves, and used to choose the attribute there, and
so on recursively.
G(Car Type) - G(Gender) > ε
The data stream is divided into equal sized chunks
outlier detection module
classifier Ensemble M
If tp is greater
of a another
than the threshold
Set of Pseudopoint H
Calculate q-NSC value
Assigned to every instance in Pseudopoint
Fig: Work flow for Identifying concept evolution.
•Applicable to many domains such as
•Intrusion detection system.
•Share Market Data.
•Network monitoring and traffic engineering.
•Business : credit card transaction flows.
•Telecommunication calling records.
•Web logs and web page click streams.
• In data stream classification VFDT algorithm is efficient to
classified high dimensional data in to the another class.
• Then, VFDT shows two key mechanisms of the another class
detection technique, outlier detection, and multiple class
 Mohammad M. Masud, Qing Chen, Latifur Khan, Charu C. Aggarwal, JingGao,
Jiawei Han, “Classification and Adaptive Novel Class Detection of Feature-Evolving
Data Streams”, IEEE Tran. on Knowledge And Data Engi., Vol. 25, No. 7, July 2013.
 Durga Toshniwal, Yogita K,“Clustering Techniques for Streaming Data–A
Survey”, 3rd IEEE International Advance Computing Conference (IACC), 2013.
 S. Hashemi, Y. Yang, Z. Mirzamomen, and M. Kangavari, “Adapted One-versusAll Decision Trees for Data Stream Classi-fication,” IEEE Trans. Knowledge and
Data Eng., vol. 21, no. 5, pp. 624-637, May 2012.
 A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavalda,“New Ensemble
Methods for Evolving Data Streams,” Proc. ACMSIGKDD 15th Int’l Conf.
Knowledge Discovery and Data Mining,pp. 139-148, 2011.
 C.C. Aggarwal, “On Classification and Segmentation of Massive Audio Data
Streams,” Knowledge and Information System, vol. 20, pp. 137-156, July 2009.
 M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classification
and Novel Class Detection in Concept-Drifting Data Streams under Time
Constraints,” IEEE Trans. Knowledge and Data Eng.,vol. 23, no. 6, pp. 859-874,
 M.M. Masud, Q. Chen, L. Khan, C. Aggarwal, J. Gao, J. Han, and B.M.
Thuraisingham, “Addressing Concept-Evolution in Concept-Drifting Data Streams,”
Proc. IEEE Int’l Conf. Data Mining (ICDM), pp. 929-934, 2010.
 M.-Y. Yeh, B.-R. Dai, and M.-S. Chen, “Clustering over multiple evolving
streams by events and correlations,” IEEE Trans. on Knowl. and Data Eng., vol. 19,
no. 10, pp. 1349–1362, Oct. 2009