3. Introduction
• What is Data mining ?
• Extracting knowledge from historical data.
• What is Data stream Mining ?
• Extracting knowledge from real high stream data
• Why we use Data stream Mining ?
5. Data Stream Classification
• Uses past labeled data to build classification model
• Predicts the labels of future instances using the model
• Helps decision making
Expert
analysis
and
labeling
Block and
quarantine
Network traffic
Attack traffic
Firewall
Classification
model
M
e
od
a
pd
lu
te
Benign traffic
Server
5
6. Decision Trees
• Decision tree is a classification model. Its
structure is a like a general tree structure or flow
chart.
– Internal node: It is used for testing the attribute
value.
– Leaf node: class labels.
Fig: Decision Tree of Weather
7. Decision Tree (cont...)
• Limitations
– Classic decision tree assume all training data
can be simultaneously stored in main
memory.
– Disk-based decision tree repeatedly read
training data from disk sequentially.
8. VFDT
• VFDT takes less time as compare to Decision tree.
• In order to find the best attribute at a node, it will take small
subset of the training examples that pass through that node.
– Given a stream of examples, use the first ones to
choose the root attribute.
– Once the root attribute is chosen, the successive
examples are passed down to the corresponding
leaves, and used to choose the attribute there, and
so on recursively.
11. The data stream is divided into equal sized chunks
(Input)
algorithm
Buffer
outliers instances.
outlier detection module
classifier Ensemble M
If tp is greater
clusters clusters
clusters
Clusters
instances in
Buffer
cluster is
Transformed
into a
pseudopoint
data
structure
corresponding
classifier votes
in favor
of a another
class
than the threshold
Set of Pseudopoint H
Centroid,Weight,radiu
s
Centroid,Weight,radius
Centroid,Weight,radius
Centroid,Weight,radius
Another instance
Calculate q-NSC value
Assigned to every instance in Pseudopoint
Fig: Work flow for Identifying concept evolution.
13. Applications
•Applicable to many domains such as
•Intrusion detection system.
•Share Market Data.
•Security Monitoring.
•Network monitoring and traffic engineering.
•Business : credit card transaction flows.
•Telecommunication calling records.
•Web logs and web page click streams.
14. Conclusion
• In data stream classification VFDT algorithm is efficient to
classified high dimensional data in to the another class.
• Then, VFDT shows two key mechanisms of the another class
detection technique, outlier detection, and multiple class
detection.
15. References
[1] Mohammad M. Masud, Qing Chen, Latifur Khan, Charu C. Aggarwal, JingGao,
Jiawei Han, “Classification and Adaptive Novel Class Detection of Feature-Evolving
Data Streams”, IEEE Tran. on Knowledge And Data Engi., Vol. 25, No. 7, July 2013.
[2] Durga Toshniwal, Yogita K,“Clustering Techniques for Streaming Data–A
Survey”, 3rd IEEE International Advance Computing Conference (IACC), 2013.
[3] S. Hashemi, Y. Yang, Z. Mirzamomen, and M. Kangavari, “Adapted One-versusAll Decision Trees for Data Stream Classi-fication,” IEEE Trans. Knowledge and
Data Eng., vol. 21, no. 5, pp. 624-637, May 2012.
[4] A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavalda,“New Ensemble
Methods for Evolving Data Streams,” Proc. ACMSIGKDD 15th Int’l Conf.
Knowledge Discovery and Data Mining,pp. 139-148, 2011.
16. References
[5] C.C. Aggarwal, “On Classification and Segmentation of Massive Audio Data
Streams,” Knowledge and Information System, vol. 20, pp. 137-156, July 2009.
[6] M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classification
and Novel Class Detection in Concept-Drifting Data Streams under Time
Constraints,” IEEE Trans. Knowledge and Data Eng.,vol. 23, no. 6, pp. 859-874,
June 2011.
[7] M.M. Masud, Q. Chen, L. Khan, C. Aggarwal, J. Gao, J. Han, and B.M.
Thuraisingham, “Addressing Concept-Evolution in Concept-Drifting Data Streams,”
Proc. IEEE Int’l Conf. Data Mining (ICDM), pp. 929-934, 2010.
[8] M.-Y. Yeh, B.-R. Dai, and M.-S. Chen, “Clustering over multiple evolving
streams by events and correlations,” IEEE Trans. on Knowl. and Data Eng., vol. 19,
no. 10, pp. 1349–1362, Oct. 2009