Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)


Published on

Data mining technique for classification and feature evaluation using stream mining

Published in: Education, Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Data streams are
    Continuous flows of data
    For example, network traffic, sensor data, and call center records
  • Data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)

    1. 1. Data Mining Technique For Classification and Feature Evaluation Using Stream Mining Ranjit R. Banshpal
    2. 2. OUTLINE •Introduction •Data streams classification •Decision Tree •VFDT •Challenges •Applications •Conclusion •References
    3. 3. Introduction • What is Data mining ? • Extracting knowledge from historical data. • What is Data stream Mining ? • Extracting knowledge from real high stream data • Why we use Data stream Mining ?
    4. 4. Introduction (Cont…) Examples: Continue flow Data Network Traffic Data Sensor Data Call Center Data
    5. 5. Data Stream Classification • Uses past labeled data to build classification model • Predicts the labels of future instances using the model • Helps decision making Expert analysis and labeling Block and quarantine Network traffic Attack traffic Firewall Classification model M e od a pd lu te Benign traffic Server 5
    6. 6. Decision Trees • Decision tree is a classification model. Its structure is a like a general tree structure or flow chart. – Internal node: It is used for testing the attribute value. – Leaf node: class labels. Fig: Decision Tree of Weather
    7. 7. Decision Tree (cont...) • Limitations – Classic decision tree assume all training data can be simultaneously stored in main memory. – Disk-based decision tree repeatedly read training data from disk sequentially.
    8. 8. VFDT • VFDT takes less time as compare to Decision tree. • In order to find the best attribute at a node, it will take small subset of the training examples that pass through that node. – Given a stream of examples, use the first ones to choose the root attribute. – Once the root attribute is chosen, the successive examples are passed down to the corresponding leaves, and used to choose the attribute there, and so on recursively.
    9. 9. VFDT (cont...) Age<30? Yes No Data Stream Yes _ _ G(Car Type) - G(Gender) > ε Age<30? Yes No Car Type= Sports Car? Car Type= normal Yes No No Data Stream
    10. 10. Challenges • Infinite length • Concept-drift • Concept-evolution • Feature Evolution
    11. 11. The data stream is divided into equal sized chunks (Input) algorithm Buffer outliers instances. outlier detection module classifier Ensemble M If tp is greater clusters clusters clusters Clusters instances in Buffer cluster is Transformed into a pseudopoint data structure corresponding classifier votes in favor of a another class than the threshold Set of Pseudopoint H Centroid,Weight,radiu s Centroid,Weight,radius Centroid,Weight,radius Centroid,Weight,radius Another instance Calculate q-NSC value Assigned to every instance in Pseudopoint Fig: Work flow for Identifying concept evolution.
    12. 12. Feature-Evolution
    13. 13. Applications •Applicable to many domains such as •Intrusion detection system. •Share Market Data. •Security Monitoring. •Network monitoring and traffic engineering. •Business : credit card transaction flows. •Telecommunication calling records. •Web logs and web page click streams.
    14. 14. Conclusion • In data stream classification VFDT algorithm is efficient to classified high dimensional data in to the another class. • Then, VFDT shows two key mechanisms of the another class detection technique, outlier detection, and multiple class detection.
    15. 15. References [1] Mohammad M. Masud, Qing Chen, Latifur Khan, Charu C. Aggarwal, JingGao, Jiawei Han, “Classification and Adaptive Novel Class Detection of Feature-Evolving Data Streams”, IEEE Tran. on Knowledge And Data Engi., Vol. 25, No. 7, July 2013. [2] Durga Toshniwal, Yogita K,“Clustering Techniques for Streaming Data–A Survey”, 3rd IEEE International Advance Computing Conference (IACC), 2013. [3] S. Hashemi, Y. Yang, Z. Mirzamomen, and M. Kangavari, “Adapted One-versusAll Decision Trees for Data Stream Classi-fication,” IEEE Trans. Knowledge and Data Eng., vol. 21, no. 5, pp. 624-637, May 2012. [4] A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavalda,“New Ensemble Methods for Evolving Data Streams,” Proc. ACMSIGKDD 15th Int’l Conf. Knowledge Discovery and Data Mining,pp. 139-148, 2011.
    16. 16. References [5] C.C. Aggarwal, “On Classification and Segmentation of Massive Audio Data Streams,” Knowledge and Information System, vol. 20, pp. 137-156, July 2009. [6] M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints,” IEEE Trans. Knowledge and Data Eng.,vol. 23, no. 6, pp. 859-874, June 2011. [7] M.M. Masud, Q. Chen, L. Khan, C. Aggarwal, J. Gao, J. Han, and B.M. Thuraisingham, “Addressing Concept-Evolution in Concept-Drifting Data Streams,” Proc. IEEE Int’l Conf. Data Mining (ICDM), pp. 929-934, 2010. [8] M.-Y. Yeh, B.-R. Dai, and M.-S. Chen, “Clustering over multiple evolving streams by events and correlations,” IEEE Trans. on Knowl. and Data Eng., vol. 19, no. 10, pp. 1349–1362, Oct. 2009
    17. 17. Any Questions?
    18. 18. THANK YOU