Successfully reported this slideshow.

Introduction to Data Mining

554 views

Published on

An introduction to data mining process and three algorithms implementation.

Published in: Technology, Education
  • Be the first to comment

Introduction to Data Mining

  1. 1. EVALUATION AND VISUALIZATION OF DIFFERENT DATA MINING TECHNIQUES INTRODUCTION TO DATAMINING BY SUMAIRA S.
  2. 2. Data Mining Process INTRODUCTION TO DATAMINING BY SUMAIRA S.
  3. 3. The purpose of this project is to gain an understanding of the process of data mining by  Implementing one or more data mining algorithms  Visualizing them  Comparing their performance on datasets  Another aspect was to provide visual tutorials and detailed help about these algorithms INTRODUCTION TO DATAMINING BY SUMAIRA S.
  4. 4. WHAT IS DATA MINING?  Originally developed to act as expert systems to solve problems  Data Mining can be utilized in any organization that needs to find patterns or relationships in their data.  Different types of Data Mining INTRODUCTION TO DATAMINING BY SUMAIRA S.
  5. 5. BASIC FEATURES OF THE PROJECT  Handling different types of data  Pre processing of data  Algorithms implementation  Visualization of data mining model  Comparison of different data mining algorithms  Help and visual tutorials INTRODUCTION TO DATAMINING BY SUMAIRA S.
  6. 6. HANDLING DIFFERENT DATA FORMATS System supports following types of data files  Text Data File Handling  CSV (Comma Separated Value) File  Any User Defined Format  Database Data File Handling  MS Access Data File  MS SQL Data File  XML Data File Handling  XML Data File INTRODUCTION TO DATAMINING BY SUMAIRA S.
  7. 7. PRE PROCESSING OF DATA  Pre processing of data includes  Filling of missing values  Ignore row INTRODUCTION TO DATAMINING BY SUMAIRA S.
  8. 8. ALGORITHMS’ IMPLEMENTATION  Clustering  Partitional Clustering Algorithm   K-Means Algorithm Hierarchical Clustering Algorithms  Single Linkage Algorithm  Weighted Average Algorithm  Complete Linkage Algorithm INTRODUCTION TO DATAMINING BY SUMAIRA S.
  9. 9. VISUALIZATION OF DATA MINING MODEL  XYScatter Chart Visualization  Dendrogram  Pie Chart  Curve Graph INTRODUCTION TO DATAMINING BY SUMAIRA S.
  10. 10. COMPARISON OF DIFFERENT DATA MINING ALGORITHMS  Data File Comparison  Running time  Memory Usage  CPU Usage  Precision/Recall INTRODUCTION TO DATAMINING BY SUMAIRA S.
  11. 11. K-MEAN ALGORITHM  K-mean was introduced by MC Queen in 1967 INTRODUCTION TO DATAMINING BY SUMAIRA S.
  12. 12. THE K-MEANS CLUSTERING METHOD 10 5 6 5 6 7 6 7 8 7 8 9 8 9 10 9 10 5 4 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 Assign each of the objects to most similar center 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 Update the cluster means 4 3 2 1 0 0 Arbitrarily choose K objects as initial cluster center 3 4 5 6 7 8 9 10 reassign 10 10 9 9 8 7 7 6 6 5 5 4 3 2 1 0 0 INTRODUCTION TO DATAMINING BY SUMAIRA S. 2 reassign 8 K=2 1 1 2 3 4 5 6 7 8 9 10 Update the cluster means 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10
  13. 13. SINGLE LINKAGE HIERARCHICAL CLUSTERING 1. Say “Every point is its own cluster” 2. Find “most similar” pair of clusters INTRODUCTION TO DATAMINING BY SUMAIRA S.
  14. 14. SINGLE LINKAGE HIERARCHICAL CLUSTERING 1. Say “Every point is its own cluster” 2. Find “most similar” pair of clusters 3. Merge it into a parent cluster INTRODUCTION TO DATAMINING BY SUMAIRA S.
  15. 15. SINGLE LINKAGE HIERARCHICAL CLUSTERING 1. Say “Every point is its own cluster” 2. Find “most similar” pair of clusters 3. Merge it into a parent cluster 4. Repeat INTRODUCTION TO DATAMINING BY SUMAIRA S.
  16. 16. SINGLE LINKAGE HIERARCHICAL CLUSTERING 1. Say “Every point is its own cluster” 2. Find “most similar” pair of clusters 3. Merge it into a parent cluster 4. Repeat INTRODUCTION TO DATAMINING BY SUMAIRA S.
  17. 17. THANK YOU Presentation By: Sumaira Sohail. INTRODUCTION TO DATAMINING BY SUMAIRA S.

×