Knowledge Discovery


Published on

Knowledge Discovery

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Knowledge Discovery

  1. 1. Knowledge Discovery<br />
  2. 2. Definition<br />Definition – “Non-trivial extraction of implicit, previously unknown and potentially useful information from data.”<br />Data Mining – Responsible for detecting patterns from the pre-processed (prepared) data. It is only a part of Knowledge discovery process.<br />
  3. 3. Applications<br />Can be divided into four major kinds:<br />Classification<br />Numerical prediction<br />Association<br />Clustering<br />Some examples:<br />Automatic abstraction<br />Financial forecasting<br />Targeted marketing<br />Medical diagnosis<br />Credit card fraud detection<br />Weather forecasting etc.<br />
  4. 4. Labeled & Unlabeled data<br />General Terminology: <br />Instances – Dataset of examples<br />Attributes – Variables in an instance<br />Labeled data<br />Specific attribute whose value in some instances can be used to predict its value in unknown instances<br />Unlabeled data<br />No such specific attribute that can be used to predict the value in unknown instances. <br />Supervised learning – Data mining using labeled data<br />Unsupervised learning – Data mining using unlabeled data<br />
  5. 5. Labeled data<br />Attributes can be of two types:<br />Categorical attribute <br />Takes a value from only a fixed set of values (like an enumeration) eg. ‘very good’, ‘good’, ‘poor’<br />Supervised learning is called Classification<br />Numerical attribute<br />Can take a value from a continuous range of numerical values<br />Supervised learning is called Regression<br />
  6. 6. Unlabeled data<br />It doesn’t have any specifically designated attribute<br />Unsupervised learning <br />Data mining using unlabeled data<br />Purpose - To extract as much as it is possible from the data available.<br />
  7. 7. Supervised learning: Classification<br />It is based on the following three methods:<br />Nearest neighbor matching: <br />Identifying the classified instances that are closest (in some sense) to the unclassified one<br />Classification rules:<br />Look for rules that can be used to predict the classification of an unknown instance<br />Classification tree:<br />Generation of classification rules via the tree-like structure<br />
  8. 8. Supervised Learning: Numerical Prediction<br />Regression is done by using Neural Networks<br />Neural Network: Given a set of inputs to predict one or more outputs<br />
  9. 9. Unsupervised Learning: Association Rules<br />Association Rules: To find any relationship that exists amongst the values of variables within a training set<br />Example: <br />IF variable_1&gt;90 and switch_6 = open<br />THEN variable_3 &lt; 47.5 and switch_9 = closed<br />(probability = 0.8)<br />
  10. 10. Unsupervised Learning: Clustering<br />To find groups of items that are similar<br />Example: <br />A company may group its customers based on income to target its policies etc.<br />
  11. 11. Visit more self help tutorials<br />Pick a tutorial of your choice and browse through it at your own pace.<br />The tutorials section is free, self-guiding and will not involve any additional support.<br />Visit us at<br />