Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Knowledge Discovery

782 views

Published on

Knowledge Discovery

Published in: Technology
  • Be the first to comment

Knowledge Discovery

  1. 1. Knowledge Discovery<br />
  2. 2. Definition<br />Definition – “Non-trivial extraction of implicit, previously unknown and potentially useful information from data.”<br />Data Mining – Responsible for detecting patterns from the pre-processed (prepared) data. It is only a part of Knowledge discovery process.<br />
  3. 3. Applications<br />Can be divided into four major kinds:<br />Classification<br />Numerical prediction<br />Association<br />Clustering<br />Some examples:<br />Automatic abstraction<br />Financial forecasting<br />Targeted marketing<br />Medical diagnosis<br />Credit card fraud detection<br />Weather forecasting etc.<br />
  4. 4. Labeled & Unlabeled data<br />General Terminology: <br />Instances – Dataset of examples<br />Attributes – Variables in an instance<br />Labeled data<br />Specific attribute whose value in some instances can be used to predict its value in unknown instances<br />Unlabeled data<br />No such specific attribute that can be used to predict the value in unknown instances. <br />Supervised learning – Data mining using labeled data<br />Unsupervised learning – Data mining using unlabeled data<br />
  5. 5. Labeled data<br />Attributes can be of two types:<br />Categorical attribute <br />Takes a value from only a fixed set of values (like an enumeration) eg. ‘very good’, ‘good’, ‘poor’<br />Supervised learning is called Classification<br />Numerical attribute<br />Can take a value from a continuous range of numerical values<br />Supervised learning is called Regression<br />
  6. 6. Unlabeled data<br />It doesn’t have any specifically designated attribute<br />Unsupervised learning <br />Data mining using unlabeled data<br />Purpose - To extract as much as it is possible from the data available.<br />
  7. 7. Supervised learning: Classification<br />It is based on the following three methods:<br />Nearest neighbor matching: <br />Identifying the classified instances that are closest (in some sense) to the unclassified one<br />Classification rules:<br />Look for rules that can be used to predict the classification of an unknown instance<br />Classification tree:<br />Generation of classification rules via the tree-like structure<br />
  8. 8. Supervised Learning: Numerical Prediction<br />Regression is done by using Neural Networks<br />Neural Network: Given a set of inputs to predict one or more outputs<br />
  9. 9. Unsupervised Learning: Association Rules<br />Association Rules: To find any relationship that exists amongst the values of variables within a training set<br />Example: <br />IF variable_1&gt;90 and switch_6 = open<br />THEN variable_3 &lt; 47.5 and switch_9 = closed<br />(probability = 0.8)<br />
  10. 10. Unsupervised Learning: Clustering<br />To find groups of items that are similar<br />Example: <br />A company may group its customers based on income to target its policies etc.<br />
  11. 11. Visit more self help tutorials<br />Pick a tutorial of your choice and browse through it at your own pace.<br />The tutorials section is free, self-guiding and will not involve any additional support.<br />Visit us at www.dataminingtools.net<br />

×