Definition Definition – “Non-trivial extraction of implicit, previously unknown and potentially useful information from data.” Data Mining – Responsible for detecting patterns from the pre-processed (prepared) data. It is only a part of Knowledge discovery process.
Applications Can be divided into four major kinds: Classification Numerical prediction Association Clustering Some examples: Automatic abstraction Financial forecasting Targeted marketing Medical diagnosis Credit card fraud detection Weather forecasting etc.
Labeled & Unlabeled data General Terminology: Instances – Dataset of examples Attributes – Variables in an instance Labeled data Specific attribute whose value in some instances can be used to predict its value in unknown instances Unlabeled data No such specific attribute that can be used to predict the value in unknown instances. Supervised learning – Data mining using labeled data Unsupervised learning – Data mining using unlabeled data
Labeled data Attributes can be of two types: Categorical attribute Takes a value from only a fixed set of values (like an enumeration) eg. ‘very good’, ‘good’, ‘poor’ Supervised learning is called Classification Numerical attribute Can take a value from a continuous range of numerical values Supervised learning is called Regression
Unlabeled data It doesn’t have any specifically designated attribute Unsupervised learning Data mining using unlabeled data Purpose - To extract as much as it is possible from the data available.
Supervised learning: Classification It is based on the following three methods: Nearest neighbor matching: Identifying the classified instances that are closest (in some sense) to the unclassified one Classification rules: Look for rules that can be used to predict the classification of an unknown instance Classification tree: Generation of classification rules via the tree-like structure
Supervised Learning: Numerical Prediction Regression is done by using Neural Networks Neural Network: Given a set of inputs to predict one or more outputs
Unsupervised Learning: Association Rules Association Rules: To find any relationship that exists amongst the values of variables within a training set Example: IF variable_1>90 and switch_6 = open THEN variable_3 < 47.5 and switch_9 = closed (probability = 0.8)
Unsupervised Learning: Clustering To find groups of items that are similar Example: A company may group its customers based on income to target its policies etc.
Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net