1.
Introduction to XLMiner™ The Data mining add-in for Microsoft Excel. Classification XLMiner and Microsoft Office are registered trademarks of the respective owners.
2.
CLASSIFICATION XLMiner provides us with different tools that can be used to classify data: They are: Discriminant Analysis Logistic Regression Classification Tree Naive Bayes Neural Network (Multilayer feed forward) k-Nearest Neighbors Let us look at each of these methods one by one. http://dataminingtools.net
3.
CLASSIFICATION-Discriminant Analysis Discriminant analysis is a technique for classifying a set of observations into predefined classes. The purpose is to determine the class of an observation based on a set of variables known as predictors or input variables. The model is built based on a set of observations for which the classes are known. This set of observations is sometimes referred to as the training set. Based on the training set , the technique constructs a set of linear functions of the predictors, known as discriminant functions. We will use the Wine.xls as the data source. http://dataminingtools.net
4.
CLASSIFICATION-Discriminant Analysis(Step 1) The variables (independent) that are selected as the input variables The output ( dependent) variable http://dataminingtools.net
5.
CLASSIFICATION-Discriminant Analysis(Step 2) Choosing the “According to relative occurrences” will specify the prior class probability i.e. the probability of a particular class occurring is selected equal to its frequency in the training set. Choosing “Use equal” specifies the class probabilities to be taken as equal . http://dataminingtools.net
6.
CLASSIFICATION-Discriminant Analysis (Step 3) Check the options which you want to be displayed in the output, and then click on finish. http://dataminingtools.net
8.
CLASSIFICATION-Discriminant Analysis This section of the output shows how each training data case was classified. The highest probability values in each record are highlighted http://dataminingtools.net
9.
CLASSIFICATION- Classification Trees These trees are very useful to classify/predict outcomes. They generate simple rules that can easily be translated to a natural query language. The decision trees work by binary recursive partitioning – i.e. they keep on classifying a record by checking whether it meets the criteria at a node or not. Since the partitioning is binary, it is essential that the nodes be divided such that they represent mutually exclusive conditions. http://dataminingtools.net
10.
CLASSIFICATION- Classification Trees (Step 1) http://dataminingtools.net
11.
CLASSIFICATION- Classification Trees (Step 2) The “Minimum #records in terminal node” determines when the classification should stop i.e. when the minimum number of records is reached classification is halted so that the built model is not over fitted. http://dataminingtools.net
12.
CLASSIFICATION- Classification Trees (Step 3) Select the options for output. Selecting “Best pruned tree” causes the tree to be pruned and the best fitting for validation set is selected. http://dataminingtools.net
13.
CLASSIFICATION- Classification Trees (Output) Rules that are used to create nodes. http://dataminingtools.net
14.
CLASSIFICATION- Classification Trees (output) http://dataminingtools.net
15.
CLASSIFICATION- Classification Trees (output) http://dataminingtools.net
16.
CLASSIFICATION- Naïve Bayes Theorem This theorem is applicable to independent events only, i.e. the value of one variable will not affect that of the others. If there are say, 10 variables that a classification technique has to consider, the Bayes theorem does classification by taking each variable into account separately. http://dataminingtools.net
21.
CLASSIFICATION- k-nearest neighbors In k-nearest neighbours classification (k-NN), for each record, the k-nearest neighbours (nearness is defined by the Euclidean distance to the record in question) are identified and the class a majority of them belong to is determined. The original record is also attributed to the same class. http://dataminingtools.net
25.
CLASSIFICATION- k-nearest neighbors (Output) Based on the probability , record is placed in the class with highest probability. http://dataminingtools.net
26.
Thank you For more presentations, tutorial videos on Data Mining, please visit http://dataminingtools.net http://dataminingtools.net
Be the first to comment