WEKA: Practical Machine Learning Tools And Techniques

Practical Machine Learning Tools and Techniques

Decision Trees Dealing with numeric attributes Standard method: binary splits Steps to decide where to split: Evaluate info gain for every possible split point of attribute Choose “best” split point But this is computationally intensive

Decision Trees Example Split on temperature attribute: 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes No Yes YesYes No No Yes YesYes No Yes Yes No temperature < 71.5: yes/4, no/2 temperature > 71.5: yes/5, no/3 Info([4,2],[5,3]) = 6/14 info([4,2]) + 8/14 info([5,3]) = 0.939 bits

Decision Trees Dealing with missing values: Split instances with missing values into pieces A piece going down a branch receives a weight proportional to the popularity of the branch weights sum to 1

Decision Trees Pruning Making the decision tree less complex by removing cases of over fitting We have two types of pruning: Prepruning: Trying to decide during tree building Postpruning: Doing pruning after the tree has been constructed The two types of postpruning thatare generally used are: Subtree replacement Subtree raising To decide whether to do postpruning or not, we calculate the error rate before and after the pruning

Decision Trees Subtree raising:

Decision Trees Subtree replacement

Classification rules Criteria for choosing tests: p/t ratio Maximizes the ratio of positive instances with stress on accuracy p[log(p/t) – log(p/t)] Maximizes the number of positive instances with lesser accuracy

Classification rules Generating good rules: We can remove over fitting by either pruning of trees during construction or after they have been fully constructed To prune during construction we check each newly added test. If the error rate on the pruning set increases because of this new test, we remove it

Classification rules Obtaining rules from partial decision trees: Algorithm

Classification rules As the node 4 was not replaced, we stop at this stage Now each leaf node gives us a possible rule Choose the leaf which covers the greatest number of instances

Extending linear models Support vector machines: Support vector machines are algorithms for learning linear classifier They use maximum marginal hyper plane: removes over fitting The instances closest to the maximum marginal hyper plane are support vectors, rest all instances can be ignored

Extending linear models Support vector machines: The hyper plane can be written as: Support vector: All instances for which alpha(i) > 0 b and alpha are determined using software packages The hyper plane can also be written using kernel as:

Extending linear models Multilayer perceptron: We can create a network of perceptron to approximate arbitrary target concepts Multilayer perceptron is an example of an artificial neural network Consists of: input layer, hidden layer(s), and output layer Structure of MLP is usually found by experimentation Parameters can be found using backpropagation

Extending linear models Examples:

Extending linear models Back propagation: f(x) = 1/(1+exp(-x)) Error = ½(y-f(x))^2 So we try to minimize the error and get: Now just calculate the above expression for all training instances and do: w(i) = w(i) – L(dE/dw) We assume values of w in the starting

Clustering Incremental clustering: Steps Tree consists of empty root node Add instances one by one Update tree at appropriately at each stage To update, find the right leaf for an instance May involve restructuring the tree Restructuring: Merging and Replacement Decisions are made using category utility

Clustering Example of incremental clustering:

EM Algorithm EM = ExpectationMaximization Generalize kmeans to probabilistic setting Iterative procedure: E “expectation” step: Calculate cluster probability for each instance M “maximization” step: Estimate distribution parameters from cluster probabilities Store cluster probabilities as instance weights Stop when improvement is negligible

Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net

WEKA: Practical Machine Learning Tools And Techniques

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (20)

Similar to WEKA: Practical Machine Learning Tools And Techniques

Similar to WEKA: Practical Machine Learning Tools And Techniques (20)

More from DataminingTools Inc

More from DataminingTools Inc (20)

Recently uploaded

Recently uploaded (20)

WEKA: Practical Machine Learning Tools And Techniques