Practical Machine Learning Tools and Techniques
Decision TreesDealing with numeric attributesStandard method: binary splits Steps to decide where to split: Evaluate info gain for every possible split point of attributeChoose “best” split pointBut this is computationally intensive
Decision TreesExampleSplit on temperature attribute:             64  65  68  69  70   71  72  72  75  75  80   81  83  85           Yes  No  Yes  YesYes  No  No  Yes  YesYes  No  Yes  Yes  Notemperature < 71.5: yes/4, no/2temperature > 71.5: yes/5, no/3Info([4,2],[5,3])= 6/14 info([4,2]) + 8/14 info([5,3]) = 0.939 bits
Decision TreesDealing with missing values:Split instances with missing values into piecesA piece going down a branch receives a weight  proportional to the popularity of the branchweights sum to 1
Decision TreesPruning Making the decision tree less complex by removing cases of over fitting We have two types of pruning:Prepruning: Trying to decide during tree buildingPostpruning: Doing pruning after the tree has been constructedThe two types of postpruning thatare generally used are:Subtree replacement Subtree raising To decide whether to do postpruning or not, we calculate the error rate before and after the pruning
Decision TreesSubtree raising:
Decision TreesSubtree replacement
Classification rulesCriteria for choosing  tests:p/t ratioMaximizes the ratio of positive instances with stress on accuracyp[log(p/t) – log(p/t)]Maximizes the number of positive instances with lesser accuracy
Classification rulesGenerating good rules:We can remove over fitting by either pruning of trees during construction or after they have been fully constructedTo prune during construction we check each newly added test. If the error rate on the pruning set increases because of this new test, we remove it
Classification rulesObtaining rules from partial decision trees: Algorithm
Classification rules
Classification rulesAs the node 4 was not replaced, we stop at this stageNow each leaf node gives us a possible ruleChoose the leaf which covers the greatest number of instances
Extending linear modelsSupport vector machines:Support vector machines are algorithms for learning linear classifierThey use maximum marginal hyper plane: removes over fittingThe instances closest to the maximum marginal hyper plane are support vectors, rest all instances can be ignored
Extending linear models
Extending linear modelsSupport vector machines:The hyper plane can be written as:Support vector: All instances for which alpha(i) > 0b and alpha are determined using software packagesThe hyper plane can also be written using kernel as:
Extending linear modelsMultilayer perceptron:We can create a network of perceptron to approximate arbitrary target concepts Multilayer perceptron is an example of an artificial neural networkConsists of: input layer, hidden layer(s), and output layer  Structure of MLP is usually found by experimentationParameters can be found using backpropagation
Extending linear modelsExamples:
Extending linear modelsBack propagation:f(x) = 1/(1+exp(-x))Error = ½(y-f(x))^2So we try to minimize the error and get:Now just calculate the above expression for all training instances and do:      w(i) = w(i) – L(dE/dw)We assume values of w in the starting
ClusteringIncremental clustering: StepsTree consists of empty root nodeAdd instances one by oneUpdate tree at appropriately at each stage To update, find the right leaf for an instance May involve restructuring the treeRestructuring: Merging and ReplacementDecisions are made using category utility
ClusteringExample of incremental clustering:
EM AlgorithmEM = Expectation­Maximization Generalize k­means to probabilistic settingIterative procedure:E “expectation” step:     Calculate cluster probability for each instance M “maximization” step:     Estimate distribution parameters from cluster      probabilitiesStore cluster probabilities as instance weightsStop when improvement is negligible
Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net

WEKA: Practical Machine Learning Tools And Techniques

  • 1.
  • 2.
    Decision TreesDealing withnumeric attributesStandard method: binary splits Steps to decide where to split: Evaluate info gain for every possible split point of attributeChoose “best” split pointBut this is computationally intensive
  • 3.
    Decision TreesExampleSplit on temperature attribute: 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes No Yes YesYes No No Yes YesYes No Yes Yes Notemperature < 71.5: yes/4, no/2temperature > 71.5: yes/5, no/3Info([4,2],[5,3])= 6/14 info([4,2]) + 8/14 info([5,3]) = 0.939 bits
  • 4.
    Decision TreesDealing withmissing values:Split instances with missing values into piecesA piece going down a branch receives a weight  proportional to the popularity of the branchweights sum to 1
  • 5.
    Decision TreesPruning Makingthe decision tree less complex by removing cases of over fitting We have two types of pruning:Prepruning: Trying to decide during tree buildingPostpruning: Doing pruning after the tree has been constructedThe two types of postpruning thatare generally used are:Subtree replacement Subtree raising To decide whether to do postpruning or not, we calculate the error rate before and after the pruning
  • 6.
  • 7.
  • 8.
    Classification rulesCriteria forchoosing tests:p/t ratioMaximizes the ratio of positive instances with stress on accuracyp[log(p/t) – log(p/t)]Maximizes the number of positive instances with lesser accuracy
  • 9.
    Classification rulesGenerating goodrules:We can remove over fitting by either pruning of trees during construction or after they have been fully constructedTo prune during construction we check each newly added test. If the error rate on the pruning set increases because of this new test, we remove it
  • 10.
    Classification rulesObtaining rulesfrom partial decision trees: Algorithm
  • 11.
  • 12.
    Classification rulesAs thenode 4 was not replaced, we stop at this stageNow each leaf node gives us a possible ruleChoose the leaf which covers the greatest number of instances
  • 13.
    Extending linear modelsSupportvector machines:Support vector machines are algorithms for learning linear classifierThey use maximum marginal hyper plane: removes over fittingThe instances closest to the maximum marginal hyper plane are support vectors, rest all instances can be ignored
  • 14.
  • 15.
    Extending linear modelsSupportvector machines:The hyper plane can be written as:Support vector: All instances for which alpha(i) > 0b and alpha are determined using software packagesThe hyper plane can also be written using kernel as:
  • 16.
    Extending linear modelsMultilayerperceptron:We can create a network of perceptron to approximate arbitrary target concepts Multilayer perceptron is an example of an artificial neural networkConsists of: input layer, hidden layer(s), and output layer  Structure of MLP is usually found by experimentationParameters can be found using backpropagation
  • 17.
  • 18.
    Extending linear modelsBackpropagation:f(x) = 1/(1+exp(-x))Error = ½(y-f(x))^2So we try to minimize the error and get:Now just calculate the above expression for all training instances and do: w(i) = w(i) – L(dE/dw)We assume values of w in the starting
  • 19.
    ClusteringIncremental clustering: StepsTreeconsists of empty root nodeAdd instances one by oneUpdate tree at appropriately at each stage To update, find the right leaf for an instance May involve restructuring the treeRestructuring: Merging and ReplacementDecisions are made using category utility
  • 20.
  • 21.
    EM AlgorithmEM = Expectation­Maximization Generalize k­means to probabilistic settingIterative procedure:E “expectation” step: Calculate cluster probability for each instance M “maximization” step: Estimate distribution parameters from cluster  probabilitiesStore cluster probabilities as instance weightsStop when improvement is negligible
  • 22.
    Visit more selfhelp tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net