• Input: C, kernel, kernel parameters, epsilon
• Initialize b and all ’s to 0
• Repeat until KKT satisfied (to within epsilon):
– Find an example e1 that violates KKT (prefer unbound
examples here, choose randomly among those)
– Choose a second example e2. Prefer one to maximize step
size (in practice, faster to just maximize |E1 – E2|). If that
fails to result in change, randomly choose unbound
example. If that fails, randomly choose example. If that
fails, re-choose e1.
– Update α1 and α2 in one step
– Compute new threshold b
Updating Two ’s: One SMO Step
• Given examples e1 and e2, set
• Clip this value in the natural way: if y1 = y2 then:
• Set where s = y1y2
- What is Overfitting? How to avoid it?
- “Cross-validation, regularization, bagging”
- What is regularization? Why do we need it?
- What is Bias-Variance tradeoff?
• JP wants to do CMO assignment, but he does
not know any of the answers.
• What will JP do?
D2 Dt-1 Dt
C1 C2 Ct -1 Ct
– Bagging (Helps reducing variance of the classifier)
– Boosting (Adaboost) (Helps in improving the
accuracy of the classifier)
• JP’s very practical problem:- “Whether to go
to prakruthi for tea or not?”
Ask Rishabh if he
wants to come?
Does Rishabh has
money for both of us?
Don’t go for tea
Go for tea Don’t go for tea
• Ensemble method specifically designed for
decision tree classifiers
• Random Forests grows many classification
trees (that is why the name!)
• Ensemble of unpruned decision trees
• Each base classifier classifies a “new” vector
• Forest chooses the classification having the
most votes (over all the trees in the forest)
• Introduce two sources of randomness:
“Bagging” and “Random input vectors”
– Each tree is grown using a bootstrap sample of
– At each node, best split is chosen from random
sample of mtry variables instead of all variables
Random Forest Algorithm
• M input variables, a number m<<M is specified such that at
each node, m variables are selected at random out of the M
and the best split on these m is used to split the node.
• m is held constant during the forest growing
• Each tree is grown to the largest extent possible
• There is no pruning
• Bagging using decision trees is a special case of random
forests when m=M
Random Forest Algorithm
• Good accuracy without over-fitting
• Fast algorithm (can be faster than growing/pruning a single
tree); easily parallelized
• Handle high dimensional data without much problem
• Only one tuning parameter mtry , usually not sensitive to it