Statistical learning

434 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
434
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
15
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Statistical learning

  1. 1. Statistical LearningV.SaranyaAP/CSESri Vidya College of Engineering andTechnology,Virudhunagar
  2. 2. Statistical Approaches• Statistical Learning• Naïve Bayes• Instance-based Learning• Neural Networks
  3. 3. Example: Candy Bags• Candy comes in two flavors: cherry () and lime ()• Candy is wrapped, can’t tell which flavor until opened• There are 5 kinds of bags of candy:– H1= all cherry (P(c|h1) = 1, P(l|h1) = 0)– H2= 75% cherry, 25% lime– H3= 50% cherry, 50% lime– H4= 25% cherry, 75% lime– H5= 100% lime• Given a new bag of candy, predict H• Observations: D1, D2 , D3, …
  4. 4. Bayesian Learning• Calculate the probability of each hypothesis, given the data,and make prediction weighted by this probability (i.e. use allthe hypothesis, not just the single best)• Now, if we want to predict some unknown quantity X)h(P)h|d(P)d|h(P ii)d(P)h(P)h|d(Piii)d|h(P)h|X(P)d|X(P iii
  5. 5. Bayesian Learning cont.)h(P)h|d(P)d|h(P iii• Calculating P(h|d)• Assume the observations are i.i.d.—independentand identically distributedlikelihood priorjiji )h|d(P)h|d(P
  6. 6. Example:• Hypothesis Prior over h1, …, h5 is{0.1,0.2,0.4,0.2,0.1}• Data:• Q1: After seeing d1, what is P(hi|d1)?• Q2: After seeing d1, what is P(d2= |d1)?
  7. 7. Making Statistical Inferences• Bayesian –– predictions made using all hypothesis, weighted by their probabilities• MAP – maximum a posteriori– uses the single most probable hypothesis to make prediction– often much easier than Bayesian; as we get more and more data,closer to Bayesian optimal• ML – maximum likelihood– assume uniform prior over H– when
  8. 8. Naïve Bayes (20.2)
  9. 9. Naïve Bayes• aka Idiot Bayes• particularly simple BN• makes overly strong independenceassumptions• but works surprisingly well in practice…
  10. 10. Bayesian Diagnosis• suppose we want to make a diagnosis D and there are npossible mutually exclusive diagnosis d1, …, dn• suppose there are m boolean symptoms, E1, …, Em)e,...,e|d(P m1ihow do we make diagnosis?)d(P i )d|e,...,e(P im1we need:)e,...,e(P)d|e,...,e(P)d(Pm1im1i
  11. 11. Naïve Bayes Assumption• Assume each piece of evidence (symptom) isindependent give the diagnosis• then)d|e,...,e(P im1what is the structure of the corresponding BN?m1kik )d|e(P
  12. 12. Naïve Bayes Example• possible diagnosis: Allergy, Cold and OK• possible symptoms: Sneeze, Cough and FeverWell Cold AllergyP(d) 0.9 0.05 0.05P(sneeze|d) 0.1 0.9 0.9P(cough|d) 0.1 0.8 0.7P(fever|d) 0.01 0.7 0.4my symptoms are: sneeze & cough, what isthe diagnosis?
  13. 13. Learning the Probabilities• aka parameter estimation• we need– P(di) – prior– P(ek|di) – conditional probability• use training data to estimate
  14. 14. Maximum Likelihood Estimate (MLE)• use frequencies in training set to estimate:Nn)d(p iiiikiknn)d|e(pwhere nx is shorthand for the counts of events intraining set
  15. 15. Example:D Sneeze Cough FeverAllergy yes no noWell yes no noAllergy yes no yesAllergy yes no noCold yes yes yesAllergy yes no noWell no no noWell no no noAllergy no no noAllergy yes no nowhat is: P(Allergy)?P(Sneeze| Allergy)?P(Cough| Allergy)?
  16. 16. Laplace Estimate (smoothing)• use smoothing to eliminate zeros:nN1n)d(p ii2n1n)d|e(piikikwhere n is number of possible values for dand e is assumed to have 2 possible valuesmany othersmoothingschemes…
  17. 17. Comments• Generally works well despite blanketassumption of independence• Experiments show competitive with decisiontrees on some well known test sets (UCI)• handles noisy data
  18. 18. Learning more complex Bayesiannetworks• Two subproblems:• learning structure: combinatorial search overspace of networks• learning parameters values: easy if all of thevariables are observed in the training set;harder if there are ‘hidden variables’

×