Upcoming SlideShare
×

# 2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …

186 views

Published on

Published in: Education, Technology
1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
186
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
9
0
Likes
1
Embeds 0
No embeds

No notes for slide

### 2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …

1. 1. A gentle introduction to themathematics of biosurveillance:Bayes Rule and Bayes ClassifiersAssociate MemberThe RODS LabUniversity of PittburghCarnegie Mellon Universityhttp://rods.health.pitt.eduAndrew W. MooreProfessorThe Auton LabSchool of Computer ScienceCarnegie Mellon Universityhttp://www.autonlab.orgawm@cs.cmu.edu412-268-7599
2. 2. 2What we’re going to do• We will review the concept of reasoning withuncertainty• Also known as probability• This is a fundamental building block• It’s really going to be worth it
3. 3. 3Discrete Random Variables• A is a Boolean-valued random variable if Adenotes an event, and there is some degreeof uncertainty as to whether A occurs.• Examples• A = The next patient you examine is sufferingfrom inhalational anthrax• A = The next patient you examine has a cough• A = There is an active terrorist cell in your city
4. 4. 4Probabilities• We write P(A) as “the fraction of possibleworlds in which A is true”• We could at this point spend 2 hours on thephilosophy of this.• But we won’t.
5. 5. 5Visualizing AEvent space ofall possibleworldsIts area is 1Worlds in which A is FalseWorlds in whichA is trueP(A) = Area ofreddish oval
6. 6. 7The Axioms Of Probability• 0 <= P(A) <= 1• P(True) = 1• P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)The area of A can’t getany smaller than 0And a zero area wouldmean no world couldever have A true
7. 7. 8Interpreting the axioms• 0 <= P(A) <= 1• P(True) = 1• P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)The area of A can’t getany bigger than 1And an area of 1 wouldmean all worlds will haveA true
8. 8. 9Interpreting the axioms• 0 <= P(A) <= 1• P(True) = 1• P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)AB
9. 9. 10ABInterpreting the axioms• 0 <= P(A) <= 1• P(True) = 1• P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)P(A or B)BP(A and B)Simple addition and subtraction
10. 10. 12Another important theorem• 0 <= P(A) <= 1, P(True) = 1, P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)From these we can prove:P(A) = P(A and B) + P(A and not B)A B
11. 11. 13Conditional Probability• P(A|B) = Fraction of worlds in which B is truethat also have A trueFHH = “Have a headache”F = “Coming down with Flu”P(H) = 1/10P(F) = 1/40P(H|F) = 1/2“Headaches are rare and fluis rarer, but if you’re comingdown with ‘flu there’s a 50-50 chance you’ll have aheadache.”
12. 12. 14Conditional ProbabilityFHH = “Have a headache”F = “Coming down with Flu”P(H) = 1/10P(F) = 1/40P(H|F) = 1/2P(H|F) = Fraction of flu-inflictedworlds in which you have aheadache= #worlds with flu and headache------------------------------------#worlds with flu= Area of “H and F” region------------------------------Area of “F” region= P(H and F)---------------P(F)
13. 13. 15Definition of Conditional ProbabilityP(A and B)P(A|B) = -----------P(B)Corollary: The Chain RuleP(A and B) = P(A|B) P(B)
14. 14. 16Probabilistic InferenceFHH = “Have a headache”F = “Coming down with Flu”P(H) = 1/10P(F) = 1/40P(H|F) = 1/2One day you wake up with a headache. You think: “Drat!50% of flus are associated with headaches so I must have a50-50 chance of coming down with flu”Is this reasoning good?
15. 15. 17Probabilistic InferenceFHH = “Have a headache”F = “Coming down with Flu”P(H) = 1/10P(F) = 1/40P(H|F) = 1/2P(F and H) = …P(F|H) = …
16. 16. 18Probabilistic InferenceFHH = “Have a headache”F = “Coming down with Flu”P(H) = 1/10P(F) = 1/40P(H|F) = 1/281101801)()and()|( HPHFPHFP80140121)()|()and(  FPFHPHFP
17. 17. 19What we just did…P(A ^ B) P(A|B) P(B)P(B|A) = ----------- = ---------------P(A) P(A)This is Bayes RuleBayes, Thomas (1763) An essaytowards solving a problem in the doctrineof chances. Philosophical Transactionsof the Royal Society of London, 53:370-418
19. 19. 21)|( SBP)()and(SPSBP)()and(SPBSP)notand()and()and(BSPBSPBSP)notand()and()()|(BSPBSPBPBSP)not()not|()()|()()|(BPBSPBPBSPBPBSP139213121432143
21. 21. 23Bayesian DiagnosisBuzzword Meaning In ourexampleOurexample’svalueTrue State The true state of the world, whichyou would like to knowIs the restaurantbad?
22. 22. 24Bayesian DiagnosisBuzzword Meaning In ourexampleOurexample’svalueTrue State The true state of the world, whichyou would like to knowIs the restaurantbad?Prior Prob(true state = x) P(Bad) 1/2
23. 23. 25Bayesian DiagnosisBuzzword Meaning In ourexampleOurexample’svalueTrue State The true state of the world, whichyou would like to knowIs the restaurantbad?Prior Prob(true state = x) P(Bad) 1/2Evidence Some symptom, or other thingyou can observeSmudge
24. 24. 26Bayesian DiagnosisBuzzword Meaning In ourexampleOurexample’svalueTrue State The true state of the world, whichyou would like to knowIs the restaurantbad?Prior Prob(true state = x) P(Bad) 1/2Evidence Some symptom, or other thingyou can observeConditional Probability of seeing evidence ifyou did know the true stateP(Smudge|Bad) 3/4P(Smudge|not Bad) 1/3
25. 25. 27Bayesian DiagnosisBuzzword Meaning In ourexampleOurexample’svalueTrue State The true state of the world, whichyou would like to knowIs the restaurantbad?Prior Prob(true state = x) P(Bad) 1/2Evidence Some symptom, or other thingyou can observeConditional Probability of seeing evidence ifyou did know the true stateP(Smudge|Bad) 3/4P(Smudge|not Bad) 1/3Posterior The Prob(true state = x | someevidence)P(Bad|Smudge) 9/13
26. 26. 28Bayesian DiagnosisBuzzword Meaning In ourexampleOurexample’svalueTrue State The true state of the world, whichyou would like to knowIs the restaurantbad?Prior Prob(true state = x) P(Bad) 1/2Evidence Some symptom, or other thingyou can observeConditional Probability of seeing evidence ifyou did know the true stateP(Smudge|Bad) 3/4P(Smudge|not Bad) 1/3Posterior The Prob(true state = x | someevidence)P(Bad|Smudge) 9/13Inference,Diagnosis,BayesianReasoningGetting the posterior from theprior and the evidence
27. 27. 29Bayesian DiagnosisBuzzword Meaning In ourexampleOurexample’svalueTrue State The true state of the world, whichyou would like to knowIs the restaurantbad?Prior Prob(true state = x) P(Bad) 1/2Evidence Some symptom, or other thingyou can observeConditional Probability of seeing evidence ifyou did know the true stateP(Smudge|Bad) 3/4P(Smudge|not Bad) 1/3Posterior The Prob(true state = x | someevidence)P(Bad|Smudge) 9/13Inference,Diagnosis,BayesianReasoningGetting the posterior from theprior and the evidenceDecisiontheoryCombining the posterior withknown costs in order to decidewhat to do
28. 28. 30Many Pieces of Evidence
29. 29. 31Many Pieces of EvidencePat walks in to the surgery.Pat is sore and has a headache but no cough
30. 30. 32Many Pieces of EvidenceP(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3Pat walks in to the surgery.Pat is sore and has a headache but no coughPriorsConditionals
31. 31. 33Many Pieces of EvidenceP(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3Pat walks in to the surgery.Pat is sore and has a headache but no coughWhat is P( F | H and not C and S ) ?PriorsConditionals
32. 32. 34P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3The NaïveAssumption
33. 33. 35If I know Pat has Flu……and I want to know if Pat has a cough……it won’t help me to find out whether Pat is soreP(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3The NaïveAssumption
34. 34. 36If I know Pat has Flu……and I want to know if Pat has a cough……it won’t help me to find out whether Pat is sore)|()notand|()|()and|(FCPSFCPFCPSFCPCoughing is explained away by FluP(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3The NaïveAssumption
35. 35. 37P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3The NaïveAssumption:General CaseIf I know the true state……and I want to know about one of thesymptoms……then it won’t help me to find out anythingabout the other symptoms)symptomsotherandstatetrue|Symptom(POther symptoms are explained away by the true state)statetrue|Symptom(P
36. 36. 38P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3The NaïveAssumption:General CaseIf I know the true state……and I want to know about one of thesymptoms……then it won’t help me to find out anythingabout the other symptoms)symptomsotherandstatetrue|Symptom(POther symptoms are explained away by the true state)statetrue|Symptom(P
37. 37. 39)andnotand|( SCHFPP(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
38. 38. 40)andnotand|( SCHFP)andnotand()andandnotand(SCHPFSCHPP(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
39. 39. 41)andnotand|( SCHFP)andnotand()andandnotand(SCHPFSCHP)notandandnotand()andandnotand()andandnotand(FSCHPFSCHPFSCHPP(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
40. 40. 42)andnotand|( SCHFP)andnotand()andandnotand(SCHPFSCHP)notandandnotand()andandnotand()andandnotand(FSCHPFSCHPFSCHPHow do I get P(H and not C and S and F)?P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
41. 41. 43P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3)andandnotand( FSCHP
42. 42. 44P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3)andandnot()andandnot|( FSCPFSCHP )andandnotand( FSCHPChain rule: P( █ and █ ) = P( █ | █ ) × P( █ )
43. 43. 45P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3)andandnot()andandnot|( FSCPFSCHP )andandnotand( FSCHPNaïve assumption: lack of cough and soreness haveno effect on headache if I am already assuming Flu)andandnot()|( FSCPFHP 
44. 44. 46P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3)andandnot()andandnot|( FSCPFSCHP )andandnot()|( FSCPFHP )and()and|not()|( FSPFSCPFHP )andandnotand( FSCHPChain rule: P( █ and █ ) = P( █ | █ ) × P( █ )
45. 45. 47P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3)andandnot()andandnot|( FSCPFSCHP )andandnot()|( FSCPFHP )and()and|not()|( FSPFSCPFHP )and()|not()|( FSPFCPFHP )andandnotand( FSCHPNaïve assumption: Sore has no effect on Cough if Iam already assuming Flu
46. 46. 48P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3)andandnot()andandnot|( FSCPFSCHP )andandnot()|( FSCPFHP )and()and|not()|( FSPFSCPFHP )and()|not()|( FSPFCPFHP )()|()|not()|( FPFSPFCPFHP )andandnotand( FSCHPChain rule: P( █ and █ ) = P( █ | █ ) × P( █ )
47. 47. 49P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3)andandnot()andandnot|( FSCPFSCHP )andandnot()|( FSCPFHP )and()and|not()|( FSPFSCPFHP )and()|not()|( FSPFCPFHP )()|()|not()|( FPFSPFCPFHP )andandnotand( FSCHP32014014332121
48. 48. 50)andnotand|( SCHFP)andnotand()andandnotand(SCHPFSCHP)notandandnotand()andandnotand()andandnotand(FSCHPFSCHPFSCHPP(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
49. 49. 51)notandandnot()not|( FSCPFHP )notand()notand|not()not|( FSPFSCPFHP )notand()not|not()not|( FSPFCPFHP )not()not|()not|not()not|( FPFSPFCPFHP )notandandnotand( FSCHP)notandandnot()notandandnot|( FSCPFSCHP P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/32887403931611787
50. 50. 52)andnotand|( SCHFP)andnotand()andandnotand(SCHPFSCHP)notandandnotand()andandnotand()andandnotand(FSCHPFSCHPFSCHPP(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3288732013201= 0.1139 (11% chance of Flu, given symptoms)
51. 51. 53Building A Bayes ClassifierP(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3PriorsConditionals
52. 52. 54The General Case
53. 53. 55Building a naïve Bayesian ClassifierP(State=1) = ___ P(State=2) = ___ … P(State=N) = ___P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___: : : : : : :P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___: : : : : : :P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___: : :P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___: : : : : : :P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___Assume:• True state has N possible values: 1, 2, 3 .. N• There are K symptoms called Symptom1, Symptom2, … SymptomK• Symptomi has Mi possible values: 1, 2, .. Mi
54. 54. 56Building a naïve Bayesian ClassifierP(State=1) = ___ P(State=2) = ___ … P(State=N) = ___P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___: : : : : : :P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___: : : : : : :P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___: : :P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___: : : : : : :P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___Assume:• True state has N values: 1, 2, 3 .. N• There are K symptoms called Symptom1, Symptom2, … SymptomK• Symptomi has Mi values: 1, 2, .. MiExample:P( Anemic | Liver Cancer) = 0.21
55. 55. 57P(State=1) = ___ P(State=2) = ___ … P(State=N) = ___P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___: : : : : : :P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___: : : : : : :P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___: : :P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___: : : : : : :P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___)sympandsympandsymp|state( 2211 nn XXXYP  )sympandsympandsymp()stateandsympandsympandsymp(22112211nnnnXXXPYXXXP ZnnnnZXXXPYXXXP)stateandsympandsympandsymp()stateandsympandsympandsymp(22112211 ZniiiniiiZPZ|XPYPY|XP)state()statesymp()state()statesymp(11
56. 56. 58P(State=1) = ___ P(State=2) = ___ … P(State=N) = ___P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___: : : : : : :P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___: : : : : : :P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___: : :P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___: : : : : : :P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___)sympandsympandsymp|state( 2211 nn XXXYP  )sympandsympandsymp()stateandsympandsympandsymp(22112211nnnnXXXPYXXXP ZnnnnZXXXPYXXXP)stateandsympandsympandsymp()stateandsympandsympandsymp(22112211 ZniiiniiiZPZ|XPYPY|XP)state()statesymp()state()statesymp(11
57. 57. 59Conclusion• You will hear lots of “Bayesian” this and“conditional probability” that this week.• It’s simple: don’t let wooly academic types trick youinto thinking it is fancy.• You should know:• What are: Bayesian Reasoning, ConditionalProbabilities, Priors, Posteriors.• Appreciate how conditional probabilities are manipulated.• Why the Naïve Bayes Assumption is Good.• Why the Naïve Bayes Assumption is Evil.
58. 58. 60Text mining• Motivation: an enormous (and growing!) supplyof rich data• Most of the available text data is unstructured…• Some of it is semi-structured:• Header entries (title, authors’ names, section titles,keyword lists, etc.)• Running text bodies (main body, abstract, summary,etc.)• Natural Language Processing (NLP)• Text Information Retrieval
59. 59. 61Text processing• Natural Language Processing:• Automated understanding of text is a very very verychallenging Artificial Intelligence problem• Aims on extracting semantic contents of the processeddocuments• Involves extensive research into semantics, grammar,automated reasoning, …• Several factors making it tough for a computer include:• Polysemy (the same word having several differentmeanings)• Synonymy (several different ways to describe thesame thing)
60. 60. 62Text processing• Text Information Retrieval:• Search through collections of documents in order to findobjects:• relevant to a specific query• similar to a specific document• For practical reasons, the text documents areparameterized• Terminology:• Documents (text data units: books, articles,paragraphs, other chunks such as emailmessages, ...)• Terms (specific words, word pairs, phrases)
61. 61. 63Text Information Retrieval• Typically, the text databases are parametrized with a document-term matrix• Each row of the matrix corresponds to one of the documents• Each column corresponds to a different termShortness of breathDifficulty breathingRash on neckSore neck and difficulty breathingJust plain ugly
62. 62. 64Text Information Retrieval• Typically, the text databases are parametrized with a document-term matrix• Each row of the matrix corresponds to one of the documents• Each column corresponds to a different termbreath difficulty just neck plain rash short sore uglyShortness of breath 1 0 0 0 0 0 1 0 0Difficulty breathing 1 1 0 0 0 0 0 0 0Rash on neck 0 0 0 1 0 1 0 0 0Sore neck and difficulty breathing 1 1 0 1 0 0 0 1 0Just plain ugly 0 0 1 0 1 0 0 0 1
63. 63. 65Parametrizationfor Text Information Retrieval• Depending on the particular method ofparametrization the matrix entries may be:• binary(telling whether a term Tj is present in thedocument Di or not)• counts (frequencies)(total number of repetitions of a term Tj in Di)• weighted frequencies(see the slide following the next)
64. 64. 66Typical applications of Text IR• Document indexing and classification(e.g. library systems)• Search engines(e.g. the Web)• Extraction of information from textual sources(e.g. profiling of personal records, consumercomplaint processing)
65. 65. 67Typical applications of Text IR• Document indexing and classification(e.g. library systems)• Search engines(e.g. the Web)• Extraction of information from textual sources(e.g. profiling of personal records, consumercomplaint processing)
66. 66. 68Building a naïve Bayesian ClassifierP(State=1) = ___ P(State=2) = ___ … P(State=N) = ___P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___: : : : : : :P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___: : : : : : :P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___: : :P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___: : : : : : :P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___Assume:• True state has N values: 1, 2, 3 .. N• There are K symptoms called Symptom1, Symptom2, … SymptomK• Symptomi has Mi values: 1, 2, .. Mi
67. 67. 69Building a naïve Bayesian ClassifierP(State=1) = ___ P(State=2) = ___ … P(State=N) = ___P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___: : : : : : :P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___: : : : : : :P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___: : :P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___: : : : : : :P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___Assume:• True state has N values: 1, 2, 3 .. N• There are K symptoms called Symptom1, Symptom2, … SymptomK• Symptomi has Mi values: 1, 2, .. Mi
68. 68. 70Building a naïve Bayesian ClassifierP(State=1) = ___ P(State=2) = ___ … P(State=N) = ___P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___: : : : : : :P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___: : : : : : :P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___: : :P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___: : : : : : :P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___Assume:• True state has N values: 1, 2, 3 .. N• There are K symptoms called Symptom1, Symptom2, … SymptomK• Symptomi has Mi values: 1, 2, .. Mi
69. 69. 71Building a naïve Bayesian ClassifierP(State=1) = ___ P(State=2) = ___ … P(State=N) = ___P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___: : : : : : :P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___: : : : : : :P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___: : :P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___: : : : : : :P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___Assume:• True state has N values: 1, 2, 3 .. N• There are K symptoms called Symptom1, Symptom2, … SymptomK• Symptomi has Mi values: 1, 2, .. Mi
70. 70. 72Building a naïve Bayesian ClassifierAssume:• True state has N values: 1, 2, 3 .. N• There are K symptoms called Symptom1, Symptom2, … SymptomK• Symptomi has Mi values: 1, 2, .. MiP(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___
71. 71. 73Building a naïve Bayesian ClassifierAssume:• True state has N values: 1, 2, 3 .. N• There are K symptoms called Symptom1, Symptom2, … SymptomK• Symptomi has Mi values: 1, 2, .. MiP(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___Example:Prob( Chief Complaint contains “Blood” | Prodrome = Respiratory ) = 0.003
72. 72. 74Building a naïve Bayesian ClassifierAssume:• True state has N values: 1, 2, 3 .. N• There are K symptoms called Symptom1, Symptom2, … SymptomK• Symptomi has Mi values: 1, 2, .. MiP(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___Example:Prob( Chief Complaint contains “Blood” | Prodrome = Respiratory ) = 0.003
73. 73. 75Building a naïve Bayesian ClassifierAssume:• True state has N values: 1, 2, 3 .. N• There are K symptoms called Symptom1, Symptom2, … SymptomK• Symptomi has Mi values: 1, 2, .. MiP(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___Example:Prob( Chief Complaint contains “Blood” | Prodrome = Respiratory ) = 0.003
74. 74. 76Learning a Bayesian Classifierbreath difficulty just neck plain rash short sore uglyShortness of breath 1 0 0 0 0 0 1 0 0Difficulty breathing 1 1 0 0 0 0 0 0 0Rash on neck 0 0 0 1 0 1 0 0 0Sore neck and difficulty breathing 1 1 0 1 0 0 0 1 0Just plain ugly 0 0 1 0 1 0 0 0 11. Before deployment of classifier,
75. 75. 77Learning a Bayesian ClassifierEXPERTSAYSbreath difficulty just neck plain rash short sore uglyShortness of breath Resp 1 0 0 0 0 0 1 0 0Difficulty breathing Resp 1 1 0 0 0 0 0 0 0Rash on neck Rash 0 0 0 1 0 1 0 0 0Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0Just plain ugly Other 0 0 1 0 1 0 0 0 11. Before deployment of classifier, get labeled training data
76. 76. 78Learning a Bayesian ClassifierEXPERT SAYS breath difficulty just neck plain rash short sore uglyShortness of breath Resp 1 0 0 0 0 0 1 0 0Difficulty breathing Resp 1 1 0 0 0 0 0 0 0Rash on neck Rash 0 0 0 1 0 1 0 0 0Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0Just plain ugly Other 0 0 1 0 1 0 0 0 11. Before deployment of classifier, get labeled training data2. Learn parameters (conditionals, and priors)P(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___
77. 77. 79Learning a Bayesian ClassifierEXPERT SAYS breath difficulty just neck plain rash short sore uglyShortness of breath Resp 1 0 0 0 0 0 1 0 0Difficulty breathing Resp 1 1 0 0 0 0 0 0 0Rash on neck Rash 0 0 0 1 0 1 0 0 0Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0Just plain ugly Other 0 0 1 0 1 0 0 0 11. Before deployment of classifier, get labeled training data2. Learn parameters (conditionals, and priors)P(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___recordstrainingresp""numbreath""containingrecordstrainingresp""num)Respprodrome|1breath( P
78. 78. 80Learning a Bayesian ClassifierEXPERT SAYS breath difficulty just neck plain rash short sore uglyShortness of breath Resp 1 0 0 0 0 0 1 0 0Difficulty breathing Resp 1 1 0 0 0 0 0 0 0Rash on neck Rash 0 0 0 1 0 1 0 0 0Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0Just plain ugly Other 0 0 1 0 1 0 0 0 11. Before deployment of classifier, get labeled training data2. Learn parameters (conditionals, and priors)P(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___recordstrainingresp""numbreath""containingrecordstrainingresp""num)Respprodrome|1breath( Precordstrainingnumtotalrecordstrainingresp""num)Respprodrome( P
79. 79. 81Learning a Bayesian ClassifierEXPERT SAYS breath difficulty just neck plain rash short sore uglyShortness of breath Resp 1 0 0 0 0 0 1 0 0Difficulty breathing Resp 1 1 0 0 0 0 0 0 0Rash on neck Rash 0 0 0 1 0 1 0 0 0Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0Just plain ugly Other 0 0 1 0 1 0 0 0 11. Before deployment of classifier, get labeled training data2. Learn parameters (conditionals, and priors)P(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___
80. 80. 82Learning a Bayesian ClassifierEXPERT SAYS breath difficulty just neck plain rash short sore uglyShortness of breath Resp 1 0 0 0 0 0 1 0 0Difficulty breathing Resp 1 1 0 0 0 0 0 0 0Rash on neck Rash 0 0 0 1 0 1 0 0 0Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0Just plain ugly Other 0 0 1 0 1 0 0 0 11. Before deployment of classifier, get labeled training data2. Learn parameters (conditionals, and priors)3. During deployment, apply classifierP(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___
81. 81. 83Learning a Bayesian ClassifierEXPERT SAYS breath difficulty just neck plain rash short sore uglyShortness of breath Resp 1 0 0 0 0 0 1 0 0Difficulty breathing Resp 1 1 0 0 0 0 0 0 0Rash on neck Rash 0 0 0 1 0 1 0 0 0Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0Just plain ugly Other 0 0 1 0 1 0 0 0 11. Before deployment of classifier, get labeled training data2. Learn parameters (conditionals, and priors)3. During deployment, apply classifierNew Chief Complaint: “Just sore breath”P(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___
82. 82. 84Learning a Bayesian ClassifierEXPERT SAYS breath difficulty just neck plain rash short sore uglyShortness of breath Resp 1 0 0 0 0 0 1 0 0Difficulty breathing Resp 1 1 0 0 0 0 0 0 0Rash on neck Rash 0 0 0 1 0 1 0 0 0Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0Just plain ugly Other 0 0 1 0 1 0 0 0 11. Before deployment of classifier, get labeled training data2. Learn parameters (conditionals, and priors)3. During deployment, apply classifier...)1,just0,difficulty1,breath|GIprodrome( P ZZPPPGIPPP)state(Z)...prod|0difficulty(Z)prod|1breath()state(GI)...prod|0difficulty(GI)prod|1breath(New Chief Complaint: “Just sore breath”P(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___
83. 83. 85CoCo Performance (AUC scores)• Botulism 0.78• rash, 0.91• neurological 0.92• hemorrhagic, 0.93;• constitutional 0.93• gastrointestinal 0.95• other, 0.96• respiratory 0.96
84. 84. 86Conclusion• Automated text extraction is increasingly important• There is a very wide world of text extraction outsideBiosurveillance• The field has changed very fast, even in the past threeyears.• Warning, although Bayes Classifiers are simplest toimplement, Logistic Regression or other discriminativemethods often learn more accurately. Consider using offthe shelf methods, such as William Cohen’s successful“minor third” open-source libraries:http://minorthird.sourceforge.net/• Real systems (including CoCo) have many ingeniousspecial-case improvements.
85. 85. 87Discussion1. What new data sources should we applyalgorithms to?1. EG Self-reporting?2. What are related surveillance problems to whichthese kinds of algorithms can be applied?3. Where are the gaps in the current algorithmsworld?4. Are there other spatial problems out there?5. Could new or pre-existing algorithms help in theperiod after an outbreak is detected?6. Other comments about favorite tools of the trade.