2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …
Upcoming SlideShare
Loading in...5
×
 

2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction …

on

  • 124 views

 

Statistics

Views

Total Views
124
Views on SlideShare
124
Embed Views
0

Actions

Likes
0
Downloads
4
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction … 2013-1 Machine Learning Lecture 03 - Andrew Moore - a gentle introduction … Presentation Transcript

  • A gentle introduction to themathematics of biosurveillance:Bayes Rule and Bayes ClassifiersAssociate MemberThe RODS LabUniversity of PittburghCarnegie Mellon Universityhttp://rods.health.pitt.eduAndrew W. MooreProfessorThe Auton LabSchool of Computer ScienceCarnegie Mellon Universityhttp://www.autonlab.orgawm@cs.cmu.edu412-268-7599
  • 2What we’re going to do• We will review the concept of reasoning withuncertainty• Also known as probability• This is a fundamental building block• It’s really going to be worth it
  • 3Discrete Random Variables• A is a Boolean-valued random variable if Adenotes an event, and there is some degreeof uncertainty as to whether A occurs.• Examples• A = The next patient you examine is sufferingfrom inhalational anthrax• A = The next patient you examine has a cough• A = There is an active terrorist cell in your city
  • 4Probabilities• We write P(A) as “the fraction of possibleworlds in which A is true”• We could at this point spend 2 hours on thephilosophy of this.• But we won’t.
  • 5Visualizing AEvent space ofall possibleworldsIts area is 1Worlds in which A is FalseWorlds in whichA is trueP(A) = Area ofreddish oval
  • 7The Axioms Of Probability• 0 <= P(A) <= 1• P(True) = 1• P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)The area of A can’t getany smaller than 0And a zero area wouldmean no world couldever have A true
  • 8Interpreting the axioms• 0 <= P(A) <= 1• P(True) = 1• P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)The area of A can’t getany bigger than 1And an area of 1 wouldmean all worlds will haveA true
  • 9Interpreting the axioms• 0 <= P(A) <= 1• P(True) = 1• P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)AB
  • 10ABInterpreting the axioms• 0 <= P(A) <= 1• P(True) = 1• P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)P(A or B)BP(A and B)Simple addition and subtraction
  • 12Another important theorem• 0 <= P(A) <= 1, P(True) = 1, P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)From these we can prove:P(A) = P(A and B) + P(A and not B)A B
  • 13Conditional Probability• P(A|B) = Fraction of worlds in which B is truethat also have A trueFHH = “Have a headache”F = “Coming down with Flu”P(H) = 1/10P(F) = 1/40P(H|F) = 1/2“Headaches are rare and fluis rarer, but if you’re comingdown with ‘flu there’s a 50-50 chance you’ll have aheadache.”
  • 14Conditional ProbabilityFHH = “Have a headache”F = “Coming down with Flu”P(H) = 1/10P(F) = 1/40P(H|F) = 1/2P(H|F) = Fraction of flu-inflictedworlds in which you have aheadache= #worlds with flu and headache------------------------------------#worlds with flu= Area of “H and F” region------------------------------Area of “F” region= P(H and F)---------------P(F)
  • 15Definition of Conditional ProbabilityP(A and B)P(A|B) = -----------P(B)Corollary: The Chain RuleP(A and B) = P(A|B) P(B)
  • 16Probabilistic InferenceFHH = “Have a headache”F = “Coming down with Flu”P(H) = 1/10P(F) = 1/40P(H|F) = 1/2One day you wake up with a headache. You think: “Drat!50% of flus are associated with headaches so I must have a50-50 chance of coming down with flu”Is this reasoning good?
  • 17Probabilistic InferenceFHH = “Have a headache”F = “Coming down with Flu”P(H) = 1/10P(F) = 1/40P(H|F) = 1/2P(F and H) = …P(F|H) = …
  • 18Probabilistic InferenceFHH = “Have a headache”F = “Coming down with Flu”P(H) = 1/10P(F) = 1/40P(H|F) = 1/281101801)()and()|( HPHFPHFP80140121)()|()and(  FPFHPHFP
  • 19What we just did…P(A ^ B) P(A|B) P(B)P(B|A) = ----------- = ---------------P(A) P(A)This is Bayes RuleBayes, Thomas (1763) An essaytowards solving a problem in the doctrineof chances. Philosophical Transactionsof the Royal Society of London, 53:370-418
  • 20MenuBad Hygiene Good HygieneMenuMenuMenuMenuMenuMenu• You are a health official, deciding whether to investigate a restaurant• You lose a dollar if you get it wrong.• You win a dollar if you get it right• Half of all restaurants have bad hygiene• In a bad restaurant, ¾ of the menus are smudged• In a good restaurant, 1/3 of the menus are smudged• You are allowed to see a randomly chosen menu
  • 21)|( SBP)()and(SPSBP)()and(SPBSP)notand()and()and(BSPBSPBSP)notand()and()()|(BSPBSPBPBSP)not()not|()()|()()|(BPBSPBPBSPBPBSP139213121432143
  • 22MenuMenuMenuMenuMenuMenuMenuMenuMenu Menu Menu MenuMenu Menu Menu Menu
  • 23Bayesian DiagnosisBuzzword Meaning In ourexampleOurexample’svalueTrue State The true state of the world, whichyou would like to knowIs the restaurantbad?
  • 24Bayesian DiagnosisBuzzword Meaning In ourexampleOurexample’svalueTrue State The true state of the world, whichyou would like to knowIs the restaurantbad?Prior Prob(true state = x) P(Bad) 1/2
  • 25Bayesian DiagnosisBuzzword Meaning In ourexampleOurexample’svalueTrue State The true state of the world, whichyou would like to knowIs the restaurantbad?Prior Prob(true state = x) P(Bad) 1/2Evidence Some symptom, or other thingyou can observeSmudge
  • 26Bayesian DiagnosisBuzzword Meaning In ourexampleOurexample’svalueTrue State The true state of the world, whichyou would like to knowIs the restaurantbad?Prior Prob(true state = x) P(Bad) 1/2Evidence Some symptom, or other thingyou can observeConditional Probability of seeing evidence ifyou did know the true stateP(Smudge|Bad) 3/4P(Smudge|not Bad) 1/3
  • 27Bayesian DiagnosisBuzzword Meaning In ourexampleOurexample’svalueTrue State The true state of the world, whichyou would like to knowIs the restaurantbad?Prior Prob(true state = x) P(Bad) 1/2Evidence Some symptom, or other thingyou can observeConditional Probability of seeing evidence ifyou did know the true stateP(Smudge|Bad) 3/4P(Smudge|not Bad) 1/3Posterior The Prob(true state = x | someevidence)P(Bad|Smudge) 9/13
  • 28Bayesian DiagnosisBuzzword Meaning In ourexampleOurexample’svalueTrue State The true state of the world, whichyou would like to knowIs the restaurantbad?Prior Prob(true state = x) P(Bad) 1/2Evidence Some symptom, or other thingyou can observeConditional Probability of seeing evidence ifyou did know the true stateP(Smudge|Bad) 3/4P(Smudge|not Bad) 1/3Posterior The Prob(true state = x | someevidence)P(Bad|Smudge) 9/13Inference,Diagnosis,BayesianReasoningGetting the posterior from theprior and the evidence
  • 29Bayesian DiagnosisBuzzword Meaning In ourexampleOurexample’svalueTrue State The true state of the world, whichyou would like to knowIs the restaurantbad?Prior Prob(true state = x) P(Bad) 1/2Evidence Some symptom, or other thingyou can observeConditional Probability of seeing evidence ifyou did know the true stateP(Smudge|Bad) 3/4P(Smudge|not Bad) 1/3Posterior The Prob(true state = x | someevidence)P(Bad|Smudge) 9/13Inference,Diagnosis,BayesianReasoningGetting the posterior from theprior and the evidenceDecisiontheoryCombining the posterior withknown costs in order to decidewhat to do
  • 30Many Pieces of Evidence
  • 31Many Pieces of EvidencePat walks in to the surgery.Pat is sore and has a headache but no cough
  • 32Many Pieces of EvidenceP(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3Pat walks in to the surgery.Pat is sore and has a headache but no coughPriorsConditionals
  • 33Many Pieces of EvidenceP(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3Pat walks in to the surgery.Pat is sore and has a headache but no coughWhat is P( F | H and not C and S ) ?PriorsConditionals
  • 34P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3The NaïveAssumption
  • 35If I know Pat has Flu……and I want to know if Pat has a cough……it won’t help me to find out whether Pat is soreP(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3The NaïveAssumption
  • 36If I know Pat has Flu……and I want to know if Pat has a cough……it won’t help me to find out whether Pat is sore)|()notand|()|()and|(FCPSFCPFCPSFCPCoughing is explained away by FluP(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3The NaïveAssumption
  • 37P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3The NaïveAssumption:General CaseIf I know the true state……and I want to know about one of thesymptoms……then it won’t help me to find out anythingabout the other symptoms)symptomsotherandstatetrue|Symptom(POther symptoms are explained away by the true state)statetrue|Symptom(P
  • 38P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3The NaïveAssumption:General CaseIf I know the true state……and I want to know about one of thesymptoms……then it won’t help me to find out anythingabout the other symptoms)symptomsotherandstatetrue|Symptom(POther symptoms are explained away by the true state)statetrue|Symptom(P
  • 39)andnotand|( SCHFPP(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
  • 40)andnotand|( SCHFP)andnotand()andandnotand(SCHPFSCHPP(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
  • 41)andnotand|( SCHFP)andnotand()andandnotand(SCHPFSCHP)notandandnotand()andandnotand()andandnotand(FSCHPFSCHPFSCHPP(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
  • 42)andnotand|( SCHFP)andnotand()andandnotand(SCHPFSCHP)notandandnotand()andandnotand()andandnotand(FSCHPFSCHPFSCHPHow do I get P(H and not C and S and F)?P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
  • 43P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3)andandnotand( FSCHP
  • 44P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3)andandnot()andandnot|( FSCPFSCHP )andandnotand( FSCHPChain rule: P( █ and █ ) = P( █ | █ ) × P( █ )
  • 45P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3)andandnot()andandnot|( FSCPFSCHP )andandnotand( FSCHPNaïve assumption: lack of cough and soreness haveno effect on headache if I am already assuming Flu)andandnot()|( FSCPFHP 
  • 46P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3)andandnot()andandnot|( FSCPFSCHP )andandnot()|( FSCPFHP )and()and|not()|( FSPFSCPFHP )andandnotand( FSCHPChain rule: P( █ and █ ) = P( █ | █ ) × P( █ )
  • 47P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3)andandnot()andandnot|( FSCPFSCHP )andandnot()|( FSCPFHP )and()and|not()|( FSPFSCPFHP )and()|not()|( FSPFCPFHP )andandnotand( FSCHPNaïve assumption: Sore has no effect on Cough if Iam already assuming Flu
  • 48P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3)andandnot()andandnot|( FSCPFSCHP )andandnot()|( FSCPFHP )and()and|not()|( FSPFSCPFHP )and()|not()|( FSPFCPFHP )()|()|not()|( FPFSPFCPFHP )andandnotand( FSCHPChain rule: P( █ and █ ) = P( █ | █ ) × P( █ )
  • 49P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3)andandnot()andandnot|( FSCPFSCHP )andandnot()|( FSCPFHP )and()and|not()|( FSPFSCPFHP )and()|not()|( FSPFCPFHP )()|()|not()|( FPFSPFCPFHP )andandnotand( FSCHP32014014332121
  • 50)andnotand|( SCHFP)andnotand()andandnotand(SCHPFSCHP)notandandnotand()andandnotand()andandnotand(FSCHPFSCHPFSCHPP(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3
  • 51)notandandnot()not|( FSCPFHP )notand()notand|not()not|( FSPFSCPFHP )notand()not|not()not|( FSPFCPFHP )not()not|()not|not()not|( FPFSPFCPFHP )notandandnotand( FSCHP)notandandnot()notandandnot|( FSCPFSCHP P(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/32887403931611787
  • 52)andnotand|( SCHFP)andnotand()andandnotand(SCHPFSCHP)notandandnotand()andandnotand()andandnotand(FSCHPFSCHPFSCHPP(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3288732013201= 0.1139 (11% chance of Flu, given symptoms)
  • 53Building A Bayes ClassifierP(Flu) = 1/40 P(Not Flu) = 39/40P( Headache | Flu ) = 1/2 P( Headache | not Flu ) = 7 / 78P( Cough | Flu ) = 2/3 P( Cough | not Flu ) = 1/6P( Sore | Flu ) = 3/4 P( Sore | not Flu ) = 1/3PriorsConditionals
  • 54The General Case
  • 55Building a naïve Bayesian ClassifierP(State=1) = ___ P(State=2) = ___ … P(State=N) = ___P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___: : : : : : :P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___: : : : : : :P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___: : :P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___: : : : : : :P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___Assume:• True state has N possible values: 1, 2, 3 .. N• There are K symptoms called Symptom1, Symptom2, … SymptomK• Symptomi has Mi possible values: 1, 2, .. Mi
  • 56Building a naïve Bayesian ClassifierP(State=1) = ___ P(State=2) = ___ … P(State=N) = ___P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___: : : : : : :P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___: : : : : : :P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___: : :P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___: : : : : : :P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___Assume:• True state has N values: 1, 2, 3 .. N• There are K symptoms called Symptom1, Symptom2, … SymptomK• Symptomi has Mi values: 1, 2, .. MiExample:P( Anemic | Liver Cancer) = 0.21
  • 57P(State=1) = ___ P(State=2) = ___ … P(State=N) = ___P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___: : : : : : :P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___: : : : : : :P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___: : :P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___: : : : : : :P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___)sympandsympandsymp|state( 2211 nn XXXYP  )sympandsympandsymp()stateandsympandsympandsymp(22112211nnnnXXXPYXXXP ZnnnnZXXXPYXXXP)stateandsympandsympandsymp()stateandsympandsympandsymp(22112211 ZniiiniiiZPZ|XPYPY|XP)state()statesymp()state()statesymp(11
  • 58P(State=1) = ___ P(State=2) = ___ … P(State=N) = ___P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___: : : : : : :P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___: : : : : : :P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___: : :P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___: : : : : : :P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___)sympandsympandsymp|state( 2211 nn XXXYP  )sympandsympandsymp()stateandsympandsympandsymp(22112211nnnnXXXPYXXXP ZnnnnZXXXPYXXXP)stateandsympandsympandsymp()stateandsympandsympandsymp(22112211 ZniiiniiiZPZ|XPYPY|XP)state()statesymp()state()statesymp(11
  • 59Conclusion• You will hear lots of “Bayesian” this and“conditional probability” that this week.• It’s simple: don’t let wooly academic types trick youinto thinking it is fancy.• You should know:• What are: Bayesian Reasoning, ConditionalProbabilities, Priors, Posteriors.• Appreciate how conditional probabilities are manipulated.• Why the Naïve Bayes Assumption is Good.• Why the Naïve Bayes Assumption is Evil.
  • 60Text mining• Motivation: an enormous (and growing!) supplyof rich data• Most of the available text data is unstructured…• Some of it is semi-structured:• Header entries (title, authors’ names, section titles,keyword lists, etc.)• Running text bodies (main body, abstract, summary,etc.)• Natural Language Processing (NLP)• Text Information Retrieval
  • 61Text processing• Natural Language Processing:• Automated understanding of text is a very very verychallenging Artificial Intelligence problem• Aims on extracting semantic contents of the processeddocuments• Involves extensive research into semantics, grammar,automated reasoning, …• Several factors making it tough for a computer include:• Polysemy (the same word having several differentmeanings)• Synonymy (several different ways to describe thesame thing)
  • 62Text processing• Text Information Retrieval:• Search through collections of documents in order to findobjects:• relevant to a specific query• similar to a specific document• For practical reasons, the text documents areparameterized• Terminology:• Documents (text data units: books, articles,paragraphs, other chunks such as emailmessages, ...)• Terms (specific words, word pairs, phrases)
  • 63Text Information Retrieval• Typically, the text databases are parametrized with a document-term matrix• Each row of the matrix corresponds to one of the documents• Each column corresponds to a different termShortness of breathDifficulty breathingRash on neckSore neck and difficulty breathingJust plain ugly
  • 64Text Information Retrieval• Typically, the text databases are parametrized with a document-term matrix• Each row of the matrix corresponds to one of the documents• Each column corresponds to a different termbreath difficulty just neck plain rash short sore uglyShortness of breath 1 0 0 0 0 0 1 0 0Difficulty breathing 1 1 0 0 0 0 0 0 0Rash on neck 0 0 0 1 0 1 0 0 0Sore neck and difficulty breathing 1 1 0 1 0 0 0 1 0Just plain ugly 0 0 1 0 1 0 0 0 1
  • 65Parametrizationfor Text Information Retrieval• Depending on the particular method ofparametrization the matrix entries may be:• binary(telling whether a term Tj is present in thedocument Di or not)• counts (frequencies)(total number of repetitions of a term Tj in Di)• weighted frequencies(see the slide following the next)
  • 66Typical applications of Text IR• Document indexing and classification(e.g. library systems)• Search engines(e.g. the Web)• Extraction of information from textual sources(e.g. profiling of personal records, consumercomplaint processing)
  • 67Typical applications of Text IR• Document indexing and classification(e.g. library systems)• Search engines(e.g. the Web)• Extraction of information from textual sources(e.g. profiling of personal records, consumercomplaint processing)
  • 68Building a naïve Bayesian ClassifierP(State=1) = ___ P(State=2) = ___ … P(State=N) = ___P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___: : : : : : :P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___: : : : : : :P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___: : :P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___: : : : : : :P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___Assume:• True state has N values: 1, 2, 3 .. N• There are K symptoms called Symptom1, Symptom2, … SymptomK• Symptomi has Mi values: 1, 2, .. Mi
  • 69Building a naïve Bayesian ClassifierP(State=1) = ___ P(State=2) = ___ … P(State=N) = ___P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___: : : : : : :P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___: : : : : : :P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___: : :P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___: : : : : : :P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___Assume:• True state has N values: 1, 2, 3 .. N• There are K symptoms called Symptom1, Symptom2, … SymptomK• Symptomi has Mi values: 1, 2, .. Mi
  • 70Building a naïve Bayesian ClassifierP(State=1) = ___ P(State=2) = ___ … P(State=N) = ___P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___: : : : : : :P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___: : : : : : :P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___: : :P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___: : : : : : :P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___Assume:• True state has N values: 1, 2, 3 .. N• There are K symptoms called Symptom1, Symptom2, … SymptomK• Symptomi has Mi values: 1, 2, .. Mi
  • 71Building a naïve Bayesian ClassifierP(State=1) = ___ P(State=2) = ___ … P(State=N) = ___P( Sym1=1 | State=1 ) = ___ P( Sym1=1 | State=2 ) = ___ … P( Sym1=1 | State=N ) = ___P( Sym1=2 | State=1 ) = ___ P( Sym1=2 | State=2 ) = ___ … P( Sym1=2 | State=N ) = ___: : : : : : :P( Sym1=M1 | State=1 ) = ___ P( Sym1=M1 | State=2 ) = ___ … P( Sym1=M1 | State=N ) = ___P( Sym2=1 | State=1 ) = ___ P( Sym2=1 | State=2 ) = ___ … P( Sym2=1 | State=N ) = ___P( Sym2=2 | State=1 ) = ___ P( Sym2=2 | State=2 ) = ___ … P( Sym2=2 | State=N ) = ___: : : : : : :P( Sym2=M2 | State=1 ) = ___ P( Sym2=M2 | State=2 ) = ___ … P( Sym2=M2 | State=N ) = ___: : :P( SymK=1 | State=1 ) = ___ P( SymK=1 | State=2 ) = ___ … P( SymK=1 | State=N ) = ___P( SymK=2 | State=1 ) = ___ P( SymK=2 | State=2 ) = ___ … P( SymK=2 | State=N ) = ___: : : : : : :P( SymK=MK | State=1 ) = ___ P( SymK=M1 | State=2 ) = ___ … P( SymK=M1 | State=N ) = ___Assume:• True state has N values: 1, 2, 3 .. N• There are K symptoms called Symptom1, Symptom2, … SymptomK• Symptomi has Mi values: 1, 2, .. Mi
  • 72Building a naïve Bayesian ClassifierAssume:• True state has N values: 1, 2, 3 .. N• There are K symptoms called Symptom1, Symptom2, … SymptomK• Symptomi has Mi values: 1, 2, .. MiP(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___
  • 73Building a naïve Bayesian ClassifierAssume:• True state has N values: 1, 2, 3 .. N• There are K symptoms called Symptom1, Symptom2, … SymptomK• Symptomi has Mi values: 1, 2, .. MiP(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___Example:Prob( Chief Complaint contains “Blood” | Prodrome = Respiratory ) = 0.003
  • 74Building a naïve Bayesian ClassifierAssume:• True state has N values: 1, 2, 3 .. N• There are K symptoms called Symptom1, Symptom2, … SymptomK• Symptomi has Mi values: 1, 2, .. MiP(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___Example:Prob( Chief Complaint contains “Blood” | Prodrome = Respiratory ) = 0.003
  • 75Building a naïve Bayesian ClassifierAssume:• True state has N values: 1, 2, 3 .. N• There are K symptoms called Symptom1, Symptom2, … SymptomK• Symptomi has Mi values: 1, 2, .. MiP(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___Example:Prob( Chief Complaint contains “Blood” | Prodrome = Respiratory ) = 0.003
  • 76Learning a Bayesian Classifierbreath difficulty just neck plain rash short sore uglyShortness of breath 1 0 0 0 0 0 1 0 0Difficulty breathing 1 1 0 0 0 0 0 0 0Rash on neck 0 0 0 1 0 1 0 0 0Sore neck and difficulty breathing 1 1 0 1 0 0 0 1 0Just plain ugly 0 0 1 0 1 0 0 0 11. Before deployment of classifier,
  • 77Learning a Bayesian ClassifierEXPERTSAYSbreath difficulty just neck plain rash short sore uglyShortness of breath Resp 1 0 0 0 0 0 1 0 0Difficulty breathing Resp 1 1 0 0 0 0 0 0 0Rash on neck Rash 0 0 0 1 0 1 0 0 0Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0Just plain ugly Other 0 0 1 0 1 0 0 0 11. Before deployment of classifier, get labeled training data
  • 78Learning a Bayesian ClassifierEXPERT SAYS breath difficulty just neck plain rash short sore uglyShortness of breath Resp 1 0 0 0 0 0 1 0 0Difficulty breathing Resp 1 1 0 0 0 0 0 0 0Rash on neck Rash 0 0 0 1 0 1 0 0 0Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0Just plain ugly Other 0 0 1 0 1 0 0 0 11. Before deployment of classifier, get labeled training data2. Learn parameters (conditionals, and priors)P(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___
  • 79Learning a Bayesian ClassifierEXPERT SAYS breath difficulty just neck plain rash short sore uglyShortness of breath Resp 1 0 0 0 0 0 1 0 0Difficulty breathing Resp 1 1 0 0 0 0 0 0 0Rash on neck Rash 0 0 0 1 0 1 0 0 0Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0Just plain ugly Other 0 0 1 0 1 0 0 0 11. Before deployment of classifier, get labeled training data2. Learn parameters (conditionals, and priors)P(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___recordstrainingresp""numbreath""containingrecordstrainingresp""num)Respprodrome|1breath( P
  • 80Learning a Bayesian ClassifierEXPERT SAYS breath difficulty just neck plain rash short sore uglyShortness of breath Resp 1 0 0 0 0 0 1 0 0Difficulty breathing Resp 1 1 0 0 0 0 0 0 0Rash on neck Rash 0 0 0 1 0 1 0 0 0Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0Just plain ugly Other 0 0 1 0 1 0 0 0 11. Before deployment of classifier, get labeled training data2. Learn parameters (conditionals, and priors)P(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___recordstrainingresp""numbreath""containingrecordstrainingresp""num)Respprodrome|1breath( Precordstrainingnumtotalrecordstrainingresp""num)Respprodrome( P
  • 81Learning a Bayesian ClassifierEXPERT SAYS breath difficulty just neck plain rash short sore uglyShortness of breath Resp 1 0 0 0 0 0 1 0 0Difficulty breathing Resp 1 1 0 0 0 0 0 0 0Rash on neck Rash 0 0 0 1 0 1 0 0 0Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0Just plain ugly Other 0 0 1 0 1 0 0 0 11. Before deployment of classifier, get labeled training data2. Learn parameters (conditionals, and priors)P(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___
  • 82Learning a Bayesian ClassifierEXPERT SAYS breath difficulty just neck plain rash short sore uglyShortness of breath Resp 1 0 0 0 0 0 1 0 0Difficulty breathing Resp 1 1 0 0 0 0 0 0 0Rash on neck Rash 0 0 0 1 0 1 0 0 0Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0Just plain ugly Other 0 0 1 0 1 0 0 0 11. Before deployment of classifier, get labeled training data2. Learn parameters (conditionals, and priors)3. During deployment, apply classifierP(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___
  • 83Learning a Bayesian ClassifierEXPERT SAYS breath difficulty just neck plain rash short sore uglyShortness of breath Resp 1 0 0 0 0 0 1 0 0Difficulty breathing Resp 1 1 0 0 0 0 0 0 0Rash on neck Rash 0 0 0 1 0 1 0 0 0Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0Just plain ugly Other 0 0 1 0 1 0 0 0 11. Before deployment of classifier, get labeled training data2. Learn parameters (conditionals, and priors)3. During deployment, apply classifierNew Chief Complaint: “Just sore breath”P(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___
  • 84Learning a Bayesian ClassifierEXPERT SAYS breath difficulty just neck plain rash short sore uglyShortness of breath Resp 1 0 0 0 0 0 1 0 0Difficulty breathing Resp 1 1 0 0 0 0 0 0 0Rash on neck Rash 0 0 0 1 0 1 0 0 0Sore neck and difficulty breathing Resp 1 1 0 1 0 0 0 1 0Just plain ugly Other 0 0 1 0 1 0 0 0 11. Before deployment of classifier, get labeled training data2. Learn parameters (conditionals, and priors)3. During deployment, apply classifier...)1,just0,difficulty1,breath|GIprodrome( P ZZPPPGIPPP)state(Z)...prod|0difficulty(Z)prod|1breath()state(GI)...prod|0difficulty(GI)prod|1breath(New Chief Complaint: “Just sore breath”P(Prodm=GI) = ___ P(Prodm=respir) = ___ … P(Prodm=const) = ___P( angry | Prodm=GI ) = ___ P( angry | Prodm=respir ) = ___ … P( angry | Prodm=const ) = ___P( ~angry | Prodm=GI ) = ___ P(~angry | Prodm=respir ) = ___ … P(~angry | Prodm=const ) = ___P( blood | Prodm=GI ) = ___ P( blood | Prodm=respir ) = ___ … P( blood | Prodm=const ) = ___P( ~blood | Prodm=GI ) = ___ P( ~blood | Prodm=respir) = ___ … P( ~blood | Prodm=const ) = ___: : :P( vomit | Prodm=GI ) = ___ P( vomit | Prodm=respir ) = ___ … P( vomit | Prodm=const ) = ___P( ~vomit | Prodm=GI ) = ___ P( ~vomit |Prodm=respir ) = ___ … P( ~vomit | Prodm=const ) = ___
  • 85CoCo Performance (AUC scores)• Botulism 0.78• rash, 0.91• neurological 0.92• hemorrhagic, 0.93;• constitutional 0.93• gastrointestinal 0.95• other, 0.96• respiratory 0.96
  • 86Conclusion• Automated text extraction is increasingly important• There is a very wide world of text extraction outsideBiosurveillance• The field has changed very fast, even in the past threeyears.• Warning, although Bayes Classifiers are simplest toimplement, Logistic Regression or other discriminativemethods often learn more accurately. Consider using offthe shelf methods, such as William Cohen’s successful“minor third” open-source libraries:http://minorthird.sourceforge.net/• Real systems (including CoCo) have many ingeniousspecial-case improvements.
  • 87Discussion1. What new data sources should we applyalgorithms to?1. EG Self-reporting?2. What are related surveillance problems to whichthese kinds of algorithms can be applied?3. Where are the gaps in the current algorithmsworld?4. Are there other spatial problems out there?5. Could new or pre-existing algorithms help in theperiod after an outbreak is detected?6. Other comments about favorite tools of the trade.