SlideShare a Scribd company logo
1 of 5
Alberta Ingenuity & CMASTE


Lesson 2: Prostate Cancer Toxicity (Teachers’ Resource)

Purpose: This lesson is based on research by Nasimeh Asgarian at the Alberta Ingenuity
Centre for Machine Learning (AICML) at the University of Alberta. This research comes
under the banner of bioinformatics, which is the application of computing science
techniques to solve problems in biological and medical science.
Nasimeh has been given access to medical data from 80 patients at the Cross Cancer
                                                        1
Institute. It is known historically that approximately of patients who have had prostate
                                                         3
cancer treatment will exhibit toxicity, resulting in bleeding. Nasimeh is using Machine
Learning techniques to try to help doctors improve predictability of this bleeding. Each
patient has 50 000 different genetic variant dimensions called SNP’s. (This stands for
Single Nucleotide Polymorphism and is pronounced “snips”). This data is gathered from
physicians at the Cross Institute. The SNP dimensions are not numerical, but are
represented by the heterocyclic bases of human DNA, namely A, C, T, G for adenine,
cytosine, thymine, and guanine. This is a huge data set that must be analyzed. For the
sake of reducing that volume, only the 51 most important SNP dimensions are actually
used. Here is a reduced example of this for 3 patients:

 Patient       SNP1         SNP2          SNP3         .....        SNP51        Bleed
    1           C            C             T                         G             +
    2           C            T             G                         A             -
    3           A            G             C                          T            -

The machine learning techniques used to analyze the data include Linear Separators,
Decision Trees, Support Vector Machines, as well as Naïve Bayes Tests. This last
technique is a conditional probability concept linked to the IB diploma program
mathematics curriculum. Naïve Bayes tests are based on Bayes Law, but with strong
independence assumptions and naïve (oversimplified) design. The results have been very
successful for many complex problems studied.

Problem: To improve physicians’ success in predicting toxicity in patients after prostate
cancer treatment by finding the SNP’s that are most strongly linked to this toxicity.

Hypothesis: Without any analysis, we can predict that about 33% of prostate patients
will exhibit toxicity (bleeding) after treatment. Using Naïve Bayes Tests on a select
group of the medical dimensions (SNP’s) for each patient, we can improve the success
rate in prediction of toxicity.

Prediction: We can significantly improve predictability of toxicity using Machine
Learning techniques.




lesson-2-probability-of-prostate-cancer-toxicity-bayesian3875.doc              Centre
for Machine Learning                       1/5
Alberta Ingenuity & CMASTE


Design: Students in Pure Math 30 work with permutations and probability, and students
in the IB diploma program learn Bayes Law for conditional probability as:
             P( A ∩ B)
 P( A / B) =           where A and B are two distinct events.
               P( B)

Procedure:

     1) If each patient has 51 SNP dimensions, each of which is represented by any one
        of the bases A, C, T, or G, then how many different arrangements of these
        dimensions are possible for any patient?

           Answer: 4 51 = 5.07 × 10 30 arrangements

Next, an example of how Bayes Law works would probably be helpful here:

It is estimated that about 2% of the world population has diabetes (event A). The test for
diabetes indicates whether a patient has a blood glucose level above normal and this test
is 85% accurate (event B). Based on this, determine the probability that:

     a) a person will test positive for diabetes given that they actually have diabetes
     b) a person is actually diabetic given that they have tested positive for diabetes

                    P ( B ∩ A)
     P ( B / A) =
                       P( A)
                                 0.85 × 0.02 → (have the disease and test +)
     =
a)     0.85 × 0.02 + 0.15 × 0.02 → (have the disease and test + as well as have the disease but test -)
       0.85 × 0.02
     =
          0.02
     = 0.85


This result is intuitive and seems trivial in the sense that 0.85 is simply the accuracy of
the diabetes test.

                    P( A ∩ B)
     P( A / B) =
                      P( B )
b)                                0.02 × 0.85 → ( test positive and have the disease)
     =
         0.02 × 0.85 + 0.98 × 0.15 → ( test + and have the disease as well as test + but don' t have the disease)
         ≈ 0.104


This result intuitively seems low, but it means that only 10% of people who have high
blood glucose levels are actually diabetic. It doesn’t necessarily indicate that the diabetes
test is not effective.




lesson-2-probability-of-prostate-cancer-toxicity-bayesian3875.doc                                                   Centre
for Machine Learning                       2/5
Alberta Ingenuity & CMASTE


    For more practice/understanding I suggest the interactive Java applet demonstrating
    Bayes Law in a medical testing application, found at:
    www.gametheory.net/Mike/applets/Bayes/Bayes.html




Evidence: In our medical situation, one event ( A) will be will be the DNA base (A, C, T,
or G) for that SNP dimension and the second event ( B ) will be the chance of bleeding
(toxicity).
∴ For the DNA bases P (C ) = P (T ) = P (G ) = P ( A) = 0.25 if each base is equally likely to
occur.
Then, for the complements, P (C ) = P (T ) = P (G ) = P ( A ) = 0.75
The probability of each of the multiple events A ∩ B is experimental and come from the
actual patient data. Let’s use a sample of 20 patients and only 1 dimension, SNP1.

SNP1 A C         C   G T      T   A G C        T    A C      G G T        T   A A C         C
bleed - -        -   + -      +   + - +        -    - -      - - +        +   + - +         +

Analysis: From this table we can find the following probabilities simply by counting:

                   2                       3                  3                      3
P ( A ∩ +bleed ) =   , P ( A ∩ −bleed ) =    , P(C ∩ +bleed ) = , P (C ∩ −bleed ) =
                  20                      20                 20                     20
                   1                       3                  3                      2
P (G ∩ +bleed ) =    , P (G ∩ −bleed ) =      , P(T ∩ +bleed ) =, P(T ∩ −bleed ) =
                  20                      20                 20                     20
                                                             1                    2
Also, from the initial information we know that P (+bleed ) = and P (−bleed ) = .
                                                             3                    3

Eg) Find the probability that a patient has base G for SNP1 given that they were not toxic
                   P (G ∩ −bleed )
P (G / − bleed ) =
                      P (−bleed )
    3
= 20
    2
    3
    3 3 9
=     × =
   20 2 40

Eg) Find the probability that a patient was toxic, given that their SNP1 dimension was A.




lesson-2-probability-of-prostate-cancer-toxicity-bayesian3875.doc                  Centre
for Machine Learning                       3/5
Alberta Ingenuity & CMASTE


                   P (+bleed ∩ A)
P (+bleed / A) =
                        P ( A)
    2
= 20
  0.25
   2 4 8 2
=     × = =
  20 1 20 5


Use these examples to do the 8 questions that follow.

Evaluation:
   2. Find each of the following conditional probabilities:

   a) the probability of having base C for SNP1, given that the patient had toxicity

                              P (C ∩ +bleed ) 3 1 3 3 9
Answer: P (C / + bleed ) =                   =   ÷ =  × =
                                 P( +bleed )   20 3 20 1 20

   b) the probability of having base A for SNP1, given that the patient had toxicity

                              P ( A ∩ +bleed ) 3 1 3 3 9
Answer: P ( A / + bleed ) =                   =   ÷ =  × =
                                 P (+bleed )    20 3 20 1 20

   c) the probability of having base G for SNP1, given that the patient had toxicity

                              P (G ∩ +bleed ) 1 1 1 3 3
Answer: P (G / + bleed ) =                   =   ÷ =  × =
                                 P (+bleed )   20 3 20 1 20

   d) the probability of having base T for SNP1, given that the patient had toxicity

                              P (T ∩ +bleed ) 3 1 3 3 9
Answer: P (T / + bleed ) =                   =   ÷ =  × =
                                 P (+bleed )   20 3 20 1 20

   e) the probability that the patient was toxic, given that they had base C for SNP1

                              P (+bleed ∩ C ) 3 1 3 4 12
Answer: P (+bleed / C ) =                    =   ÷ =  × =
                                   P (C )      20 4 20 1 20

   f) the probability that the patient was toxic, given that they had base A for SNP1

                              P (+bleed ∩ A) 2 1 2 4 8
Answer: P (+bleed / A) =                    =   ÷ =  × =
                                   P ( A)     20 4 20 1 20

lesson-2-probability-of-prostate-cancer-toxicity-bayesian3875.doc              Centre
for Machine Learning                       4/5
Alberta Ingenuity & CMASTE



   g) the probability that the patient was toxic, given that they had base G for SNP1

                            P (+bleed ∩ G ) 1 1 1 4 4
Answer: P (+bleed / G ) =                  =   ÷ =  × =
                                 P (G )      20 4 20 1 20



   h) the probability that the patient was toxic, given that they had base T for SNP1

                            P (+bleed ∩ T ) 3 1    3 4 12
Answer: P (+bleed / T ) =                  =   ÷ =  × =
                                 P(T )       20 4 20 1 20


   3. Even on this small scale, we hope to find some relationship between the two
      events. From the examples and your results in 2), can you find any connection
      between the SNP dimension and toxicity for patients?

Answer: The strongest link to toxicity appears to be with bases C and T


Synthesis: Doing this type of calculation by hand for only 1 SNP dimension out of 51
and for only 20 patients out of 80 is not difficult or very time-consuming. But the entire
data set would be enormous and this is where Machine Learning techniques are used to
process the data quickly. A Bayesian Network of probability algorithms on computer
would process this large data set.
When this is completed the researcher and the physician can easily observe the
probability values and infer which bases in each SNP dimension are most strongly linked
with toxicity in patients. This process can be used for many other medical situations as
well.
Even though students are only doing basic permutation and probability calculations, they
can appreciate the volume of calculations needed to make informed conclusions and thus,
appropriate medical decisions for researchers as well as physicians and their patients.


Sources:
           1) Nasimeh Asgarian, AICML, University of Alberta Computer Sciences
           2) Mathpower 12, Knill, George et al, McGraw-Hill Ryerson Publishing,
              Toronto, 2000
           3) www.biochem.northwestern.edu/holmgren/Glossary/Definitions/Def-
              S/SNP.html
           4) http://b-course.cs.helsinki.fi/obc/bayesnetprediction.html
           5) www.cs.ualberta.ca/~greiner/Presentations.html/#IntroBN
           6) www.gametheory.net/Mike/applets/Bayes/Bayes.html
           7) www.cs.ualberta/research/areas/bioinformatics/profiles/index.php


lesson-2-probability-of-prostate-cancer-toxicity-bayesian3875.doc               Centre
for Machine Learning                       5/5

More Related Content

Viewers also liked

mingdraft2.doc
mingdraft2.docmingdraft2.doc
mingdraft2.docbutest
 
lec21.ppt
lec21.pptlec21.ppt
lec21.pptbutest
 
Machine Learning, Neural and Statistical Classification
Machine Learning, Neural and Statistical ClassificationMachine Learning, Neural and Statistical Classification
Machine Learning, Neural and Statistical Classificationbutest
 
Put your best face foward on the web
Put your best face foward on the webPut your best face foward on the web
Put your best face foward on the webmkinzie
 
GoOpen 2010: Stefan Engseth
GoOpen 2010: Stefan EngsethGoOpen 2010: Stefan Engseth
GoOpen 2010: Stefan EngsethFriprogsenteret
 
Mayank bhutoria curriculum vitae october 2008 Mayank Bhutoria ...
Mayank bhutoria curriculum vitae october 2008 Mayank Bhutoria ...Mayank bhutoria curriculum vitae october 2008 Mayank Bhutoria ...
Mayank bhutoria curriculum vitae october 2008 Mayank Bhutoria ...butest
 
EEL4851writeup.doc
EEL4851writeup.docEEL4851writeup.doc
EEL4851writeup.docbutest
 
Word - May 1, 2009
Word - May 1, 2009Word - May 1, 2009
Word - May 1, 2009butest
 
Presentació de Diana Pijuan
Presentació de Diana PijuanPresentació de Diana Pijuan
Presentació de Diana PijuanDiana
 
talip_thai_pos.doc
talip_thai_pos.doctalip_thai_pos.doc
talip_thai_pos.docbutest
 
Best oral presentation was awarded to Dr Kostas Lathouras - BIARGSNewsletter2015
Best oral presentation was awarded to Dr Kostas Lathouras - BIARGSNewsletter2015Best oral presentation was awarded to Dr Kostas Lathouras - BIARGSNewsletter2015
Best oral presentation was awarded to Dr Kostas Lathouras - BIARGSNewsletter2015KOSTAS LATHOURAS
 
slides
slidesslides
slidesbutest
 
Network Administration
Network AdministrationNetwork Administration
Network Administrationbutest
 

Viewers also liked (15)

mingdraft2.doc
mingdraft2.docmingdraft2.doc
mingdraft2.doc
 
lec21.ppt
lec21.pptlec21.ppt
lec21.ppt
 
Machine Learning, Neural and Statistical Classification
Machine Learning, Neural and Statistical ClassificationMachine Learning, Neural and Statistical Classification
Machine Learning, Neural and Statistical Classification
 
[ppt]
[ppt][ppt]
[ppt]
 
Put your best face foward on the web
Put your best face foward on the webPut your best face foward on the web
Put your best face foward on the web
 
PPT
PPTPPT
PPT
 
GoOpen 2010: Stefan Engseth
GoOpen 2010: Stefan EngsethGoOpen 2010: Stefan Engseth
GoOpen 2010: Stefan Engseth
 
Mayank bhutoria curriculum vitae october 2008 Mayank Bhutoria ...
Mayank bhutoria curriculum vitae october 2008 Mayank Bhutoria ...Mayank bhutoria curriculum vitae october 2008 Mayank Bhutoria ...
Mayank bhutoria curriculum vitae october 2008 Mayank Bhutoria ...
 
EEL4851writeup.doc
EEL4851writeup.docEEL4851writeup.doc
EEL4851writeup.doc
 
Word - May 1, 2009
Word - May 1, 2009Word - May 1, 2009
Word - May 1, 2009
 
Presentació de Diana Pijuan
Presentació de Diana PijuanPresentació de Diana Pijuan
Presentació de Diana Pijuan
 
talip_thai_pos.doc
talip_thai_pos.doctalip_thai_pos.doc
talip_thai_pos.doc
 
Best oral presentation was awarded to Dr Kostas Lathouras - BIARGSNewsletter2015
Best oral presentation was awarded to Dr Kostas Lathouras - BIARGSNewsletter2015Best oral presentation was awarded to Dr Kostas Lathouras - BIARGSNewsletter2015
Best oral presentation was awarded to Dr Kostas Lathouras - BIARGSNewsletter2015
 
slides
slidesslides
slides
 
Network Administration
Network AdministrationNetwork Administration
Network Administration
 

More from butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

More from butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Lesson 2: Probability of Prostate Cancer Toxicity (Bayesian ...

  • 1. Alberta Ingenuity & CMASTE Lesson 2: Prostate Cancer Toxicity (Teachers’ Resource) Purpose: This lesson is based on research by Nasimeh Asgarian at the Alberta Ingenuity Centre for Machine Learning (AICML) at the University of Alberta. This research comes under the banner of bioinformatics, which is the application of computing science techniques to solve problems in biological and medical science. Nasimeh has been given access to medical data from 80 patients at the Cross Cancer 1 Institute. It is known historically that approximately of patients who have had prostate 3 cancer treatment will exhibit toxicity, resulting in bleeding. Nasimeh is using Machine Learning techniques to try to help doctors improve predictability of this bleeding. Each patient has 50 000 different genetic variant dimensions called SNP’s. (This stands for Single Nucleotide Polymorphism and is pronounced “snips”). This data is gathered from physicians at the Cross Institute. The SNP dimensions are not numerical, but are represented by the heterocyclic bases of human DNA, namely A, C, T, G for adenine, cytosine, thymine, and guanine. This is a huge data set that must be analyzed. For the sake of reducing that volume, only the 51 most important SNP dimensions are actually used. Here is a reduced example of this for 3 patients: Patient SNP1 SNP2 SNP3 ..... SNP51 Bleed 1 C C T G + 2 C T G A - 3 A G C T - The machine learning techniques used to analyze the data include Linear Separators, Decision Trees, Support Vector Machines, as well as Naïve Bayes Tests. This last technique is a conditional probability concept linked to the IB diploma program mathematics curriculum. Naïve Bayes tests are based on Bayes Law, but with strong independence assumptions and naïve (oversimplified) design. The results have been very successful for many complex problems studied. Problem: To improve physicians’ success in predicting toxicity in patients after prostate cancer treatment by finding the SNP’s that are most strongly linked to this toxicity. Hypothesis: Without any analysis, we can predict that about 33% of prostate patients will exhibit toxicity (bleeding) after treatment. Using Naïve Bayes Tests on a select group of the medical dimensions (SNP’s) for each patient, we can improve the success rate in prediction of toxicity. Prediction: We can significantly improve predictability of toxicity using Machine Learning techniques. lesson-2-probability-of-prostate-cancer-toxicity-bayesian3875.doc Centre for Machine Learning 1/5
  • 2. Alberta Ingenuity & CMASTE Design: Students in Pure Math 30 work with permutations and probability, and students in the IB diploma program learn Bayes Law for conditional probability as: P( A ∩ B) P( A / B) = where A and B are two distinct events. P( B) Procedure: 1) If each patient has 51 SNP dimensions, each of which is represented by any one of the bases A, C, T, or G, then how many different arrangements of these dimensions are possible for any patient? Answer: 4 51 = 5.07 × 10 30 arrangements Next, an example of how Bayes Law works would probably be helpful here: It is estimated that about 2% of the world population has diabetes (event A). The test for diabetes indicates whether a patient has a blood glucose level above normal and this test is 85% accurate (event B). Based on this, determine the probability that: a) a person will test positive for diabetes given that they actually have diabetes b) a person is actually diabetic given that they have tested positive for diabetes P ( B ∩ A) P ( B / A) = P( A) 0.85 × 0.02 → (have the disease and test +) = a) 0.85 × 0.02 + 0.15 × 0.02 → (have the disease and test + as well as have the disease but test -) 0.85 × 0.02 = 0.02 = 0.85 This result is intuitive and seems trivial in the sense that 0.85 is simply the accuracy of the diabetes test. P( A ∩ B) P( A / B) = P( B ) b) 0.02 × 0.85 → ( test positive and have the disease) = 0.02 × 0.85 + 0.98 × 0.15 → ( test + and have the disease as well as test + but don' t have the disease) ≈ 0.104 This result intuitively seems low, but it means that only 10% of people who have high blood glucose levels are actually diabetic. It doesn’t necessarily indicate that the diabetes test is not effective. lesson-2-probability-of-prostate-cancer-toxicity-bayesian3875.doc Centre for Machine Learning 2/5
  • 3. Alberta Ingenuity & CMASTE For more practice/understanding I suggest the interactive Java applet demonstrating Bayes Law in a medical testing application, found at: www.gametheory.net/Mike/applets/Bayes/Bayes.html Evidence: In our medical situation, one event ( A) will be will be the DNA base (A, C, T, or G) for that SNP dimension and the second event ( B ) will be the chance of bleeding (toxicity). ∴ For the DNA bases P (C ) = P (T ) = P (G ) = P ( A) = 0.25 if each base is equally likely to occur. Then, for the complements, P (C ) = P (T ) = P (G ) = P ( A ) = 0.75 The probability of each of the multiple events A ∩ B is experimental and come from the actual patient data. Let’s use a sample of 20 patients and only 1 dimension, SNP1. SNP1 A C C G T T A G C T A C G G T T A A C C bleed - - - + - + + - + - - - - - + + + - + + Analysis: From this table we can find the following probabilities simply by counting: 2 3 3 3 P ( A ∩ +bleed ) = , P ( A ∩ −bleed ) = , P(C ∩ +bleed ) = , P (C ∩ −bleed ) = 20 20 20 20 1 3 3 2 P (G ∩ +bleed ) = , P (G ∩ −bleed ) = , P(T ∩ +bleed ) =, P(T ∩ −bleed ) = 20 20 20 20 1 2 Also, from the initial information we know that P (+bleed ) = and P (−bleed ) = . 3 3 Eg) Find the probability that a patient has base G for SNP1 given that they were not toxic P (G ∩ −bleed ) P (G / − bleed ) = P (−bleed ) 3 = 20 2 3 3 3 9 = × = 20 2 40 Eg) Find the probability that a patient was toxic, given that their SNP1 dimension was A. lesson-2-probability-of-prostate-cancer-toxicity-bayesian3875.doc Centre for Machine Learning 3/5
  • 4. Alberta Ingenuity & CMASTE P (+bleed ∩ A) P (+bleed / A) = P ( A) 2 = 20 0.25 2 4 8 2 = × = = 20 1 20 5 Use these examples to do the 8 questions that follow. Evaluation: 2. Find each of the following conditional probabilities: a) the probability of having base C for SNP1, given that the patient had toxicity P (C ∩ +bleed ) 3 1 3 3 9 Answer: P (C / + bleed ) = = ÷ = × = P( +bleed ) 20 3 20 1 20 b) the probability of having base A for SNP1, given that the patient had toxicity P ( A ∩ +bleed ) 3 1 3 3 9 Answer: P ( A / + bleed ) = = ÷ = × = P (+bleed ) 20 3 20 1 20 c) the probability of having base G for SNP1, given that the patient had toxicity P (G ∩ +bleed ) 1 1 1 3 3 Answer: P (G / + bleed ) = = ÷ = × = P (+bleed ) 20 3 20 1 20 d) the probability of having base T for SNP1, given that the patient had toxicity P (T ∩ +bleed ) 3 1 3 3 9 Answer: P (T / + bleed ) = = ÷ = × = P (+bleed ) 20 3 20 1 20 e) the probability that the patient was toxic, given that they had base C for SNP1 P (+bleed ∩ C ) 3 1 3 4 12 Answer: P (+bleed / C ) = = ÷ = × = P (C ) 20 4 20 1 20 f) the probability that the patient was toxic, given that they had base A for SNP1 P (+bleed ∩ A) 2 1 2 4 8 Answer: P (+bleed / A) = = ÷ = × = P ( A) 20 4 20 1 20 lesson-2-probability-of-prostate-cancer-toxicity-bayesian3875.doc Centre for Machine Learning 4/5
  • 5. Alberta Ingenuity & CMASTE g) the probability that the patient was toxic, given that they had base G for SNP1 P (+bleed ∩ G ) 1 1 1 4 4 Answer: P (+bleed / G ) = = ÷ = × = P (G ) 20 4 20 1 20 h) the probability that the patient was toxic, given that they had base T for SNP1 P (+bleed ∩ T ) 3 1 3 4 12 Answer: P (+bleed / T ) = = ÷ = × = P(T ) 20 4 20 1 20 3. Even on this small scale, we hope to find some relationship between the two events. From the examples and your results in 2), can you find any connection between the SNP dimension and toxicity for patients? Answer: The strongest link to toxicity appears to be with bases C and T Synthesis: Doing this type of calculation by hand for only 1 SNP dimension out of 51 and for only 20 patients out of 80 is not difficult or very time-consuming. But the entire data set would be enormous and this is where Machine Learning techniques are used to process the data quickly. A Bayesian Network of probability algorithms on computer would process this large data set. When this is completed the researcher and the physician can easily observe the probability values and infer which bases in each SNP dimension are most strongly linked with toxicity in patients. This process can be used for many other medical situations as well. Even though students are only doing basic permutation and probability calculations, they can appreciate the volume of calculations needed to make informed conclusions and thus, appropriate medical decisions for researchers as well as physicians and their patients. Sources: 1) Nasimeh Asgarian, AICML, University of Alberta Computer Sciences 2) Mathpower 12, Knill, George et al, McGraw-Hill Ryerson Publishing, Toronto, 2000 3) www.biochem.northwestern.edu/holmgren/Glossary/Definitions/Def- S/SNP.html 4) http://b-course.cs.helsinki.fi/obc/bayesnetprediction.html 5) www.cs.ualberta.ca/~greiner/Presentations.html/#IntroBN 6) www.gametheory.net/Mike/applets/Bayes/Bayes.html 7) www.cs.ualberta/research/areas/bioinformatics/profiles/index.php lesson-2-probability-of-prostate-cancer-toxicity-bayesian3875.doc Centre for Machine Learning 5/5