Successfully reported this slideshow.
Upcoming SlideShare
×

# Chapter 2 Probabilty And Distribution

1,928 views

Published on

Published in: Health & Medicine, Technology
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Chapter 2 Probabilty And Distribution

1. 1. <ul><li>Chapter-2 </li></ul>
2. 2. <ul><li>Chapter 2 </li></ul><ul><li>Probability and Distribution </li></ul>
3. 3. Regular statement in statistics <ul><li>Two parts : </li></ul><ul><li>Conclusion </li></ul><ul><li>Probability that the conclusion is true </li></ul>
4. 4. 2.1 Explanation of Probability and Related Concepts <ul><li>2.1.1 Probability </li></ul><ul><li>Flipping a die ( 骰子 ) </li></ul><ul><li>Possible outcome: 1, 2, …, 6 </li></ul><ul><li>Probability of “1”= 1/6 </li></ul><ul><li>Color blindness test </li></ul><ul><li>Possible outcome: normal, abnormal </li></ul><ul><li>Probabilities of “abnormal”= ? ---- Unknown! </li></ul><ul><li>Survey: Randomly selected n students, </li></ul><ul><li>if m of them are color blinders, then </li></ul><ul><li>probability of abnormal  </li></ul>
5. 5. <ul><li>In general, </li></ul><ul><li>Events: the possible outcomes ,… </li></ul><ul><li>Probability of the event E : P ( E ). </li></ul><ul><li>---- between 0 and 1 </li></ul><ul><li>Conditional probability : </li></ul><ul><li>Under the condition that appears, the </li></ul><ul><li>probability of the event </li></ul><ul><li>For example, </li></ul><ul><li>P ( nasopharyngeal carcinoma∣ EB virus +) </li></ul>
6. 6. <ul><li>2.1.2 Odds </li></ul><ul><li>Complementary event : If there are only two </li></ul><ul><li>possible events and they are exclusive, denoted </li></ul><ul><li>with and , then </li></ul><ul><li>Odds of event E : </li></ul><ul><li>Question : Football game between teams A and B, </li></ul><ul><li>If P(A win)=0.8, then P(B win)=? </li></ul><ul><li>Odds (A win)=? Odds (B win)=? </li></ul>
7. 7. Eg. If the incidence rates of influenza in classes A, B and C are 60% , 50% and 40% then Odds: To measure risk Odds ratio: To compare risks
8. 8. 2.1.3 Bayes ’ formula <ul><li>Smoking ( A ) -> Lung cancer ( B ) ? </li></ul><ul><li>Randomly divide the subjects into two groups; </li></ul><ul><li>Invite one group to smoke and forbid another; </li></ul><ul><li>Follow up year by year to obtain the number of </li></ul><ul><li>the subjects “with lung cancer” …… </li></ul><ul><li>Unfortunately, it is morally infeasible. Then how? </li></ul>
9. 9. <ul><li>To find </li></ul><ul><li>by Bayes’ formula </li></ul><ul><li>Example 2.1 </li></ul>
10. 10. Conclusion: The risk of lung cancer for smoker is 5 times as much as that for ordinary people.
11. 11. 2.3 Binomial Distribution P ( white ball)=0.8 P ( yellow ball )=0.2 
12. 12. In general, if : the probability of an event appearing in a trial n : times of independently repeated trials X : random variable, total times of appearing such an event, then the probability of X = x This variable X is called a binomial variable , or say X following a binomial distribution , denoted as Why is it called Binomial? See following expansion:
13. 13. 2.3.2 Plot of Binomial Distribution
14. 14. 2.3.3 Population mean and population variance
15. 15. <ul><li>Example Five “exactly same” animals were </li></ul><ul><li>injected by a poison with dose of LD50 (Under </li></ul><ul><li>such a dose, the P (death) = 50% ) </li></ul><ul><li>Since </li></ul><ul><li>The possible number of deaths was ; </li></ul><ul><li>The probability of each animal died from this </li></ul><ul><li>injected poison is ； </li></ul><ul><li>Independently repeated n = 5 times ; </li></ul><ul><li>X followed </li></ul>
16. 16. 2.4 Poisson Distribution <ul><li>Distribution of rare “articles” </li></ul><ul><li>Special case of Binomial distribution ： </li></ul><ul><li>Big n , small </li></ul><ul><li>Example Pulse count of radio active isotope( 同位素 ). </li></ul><ul><li>Large n and 0-1 : Divide the period into n sub-intervals, possible numbers of pulses in a sub-interval = 0 or 1 </li></ul><ul><li>Rare event : </li></ul><ul><li>Independent </li></ul>
17. 17. It can be proved, When n ->∞, the will tend to In general, if the probability function of a random variable X has the above shape , then we say that this variable follows a Poisson distribution with parameter , denoted by .
18. 18. Example ： Red cell count on glass slide. Since Divide the glass slide into n small grids ---- big n , 0 or 1 ； P (a red cell) =  ---- small probability ; With or without a cell ---- independent ; Therefore, Number of cells ~ Poisson distribution
19. 19. <ul><li>Note “independent” and “repeat” are important , </li></ul><ul><li>without these two, the distribution will not be a </li></ul><ul><li>Poisson distribution. </li></ul><ul><li>Example: </li></ul><ul><li>For an infectious rare disease, the number of patients does not follow a Poisson distribution at all. </li></ul><ul><li>When the bacterium are clustered in milk, the total number of bacterium does not follow a Poisson distribution either. </li></ul>
20. 20. 2.4.2 Plot of probability function , positive skew; , approximately symmetric
21. 21. Property of Poisson Distribution <ul><li>population mean = population variance = λ </li></ul><ul><li>Additive property </li></ul><ul><li>If and </li></ul><ul><li>independent each other, then </li></ul><ul><li>If </li></ul><ul><li>then </li></ul>
22. 22. <ul><li>If , </li></ul><ul><li>then 2 X does not follow </li></ul><ul><li>does not follow </li></ul>However
23. 23. Example ： Five samples taken from a river ， the number of colibacillus ( 大肠杆菌 ) were counted <ul><li>1-st sample, X 1 ~  (  1 ) </li></ul><ul><li>2-nd sample X 2 ~  (  2 ) </li></ul><ul><li>…………… . …… . </li></ul><ul><li>5th sample X 5 ~  (  5 ) </li></ul><ul><li>If mix these 5 samples, the total number of </li></ul><ul><li>colibacillus also follows a Poisson distribution </li></ul><ul><li>X 1 + X 2 +…+ X 5 ~  (  1 +  2 +…+  5 ) </li></ul><ul><li>Application of additive property: </li></ul><ul><li>In order to enlarge the parameter, and then make the </li></ul><ul><li>distribution symmetric, we may pool the small units such that enlarge the observed unit. </li></ul>
24. 24. 2.5 Normal Distribution <ul><li>In practice, The shape of frequency histograms of many </li></ul><ul><li>continuous random variables looks like this: </li></ul><ul><li>taller around center, shorter on two sides and symmetric. </li></ul>
25. 25. μ 1 μ 2 μ 3 Two parameters: population mean population variance Normal distribution denoted by
26. 26. Standard normal distribution , , To any normal variable , after a transformation of standardization Z is called with standardized normal deviate or Z-value , or Z-score
27. 27. 2.5.2 Area under the normal probability density curve <ul><li>A table for standard normal distribution is usually </li></ul><ul><li>attached in most textbooks of statistics. (P. 479) </li></ul><ul><li>---- Given z , to find out </li></ul>z 0
28. 28. <ul><li>The area within </li></ul>Corresponding to 1.96, the area of one tail is 0.025, the area of two tails is 0.025  2 = 0.05
29. 29. <ul><li>The area within </li></ul>Corresponding to 2.58, the area of one tail is 0.005, the area of two tails is 0.010 -2.58 Φ(-2.58)=0.005
30. 31. Critical value : Two sided critical value : One sided critical value
31. 32. Distribution of X 1 + X 2 still follow a normal distribution When X 1 and X 2 are independent,
32. 33. 2.5.3 Determination of a reference range <ul><li>Reference range or normal range : The range of most </li></ul><ul><li>“ healthy people”. “Most” : 95% or 99% </li></ul><ul><li>“ Healthy people”: should be well defined </li></ul><ul><li>Determined by a large sample </li></ul><ul><li>1. If the variable follows a normal distribution </li></ul><ul><li>then covers 95% of “healthy people”. </li></ul><ul><li>However, usually are unknown! They may </li></ul><ul><li>replaced by (It is why a large sample needed) </li></ul><ul><li>Therefore, reference range: </li></ul>
33. 34. <ul><li>2. If the variable does not follow a normal distribution, then find out the percentile and percentile </li></ul><ul><li>Therefore, reference range: </li></ul>
34. 35. Example Based on the hemoglobin data of 120 healthy females ， , ; and the histogram shows it approximately follows a normal distribution. Please estimate the two-sided 95% reference range for females.
35. 36. Caution <ul><li>The 95% reference range just tells that the measures of 95% healthy males are within this range; </li></ul><ul><li>If someone’s measure is falling in this range, can we claim “ normal” ? </li></ul><ul><li>If someone’s measure is outside this range. can we claim “ abnormal” ? </li></ul><ul><li>---- The reference range could never be a criterion for diagnosis. </li></ul>
36. 37. 2.5.4 Normal approximation of binomial distribution and Poisson distribution <ul><li>When n is large enough, (n  >5, n (1-  ) >5) , the </li></ul><ul><li>binomial distribution approximates to a </li></ul><ul><li>normal distribution </li></ul><ul><li>When  is large enough (   20 ) , the Poisson </li></ul><ul><li>distribution approximates to a normal </li></ul><ul><li>distribution </li></ul>
37. 40. Example The infectious rate of hookworm( 钩虫 ) is 13% ， if randomly select 150 people ， what is the probability that at least 20 of them being infected ？ The probability that at least 20 of them being infected is 50% 。 Area of the rectangles on
38. 41. <ul><li>Example The p ulse count of radio active isotope </li></ul><ul><li>in 0.5 hour follows a Poisson distribution . </li></ul><ul><li>Please estimate the probability that the pulse </li></ul><ul><li>count measured is greater than 400 . </li></ul>
39. 42. Summary <ul><li>Three distributions ： </li></ul><ul><li>Discrete variable : Binomial distribution </li></ul><ul><li>Poisson distribution </li></ul><ul><li>Continuous variable : Normal distribution </li></ul><ul><li>1. Binomial distribution </li></ul><ul><li>Possible values: 0 , 1 </li></ul><ul><li>Probability of positive event in one trial =  ， </li></ul><ul><li>Probability of negative event in one trial = 1 －  ， </li></ul><ul><li>Independently repeat n times </li></ul><ul><li>Total number of positive event </li></ul><ul><li>2. Poisson distribution </li></ul><ul><li>When  or （ 1 －  ） is very small ， n very large, </li></ul><ul><li>binomial distribution approximate to Poisson distribution. </li></ul>
40. 43. <ul><li>3. Normal distribution ---- very important </li></ul><ul><li>Many phenomena follow normal distributions; </li></ul><ul><li>Important basis of statistical theory </li></ul><ul><li>Two parameters ： </li></ul><ul><li>Mean μ Standard deviation σ </li></ul><ul><li>Z- transformation </li></ul><ul><li>Area under the curve of normal distribution </li></ul>
41. 44. <ul><li>4. Normal approximation </li></ul><ul><li>When n is large ( both of n  and n (1-  ) >5) ， </li></ul><ul><li>approximates to </li></ul><ul><li>When  is large (   20 ) , </li></ul><ul><li>approximates to </li></ul>5. Web resources http://statpages.org/
42. 45. <ul><li>Thanks </li></ul>