Biostatistics

3,475 views

Published on

Lecture of Respected Sir Dr. L.M. BEHERA from N.I.H. KOLKATA in a workshop at G.D.M.H.M.C. - Patna in the Year 2011.
SUBJECT : BIOSTATISTICS
TOPIC : 'INTRODUCTION TO BIOSTATISTICS'.

Published in: Education, Technology
9 Comments
28 Likes
Statistics
Notes
No Downloads
Views
Total views
3,475
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
1
Comments
9
Likes
28
Embeds 0
No embeds

No notes for slide

Biostatistics

  1. 1. INTRODUCTION TO BIOSTATISTICS BY DR. LOKNATH BEHERA NATIONAL INSTITUTE OF HOMOEOPATHY INDIA-KOLKATA
  2. 2. THE WORD • Latin – status • Italian – statista • German – statistik • French - statistique
  3. 3. EMINENT STATISTICIANS Captain John Gront of London (1620-1674) was known as the Father of Vital statistics R. A. Fisher gave it the status of a full fledged science
  4. 4. TO DEFINE BIO-STATISTICS The procedure of collection, compilation, analysis, and interpretation of biological information / data.
  5. 5. INFORMATION / DATA INFORMATION • It is the first hand knowledge regarding anything • Information as such is not conclusive • Ex- in India more diabetics are there – it is only informative DATA • It is assorted or arranged information • Data helps in drawing a conclusion • Ex - In India every seventh person is diabetic – it gives a data
  6. 6. Information / data • We are concerned about data and not information • To get the data we are required to arrange the information • So we are concerned about both
  7. 7. DATA COLLECTION COUNTING OR MEASURING and RECORDING of RESULTS OF OBSERVATION is called data collection.
  8. 8. METHODS OF DATA COLLECTION CENSUS – In this type of data collection each individual unit of information is being recorded – Time consuming – Financially not viable – Practically impossible Example- life of an electric bulb SAMPLING METHOD – Selected representatives of the data is recorded – Whole data is not required to be recorded – Easy to collect – Practically possible – By analyzing this sample data conclusion is drawn. Example- case of hypertension
  9. 9. PREREQUISITES FOR COLLECTING DATA • Objectives and scope of the inquiry • Statistical units to be used • Sources of data / information • Method of data collection • Degree of accuracy aimed in the final result
  10. 10. CLASSIFICATION OF DATA QUALITATIVE OR ATTRIBUTE • It can not be expressed by numbers • It is not measurable • But can be classified under different categories • Ex- religion, blood group, sex QUANTITATIVE OR VARIBLES – Continuous variables – Discrete varibales
  11. 11. QUANTITATIVE OR VARIBLES contd… • Continuous variables • It is expressed in numbers • It can be measured • Ex- body temp., heart rate, etc • Discrete variables • It is countable • Ex- no. of patients in hospital, opd, etc.
  12. 12. DISCRETE VARIABLES contd… Primary – Collected directly – Ex- No of children having defective vision in a school, etc. Secondary – Previously collected – Used by others – No of HIV patients – No of still births
  13. 13. Sources of HEALTH DATA 1. Census 2. Registration of vital events 3. Sample registration system 4. Notification of disease 5. Hospital records 6. Record linkage 7. Disease registration 8. Epidemiological surveillance 9. Other health services records 10. Environmental health data 11. Population surveys 12. Other routine statistics related to health 13. Health manpower statistics 14. Non-quantitable information
  14. 14. SAMPLE – WHAT DO WE MEAN A PORTION OR PART OF A TOTAL POPULATION FOLLOWING CERTAIN GUIDELINE
  15. 15. SAMPLING TECHNIQUE contd… • Statistics is a characteristics of sample • Parameter is a characteristics of population
  16. 16. SAMPLING TECHNIQUES contd… SELECTIVE SAMPLING • Or non-random sampling • Or purposive sampling In this process, sampling is done by choice and not by chance
  17. 17. RANDOM SAMPLING • Sample is chosen according to a guideline • The incidence that one information being chosen as sample is following the guideline i.e. by chance and not by choice • But all the information in that population has got equal opportunity for being selected • No bias in collecting the sample • Sample mean is very close to the population mean Sampling techniques contd…
  18. 18. TYPES OF RANDOM SAMPLING • SIMPLE RANDOM • STRATIFIED RANDOM • SYSTEMATIC RANDOM • CLUSTER SAMPLING • MULTI STAGE SAMPLING • MULTI PHASE SAMPLING
  19. 19. SIMPLE RANDOM • RULE IS THAT EACH INFORMATION IN THIS POPULATION HAS GOT EQUAL OPPRTUNITY FOR BEING CHOSEN • EACH INFORMATION IS NUMBERED, THEN CERTAIN NUMBER IS RANDOMLY SELECTED Example- there are 100 students in class, each having roll numbers attached to them – now to select ten students simply to call any ten students by their roll number
  20. 20. STRATIFIED RANDOM • Here the information in a population is first categorised into small groups • Then the sample is chosen from each small group • Example- in the same above 100 students to select ten students the category can be 10 small groups from roll number 1 to 10, 11 to 20, 21 to 30, and so on up to 90 to 100. • Then from each small groups one student can be selected
  21. 21. • Advantages – increases the precision of calculation • Each sub group is being studied as a separate population STRATIFIED RANDOM
  22. 22. SYSTEMATIC RANDOM • This sampling is done following a rule • Each information is numbered • Then total population divided by number of samples required • One number is selected equal or less than the above fraction (sampling interval) • Then to that number the fraction is added to get the sample to be selected
  23. 23. • Example- • 300 persons are there • 30 samples are required to be selected • First number all the persons • 300/30=10 (sampling interval) • Pick any number equal to or less than ten • Suppose the number is 7 • Then 7+10=17, 17+10=27, 27+10=37,etc. SYSTEMATIC RANDOM
  24. 24. • Merits – Easy to follow – Less tedious – Gives almost accurate result SYSTEMATIC RANDOM
  25. 25. CLUSTER SAMPLING • If the population is very big in size • Then divide it into smaller size population • This should be non-overlapping • These are called clusters • Then randomly select some clusters • Then apply the method of systematic random sampling to each randomly selected cluster
  26. 26. Example of cluster sampling • 30 cluster random sampling technique by WHO used for evaluation of immunization coverage of vaccines. • It is an unique study design to evaluate routine immunization of coverage of vaccines
  27. 27. HOW IT IS DONE • Required sample size is 210 children between 12 to 23 months • From 30 clusters selected randomly • Each cluster 7 children • Find all the towns, cities etc. children between 12 to 23 months • Calculate total number of such children • Divide by 3o = sampling interval
  28. 28. • Select a random number less than or equal to sampling interval and having same number of digits as that of the sampling interval + sampling interval • The first cluster is the cluster where the total population is equal to or exceed the sampling interval is the 1st cluster
  29. 29. • Random number + sampling interval = 2nd cluster • 2nd cluster + sampling interval = 3rd cluster • Like wise ……… 3o clusters are completed
  30. 30. • To select the 1st house in 1st cluster any random sampling method can be used
  31. 31. • In each cluster door to door study is carried out to find out if any children is there between 12 to 23 months or not; if yes whether immunized or not
  32. 32. • Contiguous house to house study is done • This study goes on till 7 children in that cluster are obtained • Like wise all the 30 clusters are completed.
  33. 33. MULTI STAGE SAMPLING • In this sampling technique sampling is done in several stages using simple random technique in each of the stage • Example – population is very large and sample size is very small • To select 1000 persons from India for a study • Select 10 persons from each state randomly • Select 5 persons from each capital radomly
  34. 34. • In this method whole sample is considered in first stage • In second phase part of a sample is examined • In patholab a sample of blood is divided into several samples then studied for different stages MULTI PHAGE SAMPLING
  35. 35. SAMPLE SIZE • Factors determining the sample size – Nature of data – Study type – Sampling technique – Intensity of the problem in data – Level of confidence – Accuracy or precision – Error – One side test or two side test – Miscellaneous factor
  36. 36. PRESENTATION OF STATISTICAL DATA • Mode of presentation is more valuable than the gift. • If data is not presented systematically then it is of no use • It can not be comprehensible • In bio-statistics raw is of no use
  37. 37. HOW TO PRESENT DIFFERENT TYPE OF DATA • There are certain rules guiding data presentation • It is not according to own wish and fancy
  38. 38. • TABULAR FORM • Pictogram • PIE DIAGRAM • Bar diagram • HISTOGRAM • FREQUENCY POLYGON • Line diagram • OGIVE
  39. 39. TABLES TWO TYPES OF TABLES ARE GENERALLY USED – FREQUENCY DISTRIBUTION TABLE FOR • QUALITATIVE DATA or ATTRIBUTES • QUANTITATIVE DATA or VARIABLES
  40. 40. TABLE FOR QUALITATIVE DATA OR ATTRIBUTES • Suppose we want to draw a table for SCHOOL CHILDREN ON SEX BASIS • SEX IS NOT MEASURABLE DATA; IT IS A QUALITATIVE DATA
  41. 41. STUDENTS BY SEX IN A SCHOOL CHARACTERISTICS POPULATION Boys 73 Girls 71
  42. 42. • Quantitative data requires to be categorized • Otherwise it is not possible to understand • Blood pressure of 5000 persons TABLE FOR QUANTITATIVE DATA OR VARIABLES
  43. 43. BLOOD PRESSURE OF 100 PERSONS AGE GROUP IN YEARS SYSTOLIC [mm/hg] DIASTOLIC [mm/hg] 10 to 20 124 68 20 to 30 120 70 30 to 40 138 72 40 to 50 130 74
  44. 44. TABLE FOR QUANTITATIVE DATA OR VARIABLES • First spilt the data in small groups • Then number of items under each group • Group should be in ascending or descending order • Group interval minimum
  45. 45. BAR DIAGRAM • USED TO PRESENT GRAPHICALLY THE FREQUENCY OF DIFFERENT CATEGORIES OF QUALITATIVE DATA • IT CAN BE VERTICAL OR HORIZONTAL • IN VERTICAL TYPE Y-AXIS – FREQUENCY • IN HORIZONTAL TYPE X- AXIS - FREQUENCY
  46. 46. NUMBER OF TB CASES IN WEST BENGAL
  47. 47. NUMBER OF TB CASES IN WEST BENGAL
  48. 48. COMPOUND BAR DIAGRAM NO OF TB CASAES IN WB SEX WISE
  49. 49. COMPONENT BAR DAIGRAM SEX WISE NO OF HIV TESTED & POSITIVE
  50. 50. PIE DIAGRAM QUALITATIVE OR DISCRETE DATA
  51. 51. LINE DIAGRAM
  52. 52. LINE DIAGRAM
  53. 53. SCATTER DIAGRAM FOR NATURE OF ASSOCIATION BETWEEN VARIABLES
  54. 54. MEASUREMENTS OF CENTRAL TENDENCY • MEAN • MEDIAN • MODE • STANDARD DEVIATION
  55. 55. MEAN
  56. 56. MEDIAN • When data is arranged in ascending or descending order the middle most value is the median • Example – 6,7,7,7,8,9,10 • Here total seven data arranged in ascending order
  57. 57. MEDIAN
  58. 58. MEDIAN
  59. 59. MEDIAN 7+7 2 = 7
  60. 60. MODE • Observation which occurs most frequently in a series is called as mode • Example – 5,6,7,7,7,8,9,10 • Here 7 is the mode
  61. 61. MEASURES OF DISPERSION • Height of a group of people • They vary from persons to person • But how much they vary? • Can this variation be measured? • How?
  62. 62. • Measures of dispersion helps to find out how individual observations are dispersed or scattered around the mean of a large data. MEASURES OF DISPERSION
  63. 63. Deviation = Observation - Mean • Different measurements of DISPERSION are – RANGE – Mean deviation – STANDARD DEVIATION – Variance – COEFFICIENT OF VARIATION MEASURES OF DISPERSION
  64. 64. RANGE • Simply the difference between the highest and the lowest value • Example – 5 patients no of stay in hospital is 3,4,5,6,7 • Then range is 7-3=4
  65. 65. MEAN DEVIATION • It is the average deviation of observations from the mean value
  66. 66. • Example- incubation period of measles in 7 children is 10,9,11,7,8,9,9 • Mean deviation: – No of observations (n)=7 – Total incubation period = 10+9+11+7+8+9+9/7= 63 – Average incubation period = = 63/7=9 – How many days each patient is varying from the average incubation period (9 days) MEAN DEVIATION
  67. 67. • 1st patient 10-9=1day (1) • 2nd patient 9-9=0 day (0) • 3rd patient 11-9= 2days (+2) • 4th patient 7-9 = 2days (-2) • 5th patient 8-9 = 1 day (-1) • 6th Patient 9-9=0 day (0) • 7th Patient 9-9=0 day (0) • These are the individual deviations MEAN DEVIATION
  68. 68. MEAN DEVIATION • Do not consider the plus or minus symbol • Just add the individual deviations • (1)+(0)+(2)+(-2)+(-1)+(0)+(0)= 6 = x¯
  69. 69. MEAN DEVIATION the Formula
  70. 70. STANDARD DEVIATION (σ) • Square root of the arithmetic mean of the square of the deviations taken from arithmetic mean • Simply root –mean-square-deviation
  71. 71. STANDARD DEVIATION (σ)
  72. 72. • In the same example SD will be • (x-x-)2 = (10)2= 100 • 100/7= 14.28 • √14.28=3.77
  73. 73. TEST OF SIGNIFICANCE • We want to know the effectiveness of a drug • Then we have prove the drug on some people • To prove whether it is effective or not we have to keep another group without giving the drug also
  74. 74. RESEARCH METHODOLOGY
  75. 75. • So we are trying to prove the drug on sample group then the result shall be generalised • To generalise the result we have to assume • This assumption need not be true all the time • Such assumptions which may be true or may not be true is called statistical hypothesis TEST OF SIGNIFICANCE
  76. 76. • Statistical hypothesis is a prediction about parameter • This can be tested by significant methods • These tests are know as test of significance
  77. 77. • In interpreting the statistical data the observer is interested to know variability of sample in comparison to the population • So observer has to take decision regarding whether sampling was correct or not • Regarding transparency of the study • Any error I n interpreting the data
  78. 78. HYPOTHESIS • Null Hypothesis (H0) • ex- the height of urban person is more than rural person • This is statement ; may be true or false • How to know? • Take sample of urban people and rural people and measure their height • If the true then you have to accepted the hypothesis; if false then reject the hypothesis
  79. 79. • Hypothesis to be tested without any difference than the sample is called null hypothesis - (H0) • Against which it is tested is called alternative hypothesis - (H1)
  80. 80. LEVEL OF CONFIDENCE • It is the degree of belief upon the hypothesis • Usually it is either 95% or 99%
  81. 81. Z - test • The formula • Z = (x¯- µ) = (x¯- µ) SE σ/√n • x¯ - sample mean • SE – standard error • µ - population mean • σ – standard deviation of population
  82. 82. LET US DO A PROBLEM • A cigarette company claimed that its cigarette contains less than 15 mg of nicotine per cigarette; claiming to be low risk cigarette • An NGO found that the average nicotine content of such cigarette is 16.2mg/cigarette • And the standard deviation is 3.6mg/cigarette • Using 0.1 level of significance to prove the either of the hypothesis
  83. 83. • The null hypothesis (H0) = yes it contains less than 15 mg of nicotine per cigarette; so low risk cigarette • i.e. null hypothesis - (H0): µ ≤ 15
  84. 84. • Alternative hypothesis - (H1) - No it contains more than 15 mg nicotine per cigarette; so it is a risky cigarette • i.e. alternative hypothesis - (H1): µ ≥ 15
  85. 85. SINGLE TAIL TEST • We are testing the nicotine content; if high then problem; if low no problem • So we are doing with one tail test • But in case of blood pressure if low problem and if high also problem • in that case we have to deal with two tail test
  86. 86. • When σ – standard deviation of population is not available then we have to use standard deviation of sample SD (s) denoted by small (s) • Now sample mean - x¯ = s/ √n = 3.6 = 0.255 √200 • Z = 16.2-15 = 4.7 0.255
  87. 87. • At 1% level of confidence the standard error should be less than 2.33 • As here it is 4.7 • So we reject the null hypothesis i.e. we proved that the nicotine content of such cigarette is more than the 15 mgs; hence it is a high risk cigarette
  88. 88. ‘t’ test • Why ‘t’ not any other alphabet • It was devised by W.S.GOSSET who published a paper regarding the method known as ‘STUDENT’ • And the last alphabet become the popular name for such test
  89. 89. • ‘z’ test is done where the sample size is large • ‘t’ test is done where sample size is small • Or simply speaking the sample size is less than 30 ‘t’ test
  90. 90. • Why 30 not any other digit? • It is observed that distributions can be approximated to population when sample size is large • But when sample size is small (smaller than 30) can not be approximated to their population • That requires another test i.e. ‘t’ test
  91. 91. Let us do example • A sample was chosen comprising of 12 persons from a population • Their weight was found to be in following kgs • 40,45,48,50,55,58,60,60,62,65,68 and 70 • Is the sample drawn from the population with mean weight of 55 kg?
  92. 92. LET US WRITE IN THE LANGUAGE OF STATISTICS • Null hypothesis – H0 the mean weight of the population is 55 kg • Alternative hypothesis H1 – the mean weight of the population is not 55 kg
  93. 93. SYMBOLICALLY • H0 : µ = 55 kg • H0 : µ ≠ 55 kg • Calculation of standard deviation
  94. 94. Sl no. Weight in kg (x) Mean weight X¯ X-X¯ (X-X¯) 2 1 40 56.75 -16.75 280.56 2 45 -11.75 138.06 3 48 -8.75 76.56 4 50 -6.75 45.56 5 55 -1.75 3.06 6 58 1.25 1.56 7 60 3.25 10.56 8 60 3.25 10.56 9 62 5.25 27.56 10 65 8.25 68.06 11 68 11.25 126.56 12 70 13.25 175.56 681 0 964.25
  95. 95. STANDARD DEVIATION • √ ∑ (X-X¯) 2 n-1 • √ 964.25 = √ 964.25 = √ 87.66 = 9.36 (s) 12-1 11 • t = X¯ - µ s/√n Let us put the digits • X¯ = mean weight = 56.75 • µ = hypothesis = 55 • s = 9.36 • n =12
  96. 96. • t = 56.75-55 = 1.75 = 1.75 = 0.65 9.36/√ 12 9.36/3.46 2.71 Degree of freedom (D.F.) = (n-1) (12-1) = 11 Table value for t11 = 2.201 at 5% level of significance Our value is 0.65 which is much lower than the t value at 5% level of significance So we can accept the null hypothesis H0 i.e. the mean weight of the population is 55 kgs
  97. 97. WHAT DOES IT MEAN ? • It means that the difference between mean value of the sample and the mean value of the population from which the sample has been collected is 0.65 which is lower than the table value for ‘t’ at 5% level of significance • The acceptable difference is up to 2.201
  98. 98. TO CONCLUDE • This sample is a perfect representative sample of its population
  99. 99. • Assessment of the reuslts
  100. 100. • Success is defined as doing it repeatedly without losing enthusiasm
  101. 101. I CAN NOT SAY THANK YOU WITHOUT “U”
  102. 102. https://www.facebook.com/SevaClinic.official https://plus.google.com/116248692040189856015

×