Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. AnIntroductionTo Biostatistics
  2. 2. 2 Introduction to biostatistics: Definitions:  Biostatistics: Collections of data related to area of research (variables).  Data: information (Observations) about one or more variables.  Variables: any quantity that varies. Biostatistics deals with:  Individual life from birth to death.  Events affecting life like marriage, sickness, education.  Factors affecting such vital events and their outcome. Applications of biostatistics:  Public health, including epidemiology, nutrition and environmental health,  Design and analysis of clinical trials in medicine  Population genetics and statistical genetics  Ecology  Biological sequence analysis. Population Samples Data (Variables) Table Graph Results
  3. 3. 3 Biostatistics is concerned with the following indicators:  Demographic and vital events.  Environmental health statistics.  Measuring health status (mortality, morbidity, disability)  Measuring health resources facilities, beds, manpower.  Planning for, administrating, and managing health services.  Researching of health diseases.  Preparation of life tables.  Estimation of population (census) Data collection: Refers to the methods used to collect data about a particular phenomena. Methods of data collection:  Records.  Interview.  Telephone.  Mail.  Observation. Data Entry: Data entered directly into the computer but in order to move the data we must store it in either spreadsheet or data package which are so limited A more flexible approach is to have your data available as an ASCII format or Text file ASCII / Text file Free format - Consists of rows of text that can be reviewed in computer screen - Each variable separated by delimeter (comma, space …) Can saved in spreadsheet
  4. 4. 4 Data Organization & presentation: There are three ways of organizing and presenting data:  Classification  Tabulation.  Graphic displays. 1. Classification It refers to different observations that are grouped together into classes. Types of Data 1. Numerical (Quantitative) data. 2. Categorical (Qualitative) data. 3. Derived data. 4. Censored data. 1- Numerical (Quantitative) data: Variables take numerical value Discrete data: there is count (integer) of value Continuous data: take any value. 2- Categorical (Qualitative) data: one individual can belong to one of a number of categories. Nominal data: Categories have names (not ordered) (Ex. Blood groups A, B, AB, and O) Ordinal data: Categories are ordered. (Ex. Disease staging system: Advanced, moderate, mild, none) Note: Binary or Dichotomous data when only two possible categories. (Ex. Yes/No, Dead/alive, Disease/Patient)
  5. 5. 5 3-Derived data: With a given characteristics (a) Group (Category) Observations Without a given characteristics (b) 1. Proportion = a / a+b 2. Percentage: Per-hundred= a / a+b × 100 3. Ratios and Quotients= a / b 4. Rate: Proportion multiplied by base = a / a+b * base 5. Score: When can’t measure the quality but overall may sum to give a relevant quality. 4- Censored data: used when:  lab values are detected above cut-off value  follwing patient trials 2. Tabulation It is putting data into tables. Or it is a systematic arrangement of data in columns and rows. Tabulation has three kinds:  Single tabulation.  Double tabulation.  Manifold tabulation. a. Single tabulation. It represents one variable. For example: number of patients and marital status. Data the marital status for 15 patients is as following: married, divorced, single, widow, single, married, divorced, single, widow, married, married, divorced, single, divorced, single. Show it on a single table.
  6. 6. 6 Solution: Marital status Number of patients Total 15 b. Double tabulation: It represents two variables. For example: nationality and marital status for a number of patients. Data  the nationality and marital status for 15 patients: Married s, divorced n, single s, widow s, single n, married n, divorced s, single n, widow n, married s, married s, divorced n, single n, divorced s, single s. Note: S-Saudis N-Non-Saudis. Draw a table Solution:
  7. 7. 7 c. Manifold Tabulation It represents more than two variables.
  8. 8. 8 Displaying data graphically: Frequency Distribution: relates every possible observation to its observed frequency of occurrence Relative Frequency: Percentage of total frequency used to compare the FD in two or more groups of individuals Displaying frequency distribution: 1- One variable a) Bar or Column b) Pie chart c) Histogram d) Dot plot e) Stem and leaf plot f) Box plot (box and whisker plot) 2-Two variables: a) Segmented column chart b) Scatter diagram
  9. 9. 9 Shape of frequency distribution: Distribution of Data in a single peak: 1- Symmetrical (Normal or Gaussian) Distribution. To the left (-ve skewed) 2- Skewed Distribution To the right (+ve skewed) Uni-modal Single peak Bi-modal Two peaks Uni-form No peaks
  10. 10. 10 Statistical (Experimental) Parameters: 1. Mean X 1. Range 2. Median 2. Standard Deviation SD 3. Mode 3. Mean Absolute Deviation 4. Variation S2 5. Coefficient Variation CV 6. Percentiles a) Measures of Central Tendency: 1. Mean (X): = Average (used in normal distribution of the data) X= Ʃ x n Ʃ x: The sum of all the values n: Number of value 2. Median: (used in skewed distribution of data) It’s the midpoint of the distribution of the values and depends on the number of the data Calculation: 1. Arrange the numerical observations in ascending or descending order. 2. For odd data number: 3. For even data number 3. Mode: is the most recurrent or most frequent value Symbols of Population Vs Samples Symbol Population Sample Variable X x Size N n Mean µ X Standard deviation Σ S Variance σ2 S2 a) Measures of Central Tendency b) Measures of Dispersion (spread)
  11. 11. 11 b) Measures of Dispersion (spread): 1. Range = largest observation value – smallest observation value 2. Standard Deviation: S = √Ʃ(x-X)2 n-1 Note: in normal distribution: ±1 SD includes 68.2% of the data ±2 SD includes 95.4%, ±3 SD includes 99.7%. 3. Variance: S2 = √Ʃ(x-X)2 n-1 4. Coefficient Variation: is a measure of spread that’s independent of the unit of the measurement CV = S x 100 X 5. Mean Absolute Deviation: 6. Standard Error of the Mean: to quantitate the accuracy of the mean and standard deviation SX = S √ n
  12. 12. 12 7. Percentiles: describe a skewed distribution. We need also to calculate the median which is the point that divides the population in half Computing the percentile points of a population is a good way to see how close to a normal distribution it is. When a population follows a normal distribution, we can describe its location and variability completely with two parameters, the mean and standard deviation. When the population does not follow a normal distribution we can describe it with the median and other percentiles. The standard error quantifies the precision of these estimates. The Standard Normal distribution has a mean of zero and a variance of one. If the random variable, x, has a Normal distribution with mean, µ, and variance, σ2 then the Standardized Normal Deviate (SND) is a random variable that has a Standard Normal distribution. 2.5th percentile mean-2 standard deviation 16th percentile mean-1 standard deviation 25th percentile mean-0.67 standard deviation 50th percentile (median) mean 75th percentile mean+0.67 standard deviation 84th percentile mean+1 standard deviation 97.5th percentile mean+2 standard deviation
  13. 13. 13 Problems: 1. Viral load of HIV-1 is a known risk factor for heterosexual transmission of HIV; people with higher viral loads of HIV-1 are significantly more likely to transmit the virus to their uninfected partners. Thomas Quinn and associates studied this question by measuring the amount of HIV-1 RNA detected in blood serum. The following data represent HIV-1 RNA levels in the group whose partners seroconverted, which means that an initially uninfected partner became HIV positive during the course of the study; 79725, 12862, 18022, 76712, 256440, 14013, 46083, 6808, 85781, 1251, 6081, 50397, 11020, 13633, 1064, 496433, 25308, 6616, 11210, 13900 RNA copies/mL. Find the mean, median, standard deviation, and 25th and 75th percentiles of these concentrations. Do these data seem to be drawn from a normally distributed population? Why or why not?
  14. 14. 14 2. When data are not normally distributed, researchers can sometimes transform their data to obtain values that more closely approximate a normal distribution. One approach to this is to take the logarithm of the observations. The following numbers represent the same data described in Prob. 2-1 following log (base 10) transformation: 4.90, 4.11, 4.26, 4.88, 5.41, 4.15, 4.66, 3.83, 4.93, 3.10, 3.78, 4.70, 4.04, 4.13, 3.03, 5.70, 4.40, 3.82, 4.05, 4.14. Find the mean, median, standard deviation, and 25th and 75th percentiles of these concentrations. Do these data seem to be drawn from a normally distributed population? Why or why not?
  15. 15. 15 3. Polychlorinated biphenyls (PCBs) are a class of environmental chemicals associated with a variety of adverse health effects, including intellectual impairment in children exposed in utero while their mothers were pregnant. PCBs are also one of the most abundant contaminants found in human fat. Tu Binh Minh and colleagues analyzed PCB concentrations in the fat of a group of Japanese adults (“Occurrence of Tris (4-chlorophenyl)methane, Tris (4-chlorophenyl)methanol, and “Some Other Persistent Organochlorines in Japanese Human Adipose Tissue. They detected 1800, 1800, 2600, 1300, 520, 3200, 1700, 2500, 560, 930, 2300, 2300, 1700, 720 ng/g lipid weight of PCBs in the people they studied. Find the mean, median standard deviation, and 25th and 75th percentiles of these concentrations. Do these data seem to be drawn from a normally dis-tributed population? Why or why not?
  16. 16. 16 4. Sketch the distribution of all possible values of the number on the upright face of a die. What is the mean of this population of possible values? 5. Roll a pair of dice and note the numbers on each of the upright faces. These two numbers can be considered a sample of size 2 drawn from the population described in Prob. 2-4. This sample can be averaged. What does this average estimate? Repeat this procedure 20 times and plot the averages observed after each roll. What is this distribution? Compute its mean and standard deviation. What do they represent?
  17. 17. 17 6. Robert Fletcher and Suzanne Fletcher (“Clinical Research in General Medical Journals: A 30-Year Perspective,” N. Engl. J. Med., 301 :180–183, 1979, used by permission) studied the characteristics of 612 randomly selected articles published in the Journal of the American Medical Association, New England Journal of Medicine, and Lancet since 1946. One of the attributes they examined was the number of authors; they found: Sketch the populations of numbers of authors for each of these years. How closely do you expect the normal distribution to approximate the actual population of all authors in each of these years? Why? Estimate the certainty with which these samples permit you to estimate the true mean number of authors for all articles published in comparable journals each year. Year No. of articles examined Mean no. of authors SD 1946 151 2.0 1.4 1956 149 2.3 1.6 1966 157 2.8 1.2 1976 155 4.9 7.3
  18. 18. 18 Vital Statistics Fertility Measures: 1. Crude birth rate: = Total number of live births during a year x 1000 Total population 2. General fertility rate: = Number of live births during a year x 1000 Total number of women of childbearing age 3. Age-specific fertility rate: = Number of births to women of a certain age in a year x 1000 Total number of women of the specified age 4. Rate of natural increase (RNI) : = The crude birth rate - The crude death rate of a population Morbidity Measures: Morbidity: is the state of being diseased or the number of sick persons or cases of disease in relation to specific population. 1. Incidence rate: = Total number of new cases of a specific disease during a year x k Total population 2. Prevalence rate: = Total number of cases, new or old, existing at a point of time x k Total population at that point in time 3. Case fatality ratio = Total number of deaths due to a disease x k Total number of cases due to a disease k: is dependent on the magnitude of the numerator
  19. 19. 19 Mortality Measures: Mortality (Death rate): is the proportion of inpatient hospitalizations that end in death, usually expressed as a percentage 1. Annual crude death rate: = Total number of death during a year x k Total population 2. Age specific death rate: = Total number of death in a specific Age during a year x k Total population in a specific Age 3. Age- adjusted death rate: = Total number of expected deaths x 1000 Total standard population 4. Maternal mortality rate: = Number of maternal deaths in a given geographic area in a given year x k Number of live births that occurred among the population of the given geographic area during the same year 5. Infant mortality rate: = Deaths under 1 year of age during a year x k Total number of live births during the year 6. Neonatal mortality rate: = Deaths under 28 days of age during a year x k Total number of live births during the year 7. Fetal mortality rate: = Total number of fetal deaths during a year x k Total deliveries during the year 8. Total mortality rate: = Ʃ (age specific fertility rate x interval to which age were grouped)
  20. 20. 20 Health and hospital Statistics A.TERMS 1. Hospital Beds 1. Nursery beds 2. Recovery beds 3. Day ward 4. Labour beds 5. Observation beds 2. Bed Days 3. Inpatient 4. Inpatient admission 5. Inpatient discharge 6. Inpatient census 7. Inpatient day 8. Bed Count: is the number of available facility inpatient beds, both occupied and vacant, on any given day. 9. Bed Count Days: is a unit of measure denoting the presence of one inpatient bed (either occupied or vacant) set up staffed for use in one 24-hour period. 10.Total Bed Count Days: the sum of inpatient bed count days for each of the days in the period under consideration. 11.Bed Count Ratio: is the proportion of beds occupied, defined as the ratio of inpatient service days to bed count in the period under consideration. 12. Length of stay (LOS): is the number of days a patient stays in the hospital 13.Total Length of stay: (for all inpatients) is the sum of the days stay of any group of inpatients discharged during a specified period of time 14.Average Length of stay (ALOS): is the average number of days that inpatients discharged after considerable during the period under consideration stayed in the hospital.
  21. 21. 21 B. STATISTICS 1. Average daily inpatient census: is the average number of inpatients present each day for a given period of time. = Total inpatient service day for a period (excluding newborns) Total number of days in period 2. Average Daily Newborn Census: = Total newborn inpatient service says for a period Total number of days in the period 3. Bed Occupancy Ratio = Total inpatient service days in a period × 100 Total bed count days in the period (bed count × number of days in the period) 4. Bed Turnover Rate: = Number of discharges (including deaths) for a period Average bed count during the period 5. Bed turnover interval: = (Bed count × 365) – inpatient days Number of discharges 6. Average Length of stay (ALOS): ALOS = Total length of stay (discharge days) Total discharges (including deaths) 7. Postoperative Infection Rate: = Number of infections in clean surgical cases for a period × 100 Number of surgical operations for the period 8. Postoperative death Rate: = Number of deaths in 10 days after surgery for a period × 100 Number of surgical operations for the period 9. Cancer Mortality Rate = Number of cancer deaths during a period × 100.000 Total number of cancer patient
  22. 22. 22