BIOSTATISTICS – A TOOL FOR RESEARCH AND DATA ANALYSIS PRESENTED BY SABA BUTT
SIGNIFICANCE OF STATISTICS FOR ANALYSIS AND RESEARCH
STATISTICS IS NECESSARY FOR ALL FIELDS OF LIFE REQUIRING RESEARCH AND DATA ANALYSIS In all fields of life we have to analyze facts and interpret from these to make conclusions. The analysis needs statistics – to compare the qualities and quantities to help reach some conclusion, which will lead to decision making in business, government, industry etc and development of theories in science.
BIOSTATISTICS THE STATISTICS IN LIFE SCIENCES
designing experiments and other data collection,  summarizing information to aid understanding,  drawing conclusions from data, and  estimating the present or predicting the future.  In making predictions, Statistics uses the companion subject of  Probability , which models chance mathematically and enables calculations of chance in complicated cases.   BIOSTATISTICS IS A DISCIPLINE THAT IS CONCERNED WITH:
SOME IMPORTANT  DEFINITIONS
POPULATION AND SAMPLE POPULATION:  A population consists of an entire set of objects, observations, or scores that have something in common. For example, a population might be defined as all males between the ages of 15 and 18. SAMPLE : A sample is a subset of a  Population  Since it is usually impractical to test every member of a population, a sample from the population is typically the best approach available.
PARAMETER AND STATISTIC PARAMETER:  A parameter is a numerical quantity measuring some aspect of a population of scores. For example, the mean is a measure of central tendency in a  population. STATISTIC:  A "statistic" is defined as a numerical quantity (such as the mean calculated in a sample).
MEASURES OF CENTRAL TENDENCY Mean (Arithmetic Mean) Average value of a sample or population Median Middle value of sample or population Mode The value repeated most
The Arithmetic Mean  or  Mean  is what is commonly called the average: When the word "mean" is used without a modifier, it can be assumed that it refers to the arithmetic mean. The mean is the sum of all the scores divided by the number of scores.  Formula of calculating  Population Mean  is:   μ = ΣX/N, where  μ  =  population mean,  and N  =  number of scores .  If the scores are from a  sample , then the symbol  X   refers to the  mean  and  n  refers to the  sample size , formula written as:  X = ΣX/n.
Median:   The median is the middle of a distribution: half the scores are above the median and half are below the median. The median is less sensitive to extreme scores than the mean and this makes it a better measure than the mean for highly  skewed distributions .   5 3 4 2.5 6 Mode:   The  mode  is the most frequently occurring score in a distribution and is used as a measure of central tendency. The advantage of the mode as a measure of central tendency is that its meaning is obvious.   5 3 4 5 6
MEASURES OF DISPERSION After measuring the central value i.e., mean, next is to know that to which extent this central value represents all values, that is, to know the  scattering or dispersion of the data . There are certain measures which gives values of dispersion. The most important and widely used of these in research are: Varience   Standard Deviation Standard Error of Mean
HYPOTHESIS TESTING Student’s t test F test ANOVA Correlation Regression
EXAMPLE OF DATA ANALYSIS Comparison of Weight to Height Ratio expressed by Body Mass Index of a population.  BMI is calculated as weight in Kg / Height in Meter 2 . General surveys in USA and Europe showed that young population is overweight which is enhancing chances of diseases. We surveyed young female population of Punjab University for BMI. We measured BMI of 400 students randomly.
Subject No. BMI Subject No. BMI
We have two tables of data: one giving BMI of girls, other BMI of boys. These are long data tables.  Now, we have to analyze it to conclude something from this data . What we need, now?  We need a measure of  central tendency  to indicate average BMI to compare with other populations, between boys and girls and with the normal range.  The most common and useful measure for the purpose is the  Arithmetic Mean . Arithmetic Mean is calculated by taking sum of all values and dividing it by No. of observations. ARITHMETIC MEAN
SAMPLING ERROR Then next, we have an average value but is this average representative of all values really. Is it possible that some values be very large and some very small? If it is so, the Mean is not representative of whole data. This is called  sampling error  because some students may have strong genetic tendency to being overweight, these values are somewhat different from population. This will make our result erroneous, i.e., our Mean does not represent all data.
EXAMPLE We have four values - 2,  3,  4,  10  Mean = Sum of values / No of Observations 2 + 3 + 4 + 10 / 4 =  4.75 This is far from three values in the data. This is because of a large value that exist in the data i.e. 10.
STANDARD DEVIATION Now, we need some statistical measure that tell us how to rule out sampling error. This is the  standard deviation  – measure to find how the individual values vary from the average value, i.e., Mean.
Standard Deviation of that Data SD = s =  ∑ (x – x)  2   n - 1 Descriptive Statistics from MINITAB Variable  N  Mean  Median  StDev  SE Mean C1  4  4.75  3.50  3.59  1.80
Student’s T Test Two Sample T-Test and Confidence Interval Two sample T for BMI-F vs BMI-M N  Mean  StDev  SE Mean BMI-F  30  31.35  6.26  1.1 BMI-M  21  26.96  4.11  0.90 95% CI for mu BMI-F - mu BMI-M: ( 1.5,  7.31) T-Test mu BMI-F = mu BMI-M (vs not =):  T= 3.02   P=0.0040   DF=  48

Statistical Analysis Of Data Final

  • 1.
    BIOSTATISTICS – ATOOL FOR RESEARCH AND DATA ANALYSIS PRESENTED BY SABA BUTT
  • 2.
    SIGNIFICANCE OF STATISTICSFOR ANALYSIS AND RESEARCH
  • 3.
    STATISTICS IS NECESSARYFOR ALL FIELDS OF LIFE REQUIRING RESEARCH AND DATA ANALYSIS In all fields of life we have to analyze facts and interpret from these to make conclusions. The analysis needs statistics – to compare the qualities and quantities to help reach some conclusion, which will lead to decision making in business, government, industry etc and development of theories in science.
  • 4.
  • 5.
    designing experiments andother data collection, summarizing information to aid understanding, drawing conclusions from data, and estimating the present or predicting the future. In making predictions, Statistics uses the companion subject of Probability , which models chance mathematically and enables calculations of chance in complicated cases. BIOSTATISTICS IS A DISCIPLINE THAT IS CONCERNED WITH:
  • 6.
    SOME IMPORTANT DEFINITIONS
  • 7.
    POPULATION AND SAMPLEPOPULATION: A population consists of an entire set of objects, observations, or scores that have something in common. For example, a population might be defined as all males between the ages of 15 and 18. SAMPLE : A sample is a subset of a Population Since it is usually impractical to test every member of a population, a sample from the population is typically the best approach available.
  • 8.
    PARAMETER AND STATISTICPARAMETER: A parameter is a numerical quantity measuring some aspect of a population of scores. For example, the mean is a measure of central tendency in a population. STATISTIC: A "statistic" is defined as a numerical quantity (such as the mean calculated in a sample).
  • 9.
    MEASURES OF CENTRALTENDENCY Mean (Arithmetic Mean) Average value of a sample or population Median Middle value of sample or population Mode The value repeated most
  • 10.
    The Arithmetic Mean or Mean is what is commonly called the average: When the word "mean" is used without a modifier, it can be assumed that it refers to the arithmetic mean. The mean is the sum of all the scores divided by the number of scores. Formula of calculating Population Mean is: μ = ΣX/N, where μ = population mean, and N = number of scores . If the scores are from a sample , then the symbol X refers to the mean and n refers to the sample size , formula written as: X = ΣX/n.
  • 11.
    Median: The median is the middle of a distribution: half the scores are above the median and half are below the median. The median is less sensitive to extreme scores than the mean and this makes it a better measure than the mean for highly skewed distributions . 5 3 4 2.5 6 Mode: The mode is the most frequently occurring score in a distribution and is used as a measure of central tendency. The advantage of the mode as a measure of central tendency is that its meaning is obvious. 5 3 4 5 6
  • 12.
    MEASURES OF DISPERSIONAfter measuring the central value i.e., mean, next is to know that to which extent this central value represents all values, that is, to know the scattering or dispersion of the data . There are certain measures which gives values of dispersion. The most important and widely used of these in research are: Varience Standard Deviation Standard Error of Mean
  • 13.
    HYPOTHESIS TESTING Student’st test F test ANOVA Correlation Regression
  • 14.
    EXAMPLE OF DATAANALYSIS Comparison of Weight to Height Ratio expressed by Body Mass Index of a population. BMI is calculated as weight in Kg / Height in Meter 2 . General surveys in USA and Europe showed that young population is overweight which is enhancing chances of diseases. We surveyed young female population of Punjab University for BMI. We measured BMI of 400 students randomly.
  • 15.
    Subject No. BMISubject No. BMI
  • 16.
    We have twotables of data: one giving BMI of girls, other BMI of boys. These are long data tables. Now, we have to analyze it to conclude something from this data . What we need, now? We need a measure of central tendency to indicate average BMI to compare with other populations, between boys and girls and with the normal range. The most common and useful measure for the purpose is the Arithmetic Mean . Arithmetic Mean is calculated by taking sum of all values and dividing it by No. of observations. ARITHMETIC MEAN
  • 17.
    SAMPLING ERROR Thennext, we have an average value but is this average representative of all values really. Is it possible that some values be very large and some very small? If it is so, the Mean is not representative of whole data. This is called sampling error because some students may have strong genetic tendency to being overweight, these values are somewhat different from population. This will make our result erroneous, i.e., our Mean does not represent all data.
  • 18.
    EXAMPLE We havefour values - 2, 3, 4, 10 Mean = Sum of values / No of Observations 2 + 3 + 4 + 10 / 4 = 4.75 This is far from three values in the data. This is because of a large value that exist in the data i.e. 10.
  • 19.
    STANDARD DEVIATION Now,we need some statistical measure that tell us how to rule out sampling error. This is the standard deviation – measure to find how the individual values vary from the average value, i.e., Mean.
  • 20.
    Standard Deviation ofthat Data SD = s = ∑ (x – x) 2 n - 1 Descriptive Statistics from MINITAB Variable N Mean Median StDev SE Mean C1 4 4.75 3.50 3.59 1.80
  • 21.
    Student’s T TestTwo Sample T-Test and Confidence Interval Two sample T for BMI-F vs BMI-M N Mean StDev SE Mean BMI-F 30 31.35 6.26 1.1 BMI-M 21 26.96 4.11 0.90 95% CI for mu BMI-F - mu BMI-M: ( 1.5, 7.31) T-Test mu BMI-F = mu BMI-M (vs not =): T= 3.02 P=0.0040 DF= 48