Data Types and Distributions
Tonya Esterhuizen
Biostatistics Unit, Centre for
Evidence Based Health Care
Introduction to Statistics
“There are three kinds of lies:
lies, damn lies and statistics”
Benjamin Disraeli
“Statistics are like bikinis. What they
reveal is suggestive, but what they
conceal is vital.”
Aaron Levenstein
Quiz
Click the Quiz button to edit this quiz
Statistics - What Does It Mean ?
Data
Numerical observations
Quantitative information
Statistics - What Does It Mean ?
Number of doctors in the different
provinces in South Africa
Birth weight of babies born at a
given hospital in a given year
The number of diabetics in Cape Town
who have had an amputation
The prevalence rate of HIV per 1000
population in Western Cape
The creatine concentration in mg per
litre in a 24 h urine sample
Statistics - What Does It Mean ?
Statistics - What Does It Mean ?
Discipline or science of managing
uncertainties in decision processes
Statistics - What Does It Mean ?
using the scientific methods of
collecting
processing
reducing
presenting
analysing
interpreting data
Statistics - What Does It Mean ?
and of making inferences
and drawing conclusions from
numerical data
Main Uses of Statistical Methods
Collection of data in the best possible
way
Description of the characteristics of a
group or situation
Analysing data and drawing conclusions
Collection of data in the best way
Using a suitable and appropriate method
for selecting subjects for a study, to
minimise the role of uncertainty
Designing valid data collection
instruments such as questionnaires
and schedules
Collection of data in the best way
Organising data collection procedures
for clinical and laboratory research
and epidemiological research to
minimise the chances of errors
eg
standardise definitions and equipment
and train data gatherers
Description of the characteristics of
a group or situation
Data presentation in tables or graphs
Calculating summary measures such as
averages, which can adequately
represent the structure of the data set
Analysing data and
drawing conclusions
This involves analytical techniques and
the use of probability concepts
in drawing conclusions
Uses of statistical concepts and
methods in health science
Handling of variation
Diagnosis of patients ailments and
communities’ health problems
Prediction of likely outcomes of an
intervention programme in a community
or of treatment of individual patients
Uses of statistical concepts and
methods in health science
Selection of appropriate intervention for
a patient or a community
Public health, health administration and
planning
Planning, conducting, analysing,
interpreting and reporting of medical
research
Handling of variation
Variation in a characteristic occurs when
its value changes from subject to
subject
Or from time to time
from instrument to instrument
within the same subject
or from observer to observer
Handling of variation
Requires appropriate methods to
• summarise a characteristic for a group
of patients or for a community
• decide on the normal or average value
of a characteristic
• compare two groups of patients with
respect to a particular characteristic
Diagnosis of patients ailments
Explicit statistical methods are available
for ordering disease categories
according to their probabilities of
being the correct diagnosis
Changing medicine from an art
to a science ?
Prediction of likely outcome of treatment
Prognosis - An outcome is predicted when
the chances of its occurrence are high
and the associated uncertainty is low
Achieved by keeping records of the
characteristics prior to treatment, the
treatment and its outcome and
analysing them
Selection of appropriate intervention
This is based on
• experience gained with similar
patients who received the intervention
• reports of clinical trials or experiments
of the efficacy of different drugs or Rx
• objective assessment of previous
experience
Public Health, health planning
Use of data relating to the health and
illness in the population to make a
community diagnosis
Requires knowledge of:
Public Health, health planning
Requires knowledge of:
• population characteristics - age, sex
• health profile of the population in
terms of disease risk factors
• factors affecting population dynamics
data on births, deaths, migration
Get a feel for the data
Assess the quality of the data
• Types of variables
• Summary statistics
• Distribution
• Graphical representation
Descriptive Statistics
Types of Variables
Quantitative
Continuous : temp, height, weight
Discrete : number of headaches/week
Categorical
Ordinal : severity of pain
Nominal : sex, blood group
Binomial : no or yes
Types of Variables
• Influences the type of analysis that is
possible with that data
• Therefore its important to be able to
define your variable types so that the
most appropriate statistical tests are
chosen.
Types of Variables
Types of Data Distributions
• Two of the most common in medical
statistics:
– Normal (Z) distribution (continuous data)
– Binomial distribution (binary categorical
data)
Birth weight (Kg) No of births
1.76 - 2.0 4
2.01 - 2.25 3
2.26 - 2.5 12
2.51 - 2.75 34
2.76 - 3.0 115
3.01 - 3.25 175
3.26 - 3.5 281
3.51 - 3.75 261
3.76 - 4.0 212
4.01 - 4.25 94
4.26 - 4.5 47
4.51 - 4.75 14
4.76 - 5.0 6
5.01 - 5.25 2
Total Births 1260
Normal
Distribution:
Frequency
Distribution
Tabulated
Limits
0
50
100
150
200
250
300
2 2.25 2.5 2.75 3 3.25 3.5 3.75 4 4.25 4.5 4.75 5 5.25
Birth weight (Kg)
Frequency
Histogram
Histogram and Frequency Polygon
0
50
100
150
200
250
300
2 2.25 2.5 2.75 3 3.25 3.5 3.75 4 4.25 4.5 4.75 5 5.25
Birth weight (Kg)
Frequency
Frequency Polygon
0
50
100
150
200
250
300
2 2.25 2.5 2.75 3 3.25 3.5 3.75 4 4.25 4.5 4.75 5 5.25
Birth weight (Kg)
Frequency
Frequency Polygon
0
50
100
150
200
250
300
2 2.25 2.5 2.75 3 3.25 3.5 3.75 4 4.25 4.5 4.75 5 5.25
Birth weight (Kg)
Frequency
0
50
100
150
200
250
300
2 2.25 2.5 2.75 3 3.25 3.5 3.75 4 4.25 4.5 4.75 5 5.25
Birth weight (Kg)
Frequency
Normal and Skewed Distributions
Symmetrical
mean, mode, median
unimodal
Normal and Skewed Distributions
Positively Skewed Negatively skewed
tail tail
Normal and Skewed Distributions
Bimodal
Normal and Skewed Distributions
Mode
Median
Mean
More on
these
summary
measures
in next
session
Binomial Distribution: Percentages
and proportions
• In a survey of attitudes to statistics 6 out of
100 people say they enjoy the subject
• The percentage enjoying statistics is (6/100) x
100 = 6%
• The proportion enjoying statistics is 6/100
= 0.06
• In this session we will sometimes use
percentages and sometimes proportions
Binomial distribution
• Sample numbers with a given
characteristic follow the binomial
distribution
• The shape of this distribution varies with
the population proportion p, and the
size of the sample
• With small samples, the distribution is
symmetrical only if p is 0.5
p=0.9,
N=5
p=0.5,
N=5
p=0.9,
N=15
p=0.5,
N=15
p=0.2,
N=15
p=0.2,
N=5
Binomial approximation to Normal
• As the sample size (n) becomes larger
the shape of the distribution becomes
roughly Normal, whatever the value of p
• A rule of thumb is that you can use the
Normal approximation if both np and n(1-
p) are greater than 5
• e.g. If n=20 and p = 0.3, np=6 and
n(1-p) =14. Since both exceed 5 we can
use the Normal approximation
Binomial approximation to Normal
The Normal approximation can
be used for both confidence
intervals and for hypothesis tests
(covered in session 3)
Thank you!

Presentation 1 data types and distributions1.pptx

  • 1.
    Data Types andDistributions Tonya Esterhuizen Biostatistics Unit, Centre for Evidence Based Health Care
  • 2.
    Introduction to Statistics “Thereare three kinds of lies: lies, damn lies and statistics” Benjamin Disraeli “Statistics are like bikinis. What they reveal is suggestive, but what they conceal is vital.” Aaron Levenstein Quiz Click the Quiz button to edit this quiz
  • 3.
    Statistics - WhatDoes It Mean ? Data Numerical observations Quantitative information
  • 4.
    Statistics - WhatDoes It Mean ? Number of doctors in the different provinces in South Africa Birth weight of babies born at a given hospital in a given year
  • 5.
    The number ofdiabetics in Cape Town who have had an amputation The prevalence rate of HIV per 1000 population in Western Cape The creatine concentration in mg per litre in a 24 h urine sample Statistics - What Does It Mean ?
  • 6.
    Statistics - WhatDoes It Mean ? Discipline or science of managing uncertainties in decision processes
  • 7.
    Statistics - WhatDoes It Mean ? using the scientific methods of collecting processing reducing presenting analysing interpreting data
  • 8.
    Statistics - WhatDoes It Mean ? and of making inferences and drawing conclusions from numerical data
  • 9.
    Main Uses ofStatistical Methods Collection of data in the best possible way Description of the characteristics of a group or situation Analysing data and drawing conclusions
  • 10.
    Collection of datain the best way Using a suitable and appropriate method for selecting subjects for a study, to minimise the role of uncertainty Designing valid data collection instruments such as questionnaires and schedules
  • 11.
    Collection of datain the best way Organising data collection procedures for clinical and laboratory research and epidemiological research to minimise the chances of errors eg standardise definitions and equipment and train data gatherers
  • 12.
    Description of thecharacteristics of a group or situation Data presentation in tables or graphs Calculating summary measures such as averages, which can adequately represent the structure of the data set
  • 13.
    Analysing data and drawingconclusions This involves analytical techniques and the use of probability concepts in drawing conclusions
  • 14.
    Uses of statisticalconcepts and methods in health science Handling of variation Diagnosis of patients ailments and communities’ health problems Prediction of likely outcomes of an intervention programme in a community or of treatment of individual patients
  • 15.
    Uses of statisticalconcepts and methods in health science Selection of appropriate intervention for a patient or a community Public health, health administration and planning Planning, conducting, analysing, interpreting and reporting of medical research
  • 16.
    Handling of variation Variationin a characteristic occurs when its value changes from subject to subject Or from time to time from instrument to instrument within the same subject or from observer to observer
  • 17.
    Handling of variation Requiresappropriate methods to • summarise a characteristic for a group of patients or for a community • decide on the normal or average value of a characteristic • compare two groups of patients with respect to a particular characteristic
  • 18.
    Diagnosis of patientsailments Explicit statistical methods are available for ordering disease categories according to their probabilities of being the correct diagnosis Changing medicine from an art to a science ?
  • 19.
    Prediction of likelyoutcome of treatment Prognosis - An outcome is predicted when the chances of its occurrence are high and the associated uncertainty is low Achieved by keeping records of the characteristics prior to treatment, the treatment and its outcome and analysing them
  • 20.
    Selection of appropriateintervention This is based on • experience gained with similar patients who received the intervention • reports of clinical trials or experiments of the efficacy of different drugs or Rx • objective assessment of previous experience
  • 21.
    Public Health, healthplanning Use of data relating to the health and illness in the population to make a community diagnosis Requires knowledge of:
  • 22.
    Public Health, healthplanning Requires knowledge of: • population characteristics - age, sex • health profile of the population in terms of disease risk factors • factors affecting population dynamics data on births, deaths, migration
  • 23.
    Get a feelfor the data Assess the quality of the data • Types of variables • Summary statistics • Distribution • Graphical representation Descriptive Statistics
  • 24.
    Types of Variables Quantitative Continuous: temp, height, weight Discrete : number of headaches/week Categorical Ordinal : severity of pain Nominal : sex, blood group Binomial : no or yes
  • 25.
    Types of Variables •Influences the type of analysis that is possible with that data • Therefore its important to be able to define your variable types so that the most appropriate statistical tests are chosen.
  • 26.
  • 27.
    Types of DataDistributions • Two of the most common in medical statistics: – Normal (Z) distribution (continuous data) – Binomial distribution (binary categorical data)
  • 28.
    Birth weight (Kg)No of births 1.76 - 2.0 4 2.01 - 2.25 3 2.26 - 2.5 12 2.51 - 2.75 34 2.76 - 3.0 115 3.01 - 3.25 175 3.26 - 3.5 281 3.51 - 3.75 261 3.76 - 4.0 212 4.01 - 4.25 94 4.26 - 4.5 47 4.51 - 4.75 14 4.76 - 5.0 6 5.01 - 5.25 2 Total Births 1260 Normal Distribution: Frequency Distribution Tabulated Limits
  • 29.
    0 50 100 150 200 250 300 2 2.25 2.52.75 3 3.25 3.5 3.75 4 4.25 4.5 4.75 5 5.25 Birth weight (Kg) Frequency Histogram
  • 30.
    Histogram and FrequencyPolygon 0 50 100 150 200 250 300 2 2.25 2.5 2.75 3 3.25 3.5 3.75 4 4.25 4.5 4.75 5 5.25 Birth weight (Kg) Frequency
  • 31.
    Frequency Polygon 0 50 100 150 200 250 300 2 2.252.5 2.75 3 3.25 3.5 3.75 4 4.25 4.5 4.75 5 5.25 Birth weight (Kg) Frequency
  • 32.
    Frequency Polygon 0 50 100 150 200 250 300 2 2.252.5 2.75 3 3.25 3.5 3.75 4 4.25 4.5 4.75 5 5.25 Birth weight (Kg) Frequency 0 50 100 150 200 250 300 2 2.25 2.5 2.75 3 3.25 3.5 3.75 4 4.25 4.5 4.75 5 5.25 Birth weight (Kg) Frequency
  • 33.
    Normal and SkewedDistributions Symmetrical mean, mode, median unimodal
  • 34.
    Normal and SkewedDistributions Positively Skewed Negatively skewed tail tail
  • 35.
    Normal and SkewedDistributions Bimodal
  • 36.
    Normal and SkewedDistributions Mode Median Mean More on these summary measures in next session
  • 37.
    Binomial Distribution: Percentages andproportions • In a survey of attitudes to statistics 6 out of 100 people say they enjoy the subject • The percentage enjoying statistics is (6/100) x 100 = 6% • The proportion enjoying statistics is 6/100 = 0.06 • In this session we will sometimes use percentages and sometimes proportions
  • 38.
    Binomial distribution • Samplenumbers with a given characteristic follow the binomial distribution • The shape of this distribution varies with the population proportion p, and the size of the sample • With small samples, the distribution is symmetrical only if p is 0.5
  • 39.
  • 40.
    Binomial approximation toNormal • As the sample size (n) becomes larger the shape of the distribution becomes roughly Normal, whatever the value of p • A rule of thumb is that you can use the Normal approximation if both np and n(1- p) are greater than 5 • e.g. If n=20 and p = 0.3, np=6 and n(1-p) =14. Since both exceed 5 we can use the Normal approximation
  • 41.
    Binomial approximation toNormal The Normal approximation can be used for both confidence intervals and for hypothesis tests (covered in session 3)
  • 42.

Editor's Notes

  • #2 Statistics can be viewed from several angles. On the one hand they are looked upon as being absolute information about a given data set and on the other they are used to infer information about a general population from a sample of a subset of that general population.
  • #3 Statistics is dependent on data or information. Data are the result of measurement and when individual bits of data are collected they are usually disorganised. Once we have gathered all the desired data we can organise them. Statistics is the mathematical technique by which data are organised, treated and presented for interpretation and evaluation. Evaluation is the philosophical process of determining the worth of the data. Measurement is the process of comparing a value to a standard. For example we compare our own weight (mass x gravity) to the pound or kilogram, every tine we stand on a scale. Precision of measurement is essential. If measurement is not precise, the results cannot be trusted. The data from measurement must also be reproducible - that is a second measurement under the same conditions should produce the same result as the first measurement. Data that are not precise and not consistent are of no value. To be acceptable data must be valid, reliable and objective. Validity refers to the soundness or appropriateness of the test in measuring what it is designed to measure. Validity may be determined by a logical analysis of the measurement or by comparison to another test known to be valid. Reliability is a measure of the consistency of the data, usually determined by the test retest method where the first measure is compared to the second or third measure on the same subjects under the same conditions. Objectivity means that the data are collected without bias by the investigator.
  • #4 There are many different data that we can analyze, such as the frequency of doctors in the various provinces in South Africa, or the birth weight of babies born in a hospital in a year.
  • #5 And there are more examples.
  • #6 Statistics then is the discipline or science of managing uncertainties in decision processes.
  • #7 Statistics involves the following steps. First the data must be collected. Large amounts of data must then be processed. This results in reducing the data to more manageable quantities. These data can then be presented in a visual from. The raw data or summarised data are then analysed and interpreted.
  • #8 From these interpretations we can draw inferences and conclusions.
  • #9 We can summarise these steps into three: Collection of data in the best way Description of the characteristics of a group or situation Analysing data and drawing conclusion
  • #10 In order to collect data in the best way we need to: Use suitable and appropriate methods for selecting subjects for the study to minimise the role of uncertainty (randomisation to reduce bias) If we are using questionnaires, we need to know that these are valid instruments
  • #11 The process of data collection needs to be optimised or standardised to avoid the possibility of errors through different techniques or different interpretations of responses. So definitions and techniques need to be standardised, equipment needs to calibrated and checked against other pieces of equipment and data gatherers have to be trained to do their task in a standard way.
  • #12 Having gathered the data we need to describe the characteristics of the a group or situation. This we can do by Presenting data in tables or graphs Calculating summaries of the data by using for example averages, which can adequately represent the structure of the data set.
  • #13 Analysing data and drawing conclusions involves analytical techniques and the use of probability concepts in drawing conclusions.
  • #14 There are many areas in health science where we use statistical concepts and methods. These include: Handling of variation – by how much does something have to differ to be abnormal? An example is normal range of, for example, white blood cell count Diagnosing patients ailments and communities’ health problems Predicting likely outcomes of an intervention programme or treatment.
  • #15 Selecting an appropriate intervention Public health administration and planning Planning, conducting, analyzing, interpreting and reporting of medical research
  • #16 Lets look at some of these in a bit more detail. Handling of variation. Variation in a characteristic occurs when its value changes from subject to subject from time to time (eg diurnal rhythm) from instrument to instrument within the same subject from observer to observer
  • #17 Handling of variation requires appropriate methods to summarise a characteristic for a group of patients or for a community decide on the normal or average value of a characteristic compare two groups of patients with respect to a particular characteristic Four more definitions that are needed at this point are those of a parameter, a statistic, a population and a sample. A parameter is a characteristic of the entire population. A statistic is a characteristic of a sample that is used to estimate the value of the population parameter. A population is any group of persons, places or things that have at least one common characteristic. These characteristics are specified by the definition of the population. Usually the population of interest is quite large, so large that it would be either practically impossible or financially impractical to measure all of the members. If it is impossible or impractical to measure all members of a population, then we measure a portion, or fraction, of the population, which is called a sample. We assume that the subjects in the sample represent, or have the same characteristics as, the population. Thus data collected on the sample can be generalized to estimate the characteristics of the population.
  • #18 Statistical methods can be used in diagnosis of patients ailments. Specific statistical methods are available for ordering disease categories according to their probabilities of being the correct diagnosis.
  • #19 We can use statistical methods to predict the likely outcome treatment – its prognosis. An outcome is predicted when the chances of its occurrence are high and the associated uncertainty is low. This is achieved by keeping records of the characteristics before treatment, during treatment and the outcome after treatment and analysing them
  • #20 Selection of an appropriate treatment can be based on statistical assessment based on experience gained with similar patients who received the intervention reports of clinical trials or trials on the efficacy of different drugs or treatment objective assessment of previous experience
  • #21 It is also used in Public Health and Health Planning by using data relating to the health and illness in the population to make a community diagnosis. This requires knowledge of
  • #22 Population characteristics such as age and sex, the health profile of the population in terms of disease risk factors and factors affecting population dynamics such as data on births, deaths, migration.
  • #23 Descriptive stats gives you a feel for the data and also allows you to assess the quality of the data. In this section we will be looking at types of variables, summary statistics, distribution and graphical representation.
  • #24 A variable is a characteristic of a person, place or thing that can assume more than one value. A constant is a variable that can assume only one value. Variables can be categorised into quantitative or categorical variables Quantitative variable may be continuous or discrete A continuous variable is one that can theoretically assume any value. Temperature, height and weight are continuous variables. The values of discrete variables are limited to certain numbers, usually whole numbers or integers eg heart rate is counted in whole numbers and not in fractions of numbers, one cannot count half a heartbeat. Quantitative variables can also be considered as being derived from an interval scale – which has equal units or intervals of measurement – that is, the same distance exists between each division of the scale- but there is no absolute zero point, ie it is possible to have a value of less than zero. Because zero does not represent the absence of a value it is not appropriate to say that one point is on the scale is twice, three times or half as large as another point. The centigrade system is an example of an interval scale, 30ºC is 10ºC less than 40ºC and 20ºC is 10ºC less than 30ºC, but 4ºC is not twice as hot as 2ºC. This is because 0ºC does not indicate the complete absence of heat, there is still heat at -20ºC. Or a ratio scale. This scale is based on order, has equal values between scale points, and uses zero to represent the absence of a value. All units are equidistant from each other and proportional or ratio comparisons are appropriate eg all measurements of distance, force and time. Categorical variables can be divided into several groups. Ordinal which is based on an ordinal scale that gives quantitative order to the variables, eg pain scored on a scale of 1 – 5. Nominal which is based on a nominal scale which groups subjects into mutually exclusive categories. There is no qualitative difference between the categories. Subjects are simply classified into one of the categories and then counted. Data grouped this way are sometimes called frequency data. Binary variables are those which can only take on one of two values: present or absent, yes/no etc. Nominal and ordinal scales are called nonparametric because they do not meet the assumption of normality. Interval and ratio scales are classified as parametric.
  • #28 Here we have the birth weights of 1260 children born in a hospital during a year. The birth weights have been broken up into ranges or groups and the frequency or number babies whose weight fell within a given range are listed on the right hand side. The table represents the grouped frequency distribution of the birth weights for the year. A grouped frequency distribution is an ordered listing of a variable x (in this case birth weight) into groups (of weights) with a listing in the second column, the frequency column. The number of groups is normally kept at about 15 but may vary between 10 and 20. When the number of groups is small the frequency for each group may become too large . This tends to obscure data because so many cases are crowded together. Likewise when there are more that 20 groups the groups are spread out so far that some groups may lack cases and the list becomes long and cumbersome. Interval size can be calculated according to the formula i = Range / n where I = interval and n = the number of groups that you wish to have. In this example the range of birth weights is from 5.25 kg to 1.76 kg which is 3.49 kg. To simplify grouping we normally round the number up the nearest odd integer. An odd integer always results in a groups with a whole number as the midpoint. So we round the range to 3.5 kg. Fourteen groups were chosen, which then gives a range for each group of 3.5/14 = 0.25kg. When i is a multiple of 5, the lowest group should contain the lowest score in the data, but the lower limit of the lowest group should be a multiple of 5. Real and Apparent Limits of Groups. A problem arises when we use continuous data and measure scores in fractions of whole numbers, as we have done in this example. A birth weight of 4.1 kg falls into the group with the apparent limits of 4.01 – 4.25. But what are the real limits of this group? The is a 0.01 kg gap between apparent end limit of the one group 4.25 and the apparent starting limit of the next group 4.26. The real limits of each group are the assumed upper and lower weights in each group, created to the degree of accuracy required by the data: the real limits define the true upper and lower limits of the group. To establish thee real limits the gap between the apparent limits is equally divided; values in the upper half belong to the group above and values in the lower half belong to the group below. Thus the lower real limit of the 4.01 – 4.25 group is 4.005 and the upper real limit is 4.254.
  • #29 Another way in which this grouped frequency distribution can be displayed is as a histogram. In this example we have the birth weights on the x axis and the number of births or frequency on the y axis. Note the difference between a histogram and bar graph. In the bar graph presented earlier there was a gap between each of the bar columns. In a histogram there is no gap as the data are continuous.
  • #30 The histogram allows us to develop a frequency polygon. This is a line graph of the of the frequency for each group. To construct it, the frequency of children within each group is plotted against the mid point of each group. In this example, the frequency polygon has been constructed over the histogram. Based on simple geometry, you will see that the area of each white triangular part of the histogram that lies outside the frequency polygon has exactly the same area as the adjacent grey portion within the frequency polygon. The result of this is that the area within the frequency polygon contains the full area of the histogram, in other words, all of the data.
  • #31 If we now remove the histogram we have the frequency polygon, the area of which contains all of the information about the original dataset.
  • #32 If we smooth the lines we now have a curve which shows the distribution of the data. You can see how this is approaching the shape of a normal or Gaussian distribution curve. The frequency polygon gives us a visual picture that permits us to see trends in the data that we may not easily observe when looking at a table.
  • #33 The most widely known curve in statistics is the normal curve seen here. This uniquely shaped curve was first described by Gauss and is sometimes referred to a Gaussian curve or a bell shaped curve. The normal distribution forms the basis of all statistical reference. A normal curve is characterised by symmetrical distribution about the centre of the curve in a special manner. The mean (average), the median (50th percentile) and the mode (the group with the highest frequency) are all located in the middle of the curve. The frequency of scores declines in a predictable manner as the scores deviate further and further from the centre of the curve. Note: all normal curves are symmetrical but not all symmetrical curves are normal. The mode is the group or score with the highest frequency. On a normal curve a single mode is always in the middle. A normal curve is therefore unimodal.
  • #34 Sometimes the data result in a curve that is not normal: that is, the tails of the curve are not symmetrical. When a disproportionate number of the subjects score towards the one end of the scale the curve is skewed. When the hump or mode of the curve is pushed to the left and the tail on the right is longer than the tail on the left. The curve has a positive skew because the long tail points in a positive direction on the x axis. When the long tail is in a negative direction, the curve is negatively skewed.
  • #35 The mode is the group or score with the highest frequency. On a normal curve a single mode is always in the middle. A normal curve is therefore unimodal. Some distributions of data have two or more modes and hence are called bimodal or multimodal. If one mode is higher than the other, the modes are referred to as the major and minor modes. Bimodal curves are not normal curves.
  • #36 With a skewed curve the mode, median and mean are not at the central point. The mode represents the highest frequency, the median, the point with 50% of measurements on either side of it and the mean will tend to be shifted in the direction of the larger measurements. Relationships Among the Mode, Median and Mean. When data are normally distributed the three measures of central tendency all fall at or near the same value. But when the data are skewed, these measures are no longer identical. Note that the highest scores are further from the mode than the lowest scores. The characteristics of a positively skewed curves shifts the median and the mean to the right of the mode. On a positively skewed curve the three measures of central tendency read from left to right, mode, median mean. On a negatively skewed curve the order is reversed, mean, median, mode. Use the mode if only a rough estimate of central tendency is needed and the data are normally distributed. Use the median if (a) the data are on an ordinal scale, (b) the middle score of the group is needed, c) the most typical score is needed or (d) the curve is badly skewed by extreme scores Use the mean if (a) the curve is near normal and the data are of interval or ratio type, (b) all available information from the data is to be considered c) further calculations such as standard deviations or standard scores are to be made.
  • #39 Explain what’s happening here