Chennai Call Girls Service {7857862533 } ❤️VVIP ROCKY Call Girl in Chennai
Health Care Statistics
1. NAME: AMANI OMARI MASENYA
ID: UB17186BHE24958
BACHELOR OF SCIENCE IN
HEALTH CARE ADMINISTRATION
HEALTH CARE STATISTICS
HEALTH CARE STATISTICS (Essay-Paper)
ATLANTIC INTERNATIONAL UNIVERSITY
HONOLULU, HAWAII
February, 2013
2. TABLE OF CONTENTS
COVER PAGE........................................................................................................1
TABLE OF CONTENTS 2-6
1 INTRDUCTION...................................................................................7
What are Statistics?...............................................................................7
2 THE STATISTICAL APPROACH..........................................................7
2.1 The Scientific Technique.................................................................7
2.1.1 Some Faiths Connected with Statistics Laid by Quantitative and
Scientific Model:.................................................................................7-8
2.1.2 The Characteristics of Research Methods that Use Statistics......8-9
2.2 Research Method Basic parts..........................................................9
2.2.1 Question(s) Forming....................................................................9
2.2.2 Literature Review.........................................................................9
2.2.3 Conceptual and Theoretical Frameworks.................................9-10
2.2.4 Hypotheses and Variables..........................................................10
2.2.5 Research Design.......................................................................10
2.2.6 Sample and Population..............................................................10
2.2.7 Data compilation/Collection and Data Analysis............................10
2.2.8 Results and Conclusions......................................................10-11
3 MEASURING, SAMPLING AND ERROR.............................................11
3.1 Population...................................................................................11
3.2 Sample........................................................................................11
3.3 Cases.....................................................................................11-12
3.4 Statistical and Real Population.................................................12-13
3.5 Measurement Scales....................................................................13
3.5.1 The nominal Scale.....................................................................13
3.5.2 Ordinal Measurement Scale...................................................13-14
2
3. 3.5.3 Ratio or Interval Measurement Scales.........................................14
3.6 Errors..........................................................................................14
3.6.1 Measurement Error...............................................................14-15
3.6.2 Consistency..............................................................................15
3.6.3 Design Errors.........................................................................15-16
3.6.4 Sampling Error..........................................................................16
4 QUESTIONNAIRES..........................................................................16
4.1 Questionnaire Design..............................................................16-17
4.2 Sample Questionnaire.............................................................17-19
4.3 Scales from the Sample Questionnaire..........................................20
5 THE STUDIES.................................................................................21
5.1 The questionnaire Study...............................................................21
5.2 The Questionnaire..................................................................22-25
5.3 The Clinical Trial...........................................................................25
6 DESCRIPTIVE STATISTICS.........................................................26-27
6.1 Measures of Central Tendency................................................27-29
6.1.1 Mean (Arithmetic)......................................................................29
6.1.2 Median......................................................................................29
6.1.3 Mode....................................................................................29-30
6.2 Choosing a Measure of Central Tendency.................................30-31
6.3 Measures of Dispersion................................................................31
6.3.1 Range..................................................................................31-32
6.3.2 Inter-quartile Range...................................................................32
6.4 The Standard Deviation and Variance.......................................32-34
7 DISPLAYING DATA.....................................................................34-36
7.1 Table Types..................................................................................36
3
4. 7.2 Types of Graph.............................................................................37
7.2.1 Frequency Charts......................................................................37
7.2.2 Pie Charts.................................................................................37
7.2.3 Bar Charts for Summary Statistical Information...........................38
7.2.4 Scatter Graph (Scatter Plot)........................................................38
7.2.5 Line Graphs...............................................................................38
8 HYPOTHESIS..................................................................................39
8.1 Forming the Hypothesis for the Experiment / Study........................40
8.1.1 Experimental Approaches..........................................................40
8.1.2 Non Experimental Methods or Quasi-experiments.......................41
8.1.3 Before-After Study................................................................41-42
8.2 Variables......................................................................................42
8.2.1 Dependent Variable....................................................................42
8.2.2 Independent Variable.................................................................42
8.2.3 Other Variables.....................................................................42-43
8.3 Errors and Statistics.....................................................................43
8.4 The statistical Hypothesis.............................................................43
8.5 Types of Interaction between Variables..........................................43
9 DISTRIBUTIONS AND PROBABILITIES.............................................44
9.1 Frequency Histograms..................................................................44
9.2 Probability and Statistics: The Link within Probabilities and
Distributions........................................................................................45
9.3 The Normal Distribution Curve.................................................45-46
9.4 Making a Prediction......................................................................47
9.5 Deviations from the Normal Distribution........................................47
9.6 Random and Clumped Distributions.........................................48-49
4
5. 10 ERRORS AND ANOVA..................................................................49
10.1 ERROS AND ANOVA (Statistical Errors)....................................49
10.2 The t-Tests, Errors and ANOVAs.............................................49-50
10.3 The ANOVA (Analysis of Variance)...............................................50
10.3.1 When to Use a One-Way ANOVA........................................51-56
10.4 Contrasting the Means................................................................57
10.4.1 Tukey Test..........................................................................57-58
10.5 Independence.............................................................................58
10.6 Repeat Measure Design of ANOVA..........................................58-60
11 TESTS FOR ASSOCIATION.......................................................60-61
11.1 Chi-square Test: One-way......................................................61-64
11.1.1 Restrictions on the Use of the Chi-square.................................64
11.1.2 Independence.....................................................................64-65
11.2 Chi-square Test for Association: Two-way....................................65
BIBLIOGRAPHY...................................................................................65
FIGURES
Figure 1: Indices of depression recorded for a range of different patient types (Acklin
and Bernat 1987)
Figure 2: Diagram of an experimental study
Figure 3: Diagram of a Quasi-experimental design with two groups
Figure 4: Example of a Histogram
Figure 5:The curve is bell-shaped and symmetrical
Figure 6:The Standard Deviation
Figure 7: Symmetrical and skewed distributions
Figure 8: Different degrees of Kurtosis in frequency distribution
TABLES
Table 1: A range of different variables and their likely sampling units
5
6. Table 2: A patient’s responses when asked to consider day surgery (questions asked
pre-operatively)
Table 3: The example of data from questionnaire
Table 4: Time spent by an individual at the gym during July
Table 5 : Choosing a Measure of Central Tendency
Table 6: Age at first sexual experience (with another person)
Table 7: Central tendency and measure of Dispersion
Table 8: Acklin and Bernat (1987) data for the two indices of depression measures in
patients with a range of conditions
Table 9: Frequency of different ethnic groups of a sample of 178 Individuals
interviewed in north-east London
Table 10: Summary examination results for a group of 122 first-year students’
nurses
Table 11: Types of graphs Summary
Table 12: Duration of labor during Primp women aged between twenty-two at three
different levels of fitness
Table 13: Variation between and within Groups
Table 14: Difference walked by patients (m) with impaired hip mobility, following
various treatment regimes
Table 15: Statistical summary of the data from Table 12.3
Table 16:
Table 17:
Table 18: Comparing Means after an ANOVA test
Table 19: Diastolic blood pressure of four men aged between fifty-six and fifty-eight
at the start of and during a prescribed exercise regime
Table 20:
Table 21: The ethnicity of individuals in a cohort entering nurse education
Table 22: Observed and expected frequencies of four ethnic in a classroom of
student nurses
Table 23: A 2 x 3 contingency table showing the incidence of MDR TB inrelation to
three East European Countries
1 INTRDUCTION
What are Statistics?
6
7. Statistics is a word obtained from the Latin status, denoting state, and historically
submits to as the facts and figures that prove relations of countries or states
demography (Bhattacharyya and Johnson 1977). However, the statistical approach
engages describing phenomenon in terms of numerals and then utilizing the numerals
to either entail or infer effect and cause since they are the research tool key for
quantitative-researchers.
Statistics today are utilized in a whole studies and investigations diversity as well as to
summarize and describe data as of studies when those data are gathered in the
numerals shape, and also they are utilized to look for examples and to discover the
likelihood of surveillances having happened by chances, and so they are of course a
very important instrument that strengthen every quantitative research (number based).
2 THE STATISTICAL APPROACH
2.1 The Scientific Technique
The procedure of facts amassing methodically is a source of a concept for evidence
based practice, since mainly the practice knowledge it predicate up-on the trust that the
world and its population may be objectively sighted and forecast concerning obsessions
that may either be confirmed or refuted. Consequently, to have a sight concerning the
knowledge of how is generated and experimented is in general within sharing of the
world populace which known as a world paradigm or view.
2.1.1 Some Faiths Connected with Statistics Laid by Quantitative and Scientific
Model:
Lucid positivism is as well found on the 18 century theoretical sympathetic of Hume
(1888), that supported the facts can be obtained during people and things cautious
surveillance, their behaviors, environments, customs, and by studying physical
substance, for instance chemicals, substances, e.t.c., and find out how they act
(Kerlinger 1986). Of course these sound positivists were distrustful of obsessions that
couldn’t provide themselves to be heard or observed, like emotions and feelings (Burns
2000).
Logical positivism which supported that knowledge may be obtained during careful
observation of people, of things, their behaviors, customs, and environments and by
7
8. physically matter observation, for instance, chemicals and other substances. Thus they
were distrustful of things that might not provide themselves to be heard or observed, like
emotions or feelings. The physical world that is governed by laws may be standing for
universal laws, that if employed they may envisage results by a procedure of
hypothesizing, experimenting, corroborating or disproving and utilized for environment
management, for instance, the thermodynamics laws (Scott, 2006).
Nevertheless, experiments utilize things observation to decide something else
consequence in a managed conditions so as every factor that might influence the
research conclusion are managed or described for. However a control-group to which
naught has been done is required and then the surveillances are made of experimental
group-behaviour for something has been done. There after observations for both groups
are made and then contrasted and the consequence inferred, that is frequently known
as deductive approach.
The observations may then be analysed, organized coded and ranked, which means
decreased to their least units mathematically making it probable to forecast data results,
which may capable of then mathematically analyzed (Scott, 2006).
2.1.2 The Characteristics of Research Methods that Use Statistics
There a lot of different character in research methods though not every of them will be
seen in each study, since statistics-based studies will in general try to manage the
factors influence i.e. variables which are un-significant in a real study they might bias
the outcomes.
Accordingly, statistics-based studies tend to engage the empirical evidence collection
for a hypothesis refute. Generally, when we employ statistics, we attempt to create
outcomes that are universally, which means the sample outcomes are valid on the
whole individual’s population that is attention. Hence, the connection between the
population and the sample is of numerous statistical tests focus.
The Uses of Statistical Approach within Research:
To describe variables and their relationships.
To assist searching and discover the relationships nature amid variables.
8
9. To assist searching and discover the disparities flanked by samples and
populations.
To assist examine the probability role in giving rise to measurements.
To assist explicate relationships flanked by data sets.
To forecast the relationships causes amid phenomena.
Controlling for (or accounting of) variables.
2.2 Research Method Basic parts
2.2.1 Question(s) Forming
In forming questions it is vital to identify the phenomenon or problem of interest, i.e.
deciding the study main shape, the inhabitants to be studied and questions to be looked
at. On our own contemplations through searching literature and sketching, we have to
choose which techniques are going to be employed in data gathering. Up to this phase
a pilot-study (small) can be performed that will let possible troubles to be clearly and
highlighted, if found any trouble there will require revisiting the addressed questions.
2.2.2 Literature Review
There a huge literature body about numerous healthcare areas thus very necessary for
each latest job to be in a setting of every prior or simultaneous job and should be
reviewed such that we might study from that work before continuing, because literature
review will presumably commencing the instant of an idea for a research as is imagined
and the study will carry on all through.
2.2.3 Conceptual and Theoretical Frameworks
Numerous investigation areas have different frameworks and ideas on which the new
knowledge evolution is based, for that reason there are dissimilar ways of sighting
problems thus it is imperative to be conscious for that meticulous study that we have to
work on. Probable that a sociologist or biologist might utilize dissimilar ideas and attach
varying importance levels to dissimilar data types, still both of these academic
regulations might employ statistics a great-deal view the world.
2.2.4 Hypotheses and Variables
9
10. A variable is a phenomenon i.e. thing, that varies. Within different studies there is a
hypothesis, a forecast or sequence of forecasts created from a theory that are under
test by evaluating pertinent variables and examine their narration and their inhabitants
where they come.
2.2.5 Research Design
Design gives guidelines how to perform the research as well as they direct the sampling
method, and how to collect and analyze the data. However the main intention of the
research-design is not only to reduce all the possible error sources but also should
struggle to guarantee that every under test hypotheses are really tested, this means, the
research-design ought to let the research aims to be met and must also considering the
ethical working implications.
2.2.6 Sample and Population
The population is make-up of all the objects or persons that might possibly take
measurements from whereby the sample stand for those objects or individuals i.e. given
the resources and constraints, that we measure from, thus, we normally map to get a
sample that envoy the population to be studied, which is subject to the research design
that has permitted by a suitable ethics-committee.
2.2.7 Data compilation/Collection and Data Analysis
Data are collected by means of the most appropriate technique or instrument, however,
data describing and summarizing, and statistical tests are the process of performing
data analysis, and there are numerous packages for computer based statistics
obtainable to assist.
2.2.8 Results and Conclusions
After the data have already analyzed, there is a need to make decision what the
consequences are proposing, whether or not any hypothesis under test has been
verified or refused, so that they have to be narrated to those from prior studies as well
as requires to be narrated to an existing theory body, also by considering whether we
have moral duty to converse the findings of the study.
3 MEASURING, SAMPLING AND ERROR
10
11. 3.1 Population
A population is made-up of all persons or objects or phenomena that we might
potentially count or measure as fraction of the study, for example, when we are studying
the reasons why nurses in Tanzania left the profession early, our population will be
made-up of all the nurses in Tanzania who had left the profession early. Similarly, as
when studying the causes of why nurses in a certain hospital left the profession early,
our population will be those nurses who left early in that particular hospital. It is as well
imperative to consider that the population that we are intending to might not precisely be
the same as the one we end-up sampling from since some of the persons in the
population-of-interest might decline to be included in the study thus there can be a
difference between the target-population and the actual-population.
3.2 Sample
A sample is made-up of a proportion of objects or persons from the total population.
Although, in most case because of the resource scarcity, it is unlikely that we might be
able to gather information from all the population, so instead we must take a sample.
3.3 Cases
Every sample is made-up of the objects or individuals under study that may be referred
to by a diversity of names such as sampling units. Apiece individual or object we would
measure some variable/variables which may be physical (e.g. blood-pressure), they
may represent feelings or thoughts (e.g. anxiety) or represent events in the life of
individual (e.g. number of visits to a health clinic). Every object or individual we take a
measurement, it is recognized as a case, so it is very important to take into account that
all the statistical variables are measurable; otherwise the variable can’t deal with
statistics. For every case the measurement of each case may be referred to as a value
and so the collection of values is named as the data.
For each case we ought to state: the measurement, the variable being measured, the
sampling unit, the actual sample and the population being studied, for instance, when
we are going to studying hypertension in populations living near ironworks, 80 mm Hg
will be measurement, will be diastolic blood pressure will be the variable , and the
individuals who are in the ironworks vicinity will be the sampling unit and the sample
11
12. will be made up of those individuals from whom measurements were taken. However,
the real population is thus defined as all individuals willing and able to be measured
living within the ironwork vicinity.
3.4 Statistical and Real Population
The defined population includes the word willing and able that defines clearly we
cannot take measurements from those who doesn’t comply the consent to take part in
the study (i.e. from the example above, there is a slight disparity between population
defined and the one that originally we intended to job on.
There could be a dissimilarity from the population from whom the measurements were
sampled, frequently named the statistical population, and the real population or
biological population, that is, all those in the ironwork vicinity. Therefore the
researcher have to beware that not all persons in a population are obtainable to become
sampling units, as far as in some instances, what comprises the sampling unit and the
variables being measured is not yet clear.
Table 1: A range of different variables and their likely sampling units
Variable Sample Sampling unit
Occurrences of MRSA on a ward Wards which record occurrences
of MRSA that you collected data
from
Ward
Length of stay in Hospital Hospitals that record length of
stay that you collected data from
Hospital
Purse rate Individuals that you record pulse
rate from
Individual
Gender Individuals whose gender you
recorded
Individual
Hospital position in league table Hospital uses to formulate league
table
Hospital
Number of live birth Units of area from which records
of live birth have been collected
Unit of area
If looking at the example in table 3.1, we find the range of variables that we can use is
fairly wide, also we notice that some variables can be expressed are noticed as
numbers on a scale, at the same time as others such as gender which are discrete. This
difference is imperative thus before identifying a suitable statistical test we have to
identify the variable type and the scale measurement type.
3.5 Measurement Scales
12
13. In statistics there are recognized four types of measurement scales for variables, which
are: (i) nominal scale (ii) ordinal scale (iii) interval scale and (iv) ratio scale.
3.5.1 The nominal Scale
On the scale hierarchy the nominal scale is the lowest. Statisticians It has referred to a
scales hierarchy by statisticians for the reason that every scale further up the order
having the features of those of before. In nominal scale the data take place in named
groups and are classified into the groups as well that are mutually exclusive; an item of
data can’t be in more than one group. A good example of a nominal variable would be
ethnic group or home town. Some nominal scaled variables have only two categories
which are known dichotomous variables and some have more.
Example: Gender with Male (0) and Female (1)
3.5.2 Ordinal Measurement Scale
Data of ordinal-scale are similar to nominal-scale but the names of the groups contain
an idea of position or rank, for example the grades of staff nurses within a health service
by assessing the names of the groups, like staff-nurse, charge-nurse and matron
conveying an order or position. The position, however, do not suggest how much higher
or lower the positions are. The scale is utilized to arrange the measures from lowest to
highest, but we would not say that a charge-nurse is two times as high as a staff-nurse.
Ordinal-scales can be expressed in terms of names, as numbers, i.e. 1st and 2nd in a
race and as letters as in the UK system for grading nurses, e.g. A-I.
Remember, a study may use more than one measurement scale. We are supposed to
practice identifying appropriate scales because when we look at every statistical test, it
is imperative to consider that it is the scale that the measurement is made on which
largely decides the type of statistical test type to be applied.
Following are the examples of ordinal scaled variables:
Variable: Frequency
Never (0) Rarely (1) Sometimes (2) Frequently (4) Always (5)
Variable: Satisfaction
Dissatisfied (1) Satisfied (2) Very Satisfied (3)
Variable: Performance Assessment Attainment
Goal not attained (1) Goal Attained (2) Goal Exceeded (3)
3.5.3 Ratio or Interval Measurement Scales
13
14. For the Ratio or interval scales positions are used except in these the distance between
the positions is determined and the distance between the scales is determined, and the
points between the positions could be sub-divided. Ratio-scales have a true zero point
although are alike to interval-scales. For instance, we recorded temperature we would
know that the difference between 37°C and 38°C is 1°C, as a result the scale is interval,
but the centigrade scale doesn’t have a true zero point, therefore it is impossible to tell
that 20°C is two times as hot as 10°C. A ratio-scale example is weight; it does have a
true zero, thus many biological phenomena measures are performed on either the
interval or the ratio scale.
3.6 Errors
When conducting a study, error might happen in many different ways and so statistics
can be utilized for accounting or assessing some of this error but not all. Start looking at
three kinds of errors that we need to think on while conducting our research or thinking
critically about studies of other researchers. All studies associated with errors, thus the
more error that is present the less the researcher can rely on the study.
3.6.1 Measurement Error
Measurement error happens when we are measuring things since most of the
instruments we require to utilize measuring things with are not hundred per cent
accurate, although we are considering something as easy as recording the number of
patients/clients who turn-up at a GP surgery, or taking a measurement using a ruler,
errors of measurement will happen. Therefore, this realize that, to some degree error
associated with the use of apparatus and instruments to measure phenomena is easier
to consider than error associated with methods such as observation used to record
human behaviors.
Measurement errors happen in most studies even when the variable is simply a count of
something thus often counters make errors. These measurement errors might be
random or systematic, whereby a random error happens by chance and there is an
equivalent chance that it will be either lower or higher than the true value, however, they
are not significant if they are adequately small than the overall variation in the data. On
the other hand, systematic errors are those that are consistently either higher or smaller
14
15. than the true value thus possible factors that lead to systematic errors have to be
considered at the designing stage.
To deal with measurement errors we need to be aware of choose a measurement tool
that gives you the greatest level of accuracy, although you will need to consider the
resources available to you. For example, high accuracy may involve spending more
money or time. If we are using surveyors or interviewers we have to make sure that they
have to be trained of how to use the data collection tools, (i.e. the questionnaire).
3.6.2 Consistency
We are as well required to be conscious that some errors will happen due to
measurement tools un-consistent, that is, for a person item under measurement an
instrument, like a peak flow meter, will not all the time consistently make the same
measurement therefore, instruments are required to be calibrated before they are
utilized. It is recommended to measure how consistent the instruments are between
measurements, and between the persons using the instruments. Then we can be
conscious of the error when discussing our outcomes.
3.6.3 Design Error
If the design of the experiment is flawed the design error is likely to occur, also there are
of course varying degrees to which the study can be flawed and certainly most studies
are flawed in some way. If the design error is large, however, the study will need to be
abandoned therefore, it is so significant to get the right design. It is imperative to be
conscious of some of the errors you are likely to come across by avoiding most
common error which is that, the sample is not drawn truly randomly from the population
and thus the conclusions drawn are not appropriate for the population as a whole.
3.6.4 Sampling Error
Many statistical techniques try to take account of sampling error which is the difference
between the sample and the population. If we take a sample it is unlikely that the
measure taken from the person of the sample will precisely match that of the population.
When we look to see how representative a sample is of the population or whether or not
two samples come from the same population we need to take account of sampling error.
15
16. Sampling error will increase the more variation within the sample you are measuring,
and will reduce as the sample size increases.
The relationship between sample size, sample variation and sample error is imperative
since one question we are bound to ask is how large should my sample be? The
answer is, it depends on the variation within the sample. If we were working with the
variable height, using a population of basketball players, the variation in height of this
population would be very small, and thus the sample size that we would need would be
small. If, however, we are working on variable height across the global population the
variation would be large and a larger sample would be needed.
4 QUESTIONNAIRES
Questionnaires are an approach that is recognized and adopted by numerous studies
that are related to health. However, the subject of questionnaires and research design is
enormous thus can’t be addressed in detail in some minutes. Questionnaires could be
utilized to give additional information from people about a particular subject matter. The
questionnaire describes the uses of closed questions with the range of answers
determined by the researcher that lends itself to statistical analysis. As once a
questionnaire is constructed, it is as a rule referred to as a tool or an instrument.
4.1 Questionnaire Design
The term questionnaire usually means a form containing a set of predetermined
questions used for gathering information (data) from and about people as part of a
survey (survey is used to describe a research approach that attempts to cover as wide a
range of the population as possible for acquiring information about a subject).
Actually, the use of questionnaires requires a great deal of time and effort in terms of
careful planning, ordering and sequencing of the questions and the responses in order
to obtain relevant data and needs to address the questions of design early in the study.
The main questions to ask when thinking about designing a questionnaire are:
What is it that I wish to find out, otherwise, is it to do with knowledge or attitudes or
levels of understanding, or is it about behaviour or activities or decision making and for
what is it that I wish to find out?
16
17. In undertaking a questionnaire based study, questionnaires may be utilized in different
kinds of research, e.g. descriptive, attitudinal and comparative and there are a lot of
dissimilar kinds of questionnaire and will depend on the research question being
inquired. Questionnaires may be utilized to produce both qualitative and quantitative
data as well as a means to describing a population, to investigate cause and effect, and
to monitor change over time. Frequently, questionnaires attempt to describe a
population’s behaviour, its attitude, and view with regard to a certain topic, sympathetic
of an issue or level of sympathetic. A survey frequently uses questionnaires in order to
get information, and sometimes an interviewer may also be used, as far subjects,
respondents or participants are often terms of people taking part in the survey.
On these phenomena questionnaires help to put numerical indicators when is used as a
quantitative tool because they are a good method of data collection, although such data
might tend to be superficial, as there is no room for extracting or probing the responses
meaning. However, questionnaires are a cost efficient way of collecting large data
quantities in a short period of time, and if the questionnaire is properly structured large
quantities of data can collected and subjected to statistical analysis, and the methods
that questionnaires can be administered comprises not merely face to face interview but
as well World Wide Web, telephone, mail and e-mail.
4.2 Sample Questionnaire
I will describe a questionnaire that describes the behaviour of a peoples’ group. Let us
say, for instance, I wanted to be acquainted with more concerning the epidemiology of
HIV and AIDS in a certain village. It may be useful to know more on the general sexual-
health of the population in that village. Safe-sex is much talked about in the media, and
only way of keeping-safe is to utilize the contraception barrier methods and the
government is spending a large amount of money on health promotion programmes to
educate the general population about safe-sex.
**Now, how many are there in the village sexually active people? Do all sexually active
people are using a barrier method of contraception in practicing safe-sex such as
condoms in the Village?
However, surveys have to be used to give an indication of how many and a
questionnaire has to then be planed to distribute to sexually active adults so as to
17
18. discover what brands of sexual guard (i.e. femidom, condom or dental dam) are utilized.
This type of survey will try to discover when those individuals utilized protection, when
they were most possible and least possible to utilize protection, and how much cash
was spent on sexual protection per week/month. There might be helpful information in a
health promotion context:
Whether running a safe-sex campaign within a family planning Village Clinic.
Or in a genito-urinary Village Clinic.
When recognizing specific health promotion needs used in recognizing trends in
the behaviour of client/patient.
When considering issues on social policy relating to such health issues as
national trends in the development of HIV-AIDS.
At first, I must be targeted the population that is to be sampled, for instance, permission
from those adults over the eighteen years of age. This means I am not going to get any
information concerning those under eighteen, as they are not part of the targeted
population.
The questionnaire example
This survey is attempting to find out about the use of barrier methods of contraception
as protection. Your answers will be treated in confidence and will help us plan health
care services. Please indicate your responses by placing a cross in the box next to the
answer you think best represents your answer:
(1) Strongly agree (2) agree (3) not sure (4) disagree (5) strongly disagree
Q1. I use the following types of barrier protection?
Always Never
1 2 3 4 5
None
Condom
Femidom
Dental dam
Cap (dutch cap)
Q2. When are you most likely to use sexual protection?
Agree Disagree
18
19. 1 2 3 4 5
I never use barrier methods of sexual protection
I sometimes use protection when I remember
I use a condom every time I have anal penetrative sex
To avoid pregnancy
Allergic to latex
To avoid getting a sexually transmitted disease
Q3. Please indicate your circumstances: which of the following categories applies to
you? Tick those that do.
Single
Married or cohabiting
Living with spouse
Living with spouse
I have many sexual partners
I have one sexual partner
Thank you for completing this questionnaire.
Please return it in the enclosed pre-paid envelope.
Your responses will be treated in confidence.
4.3 Scales from the Sample Questionnaire
When asking question to a respondents we rather frequently provide them a range of
choices so that they can indicate where their answer lies according to a scale. The most
commonly scale utilized is the Likert-scale, which determines the level to which a
person agrees or disagrees with the inquiry and this scale is 1 to 5 of which frequently
this scale might be (1) strongly disagree, (2) disagree, (3) not sure, (4) agree and (5)
strongly agree. One common problem with using a scale like this which is based on an
odd (5) number of choices is that, it all the time gives the choice to opt the middle point,
thus often it’s easier for the subjects to take this easy choice than to struggle to make a
decision.
Another type of scale is the Thurstone scale that uses only two points: agree and
disagree, it is customary to inquire numerous related questions that could be utilized to
19
20. produce an overall score for an individual respondent that will be compared with that for
the sample as a whole, or utilized so that differing populations are compared. A common
example of the use of Thurstone scales is in psychometric tests that test aptitude and
attitude, like a test as part of a job interview. However an alternative approach is to
utilize semantic differential scales, which are good for investigating phenomena such as
attitude and values, as far as are based on opposite points of view, or potential
emotions concerning a subject or concept.
Contrary to that, for example, when investigating people’s job environment, we ask
questions concerning how they felt about aspects of the environment, and the response
scale may range from helpful, nurturing, happy to unsupportive, blocking, toxic,
dysfunctional.
The concept (in table 4.1) below is day surgery; patients are asked to indicate where on
the scale they lie. You would maybe want to ask more questions to get a good overview
of an individual’s impressions of day surgery. Note that there is no consistent negative
end of the scale. This helps to persuade the respondents to think about their answers.
Table 2: A patient’s responses when asked to consider day surgery (questions
asked pre-operatively)
Exciting 1 2 3 4 5 6 Boring
Frightenin
g
1 2 3 4 5 6 Calming
Useful 1 2 3 4 5 6 Useless
Fast 1 2 3 4 5 6 Slow
5 THE STUDIES
The Studies provide the background and data from two hypothetical studies, one from a
clinical trial and the other from a questionnaire that enables us to complete a basic
analysis of data sets and be able to draw some conclusions and be in a better position
to analyze critically other studies that will have been devised to be relevant to modern
20
21. health care as well as to stimulate our interest and enjoyment of data analysis (the
following is the questionnaire example).
5.1 The questionnaire Study
The following questionnaire is an example, concerns the sexual health of individuals
who presented for advice at a walk-in clinic in central Kigoma. Here we show the
extended version, although for the sake of simplicity we have reduced the extent of
some of the questions. The aim of this study is to provide basic information as to the
sexual behaviour of individuals of differing sex, age and ethnic group who used the
walk-in clinic, and was initiated because the clinical leader considered that a large
proportion of the clients were presenting with symptoms that related to sexual heath and
he was considering putting in a bid for funds to support the employment of a specialist
in this area, consequently, it is largely descriptive and exploratory in nature. However,
the population of the study is all those individuals who could potentially use the walk-in
centre and so the sample is made up of those that actually did enter the clinic and
complete the questionnaire.
5.2 The Questionnaire
This survey is attempting to find out about the use of the walk-in clinic in relation to
sexual health. Your answers will be treated in confidence and will help us plan health
care services. Please indicate your responses by placing a cross in the box next to the
answer you think best represents your answer.
1. What made you attend the walk-in centre today?
Agree Disagree
i. Location
ii. Availability
iii. Access to medical staff
iv. Access to nursing staff
v. Emergency treatment required
21
22. 2. What symptoms are you experiencing? Please tick the one you feel best describes
your symptoms
i. Headache
ii. Chills/shakes
iii. Temperature
iv. Lack of appetite
v. Feeling generally unwell all over
vi. Cough
vii. Pain/soreness in chest
viii. Faintness
ix. Collapse
x. Pain or difficulty passing urine
xi. Discharge from penis
xii. Discharge from vagina
3. Which of the following descriptions best represents your sex life?
(Please tick.)
i. Very active (I have sex more than five times per week)
ii. Active (I have sex between once and five times per week)
iii. Not very active (I have sex less than once per week)
iv. Non-existent
4. Please indicate how many sexual partners you have shared sex with in the past
month.
5. My most frequent choice of barrier protection is: (Please tick one)
i. None
ii. Condom
iii. Femidom
iv. Dental dam
v. Cap (Dutch cap)
6. When are you most likely to use sexual protection?
Agree / Disagree
i. I never use barrier methods of sexual protection
22
23. ii. I sometimes use protection when I remember
iii. I use protection (put a condom on) if I do not know my sexual partner very well
iv. I use protection (put a condom on) when I think I am going to climax
v. I use a condom every time I have oral sex
vi. I use a condom every time I have vaginal penetrative sex
vii. I use a condom every time I have anal penetrative sex
viii. To avoid pregnancy
ix. To avoid getting a sexually transmitted disease
7. What puts you off using sexual protection?
Agree / Disagree
i. Not very comfortable
ii. Loss of sensation/cannot feel anything
iii. Too fiddly to have to open packets
iv. Cost
v. By the time the packet is open the moment has gone
vi. Need to use a lubricant as well
vii. Have to plan sex in advance
viii. Allergic to latex
8. How much money do you spend, on average, on sexual protection per week? Please
indicate which category applies to you, using a tick:
i. £0
ii. £0–£5
iii. £5–£10
iv. £10–15
v. I get them free
9. My age group is: Please indicate which category applies to you using a tick.
16–24 25–34 35–44 45–54 55–64 65–74 75–100
11. I consider my sexual orientation to be: (Please tick one)
i. Heterosexual male
ii. Heterosexual female
23
24. iii. Gay (homosexual) male
iv. Gay (lesbian) femal
v. Bisexual male
vi. Bisexual female
vii. Asexual
12. I consider my ethnic background to be (Please tick):
i. African
ii. Asian
iii. European
**The questionnaire has been simplified for the purposes of my assignment.
During data entering into charts/computers it is frequently easier to employ a line for
every subject and enter values across the rows for the variables measured, however, all
through codes shall be utilized to make simpler the data entry.
In Table 5.1 the following codes have utilized: M for gender male and F for female; A
code from 1 (very active) to 4 (non-existent) for sexual activity; A codes 1–5 for barrier
choice for the participant selected most used protection and 1 is the first response
possible (none) and 5 the last (Cap).
For question 7 we have used a similar system to that used for barrier choice where (i) is
the first option and (vii) the last. For the presentation type we have also used the same
system as for questions 5 and 7, with 1 equivalent to headache and 12 to discharge
from vagina.
5.3 The Clinical Trial
This experimental trial looks at the ability of the novel drug Symphadiol (as Placebo) to
help increase weight loss in individuals who are trying to lose weight using a calorie-
controlled diet. This clinical trial is being organized by Spinto, a drug company, which is
active in the field that has recruited individuals to take part in their study using a network
of dieters’ groups, who were invited to take part by Spinto’s clinical trials specialist nurse
to conduct the study, with the support of local GPs and Spinto’s dietician. The aim of the
experiment is to test the hypothesis that a daily dose of Symphadiol enhances weight
24
25. loss, in clinically obese individuals, compared with just using a calorie-controlled diet. It
was decided to select men between the ages of twenty two and forty.
The aim of the experiment is to test the hypothesis that a daily dose of Symphadiol
enhances weight loss, in clinically obese individuals, compared with just using a calorie-
controlled diet. It was decided to select men between the ages of twenty two and forty
for the study. It was also decided to look at the impact of exercise in conjunction with
Symphadiol. The population in this study is all healthy obese male individuals who are
sufficiently motivated to lose weight to join a diet network and they must not be taking
any medication, except that required for minor ailments.
Table 3: The example of data from questionnaire
Questions
13 12 3 4 5 7 2 9
Individu
al
Se
x
Ethnic
group
Sexual
activity
Partner
s
Barrier
s
choice
Reason
not
Presentatio
n Age
1 F E 3 1 1 1 1 22
2 F E 2 2 2 5 7 28
3 F E 1 8 2 7 4 21
4 F AS 2 1 2 1 9 26
5 F AS 4 0 0 5 12 27
6 F AS 2 1 2 7 6 30
7 F AS 4 0 0 5 5 29
8 F E 2 2 1 5 12 23
9 F E 4 0 0 3 10 21
10 M E 2 1 3 2 9 32
11 M E 1 1 5 2 3 18
12 M A 1 1 1 7 10 24
13 M A 1 3 5 6 12 23
13 M AS 2 0 2 1 7 23
14 M AS 1 4 0 3 12 20
15 M A 3 0 0 5 1 36
25
26. 16 M E 4 4 2 5 2 20
17 F E 4 1 3 1 5 23
18 M E 1 0 2 3 3 24
19 M E 3 2 1 3 3 20
20 F A 2 0 2 1 12 36
6 DESCRIPTIVE STATISTICS
Descriptive statistics are utilized in healthcare for numerous aspects, mostly to describe
and summarize detailed data collected in a manner that can be interpreted quickly and
easily in a study, and actually, they are most recognized kind of statistic that are
probable encountered. They are utilized to administer, to watch and to evaluate services
for health and the persons who job in a health facility/ organization. Consequently, to
understand descriptive statistics is very significant for individuals who job in the
healthcare surroundings.
Usually descriptive statistics have two kinds:
(1) Central tendency measures (typical value), the mean, the mode and the median,
(2) Variability measures concerning the typical value (dispersion measures), which are
range, inter-quartile range, variance and standard deviation.
6.1 Measures of Central Tendency
A central tendency measure is a solitary value that tries to explain the data set by
recognizing the position of the centre within that data set, and is sometimes known as
central location measures as well they are classed as summary-statistics. The mean
which frequently known the average, that is most probable the central tendency
measure that we are most recognizable with, also there are median and the mode
6.1.1 Mean (Arithmetic)
Mathematics mean are usually known as the average. In calculating the average time
spent in a sports ground in a month, a scientist has to take the times spent during every
visit, adding them together and dividing by the number of visits. Table 1 below, for
example, shows the time that Amani spent in the sports ground throughout a month of
January. Hence the mean (average) time length used-up in the sports ground is 48.16
min. Yet, 48.16 min score falls roughly in the centre, of the given data, with six visit
26
27. times greater and with six visit times less than 48.16. But this case with the mean
doesn’t always appear.
Calculations:
Table 4: Time spent by an individual at the gym during July
Visit Time spent
(minutes)
1 62
2 34
3 50
4 40
5 58
6 48
7 38
8 60
9 58
10 45
11 53
12 32
To calculate the mean:
Mean = Minutes spent in each visit
Total number of visits
= 32 +34 +38+40 +45 +48 +50 +53 +58 +58 +60 +62
12
= 578
Mean = 48.16
Since nearly all statistics occupies computation that are performed on a basic-calculator
that have an ability to let a scientist to compute the basic-statistical quicker, and if
sometimes data sets turn out to be large use a package of computer. In the world of
statistics as being a branch of mathematics, statistical tests are frequently represented
as a formula and thus are much lies within the uses of symbols thereby recognizing
them is akin to study a new language that more often than not let persons persuade that
they can’t comprehend statistics. Subsequently here is an introduction of the few:
x A person case akin to single visit to the sports ground.
n The cases number (x) so twelve visits to the sports ground makes that n = 12.
The summation of’ e.g. all the xs added-up, from the case above, the sum of x. = 578
_
x. The symbol representing the mean.
27
28. Different from using words, one more method to describe the mean is by calculations
using the following formula:
**This formula is used while want to calculate the mean for a sample obtained from a
population.
It is advised to write-out in words every symbol in case of any difficulty
following a statistical formula. Therefore for the mean we have the following
formula:
Mean =
The sum of the cases
The number of cases
One disadvantage of the mean can mislead if some of the values are un-
usually large or small. If, for example, Amani one day had been in the sports
ground for only 5 min, then if adding this value into the data it might have an
effect on the average score, creating the 44.85 min as an average time (see
below).
5, 32, 34, 38, 40, 45, 48, 50, 53, 58, 58, 60, 62, 44.85
Really, this un-usual score of 5minutes will make the mean much lower;
hence, it isn’t a true central point reflection and so called Outliers as far as
they are excessive values.
Symbols used while discussing on populations:
N The size of population.
µ The mean of population.
s The population standard deviation.
6.1.2 Median
The median assist in solving problems of outliers since, quite than utilizing
every value in calculating the central tendency statistics, instead it utilizes
just the value that are found in the center of the data which is the physical
centre. It is necessary to have the data to be put in an ascending order when
desire to calculate the median. Taking two examples below of exercise time
28
29. and comprise the low value; it reveals that it has a scanty impact on the
median even when it just happens at one end.
5, 32, 34, 38, 40, 45, 48, 50, 53, 58, 58, 60, 62,
Median = 48
The great value just goes the median by 1, and the middle falls in the middle of two
values, thus necessary to add the two middle numbers and divide by 2, for example:
32, 34, 38, 40, 45, 48, 50, 53, 58, 58, 60, 62,
Median = 448 + 50
2
=49
. Median = 49
6.1.3 Mode
The value that occurring most frequently is the mode, thus from the data set below,
finding it is 58.
32, 34, 38, 40, 45, 48, 50, 53, 58, 58, 60, 62,
Mode demonstrates the common value that appears in a data set, and its one
advantage, it may be utilized on nominal and continuous data and sometimes it is
merely the choice for measuring central tendency, while the mean and median may only
be utilized on continuous data. For instance, if one question of a lifestyle survey asking
is where students might primarily go for family-planning recommendation:
GP 5
Practice nurse 4
Family planning clinic 6
Friends 8
Chemist 2
Nowhere 2
29
30. Consequently, category of friends is the category that is most commonly occurring, by
having 8 scores and therefore in the data list above, the mean and median might not be
of importance.
6.2 Choosing a Measure of Central Tendency
Mean is mostly common to use since it is the most sensitive as far it considers each
value of every case within the distribution, at the same time it is based mathematically
subsequently it may be utilized in the statistical computations later, but it may only be
utilized on measurement level for ratio or interval data and is simply deformed by
outliers.
Table 5 : Choosing a Measure of Central Tendency
Choosing a Measure of Central Tendency
Measure When to Use When Not to Use
Mean Interval or ratio data Categorical data
Ordinal data
For most data sets, where the
cases or less symmetrically
distributed about the mean
When there are outliers or the
data are heavily skewed
Where the measure is going to
be used in further calculations
Interval, ratio or ordinal data
Median Data heavily skewed, mean
distorted by outliers
When the measure will be used
in further calculations
Categorical
Mode Ordinal data
There is no wrong or right central tendency measurement in many situations and
persons like to choose the mathematical mean for the reason that they are recognizable
with it, but it is best to keep in mind that the descriptive statistics major point is the
communicate/organize information, thus a statistician ought to opt the gauge that takes
the information in the optimum potential way and doesn’t misinform the spectators.
6.3 Measures of Dispersion
Thus far, after having already dealt with the central values, then we have to look at
statistics that notify the different scores within a sample; however, the term provided to
measures that informing about variability level with the data is dispersion, thus there
would be slight statistics requirement, if there might be no variation in populations.
30
31. For example, the two groups of students were asked, one part time and one group full
time, at what age at the time they had their first experience of sexual intercourse with
another person.
--- This subscription tells us that
the mean is that of group AGroup A 14, 15, 18, 22, 23 = X 18.4
A
---
Group B 17, 18, 19, 19, 20, = X 18.6
B
Whilst the means are the same it is obvious that there is a difference in the variability of
the values of the cases with the groups. In terms of using the statistics to develop health
care practices, knowing the variability of values is as important as knowing the mean.
After all, if shoe manufacturers only made shoes for people with average-size feet they
would soon go out of business. We want to know about the variability in patients’ health
that we will encounter and in their behaviors.
6.3.1 Range
The range is one of the dispersion measures. Calculating this you subtract the lowest
score as of the highest, so the group A range is: 23 - 14 = 9
The group B range is: 20 - 17 = 3
This means for group A scores are spread over 9 units, other-than for group B they
cover only 3. This expresses that group A has a greater range of values in its cases.
The ranges are a easy and quick way of approximating the level of variation within the
sample, but must be careful of outliers.
6.3.2 Inter-quartile Range
The inter-quartile range (IQR) is an alternative that have to be used to deal with outliers.
A quartile proposed as a name, and is obtained by quartering the set of data, after
placing that data set in an ascending order and divide into 4 quarters and the numbers
at the limits of these quarters are recognized as quartiles. However, this inter-quartile
range is in-fact the range for the middle 50% of the data and can be computed by
situating the first quartile upper value which is the first 25% of the cases and subtract it
31
32. from the third quartile upper value. For example, a group of teenagers asked for how
many year they have thought it was safe to take the pills of contraceptive (Q quartile).
2, 2, 4, 5, 8, 10, 10, 10, 12, 15, 15, 30
Q1 Q2 Q3
From the above data set the range is twenty eighty but it is affected by the case with the
value of thirty since there are twelve cases in the data set, and so there will be three
cases in every quartile. Now, at the first quartile end, i.e. the first 25% of the cases, the
value is four and at the third quartile end, i.e. first 75% of the cases, the value is twelve.
Therefore the difference in these two values (eight), it is the inter-quartile range (IQR).
6.4 The Standard Deviation and Variance
The variance and the standard deviation are the two most common dispersion
measures that the amount of deviation from the mean is indicated, whereby the
standard deviation is most quoted in describing data is, at the same time as the
variance is utilized in numerous tests for statistics.
Consequently, the variation is measure by standard deviation in the data that evaluates
how much every case deviates from the mean, for example, if the mean is six, and a
person case is eight, therefore, the deviation is two.
At the same time as it is sufficiently and takes care of the statistic deviation part, it is
actually the standard tad that is significant, hence a statistician has to take and observe
all the deviations, and relating them to the mean size to obtain standard deviation. This
is imperative because a deviation of 2 is of much less significance if the mean is 110
than if the mean is 8.
Table 6: Age at first sexual experience (with another person)
Case 1 2 3 4 5 6 7 8 9 10
Age at first
sexual
experience
14 15 17 18 18 19 19 20 22 23
32
33. The standard deviation formula given below, it actually engages quite straightforward
mathematics and looks complex, but using the data in Table 2 above it take through the
formula step by step.
s is the symbol for the standard deviation and
s square is the symbol for the variance.
Hence the square-root of variance is simply the standard deviation.
The calculation steps are outlined below:
Step 1: Calculate the mean.
Step 2: Subtract the mean from each value
14 - 18.5 =-4.5
15 - 18.5 =-3.5
17 - 18.5 =-1.5
18 - 18.5 =-0.5
18 - 18.5 =-0.5
19 - 18.5 = 0.5
19 - 18.5 = 0.5
20 - 18.5 = 1.5
22 - 18.5 = 3.5
23 - 18.5 = 4.5
2
Step 3: Square each answer obtained in step 2.
Step 4: Add up all the answers to step 3. This value is called the sum of squares.
Step 5: Minus 1 from the size of your sample (n - 1).
Step 6: Divide the value found in step 4 by the value calculated in step 5: This is called
the variance.
Step 7: Find the square root of the value obtained in step 6 to determine the value of
one standard deviation: 7.83 = 2.80
Therefore our sample standard deviation is 2.80.
When wanting to do a statistical test, have to be careful of sets of data, where the
square of the standard deviation that is the variance, exceeds much the mean i.e. 2
33
x –
_
x.
34. times, if found the variance is smaller in relation to the mean or equals the mean. This
shows signs that the set of data will have a more complex-form and requires to be
handled in a conscientious way. However, it is significant when quoting a mean all the
time to provide a dispersion measure since a mean measure without a dispersion
measure is not easy to interpret and size of the sample is also supposed to be included.
Table 7: Central tendency and measure of Dispersion
Central tendency and measure of Dispersion
Type of data Measure of central tendency Measure of Dispersion
For most data sets, where the
cases are more or less
symmetrically distributed about
the mean
Mean; there should be no need
to quote any other measure
because all measures of central
tendency for this type of data will
be similar
Standard deviation and also
consider giving the range
Interval ratio or ordinal data, data
heavily skewed, mean distorted
by outliers
Median though you might also
quote the mean
Range and quartiles
Nominal Mode No measure
7 DISPLAYING DATA
The two most common forms of data display are graphs and tables. Both
have the same aim, to summarize and present the data in a manner that is
easy to understand and take in. Displaying your data is an essential part of
analyzing the data. It allows graphs you to establish how the data are
distributed, to see unusual cases and generally get a ‘feel’ for the data.
Table 8: Acklin and Bernat (1987) data for the two indices of depression measures
in patients with a range of conditions
Patient type
Index LBP Depressives Personality discover Non-patients
Egocentricity 0.31 0.32 0.42 0.39
Sum morbid content 0.82 3.47 0.99 0.70
Tables present information in a text-based form. As such, much of the detail in the data
can be retained. Unfortunately taking in lots of different numbers and seeing emerging
patterns is rather difficult, and this is where graphs come in. When we present data in
graphical form some of the detail tends to be lost but it becomes much easier to see the
34
35. emerging patterns. In the example below we show some data from Acklin and Bernat
(1987) in graphical (Figure 7.1) and in Table (7.1) form. Acklin and Bernat’s study
examined the relationship between chronic low back pain and depression. In the graph
and the table we have taken two of the indices of depression that they used and plotted
them against patient type.
Egocentricity
Sum morbid
content
Figure 1: Indices of depression recorded for a range of different patient types
(Acklin and Bernat 1987)
Which display form makes it easier to see the trend? Which allows you to see most
detail? As a general rule the more data put into a table the more it will become harder to
read, and less likely to be read. Tables should be used when the data set is very simple
or when data set are needed to be shown in great detail. The data we used for these
graphs are already summaries of the data collected. This means that the figures are
averages.
7.1 Table Types
There are several different types of table that can be used. Your choice of table will
depend on the type and number of variables that you have. In the example above (Table
7.1) there are two types of variable. Along the top of the table we have the nominal
category ‘patient type’ whilst down the side of the table we have two interval scale
variables, that is, the indices of depression.
Table 9: Frequency of different ethnic groups of a sample of 178 Individuals
interviewed in north-east London
Ethnic group Frequency in sample
White European (EU) 75
African 35
35
36. Indian 32
Afro-Caribbean 26
Other 10
Other tables may have just one variable which runs either along the top or down the
side of the table, there measurement could be the frequency or occurrence of that
particular variable. In such a case the table becomes a frequency table and it is normal
to have the most commonly occurring frequency at the top (see Table. 7.2). Sometimes
tables may include summary statistics (as does Table 7.1). In Table 7.3 the bottom row
is a summary of the data within the table. Tables that report on the frequency values of
two nominal variables simultaneously, and that include totals, are often used to help
look for associations between variables. These tables are known as contingency tables.
Table 10: Summary examination results for a group of 122 first-year students’
nurses
Exam
paper
%
(average)
1 53
2 46
3 58
4 43
Average 50
7.2 Types of Graph
From the guiding principle depicts less is more when using graphs, since in the data
graphing if too much information is comprised there will distort the meaning. There are a
variety of graph types. The main types you will see are frequency charts, histograms,
bar charts, pie charts, scatter graphs and line graphs. The type of chart chosen to utilize
will depend on the type and complexity of the data. Many graphs are plotted against two
axes, the horizontal axis and the vertical axis, however the point to consider is that if
your graph is drawn incorrectly it may mislead another audience.
7.2.1 Frequency Charts
Data are divided into categories or/and counts of how frequently a certain value occurs
and these occurring counts are referred as frequencies. One of the simplest type of data
is nominal data, where things are categorized, then count the number of things in each
36
37. category. Few examples of such categories are different types of diseases or the
number of individuals belonging to each ethnic group or the number of males and
females.
If these data are measured initially on an interval or ratio scale, the most appropriate
form of display is a histogram. Plotting data using a frequency histogram permits us
to obtain a thought of how the data are distributed and as well as to have a feel for the
data in a frequency histogram, the x axis covers the values range of the cases,
however, each distance covered by a bar on the x axis stands for a range of potential
recordable values for the measure so it necessitate to decide the size of the range
categories that you will utilize.
7.2.2 Pie Charts
The pie charts are an alternative form of frequency chart best used with ordinal or
nominal scale data are pie charts, since they displays the count of things in each
nominal group category as a proportion or frequency of the total number of counts. The
total set of data is represented as a circle that is divided into segments the size of which
reflects the frequency of every nominal group. For example,
7.2.3 Bar Charts for Summary Statistical Information
The bar charts may as well be utilized for summary statistical information display such
as means and standard deviations. In Figure 7.1 the means of two depression indices
are plotted on a bar chart. This could also be plotted on the graph an indication of the
variation in the data; this is frequently done in the form of error bars which are small
vertical lines with horizontal bars at the top and bottom that mark the range of the mean
plus or minus one standard deviation or standard error.
7.2.4 Scatter Graph (Scatter Plot)
Scatter graph is utilized when there is relationship between two variables that is the
value of one of the variables related with the value of another, for instance, height and
weight are quite often closely linked. Yet, scatter graphs may be utilized with the data of
interval, ratio or ordinal that have been collected in pairs, for example, measured of both
the weight and the height of each of participants.
37
38. The x axis carries the scale for one of the variables and the y axis the other then for
each sample unit the points are plotted on the graph at the place where the values of x
and y for a particular sampling unit meet. The graph will show the points scattered over
the surface of the graph if you have enough data.
7.2.5 Line Graphs
Similarly a scatter graph, a line graph is except that the points in this graph are plotted
in sequence as the values increase along the x axis and a line is drawn between every
point and the next. The line graphs are perfect for showing sequences, for example
plots of patient study over-time, or the infants growth over-time. In general, line graphs
have to be utilized simply when there is a best reason to presume that the line drawn
between the points does actually represent what in all probability will occur.
Consequently, they really ought not to be utilized for grouped data, for instance, monthly
means or counts, although in practice they frequently done. Really, utilized in this
manner, line graphs are rather a good method of permitting comparisons in trends
across different the data groups.
Table 11: Types of graphs Summary
Type When to use
Histogram For showing frequency distribution of data measured on the interval or ratio scales
Bar chart Use for displaying frequencies of nominal or ordinal data, also for comparing measures
of central tendency between groups of data measured on ordinal, interval or ratio scales
Pie chart Used largely for showing frequency distribution of nominal data. Try to avoid using pie
chart to compare different groups of data.
Scatter graph Use with interval, ratio and ordinal data when if you want to see if two variables are
linked. Two or more variables must be measured from each sampling unit.
Line graph Used for data measured on interval, ratio or ordinal scales, particularly when you want to
display a trend or change over time. Particularly useful for display trends in several
groups of data at once. Avoid joining points if there is no reason to do so.
8 HYPOTHESIS
38
39. The hypothesis is a proposed explanation for an observation that leads to a
prediction(s) that through our investigation and the use of statistics we will either confirm
or reject and in so doing test the validity of the hypothesis, otherwise simply a method of
synthesizing an idea or an explanation as they are central to most studies that involve
the collection of quantitative data and statistics and are usually built from a previous
observation or experience that will lead to a prediction that there is a relationship or link
between two or more variables for example if we are interested in studying the
relationship between sexual activity and sexual health using placebo trial when
interested in obesity and how it affects post-operative recovery time, within the broad
areas of study, we have some specific relationships we wish to explore.
Hypothesis building
Observation: A manager for sexual-health clinic reports that patients/clients from
certain post code areas seem to be infrequent visitors to the clinic.
Hypothesis: Persons who live farther away from the clinic are less likely to visit.
Study: Make a detailed analysis of the distance people live away from the clinic
and the frequency of visits.
8.1 Forming the Hypothesis for the Experiment / Study
One of the hypotheses from the first investigation is that males are less likely to use the
walk-in clinic than females. One of the hypotheses from the second study is that obese
patients treated with Placebo will lose weight faster than those not given the drug. In
this the predictions are highlighted in bold.
8.1.1 Experimental Approaches
Some investigations will ask for testing those associations by experiments as an
approach of finding answers since they keep the variables which we are not interested
in constant, for example, in the second investigation we would divide the patients/clients
into 2 clusters and we would subject 1 cluster to treatment with Placebo.
Variables that we are not interested in, must be controlled carefully (i.e. the patient’s
sex, age and socio-economic group) by making sure the composition of the two groups
are alike, as we manipulate 1 of the groups of people that take part in the study, thus
the investigation is an experiment. As these individual who take part in a study were
39
40. recognized as subjects, they are now becoming more common to be referred to as
participants, in acknowledgment that in most cases we have to get permission from
people before we study them since they are not objects to be studied, they are
participants in the study. Consequently, the manipulated group is recognized as the
experimental or treatment group and the group which was not subject to the
manipulation is recognized as the control group which will receive a Placebo. The
control groups and the experimental treatment together are recognized as the treatment
groups in statistical vocabulary,. Thereby the outcomes of the experiment will be
subjected to the analysis of statistics for the purpose of assessing the probability of the
results occurring by chance.
Figure 2: Diagram of an experimental study
Study population (sampling)
Sample population (ranndomisation)
Experimental group Control group
First data collection (before interview) First data collection (same time as in study group)
Period of intervention manipulation No manipulation intervention
Last data collection (after intervention) Last data collection (same time as in study group)
compare
8.1.2 Non Experimental Methods or Quasi-experiments
In the hypothesis, from the walk in clinic study whereby males are less likely to visit the
walk in clinic than females might presumably not utilize an experiment as the basis of
the investigation but a study-based on the data statistical-analysis is narrating to the
40
41. frequency of visits by males and females. The study might need to test if the males to
females observed ratio of visiting the clinic was likely to occur by chance.
Figure 3: Diagram of a Quasi-experimental design with two groups
Study group Intervention Study group after
Compare
Control group before Control group after
8.1.3 Before-After Study
It is another kind of study design Iis frequently chosen since its rather easy in setting up,
it utilizes only one group whereby the intervention is performed. The condition has to be
analysed previous to and after the intervention to test if there is any dissimilarity within
the observation trouble, as it is considered as the pre-experimental study design quite
different from a quasi-experimental study design for the reason that it engages neither
the use of control group nor randomisation.
8.2 Variables
Nevertheless, we have decided that a hypothesis is a prediction about 2 or more
variables, it is imperative to remain aware of the role of every variable when a statistical
test is applied, since we will have one dependent variable and one independent
variable (or/and confounding variable).
8.2.1 Dependent Variable
The dependent variable refers to a variable whose value is determined by/dependent on
the value of another variable, and always is measured in experimental designs. We may
for example hypothesize that age and blood-pressure are linked i.e. in a study this
relationships the dependent variable would be age, as would not suggest that age was
determined by blood pressure, instead would predict that age in some way was
important in determining blood pressure.
8.2.2 Independent Variable
41
42. The independent variable conversely is the consideration for the researcher to decide
the value, as a minimum partly, of the dependent variable, i.e. while considering the
relationships between age and blood-pressure; we can suggest that age in some way
might account for the recorded level of blood-pressure and therefore age is the
independent variable. The independent variable is the variable that is fixed or
manipulated in the experimental designs
8.2.3 Other Variables
The confounding variable is another important kind of hat has influence on the value of
the dependent variable so far is not important with respect to the hypothesis that is
being tested. For instance, in the test of the impact of Placebo it might be that the
patient’s age influences the effects of Placebo. If this is the case and we fail to ensure
that the two treatment groups have participants of similar age, then age will become a
confounding variable, to interpret the results of our experiment might be difficult.
Consequently, potential confounding variables necessitate to be taken into account
using suitable and cautiously consideration out research designs, particularly with
admiration to the sample selection.
8.3 Errors and Statistics
There are several types of error that may give false in statistics of which fall into four
categories that are random error, sampling error, measurement error and experimental
error. Much of research design and statistics involves either trying to reduce error or
trying to take account of it. One of the most significant of statistics uses is thus to help
make a decision if an observed result could be due to chance, that is, caused by
sampling and other non-systematic errors.
8.4 The statistical Hypothesis
An experimental hypothesis has to be established by a researcher before performing an
experiment/study to test it. In a same manner, when we test the results of the
experiment to see if they might have occurred by chance, we as well establish a
statistical hypothesis. The hypothesis of no difference is the most common form of
statistical hypothesis, frequently namely the null hypothesis and given the symbol H
base zero and the hypothesis is given the symbol H base one.
42
43. 8.5 Types of Interaction between Variables
When conducting studies we are not all the time looking for the same kind of
relationship between variables since there are generally three kinds of interaction:
relationships, associations and differences (clear-cut). To make a decision on the kind of
interaction between the variables you are dealing with is an imperative aspect of
statistics since it will in part determine the statistical test that you use.
9 DISTRIBUTIONS AND PROBABILITIES
One of the more important concepts in statistics is the idea that numbers can be
distributed in the frequency of occurrence of particular numbers. For example, a data
set of the number of sexual partners that each individual has during a lifetime could
contain just the values 4 or 3; it’s much more likely that it will be an important mixture of
different numbers, from high to low, because the way numbers are mixed or distributed
will largely determine the type of statistical test that are to be used, and so the easiest
way to see the way in which data combinations are assembled is to plot them in a
frequency histogram (Figure 9.1).
9.1 Frequency Histograms
Actually the frequency histogram is a kind of bar-chart where the y axis is the frequency
of incidence of a particular case and on the x axis we have a scale that is bounded by
the values of the lowest and the highest of the cases and the values of the scale are
placed in between by the use of appropriate intervals.
43
44. Figure 4: Example of a Histogram
A bar is drawn that fills the whole of each of the intervals being measured; the sides of
the bars are parallel and the width of the bar is held constant. This type of figure is
normally used for variables that are recorded on an interval or ratio scale. If your data
are interval or ratio scale, data plotting them in this manner must be one of your very
first steps. This is for the reason that the data distributions and numbers shape the
basis of many tests of statistical and might be found that the numbers are distributed in
numerous ways. Nevertheless, some of the distributions have features that can be
developed by researchers. One such distribution that we can step up to find out is the
normal distribution which forms the basis of numerous tests of statistical.
9.2 Probability and Statistics: The Link within Probabilities and Distributions
Having a bag of laundry with equivalent numbers of blue and pink towels whereby you
can’t see into the bag. When reach in and pull out a towel there are two possible results:
the towel will be pink or the towel will be blue. You draw out three towels from your bag.
The likely results are 8 (i.e. BPB, BPP, PBB, PBP, PPB, BBB,PPP or BBP). The
likelihood of each outcome occurring is thus 1/8. We have four combinations: all blue,
all pink, one pink and two blue, or two blue and one pink. Thus what is the chance of
gaining every of these groupings? Well, for PPP and BBB it is simple, as we have
already said the chance of these results is 1/8. There are three results that give us one
pink and two blue towels, so the chance of this grouping is 1/8 +1/8 +1/8 = 3/8.
Therefore, there are also three results that provide us one blue and two pink towels, so
the chance of this grouping is 1/8 +1/8 +1/8 = 3/8. The type of distribution shown here is
called the binomial distribution, which is a mathematical model that describes data
whose distribution is determined by events that can occur as either of two categories.
44
45. Hence the distributions of numbers and probabilities are linked thereby allowing making
predictions and fortunately it just so happens, that natural phenomena produce sets of
data that have a distribution similar to the one above. This distribution is recognized as
the normal or gausian distribution since it forms the basis of many of the most
commonly used statistics. Consequently, the statistics type that relies on numbers being
distributed in a certain way is called parametric statistics.
9.3 The Normal Distribution Curve
The normal distribution has mathematical properties that allow us to make predictions,
just like the histogram. As far as we are of what is meant by the term probability and the
ways in which probabilities may be expressed, however we ought to be aware that it is
likely, to make predictions using knowledge of how numbers are distributed. Envisage
that the intervals on the x axis were small infinitely, in its place of a bar chart with steps
we have to produce a curve, especially if we haven’t shade in the bars obviously the
normal distribution would look-like such a curve (see figure 9.3) after connecting the
tops of the bars with a line and then removed the bars. (see figure 9.2), as a defined
distribution curve of numbers the normal distribution has certain properties. The first is
very clear, the curve is symmetrical, and it is sometimes referred to as ‘bell-shaped’,
however, that curve depends on the standard deviation of the data.
Figure 5:The curve is bell-shaped and symmetrical
45
46. The mean is all the time in the middle of the x axis. The normally distributed curve tail
(the rare values) is inclined to be short. Yet, presumably the most significant feature of
the curve of normal distribution is that the point where the curve changes to convex
from being concave (the inflection point) is always 1 standard deviation (SD) from the
mean, away. This means the area enclosed by the boundaries of the mean plus 1 SD
and the mean minus 1 SD is all the time the total area constant proportion, which is
68.27 per cent.
If moving two standard deviations away from either side of the mean then we
encapsulate 95% of the total area and if we would take a large sample of patients’ arm
lengths that is large, we would anticipate that 67% of the results would lie within ± 1 SD
from the mean and that 95% would lie within ± 2 SD from the mean (see figure below).
Figure 6: The Standard Deviation
9.4 Making a Prediction
I am interested in the number of Opsite dressings used on the average medical ward. I
collect data from 102 wards and the data are normally distributed. Now how many
wards will lay within ± 1 SD of the mean? Suggestion: in normally distributed data 68.27
per cent of the data lies within ± 1 SD of the mean.
I have now introduced a means by which, if I know the mean and the data set of SD,
and I know that it is normally distributed, I can make predictions. I have to use this
knowledge as the basis of what are often called parametric statistics which are decided
by processes that presume data are distributed in a particular way and share common
characteristics
9.5 Deviations from the Normal Distribution
46
47. Sometimes we find that the data we have collected do not fit the normal distribution.
The best way to get a rough idea whether your data fit the distribution is to plot a
frequency histogram. Some deviations have a particular shape and are given special
names. The distribution (figure 7) is called negatively skewed since the mean lies to the
left of the median (as you look at it) and the distribution (figure 7) is called positively
skewed since the mean lies to the right of the median.
Skewed sets of data tend to happen when there are values that are much lower or
greater than the rest as a result the frequency histogram is not symmetrical, it is
skewed. Therefore, in these distributions, the greater the dissimilarity between the mean
and the median, the greater the skew, however, it is as well likely to have symmetrical
distributions that don’t conform to the normal distribution. The most common are
random distributions and the regular, or under-dispersed, distribution. Examples of
which are given below (figure 7).
Figure 7: Symmetrical and skewed distributions
9.6 Random and Clumped Distributions
In sets of data when the variance is almost equal to the mean they tend to be un-
common and referred to as randomly distributed. An example of a random distribution
example can be the number of incidences of certain diseases in the areas that are
defined geographically.
The true randomness is relatively uncommon and that the of many disease phenomena
geographical distribution be inclined to contain an over-dispersed or clumped
distribution, when talking of disease-outbreaks where we recognize that certain areas
having a high incidence of a particular disease. In random phenomena we are saying
that every happening (e.g. an incidence of a cystic fibrosis) is un-related to any other
incidence, but when the distribution is clumped it proposes that the episodes are
narrated (for instance, in the contagious disease case, or a disease that is activated by
47
48. some ecological reasons), surely will tend to show a strong positive-skew (the mean lies
to the right of the median). The last distribution to be conscious of is the regular
distribution which is actually an excessive shape of the normal distribution whereby the
SD is small relatively to the mean, this means, there is very little set of data spread (for
instance, might be the numbers of fingers and toes records within the population.
Evidently, individuals with less than 8 fingers and 10 toes are un-usual, and thus the
regular distribution would be. if the point of the curve is flattened or if a normal
distribution is shaped like that shown below in Figure 9.8 it is said to demonstrate
kurtosis.
Figure 8: Different degrees of Kurtosis in frequency distribution
It is imperative to differentiate between random and clumped distributions since the
manner where data are distributed is significant, as it tells us about the basic properties
we are studying and, as we have seen here, is very pertinent to studies of the spread
and distribution of diseases (epidemiological). We also need to know how data are
distributed before we get on many tests of statistical.
10 ERRORS AND ANOVA
An ANOVA is an analysis of the variation and described as a powerful and robust
technique, present in an experiment. However it is the hypothesis test that the variation
in an experiment is no greater than that due to individuals' characteristics normal
variation and error within their measurement. Criteria to be met before doing an ANOVA
test the data of each treatment group are derived from a normal distribution, the data
were measured on an interval/ratio scale, the variance between each group is not
significantly different. (There are ways round this one) and the sample groups are
measured independently of each other.
10.1 Statistical Errors
48
49. There are several types of ANOVA but they have evolved to deal with a certain type of
statistical error. Say you have a control group and two different levels of a treatment. In
that case you can’t use a t test and must use a type of test that belongs to a group of
ANOVA, which is shorthand for analysis of variance.
A Type 1 error happens if the null hypothesis is rejected when it ought to have been
accepted while a Type 2 error is when a fake null hypothesis is accepted hence they are
errors are opposites, as we decreasing the probability of a type 1 the possibility of a
Type 2 is increasing. Generally we tend to choose tests that will reduce the possibility
of a Type1, so a careful approach is accepted, for instance, we always say that in many
medical studies the significance level is set at P = 0.01. However, Type 1 error rejecting
the null hypothesis when it is true while Type 2 errors not rejecting the null hypothesis
when it should be rejected.
10.2 The t-Tests, Errors and ANOVAs
When we have more than two groups we need an ANOVA, envisage we are doing a
study where we have a control group (C) and two treatment groups (T1 and T2) and we
desire to find out if their means are considerably different; if we use a t-test then we
must test:
C against T1.
T1 against T2.
T2 against C.
10.3 The ANOVA (Analysis of Variance)
Perhaps this is not too much trouble if using a computer or even a calculator but if you
had five treatment groups you would need to do ten tests. Even if you are prepared to
stand the boredom, and manage not to make any calculation errors, you will commit a
statistical error.
This is because: if you set your significance level at the normally accepted value of P =
0.05 (5 per cent), once every twenty tests (on average) you will get it wrong and commit
a Type 1 error. But if, as in the case above, where we have five treatment groups, you
perform ten t tests the chance of one of them being wrong goes up to one in two (that is,
49
50. 0.05 × 10). So we need a way round this problem and hence the solution is to use an
ANOVA.
ANOVA allows us to compare the means of several treatment groups at the same time
without having to worry about adjusting P values or increasing the chance of Type 2
errors.
It does this because it compares all the treatment groups in a single test. As you can
imagine, the number of calculations needed to perform an ANOVA is quite large.
However, with the advent of computers the use of ANOVA has become much more
common and many more ANOVA-type tests have been designed. In this chapter we will
look at two types of ANOVA.
In general it is better to use a computer, as they make fewer errors than humans. It is
suggested that you focus on the structure of the tests and interpreting the output. The
type of ANOVA that we will describe is the one way analysis of variance.
Some of Computer statistics packages are: SPSS, Minitab and Microsoft Excel can
all help to analyze data using the one way ANOVA described here.
10.3.1 When to Use a One-Way ANOVA
We are comparing the difference between more than two sample groups.
The data in each of our groups are normally distributed.
The data are measured on an interval scale.
Each case is measured independently.
How does it work?
First, here are some data. The data set is smaller than would normally be used for
ANOVA but we will use it to help us examine the ANOVA. The data in Table 12 are from
a study to examine whether the pre-natal fitness level of Primip women significantly
affects duration of labor.
The ANOVA test looks at the source of variation in the overall data set and tries to
apportion it to different aspects of the data. Once the variation has been allocated it is
possible to see if the differences between the sample groups are significant. The
50
51. sources of variation in the data are the variability that occurs within a sample group and
the variability that occurs between the groups (Table 13). We can say that:
ANOVA seeks to determine how much of the variation in data sets can be attributed to
error and how much can be attributed to the factor or treatment under study.
We are interested in the between-group variation, that is, that which has occurred
because of the fitness level. The rest of the variation, that is, that within the groups, we
regard as error. The variability between groups will reflect the error that occurs within
the groups and any additional variability caused by the treatment (in this case, fitness
level).
Total
variability
= Variability
between the
groups
+ Variability within
the groups
If there is no difference between the groups, that is, the null hypothesis is correct, we
would expect there to be just as much variation between the groups as there is within
the groups. If the between-group variation is more than the within-group variation we
know that the treatment has had an effect; and this is the simple logic behind the
ANOVA test.
Table 12: Duration of labor during Primp women aged between
Twenty two at three different levels of fitness
Fitness level 1 Fitness level 2 Fitness level 3
20 34 16
32 12 15
14 23 22
15 10 10
Level 1, Low: Level 2, Medium: Level 3, High
Table 13: Variation within and between groups
Fitness level 1 Fitness level 2 Fitness level 3
20 34 16
32 12 15
14 23 22
15 10 10
51
Variability
within
a group
Between –Group
variability
52. The test statistic produced by the ANOVA is F, a statistic we have seen before, and the
measure of variation we use, the variance. Hence the name of the test: the analysis of
variance. If we compute the within-group variance and compare it with the between-
group variance, F will equal 1 if the null hypothesis is correct and if F is significantly
different from 1 we know that the means are significantly different, as well as the level of
fitness (treatment groups) had an effect. The procedure for calculating the ANOVA by
hand is long winded. It is probably worth doing by hand once or twice, as that will help
grasp how the procedure works and how ANOVAs are presented.
Table 14: Difference walked by patients (m) with impaired hip mobility, following
various treatment regimes
Old frame New frame
Exercise level 1 Exercise level 1 Exercise level 2
16.1 22.3 13.2
17.7 20.5 20.8
20.6 21.3 22.2
10.4 26.7 16.3
20.3 16.3 13.7
14.9 29.0 11.9
11.5 24.4 14.1
14.7 23.7 10.6
15.3 23.5 15.8
17.4 19.5 15.9
Mean 15.89 22.72 15.46
Standard deviation 3.32 3.63 3.67
In this example Anna Fimbo, a physiotherapist, is interested in the impact of the use of a
new walking frame on her clients with impaired hip mobility. She has decided to test the
new frame at two levels of exercise and use her old frame with the normal level of
52
53. exercise as a control. Martha uses the distance the patient can walk unassisted as a
measure of the effectiveness of the treatments (see table 14 above). Again, we will
assume that the data are normally distributed, and remember that it would be normal to
plot out the data to look for any odd results and get a ‘feel’ for your results. Place the
data into a table and, using a scientific calculator (if you have one), calculate the mean,
the standard deviation, the variance, the sum of the cases and the sum of the cases
squared (Table 15). Now we need to make sure that the variances of our sample groups
are not significantly different, see criterion 3.
Table 15: Statistical summary of the data from Table 14
New frame
Group 1
(GP1)
Old frame
Group 2
(GP2)
Exercise
Level 1
Group 3
(GP3)
Exercise
Level 2
12.1 22.3 13.2
15.7 20.5 20.8
18.6 21.3 22.2
9.4 26.7 16.3
18.3 16.3 13.7
12.9 29.0 11.9
9.5 24.4 14.1
12.7 23.7 10.6
13.3 23.5 15.8
15.4 19.5 15;9
Sample number 10 10 10
Mean 13.79 22.72 15,46
Standard Deviation
(SD)
3.21 3.63 3.67
SD square 10.27 13.15 13.45
Sum of samples 137.9 227.2 154.6
Sum of samples square 19,016.4 51,619.8 23,890.8
53
54. Sum of squares of
samples
1,994.11 5,280.36 2,762.4
Total number of samples, ntotal = 30
Sum of samples square, = 519.7
Sum of square of samples, =10,036.9
To do this, select the largest and smallest variances and perform an F test, there is no
significant difference in the variances, so we can proceed with the test. The ANOVA test
uses the sums of squares as a measure of variation.
Calculations
Step 1 Calculate a correction factor (CF). This makes the calculations a little quicker:
Step 2 Calculate the sums of squares (SS) for the whole sample:
Step 3 Calculate the between-groups sums of squares:
SSbetween = 449.67
Step 4: Calculate within-group sums of squares. A short cut can be used here because
we know the between-group sums of squares and the total and we know that the
between-groups and within-groups sums of squares must add up to the total:
So:
SStotal = SSbetween - SSwithin
SSwithin= SStotal - SSbetween
584.33 = 1,034 - 449.67
54
55. If you are forced to do ANOVAs by hand it’s probably best to calculate both the within-
group and the between-group sums of squares by longhand. This will allow you to
check your maths.
Step 5: Determine the degrees of freedom for both the within and between groups
following the following rules.
d.f. for SSbetween = number of groups -1, (In this example 3 - 1 = 2)
d.f. for SSwithin = total number of cases - number of groups, (In this example 30 - 3 = 27)
d.f. for SStotal = d.f. for SSbetween + d.f. for SSwithin
Step 6: Calculate the variances for both the between- and the within-group sums of
squares:
Step 7: Calculate F:
F = Variance between groups = 224.83 = 10.38
Variance between groups
Step 8: It is normal for the results from an ANOVA to be put in a table laid out in a
standard format like Table 16. The results of ANOVAs performed using statistical
packages are often presented in such tables. An alternative would be Table 17.
Table 16:
Source of variation Sum of squares d.f. Variance F
Between groups 449.67 2 224.83 10.38
Within groups 584.33 27 21.64
Total 1,034 29
Table 17:
Source of variation Sum of squares d.f. Variance F
Between groups 449.67 2 224.83 10.38
Error 584.33 27 21.64
55
56. Total 1,034 29
Step 9: Let’s look the value of F in the appropriate statistical table Note that the
variance between groups should always be on top, and larger than the within-group
variance. If the between-group variance is less than the within-group variance the null
hypothesis is automatically accepted.
The value of 10.38 is significant at the P < 0.01 level and so we can reject the null
hypothesis and say that the difference between the groups is significantly different. We
would express this result by saying that there was a significant difference between the
three treatment groups (ANOVA F2, 27 = 10.38, n = 30, P < 0.01.). Unlike the t test we
also give the degrees of freedom for both within and between groups. They are given as
a subscript to the F statistic.
10.4 Contrasting the Means
You may have noted that there is a slight problem with the ANOVA in that, whilst we can
say that there is a significant difference between the sample groups, we can’t say which
groups are different from each other and which are not.
Table 18: Comparing Means after an ANOVA test
Group means Group 1:13.79 Group2:22.72 Group3:15.46
Group 1:13.79 9.05 1.67
Group 2:22.72 7.26
Group 3:15.46
Thus in the first example we do not know if both exercise regimes are both different
from the control, or if they are different from each other, etc. Fortunately we can do
follow-up tests that allow us to determine which sample groups are significantly different
from each other. For those using computer packages there are a range of these follow-
up test options with an assortment of names. The only one to avoid is the least
significant difference test, as you will make the same error as if you did multiple t tests.
The most conservative (tends towards a Type 2 error) is Scheffe’s test, the least
conservative (tends towards a Type 1 error) is Duncan’s multiple range test (Kerr, Hall
56