SlideShare a Scribd company logo
1 of 67
Download to read offline
NAME: AMANI OMARI MASENYA
ID: UB17186BHE24958
BACHELOR OF SCIENCE IN
HEALTH CARE ADMINISTRATION
HEALTH CARE STATISTICS
HEALTH CARE STATISTICS (Essay-Paper)
ATLANTIC INTERNATIONAL UNIVERSITY
HONOLULU, HAWAII
February, 2013
TABLE OF CONTENTS
COVER PAGE........................................................................................................1
TABLE OF CONTENTS 2-6
1 INTRDUCTION...................................................................................7
What are Statistics?...............................................................................7
2 THE STATISTICAL APPROACH..........................................................7
2.1 The Scientific Technique.................................................................7
2.1.1 Some Faiths Connected with Statistics Laid by Quantitative and
Scientific Model:.................................................................................7-8
2.1.2 The Characteristics of Research Methods that Use Statistics......8-9
2.2 Research Method Basic parts..........................................................9
2.2.1 Question(s) Forming....................................................................9
2.2.2 Literature Review.........................................................................9
2.2.3 Conceptual and Theoretical Frameworks.................................9-10
2.2.4 Hypotheses and Variables..........................................................10
2.2.5 Research Design.......................................................................10
2.2.6 Sample and Population..............................................................10
2.2.7 Data compilation/Collection and Data Analysis............................10
2.2.8 Results and Conclusions......................................................10-11
3 MEASURING, SAMPLING AND ERROR.............................................11
3.1 Population...................................................................................11
3.2 Sample........................................................................................11
3.3 Cases.....................................................................................11-12
3.4 Statistical and Real Population.................................................12-13
3.5 Measurement Scales....................................................................13
3.5.1 The nominal Scale.....................................................................13
3.5.2 Ordinal Measurement Scale...................................................13-14
2
3.5.3 Ratio or Interval Measurement Scales.........................................14
3.6 Errors..........................................................................................14
3.6.1 Measurement Error...............................................................14-15
3.6.2 Consistency..............................................................................15
3.6.3 Design Errors.........................................................................15-16
3.6.4 Sampling Error..........................................................................16
4 QUESTIONNAIRES..........................................................................16
4.1 Questionnaire Design..............................................................16-17
4.2 Sample Questionnaire.............................................................17-19
4.3 Scales from the Sample Questionnaire..........................................20
5 THE STUDIES.................................................................................21
5.1 The questionnaire Study...............................................................21
5.2 The Questionnaire..................................................................22-25
5.3 The Clinical Trial...........................................................................25
6 DESCRIPTIVE STATISTICS.........................................................26-27
6.1 Measures of Central Tendency................................................27-29
6.1.1 Mean (Arithmetic)......................................................................29
6.1.2 Median......................................................................................29
6.1.3 Mode....................................................................................29-30
6.2 Choosing a Measure of Central Tendency.................................30-31
6.3 Measures of Dispersion................................................................31
6.3.1 Range..................................................................................31-32
6.3.2 Inter-quartile Range...................................................................32
6.4 The Standard Deviation and Variance.......................................32-34
7 DISPLAYING DATA.....................................................................34-36
7.1 Table Types..................................................................................36
3
7.2 Types of Graph.............................................................................37
7.2.1 Frequency Charts......................................................................37
7.2.2 Pie Charts.................................................................................37
7.2.3 Bar Charts for Summary Statistical Information...........................38
7.2.4 Scatter Graph (Scatter Plot)........................................................38
7.2.5 Line Graphs...............................................................................38
8 HYPOTHESIS..................................................................................39
8.1 Forming the Hypothesis for the Experiment / Study........................40
8.1.1 Experimental Approaches..........................................................40
8.1.2 Non Experimental Methods or Quasi-experiments.......................41
8.1.3 Before-After Study................................................................41-42
8.2 Variables......................................................................................42
8.2.1 Dependent Variable....................................................................42
8.2.2 Independent Variable.................................................................42
8.2.3 Other Variables.....................................................................42-43
8.3 Errors and Statistics.....................................................................43
8.4 The statistical Hypothesis.............................................................43
8.5 Types of Interaction between Variables..........................................43
9 DISTRIBUTIONS AND PROBABILITIES.............................................44
9.1 Frequency Histograms..................................................................44
9.2 Probability and Statistics: The Link within Probabilities and
Distributions........................................................................................45
9.3 The Normal Distribution Curve.................................................45-46
9.4 Making a Prediction......................................................................47
9.5 Deviations from the Normal Distribution........................................47
9.6 Random and Clumped Distributions.........................................48-49
4
10 ERRORS AND ANOVA..................................................................49
10.1 ERROS AND ANOVA (Statistical Errors)....................................49
10.2 The t-Tests, Errors and ANOVAs.............................................49-50
10.3 The ANOVA (Analysis of Variance)...............................................50
10.3.1 When to Use a One-Way ANOVA........................................51-56
10.4 Contrasting the Means................................................................57
10.4.1 Tukey Test..........................................................................57-58
10.5 Independence.............................................................................58
10.6 Repeat Measure Design of ANOVA..........................................58-60
11 TESTS FOR ASSOCIATION.......................................................60-61
11.1 Chi-square Test: One-way......................................................61-64
11.1.1 Restrictions on the Use of the Chi-square.................................64
11.1.2 Independence.....................................................................64-65
11.2 Chi-square Test for Association: Two-way....................................65
BIBLIOGRAPHY...................................................................................65
FIGURES
Figure 1: Indices of depression recorded for a range of different patient types (Acklin
and Bernat 1987)
Figure 2: Diagram of an experimental study
Figure 3: Diagram of a Quasi-experimental design with two groups
Figure 4: Example of a Histogram
Figure 5:The curve is bell-shaped and symmetrical
Figure 6:The Standard Deviation
Figure 7: Symmetrical and skewed distributions
Figure 8: Different degrees of Kurtosis in frequency distribution
TABLES
Table 1: A range of different variables and their likely sampling units
5
Table 2: A patient’s responses when asked to consider day surgery (questions asked
pre-operatively)
Table 3: The example of data from questionnaire
Table 4: Time spent by an individual at the gym during July
Table 5 : Choosing a Measure of Central Tendency
Table 6: Age at first sexual experience (with another person)
Table 7: Central tendency and measure of Dispersion
Table 8: Acklin and Bernat (1987) data for the two indices of depression measures in
patients with a range of conditions
Table 9: Frequency of different ethnic groups of a sample of 178 Individuals
interviewed in north-east London
Table 10: Summary examination results for a group of 122 first-year students’
nurses
Table 11: Types of graphs Summary
Table 12: Duration of labor during Primp women aged between twenty-two at three
different levels of fitness
Table 13: Variation between and within Groups
Table 14: Difference walked by patients (m) with impaired hip mobility, following
various treatment regimes
Table 15: Statistical summary of the data from Table 12.3
Table 16:
Table 17:
Table 18: Comparing Means after an ANOVA test
Table 19: Diastolic blood pressure of four men aged between fifty-six and fifty-eight
at the start of and during a prescribed exercise regime
Table 20:
Table 21: The ethnicity of individuals in a cohort entering nurse education
Table 22: Observed and expected frequencies of four ethnic in a classroom of
student nurses
Table 23: A 2 x 3 contingency table showing the incidence of MDR TB inrelation to
three East European Countries
1 INTRDUCTION
What are Statistics?
6
Statistics is a word obtained from the Latin status, denoting state, and historically
submits to as the facts and figures that prove relations of countries or states
demography (Bhattacharyya and Johnson 1977). However, the statistical approach
engages describing phenomenon in terms of numerals and then utilizing the numerals
to either entail or infer effect and cause since they are the research tool key for
quantitative-researchers.
Statistics today are utilized in a whole studies and investigations diversity as well as to
summarize and describe data as of studies when those data are gathered in the
numerals shape, and also they are utilized to look for examples and to discover the
likelihood of surveillances having happened by chances, and so they are of course a
very important instrument that strengthen every quantitative research (number based).
2 THE STATISTICAL APPROACH
2.1 The Scientific Technique
The procedure of facts amassing methodically is a source of a concept for evidence
based practice, since mainly the practice knowledge it predicate up-on the trust that the
world and its population may be objectively sighted and forecast concerning obsessions
that may either be confirmed or refuted. Consequently, to have a sight concerning the
knowledge of how is generated and experimented is in general within sharing of the
world populace which known as a world paradigm or view.
2.1.1 Some Faiths Connected with Statistics Laid by Quantitative and Scientific
Model:
Lucid positivism is as well found on the 18 century theoretical sympathetic of Hume
(1888), that supported the facts can be obtained during people and things cautious
surveillance, their behaviors, environments, customs, and by studying physical
substance, for instance chemicals, substances, e.t.c., and find out how they act
(Kerlinger 1986). Of course these sound positivists were distrustful of obsessions that
couldn’t provide themselves to be heard or observed, like emotions and feelings (Burns
2000).
Logical positivism which supported that knowledge may be obtained during careful
observation of people, of things, their behaviors, customs, and environments and by
7
physically matter observation, for instance, chemicals and other substances. Thus they
were distrustful of things that might not provide themselves to be heard or observed, like
emotions or feelings. The physical world that is governed by laws may be standing for
universal laws, that if employed they may envisage results by a procedure of
hypothesizing, experimenting, corroborating or disproving and utilized for environment
management, for instance, the thermodynamics laws (Scott, 2006).
Nevertheless, experiments utilize things observation to decide something else
consequence in a managed conditions so as every factor that might influence the
research conclusion are managed or described for. However a control-group to which
naught has been done is required and then the surveillances are made of experimental
group-behaviour for something has been done. There after observations for both groups
are made and then contrasted and the consequence inferred, that is frequently known
as deductive approach.
The observations may then be analysed, organized coded and ranked, which means
decreased to their least units mathematically making it probable to forecast data results,
which may capable of then mathematically analyzed (Scott, 2006).
2.1.2 The Characteristics of Research Methods that Use Statistics
There a lot of different character in research methods though not every of them will be
seen in each study, since statistics-based studies will in general try to manage the
factors influence i.e. variables which are un-significant in a real study they might bias
the outcomes.
Accordingly, statistics-based studies tend to engage the empirical evidence collection
for a hypothesis refute. Generally, when we employ statistics, we attempt to create
outcomes that are universally, which means the sample outcomes are valid on the
whole individual’s population that is attention. Hence, the connection between the
population and the sample is of numerous statistical tests focus.
The Uses of Statistical Approach within Research:
 To describe variables and their relationships.
 To assist searching and discover the relationships nature amid variables.
8
 To assist searching and discover the disparities flanked by samples and
populations.
 To assist examine the probability role in giving rise to measurements.
 To assist explicate relationships flanked by data sets.
 To forecast the relationships causes amid phenomena.
 Controlling for (or accounting of) variables.
2.2 Research Method Basic parts
2.2.1 Question(s) Forming
In forming questions it is vital to identify the phenomenon or problem of interest, i.e.
deciding the study main shape, the inhabitants to be studied and questions to be looked
at. On our own contemplations through searching literature and sketching, we have to
choose which techniques are going to be employed in data gathering. Up to this phase
a pilot-study (small) can be performed that will let possible troubles to be clearly and
highlighted, if found any trouble there will require revisiting the addressed questions.
2.2.2 Literature Review
There a huge literature body about numerous healthcare areas thus very necessary for
each latest job to be in a setting of every prior or simultaneous job and should be
reviewed such that we might study from that work before continuing, because literature
review will presumably commencing the instant of an idea for a research as is imagined
and the study will carry on all through.
2.2.3 Conceptual and Theoretical Frameworks
Numerous investigation areas have different frameworks and ideas on which the new
knowledge evolution is based, for that reason there are dissimilar ways of sighting
problems thus it is imperative to be conscious for that meticulous study that we have to
work on. Probable that a sociologist or biologist might utilize dissimilar ideas and attach
varying importance levels to dissimilar data types, still both of these academic
regulations might employ statistics a great-deal view the world.
2.2.4 Hypotheses and Variables
9
A variable is a phenomenon i.e. thing, that varies. Within different studies there is a
hypothesis, a forecast or sequence of forecasts created from a theory that are under
test by evaluating pertinent variables and examine their narration and their inhabitants
where they come.
2.2.5 Research Design
Design gives guidelines how to perform the research as well as they direct the sampling
method, and how to collect and analyze the data. However the main intention of the
research-design is not only to reduce all the possible error sources but also should
struggle to guarantee that every under test hypotheses are really tested, this means, the
research-design ought to let the research aims to be met and must also considering the
ethical working implications.
2.2.6 Sample and Population
The population is make-up of all the objects or persons that might possibly take
measurements from whereby the sample stand for those objects or individuals i.e. given
the resources and constraints, that we measure from, thus, we normally map to get a
sample that envoy the population to be studied, which is subject to the research design
that has permitted by a suitable ethics-committee.
2.2.7 Data compilation/Collection and Data Analysis
Data are collected by means of the most appropriate technique or instrument, however,
data describing and summarizing, and statistical tests are the process of performing
data analysis, and there are numerous packages for computer based statistics
obtainable to assist.
2.2.8 Results and Conclusions
After the data have already analyzed, there is a need to make decision what the
consequences are proposing, whether or not any hypothesis under test has been
verified or refused, so that they have to be narrated to those from prior studies as well
as requires to be narrated to an existing theory body, also by considering whether we
have moral duty to converse the findings of the study.
3 MEASURING, SAMPLING AND ERROR
10
3.1 Population
A population is made-up of all persons or objects or phenomena that we might
potentially count or measure as fraction of the study, for example, when we are studying
the reasons why nurses in Tanzania left the profession early, our population will be
made-up of all the nurses in Tanzania who had left the profession early. Similarly, as
when studying the causes of why nurses in a certain hospital left the profession early,
our population will be those nurses who left early in that particular hospital. It is as well
imperative to consider that the population that we are intending to might not precisely be
the same as the one we end-up sampling from since some of the persons in the
population-of-interest might decline to be included in the study thus there can be a
difference between the target-population and the actual-population.
3.2 Sample
A sample is made-up of a proportion of objects or persons from the total population.
Although, in most case because of the resource scarcity, it is unlikely that we might be
able to gather information from all the population, so instead we must take a sample.
3.3 Cases
Every sample is made-up of the objects or individuals under study that may be referred
to by a diversity of names such as sampling units. Apiece individual or object we would
measure some variable/variables which may be physical (e.g. blood-pressure), they
may represent feelings or thoughts (e.g. anxiety) or represent events in the life of
individual (e.g. number of visits to a health clinic). Every object or individual we take a
measurement, it is recognized as a case, so it is very important to take into account that
all the statistical variables are measurable; otherwise the variable can’t deal with
statistics. For every case the measurement of each case may be referred to as a value
and so the collection of values is named as the data.
For each case we ought to state: the measurement, the variable being measured, the
sampling unit, the actual sample and the population being studied, for instance, when
we are going to studying hypertension in populations living near ironworks, 80 mm Hg
will be measurement, will be diastolic blood pressure will be the variable , and the
individuals who are in the ironworks vicinity will be the sampling unit and the sample
11
will be made up of those individuals from whom measurements were taken. However,
the real population is thus defined as all individuals willing and able to be measured
living within the ironwork vicinity.
3.4 Statistical and Real Population
The defined population includes the word willing and able that defines clearly we
cannot take measurements from those who doesn’t comply the consent to take part in
the study (i.e. from the example above, there is a slight disparity between population
defined and the one that originally we intended to job on.
There could be a dissimilarity from the population from whom the measurements were
sampled, frequently named the statistical population, and the real population or
biological population, that is, all those in the ironwork vicinity. Therefore the
researcher have to beware that not all persons in a population are obtainable to become
sampling units, as far as in some instances, what comprises the sampling unit and the
variables being measured is not yet clear.
Table 1: A range of different variables and their likely sampling units
Variable Sample Sampling unit
Occurrences of MRSA on a ward Wards which record occurrences
of MRSA that you collected data
from
Ward
Length of stay in Hospital Hospitals that record length of
stay that you collected data from
Hospital
Purse rate Individuals that you record pulse
rate from
Individual
Gender Individuals whose gender you
recorded
Individual
Hospital position in league table Hospital uses to formulate league
table
Hospital
Number of live birth Units of area from which records
of live birth have been collected
Unit of area
If looking at the example in table 3.1, we find the range of variables that we can use is
fairly wide, also we notice that some variables can be expressed are noticed as
numbers on a scale, at the same time as others such as gender which are discrete. This
difference is imperative thus before identifying a suitable statistical test we have to
identify the variable type and the scale measurement type.
3.5 Measurement Scales
12
In statistics there are recognized four types of measurement scales for variables, which
are: (i) nominal scale (ii) ordinal scale (iii) interval scale and (iv) ratio scale.
3.5.1 The nominal Scale
On the scale hierarchy the nominal scale is the lowest. Statisticians It has referred to a
scales hierarchy by statisticians for the reason that every scale further up the order
having the features of those of before. In nominal scale the data take place in named
groups and are classified into the groups as well that are mutually exclusive; an item of
data can’t be in more than one group. A good example of a nominal variable would be
ethnic group or home town. Some nominal scaled variables have only two categories
which are known dichotomous variables and some have more.
Example: Gender with Male (0) and Female (1)
3.5.2 Ordinal Measurement Scale
Data of ordinal-scale are similar to nominal-scale but the names of the groups contain
an idea of position or rank, for example the grades of staff nurses within a health service
by assessing the names of the groups, like staff-nurse, charge-nurse and matron
conveying an order or position. The position, however, do not suggest how much higher
or lower the positions are. The scale is utilized to arrange the measures from lowest to
highest, but we would not say that a charge-nurse is two times as high as a staff-nurse.
Ordinal-scales can be expressed in terms of names, as numbers, i.e. 1st and 2nd in a
race and as letters as in the UK system for grading nurses, e.g. A-I.
Remember, a study may use more than one measurement scale. We are supposed to
practice identifying appropriate scales because when we look at every statistical test, it
is imperative to consider that it is the scale that the measurement is made on which
largely decides the type of statistical test type to be applied.
Following are the examples of ordinal scaled variables:
Variable: Frequency
Never (0) Rarely (1) Sometimes (2) Frequently (4) Always (5)
Variable: Satisfaction
Dissatisfied (1) Satisfied (2) Very Satisfied (3)
Variable: Performance Assessment Attainment
Goal not attained (1) Goal Attained (2) Goal Exceeded (3)
3.5.3 Ratio or Interval Measurement Scales
13
For the Ratio or interval scales positions are used except in these the distance between
the positions is determined and the distance between the scales is determined, and the
points between the positions could be sub-divided. Ratio-scales have a true zero point
although are alike to interval-scales. For instance, we recorded temperature we would
know that the difference between 37°C and 38°C is 1°C, as a result the scale is interval,
but the centigrade scale doesn’t have a true zero point, therefore it is impossible to tell
that 20°C is two times as hot as 10°C. A ratio-scale example is weight; it does have a
true zero, thus many biological phenomena measures are performed on either the
interval or the ratio scale.
3.6 Errors
When conducting a study, error might happen in many different ways and so statistics
can be utilized for accounting or assessing some of this error but not all. Start looking at
three kinds of errors that we need to think on while conducting our research or thinking
critically about studies of other researchers. All studies associated with errors, thus the
more error that is present the less the researcher can rely on the study.
3.6.1 Measurement Error
Measurement error happens when we are measuring things since most of the
instruments we require to utilize measuring things with are not hundred per cent
accurate, although we are considering something as easy as recording the number of
patients/clients who turn-up at a GP surgery, or taking a measurement using a ruler,
errors of measurement will happen. Therefore, this realize that, to some degree error
associated with the use of apparatus and instruments to measure phenomena is easier
to consider than error associated with methods such as observation used to record
human behaviors.
Measurement errors happen in most studies even when the variable is simply a count of
something thus often counters make errors. These measurement errors might be
random or systematic, whereby a random error happens by chance and there is an
equivalent chance that it will be either lower or higher than the true value, however, they
are not significant if they are adequately small than the overall variation in the data. On
the other hand, systematic errors are those that are consistently either higher or smaller
14
than the true value thus possible factors that lead to systematic errors have to be
considered at the designing stage.
To deal with measurement errors we need to be aware of choose a measurement tool
that gives you the greatest level of accuracy, although you will need to consider the
resources available to you. For example, high accuracy may involve spending more
money or time. If we are using surveyors or interviewers we have to make sure that they
have to be trained of how to use the data collection tools, (i.e. the questionnaire).
3.6.2 Consistency
We are as well required to be conscious that some errors will happen due to
measurement tools un-consistent, that is, for a person item under measurement an
instrument, like a peak flow meter, will not all the time consistently make the same
measurement therefore, instruments are required to be calibrated before they are
utilized. It is recommended to measure how consistent the instruments are between
measurements, and between the persons using the instruments. Then we can be
conscious of the error when discussing our outcomes.
3.6.3 Design Error
If the design of the experiment is flawed the design error is likely to occur, also there are
of course varying degrees to which the study can be flawed and certainly most studies
are flawed in some way. If the design error is large, however, the study will need to be
abandoned therefore, it is so significant to get the right design. It is imperative to be
conscious of some of the errors you are likely to come across by avoiding most
common error which is that, the sample is not drawn truly randomly from the population
and thus the conclusions drawn are not appropriate for the population as a whole.
3.6.4 Sampling Error
Many statistical techniques try to take account of sampling error which is the difference
between the sample and the population. If we take a sample it is unlikely that the
measure taken from the person of the sample will precisely match that of the population.
When we look to see how representative a sample is of the population or whether or not
two samples come from the same population we need to take account of sampling error.
15
Sampling error will increase the more variation within the sample you are measuring,
and will reduce as the sample size increases.
The relationship between sample size, sample variation and sample error is imperative
since one question we are bound to ask is how large should my sample be? The
answer is, it depends on the variation within the sample. If we were working with the
variable height, using a population of basketball players, the variation in height of this
population would be very small, and thus the sample size that we would need would be
small. If, however, we are working on variable height across the global population the
variation would be large and a larger sample would be needed.
4 QUESTIONNAIRES
Questionnaires are an approach that is recognized and adopted by numerous studies
that are related to health. However, the subject of questionnaires and research design is
enormous thus can’t be addressed in detail in some minutes. Questionnaires could be
utilized to give additional information from people about a particular subject matter. The
questionnaire describes the uses of closed questions with the range of answers
determined by the researcher that lends itself to statistical analysis. As once a
questionnaire is constructed, it is as a rule referred to as a tool or an instrument.
4.1 Questionnaire Design
The term questionnaire usually means a form containing a set of predetermined
questions used for gathering information (data) from and about people as part of a
survey (survey is used to describe a research approach that attempts to cover as wide a
range of the population as possible for acquiring information about a subject).
Actually, the use of questionnaires requires a great deal of time and effort in terms of
careful planning, ordering and sequencing of the questions and the responses in order
to obtain relevant data and needs to address the questions of design early in the study.
The main questions to ask when thinking about designing a questionnaire are:
What is it that I wish to find out, otherwise, is it to do with knowledge or attitudes or
levels of understanding, or is it about behaviour or activities or decision making and for
what is it that I wish to find out?
16
In undertaking a questionnaire based study, questionnaires may be utilized in different
kinds of research, e.g. descriptive, attitudinal and comparative and there are a lot of
dissimilar kinds of questionnaire and will depend on the research question being
inquired. Questionnaires may be utilized to produce both qualitative and quantitative
data as well as a means to describing a population, to investigate cause and effect, and
to monitor change over time. Frequently, questionnaires attempt to describe a
population’s behaviour, its attitude, and view with regard to a certain topic, sympathetic
of an issue or level of sympathetic. A survey frequently uses questionnaires in order to
get information, and sometimes an interviewer may also be used, as far subjects,
respondents or participants are often terms of people taking part in the survey.
On these phenomena questionnaires help to put numerical indicators when is used as a
quantitative tool because they are a good method of data collection, although such data
might tend to be superficial, as there is no room for extracting or probing the responses
meaning. However, questionnaires are a cost efficient way of collecting large data
quantities in a short period of time, and if the questionnaire is properly structured large
quantities of data can collected and subjected to statistical analysis, and the methods
that questionnaires can be administered comprises not merely face to face interview but
as well World Wide Web, telephone, mail and e-mail.
4.2 Sample Questionnaire
I will describe a questionnaire that describes the behaviour of a peoples’ group. Let us
say, for instance, I wanted to be acquainted with more concerning the epidemiology of
HIV and AIDS in a certain village. It may be useful to know more on the general sexual-
health of the population in that village. Safe-sex is much talked about in the media, and
only way of keeping-safe is to utilize the contraception barrier methods and the
government is spending a large amount of money on health promotion programmes to
educate the general population about safe-sex.
**Now, how many are there in the village sexually active people? Do all sexually active
people are using a barrier method of contraception in practicing safe-sex such as
condoms in the Village?
However, surveys have to be used to give an indication of how many and a
questionnaire has to then be planed to distribute to sexually active adults so as to
17
discover what brands of sexual guard (i.e. femidom, condom or dental dam) are utilized.
This type of survey will try to discover when those individuals utilized protection, when
they were most possible and least possible to utilize protection, and how much cash
was spent on sexual protection per week/month. There might be helpful information in a
health promotion context:
 Whether running a safe-sex campaign within a family planning Village Clinic.
 Or in a genito-urinary Village Clinic.
 When recognizing specific health promotion needs used in recognizing trends in
the behaviour of client/patient.
 When considering issues on social policy relating to such health issues as
national trends in the development of HIV-AIDS.
At first, I must be targeted the population that is to be sampled, for instance, permission
from those adults over the eighteen years of age. This means I am not going to get any
information concerning those under eighteen, as they are not part of the targeted
population.
The questionnaire example
This survey is attempting to find out about the use of barrier methods of contraception
as protection. Your answers will be treated in confidence and will help us plan health
care services. Please indicate your responses by placing a cross in the box next to the
answer you think best represents your answer:
(1) Strongly agree (2) agree (3) not sure (4) disagree (5) strongly disagree
Q1. I use the following types of barrier protection?
Always Never
1 2 3 4 5
None
Condom
Femidom
Dental dam
Cap (dutch cap)
Q2. When are you most likely to use sexual protection?
Agree Disagree
18
1 2 3 4 5
I never use barrier methods of sexual protection
I sometimes use protection when I remember
I use a condom every time I have anal penetrative sex
To avoid pregnancy
Allergic to latex
To avoid getting a sexually transmitted disease
Q3. Please indicate your circumstances: which of the following categories applies to
you? Tick those that do.
Single
Married or cohabiting
Living with spouse
Living with spouse
I have many sexual partners
I have one sexual partner
 Thank you for completing this questionnaire.
 Please return it in the enclosed pre-paid envelope.
 Your responses will be treated in confidence.
4.3 Scales from the Sample Questionnaire
When asking question to a respondents we rather frequently provide them a range of
choices so that they can indicate where their answer lies according to a scale. The most
commonly scale utilized is the Likert-scale, which determines the level to which a
person agrees or disagrees with the inquiry and this scale is 1 to 5 of which frequently
this scale might be (1) strongly disagree, (2) disagree, (3) not sure, (4) agree and (5)
strongly agree. One common problem with using a scale like this which is based on an
odd (5) number of choices is that, it all the time gives the choice to opt the middle point,
thus often it’s easier for the subjects to take this easy choice than to struggle to make a
decision.
Another type of scale is the Thurstone scale that uses only two points: agree and
disagree, it is customary to inquire numerous related questions that could be utilized to
19
produce an overall score for an individual respondent that will be compared with that for
the sample as a whole, or utilized so that differing populations are compared. A common
example of the use of Thurstone scales is in psychometric tests that test aptitude and
attitude, like a test as part of a job interview. However an alternative approach is to
utilize semantic differential scales, which are good for investigating phenomena such as
attitude and values, as far as are based on opposite points of view, or potential
emotions concerning a subject or concept.
Contrary to that, for example, when investigating people’s job environment, we ask
questions concerning how they felt about aspects of the environment, and the response
scale may range from helpful, nurturing, happy to unsupportive, blocking, toxic,
dysfunctional.
The concept (in table 4.1) below is day surgery; patients are asked to indicate where on
the scale they lie. You would maybe want to ask more questions to get a good overview
of an individual’s impressions of day surgery. Note that there is no consistent negative
end of the scale. This helps to persuade the respondents to think about their answers.
Table 2: A patient’s responses when asked to consider day surgery (questions
asked pre-operatively)
Exciting 1 2 3 4 5 6 Boring
Frightenin
g
1 2 3 4 5 6 Calming
Useful 1 2 3 4 5 6 Useless
Fast 1 2 3 4 5 6 Slow
5 THE STUDIES
The Studies provide the background and data from two hypothetical studies, one from a
clinical trial and the other from a questionnaire that enables us to complete a basic
analysis of data sets and be able to draw some conclusions and be in a better position
to analyze critically other studies that will have been devised to be relevant to modern
20
health care as well as to stimulate our interest and enjoyment of data analysis (the
following is the questionnaire example).
5.1 The questionnaire Study
The following questionnaire is an example, concerns the sexual health of individuals
who presented for advice at a walk-in clinic in central Kigoma. Here we show the
extended version, although for the sake of simplicity we have reduced the extent of
some of the questions. The aim of this study is to provide basic information as to the
sexual behaviour of individuals of differing sex, age and ethnic group who used the
walk-in clinic, and was initiated because the clinical leader considered that a large
proportion of the clients were presenting with symptoms that related to sexual heath and
he was considering putting in a bid for funds to support the employment of a specialist
in this area, consequently, it is largely descriptive and exploratory in nature. However,
the population of the study is all those individuals who could potentially use the walk-in
centre and so the sample is made up of those that actually did enter the clinic and
complete the questionnaire.
5.2 The Questionnaire
This survey is attempting to find out about the use of the walk-in clinic in relation to
sexual health. Your answers will be treated in confidence and will help us plan health
care services. Please indicate your responses by placing a cross in the box next to the
answer you think best represents your answer.
1. What made you attend the walk-in centre today?
Agree Disagree
i. Location
ii. Availability
iii. Access to medical staff
iv. Access to nursing staff
v. Emergency treatment required
21
2. What symptoms are you experiencing? Please tick the one you feel best describes
your symptoms
i. Headache
ii. Chills/shakes
iii. Temperature
iv. Lack of appetite
v. Feeling generally unwell all over
vi. Cough
vii. Pain/soreness in chest
viii. Faintness
ix. Collapse
x. Pain or difficulty passing urine
xi. Discharge from penis
xii. Discharge from vagina
3. Which of the following descriptions best represents your sex life?
(Please tick.)
i. Very active (I have sex more than five times per week)
ii. Active (I have sex between once and five times per week)
iii. Not very active (I have sex less than once per week)
iv. Non-existent
4. Please indicate how many sexual partners you have shared sex with in the past
month.
5. My most frequent choice of barrier protection is: (Please tick one)
i. None
ii. Condom
iii. Femidom
iv. Dental dam
v. Cap (Dutch cap)
6. When are you most likely to use sexual protection?
Agree / Disagree
i. I never use barrier methods of sexual protection
22
ii. I sometimes use protection when I remember
iii. I use protection (put a condom on) if I do not know my sexual partner very well
iv. I use protection (put a condom on) when I think I am going to climax
v. I use a condom every time I have oral sex
vi. I use a condom every time I have vaginal penetrative sex
vii. I use a condom every time I have anal penetrative sex
viii. To avoid pregnancy
ix. To avoid getting a sexually transmitted disease
7. What puts you off using sexual protection?
Agree / Disagree
i. Not very comfortable
ii. Loss of sensation/cannot feel anything
iii. Too fiddly to have to open packets
iv. Cost
v. By the time the packet is open the moment has gone
vi. Need to use a lubricant as well
vii. Have to plan sex in advance
viii. Allergic to latex
8. How much money do you spend, on average, on sexual protection per week? Please
indicate which category applies to you, using a tick:
i. £0
ii. £0–£5
iii. £5–£10
iv. £10–15
v. I get them free
9. My age group is: Please indicate which category applies to you using a tick.
16–24 25–34 35–44 45–54 55–64 65–74 75–100
11. I consider my sexual orientation to be: (Please tick one)
i. Heterosexual male
ii. Heterosexual female
23
iii. Gay (homosexual) male
iv. Gay (lesbian) femal
v. Bisexual male
vi. Bisexual female
vii. Asexual
12. I consider my ethnic background to be (Please tick):
i. African
ii. Asian
iii. European
**The questionnaire has been simplified for the purposes of my assignment.
During data entering into charts/computers it is frequently easier to employ a line for
every subject and enter values across the rows for the variables measured, however, all
through codes shall be utilized to make simpler the data entry.
In Table 5.1 the following codes have utilized: M for gender male and F for female; A
code from 1 (very active) to 4 (non-existent) for sexual activity; A codes 1–5 for barrier
choice for the participant selected most used protection and 1 is the first response
possible (none) and 5 the last (Cap).
For question 7 we have used a similar system to that used for barrier choice where (i) is
the first option and (vii) the last. For the presentation type we have also used the same
system as for questions 5 and 7, with 1 equivalent to headache and 12 to discharge
from vagina.
5.3 The Clinical Trial
This experimental trial looks at the ability of the novel drug Symphadiol (as Placebo) to
help increase weight loss in individuals who are trying to lose weight using a calorie-
controlled diet. This clinical trial is being organized by Spinto, a drug company, which is
active in the field that has recruited individuals to take part in their study using a network
of dieters’ groups, who were invited to take part by Spinto’s clinical trials specialist nurse
to conduct the study, with the support of local GPs and Spinto’s dietician. The aim of the
experiment is to test the hypothesis that a daily dose of Symphadiol enhances weight
24
loss, in clinically obese individuals, compared with just using a calorie-controlled diet. It
was decided to select men between the ages of twenty two and forty.
The aim of the experiment is to test the hypothesis that a daily dose of Symphadiol
enhances weight loss, in clinically obese individuals, compared with just using a calorie-
controlled diet. It was decided to select men between the ages of twenty two and forty
for the study. It was also decided to look at the impact of exercise in conjunction with
Symphadiol. The population in this study is all healthy obese male individuals who are
sufficiently motivated to lose weight to join a diet network and they must not be taking
any medication, except that required for minor ailments.
Table 3: The example of data from questionnaire
Questions
13 12 3 4 5 7 2 9
Individu
al
Se
x
Ethnic
group
Sexual
activity
Partner
s
Barrier
s
choice
Reason
not
Presentatio
n Age
1 F E 3 1 1 1 1 22
2 F E 2 2 2 5 7 28
3 F E 1 8 2 7 4 21
4 F AS 2 1 2 1 9 26
5 F AS 4 0 0 5 12 27
6 F AS 2 1 2 7 6 30
7 F AS 4 0 0 5 5 29
8 F E 2 2 1 5 12 23
9 F E 4 0 0 3 10 21
10 M E 2 1 3 2 9 32
11 M E 1 1 5 2 3 18
12 M A 1 1 1 7 10 24
13 M A 1 3 5 6 12 23
13 M AS 2 0 2 1 7 23
14 M AS 1 4 0 3 12 20
15 M A 3 0 0 5 1 36
25
16 M E 4 4 2 5 2 20
17 F E 4 1 3 1 5 23
18 M E 1 0 2 3 3 24
19 M E 3 2 1 3 3 20
20 F A 2 0 2 1 12 36
6 DESCRIPTIVE STATISTICS
Descriptive statistics are utilized in healthcare for numerous aspects, mostly to describe
and summarize detailed data collected in a manner that can be interpreted quickly and
easily in a study, and actually, they are most recognized kind of statistic that are
probable encountered. They are utilized to administer, to watch and to evaluate services
for health and the persons who job in a health facility/ organization. Consequently, to
understand descriptive statistics is very significant for individuals who job in the
healthcare surroundings.
Usually descriptive statistics have two kinds:
(1) Central tendency measures (typical value), the mean, the mode and the median,
(2) Variability measures concerning the typical value (dispersion measures), which are
range, inter-quartile range, variance and standard deviation.
6.1 Measures of Central Tendency
A central tendency measure is a solitary value that tries to explain the data set by
recognizing the position of the centre within that data set, and is sometimes known as
central location measures as well they are classed as summary-statistics. The mean
which frequently known the average, that is most probable the central tendency
measure that we are most recognizable with, also there are median and the mode
6.1.1 Mean (Arithmetic)
Mathematics mean are usually known as the average. In calculating the average time
spent in a sports ground in a month, a scientist has to take the times spent during every
visit, adding them together and dividing by the number of visits. Table 1 below, for
example, shows the time that Amani spent in the sports ground throughout a month of
January. Hence the mean (average) time length used-up in the sports ground is 48.16
min. Yet, 48.16 min score falls roughly in the centre, of the given data, with six visit
26
times greater and with six visit times less than 48.16. But this case with the mean
doesn’t always appear.
Calculations:
Table 4: Time spent by an individual at the gym during July
Visit Time spent
(minutes)
1 62
2 34
3 50
4 40
5 58
6 48
7 38
8 60
9 58
10 45
11 53
12 32
To calculate the mean:
Mean = Minutes spent in each visit
Total number of visits
= 32 +34 +38+40 +45 +48 +50 +53 +58 +58 +60 +62
12
= 578
Mean = 48.16
Since nearly all statistics occupies computation that are performed on a basic-calculator
that have an ability to let a scientist to compute the basic-statistical quicker, and if
sometimes data sets turn out to be large use a package of computer. In the world of
statistics as being a branch of mathematics, statistical tests are frequently represented
as a formula and thus are much lies within the uses of symbols thereby recognizing
them is akin to study a new language that more often than not let persons persuade that
they can’t comprehend statistics. Subsequently here is an introduction of the few:
x A person case akin to single visit to the sports ground.
n The cases number (x) so twelve visits to the sports ground makes that n = 12.
The summation of’ e.g. all the xs added-up, from the case above, the sum of x. = 578
_
x. The symbol representing the mean.
27
Different from using words, one more method to describe the mean is by calculations
using the following formula:
**This formula is used while want to calculate the mean for a sample obtained from a
population.
It is advised to write-out in words every symbol in case of any difficulty
following a statistical formula. Therefore for the mean we have the following
formula:
Mean =
The sum of the cases
The number of cases
One disadvantage of the mean can mislead if some of the values are un-
usually large or small. If, for example, Amani one day had been in the sports
ground for only 5 min, then if adding this value into the data it might have an
effect on the average score, creating the 44.85 min as an average time (see
below).
5, 32, 34, 38, 40, 45, 48, 50, 53, 58, 58, 60, 62, 44.85
Really, this un-usual score of 5minutes will make the mean much lower;
hence, it isn’t a true central point reflection and so called Outliers as far as
they are excessive values.
Symbols used while discussing on populations:
N The size of population.
µ The mean of population.
s The population standard deviation.
6.1.2 Median
The median assist in solving problems of outliers since, quite than utilizing
every value in calculating the central tendency statistics, instead it utilizes
just the value that are found in the center of the data which is the physical
centre. It is necessary to have the data to be put in an ascending order when
desire to calculate the median. Taking two examples below of exercise time
28
and comprise the low value; it reveals that it has a scanty impact on the
median even when it just happens at one end.
5, 32, 34, 38, 40, 45, 48, 50, 53, 58, 58, 60, 62,
Median = 48
The great value just goes the median by 1, and the middle falls in the middle of two
values, thus necessary to add the two middle numbers and divide by 2, for example:
32, 34, 38, 40, 45, 48, 50, 53, 58, 58, 60, 62,
Median = 448 + 50
2
=49
. Median = 49
6.1.3 Mode
The value that occurring most frequently is the mode, thus from the data set below,
finding it is 58.
32, 34, 38, 40, 45, 48, 50, 53, 58, 58, 60, 62,
Mode demonstrates the common value that appears in a data set, and its one
advantage, it may be utilized on nominal and continuous data and sometimes it is
merely the choice for measuring central tendency, while the mean and median may only
be utilized on continuous data. For instance, if one question of a lifestyle survey asking
is where students might primarily go for family-planning recommendation:
 GP 5
 Practice nurse 4
 Family planning clinic 6
 Friends 8
 Chemist 2
 Nowhere 2
29
Consequently, category of friends is the category that is most commonly occurring, by
having 8 scores and therefore in the data list above, the mean and median might not be
of importance.
6.2 Choosing a Measure of Central Tendency
Mean is mostly common to use since it is the most sensitive as far it considers each
value of every case within the distribution, at the same time it is based mathematically
subsequently it may be utilized in the statistical computations later, but it may only be
utilized on measurement level for ratio or interval data and is simply deformed by
outliers.
Table 5 : Choosing a Measure of Central Tendency
Choosing a Measure of Central Tendency
Measure When to Use When Not to Use
Mean Interval or ratio data Categorical data
Ordinal data
For most data sets, where the
cases or less symmetrically
distributed about the mean
When there are outliers or the
data are heavily skewed
Where the measure is going to
be used in further calculations
Interval, ratio or ordinal data
Median Data heavily skewed, mean
distorted by outliers
When the measure will be used
in further calculations
Categorical
Mode Ordinal data
There is no wrong or right central tendency measurement in many situations and
persons like to choose the mathematical mean for the reason that they are recognizable
with it, but it is best to keep in mind that the descriptive statistics major point is the
communicate/organize information, thus a statistician ought to opt the gauge that takes
the information in the optimum potential way and doesn’t misinform the spectators.
6.3 Measures of Dispersion
Thus far, after having already dealt with the central values, then we have to look at
statistics that notify the different scores within a sample; however, the term provided to
measures that informing about variability level with the data is dispersion, thus there
would be slight statistics requirement, if there might be no variation in populations.
30
For example, the two groups of students were asked, one part time and one group full
time, at what age at the time they had their first experience of sexual intercourse with
another person.
--- This subscription tells us that
the mean is that of group AGroup A 14, 15, 18, 22, 23 = X 18.4
A
---
Group B 17, 18, 19, 19, 20, = X 18.6
B
Whilst the means are the same it is obvious that there is a difference in the variability of
the values of the cases with the groups. In terms of using the statistics to develop health
care practices, knowing the variability of values is as important as knowing the mean.
After all, if shoe manufacturers only made shoes for people with average-size feet they
would soon go out of business. We want to know about the variability in patients’ health
that we will encounter and in their behaviors.
6.3.1 Range
The range is one of the dispersion measures. Calculating this you subtract the lowest
score as of the highest, so the group A range is: 23 - 14 = 9
The group B range is: 20 - 17 = 3
This means for group A scores are spread over 9 units, other-than for group B they
cover only 3. This expresses that group A has a greater range of values in its cases.
The ranges are a easy and quick way of approximating the level of variation within the
sample, but must be careful of outliers.
6.3.2 Inter-quartile Range
The inter-quartile range (IQR) is an alternative that have to be used to deal with outliers.
A quartile proposed as a name, and is obtained by quartering the set of data, after
placing that data set in an ascending order and divide into 4 quarters and the numbers
at the limits of these quarters are recognized as quartiles. However, this inter-quartile
range is in-fact the range for the middle 50% of the data and can be computed by
situating the first quartile upper value which is the first 25% of the cases and subtract it
31
from the third quartile upper value. For example, a group of teenagers asked for how
many year they have thought it was safe to take the pills of contraceptive (Q quartile).
2, 2, 4, 5, 8, 10, 10, 10, 12, 15, 15, 30
Q1 Q2 Q3
From the above data set the range is twenty eighty but it is affected by the case with the
value of thirty since there are twelve cases in the data set, and so there will be three
cases in every quartile. Now, at the first quartile end, i.e. the first 25% of the cases, the
value is four and at the third quartile end, i.e. first 75% of the cases, the value is twelve.
Therefore the difference in these two values (eight), it is the inter-quartile range (IQR).
6.4 The Standard Deviation and Variance
The variance and the standard deviation are the two most common dispersion
measures that the amount of deviation from the mean is indicated, whereby the
standard deviation is most quoted in describing data is, at the same time as the
variance is utilized in numerous tests for statistics.
Consequently, the variation is measure by standard deviation in the data that evaluates
how much every case deviates from the mean, for example, if the mean is six, and a
person case is eight, therefore, the deviation is two.
At the same time as it is sufficiently and takes care of the statistic deviation part, it is
actually the standard tad that is significant, hence a statistician has to take and observe
all the deviations, and relating them to the mean size to obtain standard deviation. This
is imperative because a deviation of 2 is of much less significance if the mean is 110
than if the mean is 8.
Table 6: Age at first sexual experience (with another person)
Case 1 2 3 4 5 6 7 8 9 10
Age at first
sexual
experience
14 15 17 18 18 19 19 20 22 23
32
The standard deviation formula given below, it actually engages quite straightforward
mathematics and looks complex, but using the data in Table 2 above it take through the
formula step by step.
s is the symbol for the standard deviation and
s square is the symbol for the variance.
Hence the square-root of variance is simply the standard deviation.
The calculation steps are outlined below:
Step 1: Calculate the mean.
Step 2: Subtract the mean from each value
14 - 18.5 =-4.5
15 - 18.5 =-3.5
17 - 18.5 =-1.5
18 - 18.5 =-0.5
18 - 18.5 =-0.5
19 - 18.5 = 0.5
19 - 18.5 = 0.5
20 - 18.5 = 1.5
22 - 18.5 = 3.5
23 - 18.5 = 4.5
2
Step 3: Square each answer obtained in step 2.
Step 4: Add up all the answers to step 3. This value is called the sum of squares.
Step 5: Minus 1 from the size of your sample (n - 1).
Step 6: Divide the value found in step 4 by the value calculated in step 5: This is called
the variance.
Step 7: Find the square root of the value obtained in step 6 to determine the value of
one standard deviation: 7.83 = 2.80
Therefore our sample standard deviation is 2.80.
When wanting to do a statistical test, have to be careful of sets of data, where the
square of the standard deviation that is the variance, exceeds much the mean i.e. 2
33
x –
_
x.
times, if found the variance is smaller in relation to the mean or equals the mean. This
shows signs that the set of data will have a more complex-form and requires to be
handled in a conscientious way. However, it is significant when quoting a mean all the
time to provide a dispersion measure since a mean measure without a dispersion
measure is not easy to interpret and size of the sample is also supposed to be included.
Table 7: Central tendency and measure of Dispersion
Central tendency and measure of Dispersion
Type of data Measure of central tendency Measure of Dispersion
For most data sets, where the
cases are more or less
symmetrically distributed about
the mean
Mean; there should be no need
to quote any other measure
because all measures of central
tendency for this type of data will
be similar
Standard deviation and also
consider giving the range
Interval ratio or ordinal data, data
heavily skewed, mean distorted
by outliers
Median though you might also
quote the mean
Range and quartiles
Nominal Mode No measure
7 DISPLAYING DATA
The two most common forms of data display are graphs and tables. Both
have the same aim, to summarize and present the data in a manner that is
easy to understand and take in. Displaying your data is an essential part of
analyzing the data. It allows graphs you to establish how the data are
distributed, to see unusual cases and generally get a ‘feel’ for the data.
Table 8: Acklin and Bernat (1987) data for the two indices of depression measures
in patients with a range of conditions
Patient type
Index LBP Depressives Personality discover Non-patients
Egocentricity 0.31 0.32 0.42 0.39
Sum morbid content 0.82 3.47 0.99 0.70
Tables present information in a text-based form. As such, much of the detail in the data
can be retained. Unfortunately taking in lots of different numbers and seeing emerging
patterns is rather difficult, and this is where graphs come in. When we present data in
graphical form some of the detail tends to be lost but it becomes much easier to see the
34
emerging patterns. In the example below we show some data from Acklin and Bernat
(1987) in graphical (Figure 7.1) and in Table (7.1) form. Acklin and Bernat’s study
examined the relationship between chronic low back pain and depression. In the graph
and the table we have taken two of the indices of depression that they used and plotted
them against patient type.
Egocentricity
Sum morbid
content
Figure 1: Indices of depression recorded for a range of different patient types
(Acklin and Bernat 1987)
Which display form makes it easier to see the trend? Which allows you to see most
detail? As a general rule the more data put into a table the more it will become harder to
read, and less likely to be read. Tables should be used when the data set is very simple
or when data set are needed to be shown in great detail. The data we used for these
graphs are already summaries of the data collected. This means that the figures are
averages.
7.1 Table Types
There are several different types of table that can be used. Your choice of table will
depend on the type and number of variables that you have. In the example above (Table
7.1) there are two types of variable. Along the top of the table we have the nominal
category ‘patient type’ whilst down the side of the table we have two interval scale
variables, that is, the indices of depression.
Table 9: Frequency of different ethnic groups of a sample of 178 Individuals
interviewed in north-east London
Ethnic group Frequency in sample
White European (EU) 75
African 35
35
Indian 32
Afro-Caribbean 26
Other 10
Other tables may have just one variable which runs either along the top or down the
side of the table, there measurement could be the frequency or occurrence of that
particular variable. In such a case the table becomes a frequency table and it is normal
to have the most commonly occurring frequency at the top (see Table. 7.2). Sometimes
tables may include summary statistics (as does Table 7.1). In Table 7.3 the bottom row
is a summary of the data within the table. Tables that report on the frequency values of
two nominal variables simultaneously, and that include totals, are often used to help
look for associations between variables. These tables are known as contingency tables.
Table 10: Summary examination results for a group of 122 first-year students’
nurses
Exam
paper
%
(average)
1 53
2 46
3 58
4 43
Average 50
7.2 Types of Graph
From the guiding principle depicts less is more when using graphs, since in the data
graphing if too much information is comprised there will distort the meaning. There are a
variety of graph types. The main types you will see are frequency charts, histograms,
bar charts, pie charts, scatter graphs and line graphs. The type of chart chosen to utilize
will depend on the type and complexity of the data. Many graphs are plotted against two
axes, the horizontal axis and the vertical axis, however the point to consider is that if
your graph is drawn incorrectly it may mislead another audience.
7.2.1 Frequency Charts
Data are divided into categories or/and counts of how frequently a certain value occurs
and these occurring counts are referred as frequencies. One of the simplest type of data
is nominal data, where things are categorized, then count the number of things in each
36
category. Few examples of such categories are different types of diseases or the
number of individuals belonging to each ethnic group or the number of males and
females.
If these data are measured initially on an interval or ratio scale, the most appropriate
form of display is a histogram. Plotting data using a frequency histogram permits us
to obtain a thought of how the data are distributed and as well as to have a feel for the
data in a frequency histogram, the x axis covers the values range of the cases,
however, each distance covered by a bar on the x axis stands for a range of potential
recordable values for the measure so it necessitate to decide the size of the range
categories that you will utilize.
7.2.2 Pie Charts
The pie charts are an alternative form of frequency chart best used with ordinal or
nominal scale data are pie charts, since they displays the count of things in each
nominal group category as a proportion or frequency of the total number of counts. The
total set of data is represented as a circle that is divided into segments the size of which
reflects the frequency of every nominal group. For example,
7.2.3 Bar Charts for Summary Statistical Information
The bar charts may as well be utilized for summary statistical information display such
as means and standard deviations. In Figure 7.1 the means of two depression indices
are plotted on a bar chart. This could also be plotted on the graph an indication of the
variation in the data; this is frequently done in the form of error bars which are small
vertical lines with horizontal bars at the top and bottom that mark the range of the mean
plus or minus one standard deviation or standard error.
7.2.4 Scatter Graph (Scatter Plot)
Scatter graph is utilized when there is relationship between two variables that is the
value of one of the variables related with the value of another, for instance, height and
weight are quite often closely linked. Yet, scatter graphs may be utilized with the data of
interval, ratio or ordinal that have been collected in pairs, for example, measured of both
the weight and the height of each of participants.
37
The x axis carries the scale for one of the variables and the y axis the other then for
each sample unit the points are plotted on the graph at the place where the values of x
and y for a particular sampling unit meet. The graph will show the points scattered over
the surface of the graph if you have enough data.
7.2.5 Line Graphs
Similarly a scatter graph, a line graph is except that the points in this graph are plotted
in sequence as the values increase along the x axis and a line is drawn between every
point and the next. The line graphs are perfect for showing sequences, for example
plots of patient study over-time, or the infants growth over-time. In general, line graphs
have to be utilized simply when there is a best reason to presume that the line drawn
between the points does actually represent what in all probability will occur.
Consequently, they really ought not to be utilized for grouped data, for instance, monthly
means or counts, although in practice they frequently done. Really, utilized in this
manner, line graphs are rather a good method of permitting comparisons in trends
across different the data groups.
Table 11: Types of graphs Summary
Type When to use
Histogram For showing frequency distribution of data measured on the interval or ratio scales
Bar chart Use for displaying frequencies of nominal or ordinal data, also for comparing measures
of central tendency between groups of data measured on ordinal, interval or ratio scales
Pie chart Used largely for showing frequency distribution of nominal data. Try to avoid using pie
chart to compare different groups of data.
Scatter graph Use with interval, ratio and ordinal data when if you want to see if two variables are
linked. Two or more variables must be measured from each sampling unit.
Line graph Used for data measured on interval, ratio or ordinal scales, particularly when you want to
display a trend or change over time. Particularly useful for display trends in several
groups of data at once. Avoid joining points if there is no reason to do so.
8 HYPOTHESIS
38
The hypothesis is a proposed explanation for an observation that leads to a
prediction(s) that through our investigation and the use of statistics we will either confirm
or reject and in so doing test the validity of the hypothesis, otherwise simply a method of
synthesizing an idea or an explanation as they are central to most studies that involve
the collection of quantitative data and statistics and are usually built from a previous
observation or experience that will lead to a prediction that there is a relationship or link
between two or more variables for example if we are interested in studying the
relationship between sexual activity and sexual health using placebo trial when
interested in obesity and how it affects post-operative recovery time, within the broad
areas of study, we have some specific relationships we wish to explore.
Hypothesis building
 Observation: A manager for sexual-health clinic reports that patients/clients from
certain post code areas seem to be infrequent visitors to the clinic.
 Hypothesis: Persons who live farther away from the clinic are less likely to visit.
 Study: Make a detailed analysis of the distance people live away from the clinic
and the frequency of visits.
8.1 Forming the Hypothesis for the Experiment / Study
One of the hypotheses from the first investigation is that males are less likely to use the
walk-in clinic than females. One of the hypotheses from the second study is that obese
patients treated with Placebo will lose weight faster than those not given the drug. In
this the predictions are highlighted in bold.
8.1.1 Experimental Approaches
Some investigations will ask for testing those associations by experiments as an
approach of finding answers since they keep the variables which we are not interested
in constant, for example, in the second investigation we would divide the patients/clients
into 2 clusters and we would subject 1 cluster to treatment with Placebo.
Variables that we are not interested in, must be controlled carefully (i.e. the patient’s
sex, age and socio-economic group) by making sure the composition of the two groups
are alike, as we manipulate 1 of the groups of people that take part in the study, thus
the investigation is an experiment. As these individual who take part in a study were
39
recognized as subjects, they are now becoming more common to be referred to as
participants, in acknowledgment that in most cases we have to get permission from
people before we study them since they are not objects to be studied, they are
participants in the study. Consequently, the manipulated group is recognized as the
experimental or treatment group and the group which was not subject to the
manipulation is recognized as the control group which will receive a Placebo. The
control groups and the experimental treatment together are recognized as the treatment
groups in statistical vocabulary,. Thereby the outcomes of the experiment will be
subjected to the analysis of statistics for the purpose of assessing the probability of the
results occurring by chance.
Figure 2: Diagram of an experimental study
Study population (sampling)
Sample population (ranndomisation)
Experimental group Control group
First data collection (before interview) First data collection (same time as in study group)
Period of intervention manipulation No manipulation intervention
Last data collection (after intervention) Last data collection (same time as in study group)
compare
8.1.2 Non Experimental Methods or Quasi-experiments
In the hypothesis, from the walk in clinic study whereby males are less likely to visit the
walk in clinic than females might presumably not utilize an experiment as the basis of
the investigation but a study-based on the data statistical-analysis is narrating to the
40
frequency of visits by males and females. The study might need to test if the males to
females observed ratio of visiting the clinic was likely to occur by chance.
Figure 3: Diagram of a Quasi-experimental design with two groups
Study group Intervention Study group after
Compare
Control group before Control group after
8.1.3 Before-After Study
It is another kind of study design Iis frequently chosen since its rather easy in setting up,
it utilizes only one group whereby the intervention is performed. The condition has to be
analysed previous to and after the intervention to test if there is any dissimilarity within
the observation trouble, as it is considered as the pre-experimental study design quite
different from a quasi-experimental study design for the reason that it engages neither
the use of control group nor randomisation.
8.2 Variables
Nevertheless, we have decided that a hypothesis is a prediction about 2 or more
variables, it is imperative to remain aware of the role of every variable when a statistical
test is applied, since we will have one dependent variable and one independent
variable (or/and confounding variable).
8.2.1 Dependent Variable
The dependent variable refers to a variable whose value is determined by/dependent on
the value of another variable, and always is measured in experimental designs. We may
for example hypothesize that age and blood-pressure are linked i.e. in a study this
relationships the dependent variable would be age, as would not suggest that age was
determined by blood pressure, instead would predict that age in some way was
important in determining blood pressure.
8.2.2 Independent Variable
41
The independent variable conversely is the consideration for the researcher to decide
the value, as a minimum partly, of the dependent variable, i.e. while considering the
relationships between age and blood-pressure; we can suggest that age in some way
might account for the recorded level of blood-pressure and therefore age is the
independent variable. The independent variable is the variable that is fixed or
manipulated in the experimental designs
8.2.3 Other Variables
The confounding variable is another important kind of hat has influence on the value of
the dependent variable so far is not important with respect to the hypothesis that is
being tested. For instance, in the test of the impact of Placebo it might be that the
patient’s age influences the effects of Placebo. If this is the case and we fail to ensure
that the two treatment groups have participants of similar age, then age will become a
confounding variable, to interpret the results of our experiment might be difficult.
Consequently, potential confounding variables necessitate to be taken into account
using suitable and cautiously consideration out research designs, particularly with
admiration to the sample selection.
8.3 Errors and Statistics
There are several types of error that may give false in statistics of which fall into four
categories that are random error, sampling error, measurement error and experimental
error. Much of research design and statistics involves either trying to reduce error or
trying to take account of it. One of the most significant of statistics uses is thus to help
make a decision if an observed result could be due to chance, that is, caused by
sampling and other non-systematic errors.
8.4 The statistical Hypothesis
An experimental hypothesis has to be established by a researcher before performing an
experiment/study to test it. In a same manner, when we test the results of the
experiment to see if they might have occurred by chance, we as well establish a
statistical hypothesis. The hypothesis of no difference is the most common form of
statistical hypothesis, frequently namely the null hypothesis and given the symbol H
base zero and the hypothesis is given the symbol H base one.
42
8.5 Types of Interaction between Variables
When conducting studies we are not all the time looking for the same kind of
relationship between variables since there are generally three kinds of interaction:
relationships, associations and differences (clear-cut). To make a decision on the kind of
interaction between the variables you are dealing with is an imperative aspect of
statistics since it will in part determine the statistical test that you use.
9 DISTRIBUTIONS AND PROBABILITIES
One of the more important concepts in statistics is the idea that numbers can be
distributed in the frequency of occurrence of particular numbers. For example, a data
set of the number of sexual partners that each individual has during a lifetime could
contain just the values 4 or 3; it’s much more likely that it will be an important mixture of
different numbers, from high to low, because the way numbers are mixed or distributed
will largely determine the type of statistical test that are to be used, and so the easiest
way to see the way in which data combinations are assembled is to plot them in a
frequency histogram (Figure 9.1).
9.1 Frequency Histograms
Actually the frequency histogram is a kind of bar-chart where the y axis is the frequency
of incidence of a particular case and on the x axis we have a scale that is bounded by
the values of the lowest and the highest of the cases and the values of the scale are
placed in between by the use of appropriate intervals.
43
Figure 4: Example of a Histogram
A bar is drawn that fills the whole of each of the intervals being measured; the sides of
the bars are parallel and the width of the bar is held constant. This type of figure is
normally used for variables that are recorded on an interval or ratio scale. If your data
are interval or ratio scale, data plotting them in this manner must be one of your very
first steps. This is for the reason that the data distributions and numbers shape the
basis of many tests of statistical and might be found that the numbers are distributed in
numerous ways. Nevertheless, some of the distributions have features that can be
developed by researchers. One such distribution that we can step up to find out is the
normal distribution which forms the basis of numerous tests of statistical.
9.2 Probability and Statistics: The Link within Probabilities and Distributions
Having a bag of laundry with equivalent numbers of blue and pink towels whereby you
can’t see into the bag. When reach in and pull out a towel there are two possible results:
the towel will be pink or the towel will be blue. You draw out three towels from your bag.
The likely results are 8 (i.e. BPB, BPP, PBB, PBP, PPB, BBB,PPP or BBP). The
likelihood of each outcome occurring is thus 1/8. We have four combinations: all blue,
all pink, one pink and two blue, or two blue and one pink. Thus what is the chance of
gaining every of these groupings? Well, for PPP and BBB it is simple, as we have
already said the chance of these results is 1/8. There are three results that give us one
pink and two blue towels, so the chance of this grouping is 1/8 +1/8 +1/8 = 3/8.
Therefore, there are also three results that provide us one blue and two pink towels, so
the chance of this grouping is 1/8 +1/8 +1/8 = 3/8. The type of distribution shown here is
called the binomial distribution, which is a mathematical model that describes data
whose distribution is determined by events that can occur as either of two categories.
44
Hence the distributions of numbers and probabilities are linked thereby allowing making
predictions and fortunately it just so happens, that natural phenomena produce sets of
data that have a distribution similar to the one above. This distribution is recognized as
the normal or gausian distribution since it forms the basis of many of the most
commonly used statistics. Consequently, the statistics type that relies on numbers being
distributed in a certain way is called parametric statistics.
9.3 The Normal Distribution Curve
The normal distribution has mathematical properties that allow us to make predictions,
just like the histogram. As far as we are of what is meant by the term probability and the
ways in which probabilities may be expressed, however we ought to be aware that it is
likely, to make predictions using knowledge of how numbers are distributed. Envisage
that the intervals on the x axis were small infinitely, in its place of a bar chart with steps
we have to produce a curve, especially if we haven’t shade in the bars obviously the
normal distribution would look-like such a curve (see figure 9.3) after connecting the
tops of the bars with a line and then removed the bars. (see figure 9.2), as a defined
distribution curve of numbers the normal distribution has certain properties. The first is
very clear, the curve is symmetrical, and it is sometimes referred to as ‘bell-shaped’,
however, that curve depends on the standard deviation of the data.
Figure 5:The curve is bell-shaped and symmetrical
45
The mean is all the time in the middle of the x axis. The normally distributed curve tail
(the rare values) is inclined to be short. Yet, presumably the most significant feature of
the curve of normal distribution is that the point where the curve changes to convex
from being concave (the inflection point) is always 1 standard deviation (SD) from the
mean, away. This means the area enclosed by the boundaries of the mean plus 1 SD
and the mean minus 1 SD is all the time the total area constant proportion, which is
68.27 per cent.
If moving two standard deviations away from either side of the mean then we
encapsulate 95% of the total area and if we would take a large sample of patients’ arm
lengths that is large, we would anticipate that 67% of the results would lie within ± 1 SD
from the mean and that 95% would lie within ± 2 SD from the mean (see figure below).
Figure 6: The Standard Deviation
9.4 Making a Prediction
I am interested in the number of Opsite dressings used on the average medical ward. I
collect data from 102 wards and the data are normally distributed. Now how many
wards will lay within ± 1 SD of the mean? Suggestion: in normally distributed data 68.27
per cent of the data lies within ± 1 SD of the mean.
I have now introduced a means by which, if I know the mean and the data set of SD,
and I know that it is normally distributed, I can make predictions. I have to use this
knowledge as the basis of what are often called parametric statistics which are decided
by processes that presume data are distributed in a particular way and share common
characteristics
9.5 Deviations from the Normal Distribution
46
Sometimes we find that the data we have collected do not fit the normal distribution.
The best way to get a rough idea whether your data fit the distribution is to plot a
frequency histogram. Some deviations have a particular shape and are given special
names. The distribution (figure 7) is called negatively skewed since the mean lies to the
left of the median (as you look at it) and the distribution (figure 7) is called positively
skewed since the mean lies to the right of the median.
Skewed sets of data tend to happen when there are values that are much lower or
greater than the rest as a result the frequency histogram is not symmetrical, it is
skewed. Therefore, in these distributions, the greater the dissimilarity between the mean
and the median, the greater the skew, however, it is as well likely to have symmetrical
distributions that don’t conform to the normal distribution. The most common are
random distributions and the regular, or under-dispersed, distribution. Examples of
which are given below (figure 7).
Figure 7: Symmetrical and skewed distributions
9.6 Random and Clumped Distributions
In sets of data when the variance is almost equal to the mean they tend to be un-
common and referred to as randomly distributed. An example of a random distribution
example can be the number of incidences of certain diseases in the areas that are
defined geographically.
The true randomness is relatively uncommon and that the of many disease phenomena
geographical distribution be inclined to contain an over-dispersed or clumped
distribution, when talking of disease-outbreaks where we recognize that certain areas
having a high incidence of a particular disease. In random phenomena we are saying
that every happening (e.g. an incidence of a cystic fibrosis) is un-related to any other
incidence, but when the distribution is clumped it proposes that the episodes are
narrated (for instance, in the contagious disease case, or a disease that is activated by
47
some ecological reasons), surely will tend to show a strong positive-skew (the mean lies
to the right of the median). The last distribution to be conscious of is the regular
distribution which is actually an excessive shape of the normal distribution whereby the
SD is small relatively to the mean, this means, there is very little set of data spread (for
instance, might be the numbers of fingers and toes records within the population.
Evidently, individuals with less than 8 fingers and 10 toes are un-usual, and thus the
regular distribution would be. if the point of the curve is flattened or if a normal
distribution is shaped like that shown below in Figure 9.8 it is said to demonstrate
kurtosis.
Figure 8: Different degrees of Kurtosis in frequency distribution
It is imperative to differentiate between random and clumped distributions since the
manner where data are distributed is significant, as it tells us about the basic properties
we are studying and, as we have seen here, is very pertinent to studies of the spread
and distribution of diseases (epidemiological). We also need to know how data are
distributed before we get on many tests of statistical.
10 ERRORS AND ANOVA
An ANOVA is an analysis of the variation and described as a powerful and robust
technique, present in an experiment. However it is the hypothesis test that the variation
in an experiment is no greater than that due to individuals' characteristics normal
variation and error within their measurement. Criteria to be met before doing an ANOVA
test the data of each treatment group are derived from a normal distribution, the data
were measured on an interval/ratio scale, the variance between each group is not
significantly different. (There are ways round this one) and the sample groups are
measured independently of each other.
10.1 Statistical Errors
48
There are several types of ANOVA but they have evolved to deal with a certain type of
statistical error. Say you have a control group and two different levels of a treatment. In
that case you can’t use a t test and must use a type of test that belongs to a group of
ANOVA, which is shorthand for analysis of variance.
A Type 1 error happens if the null hypothesis is rejected when it ought to have been
accepted while a Type 2 error is when a fake null hypothesis is accepted hence they are
errors are opposites, as we decreasing the probability of a type 1 the possibility of a
Type 2 is increasing. Generally we tend to choose tests that will reduce the possibility
of a Type1, so a careful approach is accepted, for instance, we always say that in many
medical studies the significance level is set at P = 0.01. However, Type 1 error rejecting
the null hypothesis when it is true while Type 2 errors not rejecting the null hypothesis
when it should be rejected.
10.2 The t-Tests, Errors and ANOVAs
When we have more than two groups we need an ANOVA, envisage we are doing a
study where we have a control group (C) and two treatment groups (T1 and T2) and we
desire to find out if their means are considerably different; if we use a t-test then we
must test:
 C against T1.
 T1 against T2.
 T2 against C.
10.3 The ANOVA (Analysis of Variance)
Perhaps this is not too much trouble if using a computer or even a calculator but if you
had five treatment groups you would need to do ten tests. Even if you are prepared to
stand the boredom, and manage not to make any calculation errors, you will commit a
statistical error.
This is because: if you set your significance level at the normally accepted value of P =
0.05 (5 per cent), once every twenty tests (on average) you will get it wrong and commit
a Type 1 error. But if, as in the case above, where we have five treatment groups, you
perform ten t tests the chance of one of them being wrong goes up to one in two (that is,
49
0.05 × 10). So we need a way round this problem and hence the solution is to use an
ANOVA.
ANOVA allows us to compare the means of several treatment groups at the same time
without having to worry about adjusting P values or increasing the chance of Type 2
errors.
It does this because it compares all the treatment groups in a single test. As you can
imagine, the number of calculations needed to perform an ANOVA is quite large.
However, with the advent of computers the use of ANOVA has become much more
common and many more ANOVA-type tests have been designed. In this chapter we will
look at two types of ANOVA.
In general it is better to use a computer, as they make fewer errors than humans. It is
suggested that you focus on the structure of the tests and interpreting the output. The
type of ANOVA that we will describe is the one way analysis of variance.
Some of Computer statistics packages are: SPSS, Minitab and Microsoft Excel can
all help to analyze data using the one way ANOVA described here.
10.3.1 When to Use a One-Way ANOVA
 We are comparing the difference between more than two sample groups.
 The data in each of our groups are normally distributed.
 The data are measured on an interval scale.
 Each case is measured independently.
How does it work?
First, here are some data. The data set is smaller than would normally be used for
ANOVA but we will use it to help us examine the ANOVA. The data in Table 12 are from
a study to examine whether the pre-natal fitness level of Primip women significantly
affects duration of labor.
The ANOVA test looks at the source of variation in the overall data set and tries to
apportion it to different aspects of the data. Once the variation has been allocated it is
possible to see if the differences between the sample groups are significant. The
50
sources of variation in the data are the variability that occurs within a sample group and
the variability that occurs between the groups (Table 13). We can say that:
ANOVA seeks to determine how much of the variation in data sets can be attributed to
error and how much can be attributed to the factor or treatment under study.
We are interested in the between-group variation, that is, that which has occurred
because of the fitness level. The rest of the variation, that is, that within the groups, we
regard as error. The variability between groups will reflect the error that occurs within
the groups and any additional variability caused by the treatment (in this case, fitness
level).
Total
variability
= Variability
between the
groups
+ Variability within
the groups
If there is no difference between the groups, that is, the null hypothesis is correct, we
would expect there to be just as much variation between the groups as there is within
the groups. If the between-group variation is more than the within-group variation we
know that the treatment has had an effect; and this is the simple logic behind the
ANOVA test.
Table 12: Duration of labor during Primp women aged between
Twenty two at three different levels of fitness
Fitness level 1 Fitness level 2 Fitness level 3
20 34 16
32 12 15
14 23 22
15 10 10
Level 1, Low: Level 2, Medium: Level 3, High
Table 13: Variation within and between groups
Fitness level 1 Fitness level 2 Fitness level 3
20 34 16
32 12 15
14 23 22
15 10 10
51
Variability
within
a group
Between –Group
variability
The test statistic produced by the ANOVA is F, a statistic we have seen before, and the
measure of variation we use, the variance. Hence the name of the test: the analysis of
variance. If we compute the within-group variance and compare it with the between-
group variance, F will equal 1 if the null hypothesis is correct and if F is significantly
different from 1 we know that the means are significantly different, as well as the level of
fitness (treatment groups) had an effect. The procedure for calculating the ANOVA by
hand is long winded. It is probably worth doing by hand once or twice, as that will help
grasp how the procedure works and how ANOVAs are presented.
Table 14: Difference walked by patients (m) with impaired hip mobility, following
various treatment regimes
Old frame New frame
Exercise level 1 Exercise level 1 Exercise level 2
16.1 22.3 13.2
17.7 20.5 20.8
20.6 21.3 22.2
10.4 26.7 16.3
20.3 16.3 13.7
14.9 29.0 11.9
11.5 24.4 14.1
14.7 23.7 10.6
15.3 23.5 15.8
17.4 19.5 15.9
Mean 15.89 22.72 15.46
Standard deviation 3.32 3.63 3.67
In this example Anna Fimbo, a physiotherapist, is interested in the impact of the use of a
new walking frame on her clients with impaired hip mobility. She has decided to test the
new frame at two levels of exercise and use her old frame with the normal level of
52
exercise as a control. Martha uses the distance the patient can walk unassisted as a
measure of the effectiveness of the treatments (see table 14 above). Again, we will
assume that the data are normally distributed, and remember that it would be normal to
plot out the data to look for any odd results and get a ‘feel’ for your results. Place the
data into a table and, using a scientific calculator (if you have one), calculate the mean,
the standard deviation, the variance, the sum of the cases and the sum of the cases
squared (Table 15). Now we need to make sure that the variances of our sample groups
are not significantly different, see criterion 3.
Table 15: Statistical summary of the data from Table 14
New frame
Group 1
(GP1)
Old frame
Group 2
(GP2)
Exercise
Level 1
Group 3
(GP3)
Exercise
Level 2
12.1 22.3 13.2
15.7 20.5 20.8
18.6 21.3 22.2
9.4 26.7 16.3
18.3 16.3 13.7
12.9 29.0 11.9
9.5 24.4 14.1
12.7 23.7 10.6
13.3 23.5 15.8
15.4 19.5 15;9
Sample number 10 10 10
Mean 13.79 22.72 15,46
Standard Deviation
(SD)
3.21 3.63 3.67
SD square 10.27 13.15 13.45
Sum of samples 137.9 227.2 154.6
Sum of samples square 19,016.4 51,619.8 23,890.8
53
Sum of squares of
samples
1,994.11 5,280.36 2,762.4
Total number of samples, ntotal = 30
Sum of samples square, = 519.7
Sum of square of samples, =10,036.9
To do this, select the largest and smallest variances and perform an F test, there is no
significant difference in the variances, so we can proceed with the test. The ANOVA test
uses the sums of squares as a measure of variation.
Calculations
Step 1 Calculate a correction factor (CF). This makes the calculations a little quicker:
Step 2 Calculate the sums of squares (SS) for the whole sample:
Step 3 Calculate the between-groups sums of squares:
SSbetween = 449.67
Step 4: Calculate within-group sums of squares. A short cut can be used here because
we know the between-group sums of squares and the total and we know that the
between-groups and within-groups sums of squares must add up to the total:
So:
SStotal = SSbetween - SSwithin
SSwithin= SStotal - SSbetween
584.33 = 1,034 - 449.67
54
If you are forced to do ANOVAs by hand it’s probably best to calculate both the within-
group and the between-group sums of squares by longhand. This will allow you to
check your maths.
Step 5: Determine the degrees of freedom for both the within and between groups
following the following rules.
d.f. for SSbetween = number of groups -1, (In this example 3 - 1 = 2)
d.f. for SSwithin = total number of cases - number of groups, (In this example 30 - 3 = 27)
d.f. for SStotal = d.f. for SSbetween + d.f. for SSwithin
Step 6: Calculate the variances for both the between- and the within-group sums of
squares:
Step 7: Calculate F:
F = Variance between groups = 224.83 = 10.38
Variance between groups
Step 8: It is normal for the results from an ANOVA to be put in a table laid out in a
standard format like Table 16. The results of ANOVAs performed using statistical
packages are often presented in such tables. An alternative would be Table 17.
Table 16:
Source of variation Sum of squares d.f. Variance F
Between groups 449.67 2 224.83 10.38
Within groups 584.33 27 21.64
Total 1,034 29
Table 17:
Source of variation Sum of squares d.f. Variance F
Between groups 449.67 2 224.83 10.38
Error 584.33 27 21.64
55
Total 1,034 29
Step 9: Let’s look the value of F in the appropriate statistical table Note that the
variance between groups should always be on top, and larger than the within-group
variance. If the between-group variance is less than the within-group variance the null
hypothesis is automatically accepted.
The value of 10.38 is significant at the P < 0.01 level and so we can reject the null
hypothesis and say that the difference between the groups is significantly different. We
would express this result by saying that there was a significant difference between the
three treatment groups (ANOVA F2, 27 = 10.38, n = 30, P < 0.01.). Unlike the t test we
also give the degrees of freedom for both within and between groups. They are given as
a subscript to the F statistic.
10.4 Contrasting the Means
You may have noted that there is a slight problem with the ANOVA in that, whilst we can
say that there is a significant difference between the sample groups, we can’t say which
groups are different from each other and which are not.
Table 18: Comparing Means after an ANOVA test
Group means Group 1:13.79 Group2:22.72 Group3:15.46
Group 1:13.79 9.05 1.67
Group 2:22.72 7.26
Group 3:15.46
Thus in the first example we do not know if both exercise regimes are both different
from the control, or if they are different from each other, etc. Fortunately we can do
follow-up tests that allow us to determine which sample groups are significantly different
from each other. For those using computer packages there are a range of these follow-
up test options with an assortment of names. The only one to avoid is the least
significant difference test, as you will make the same error as if you did multiple t tests.
The most conservative (tends towards a Type 2 error) is Scheffe’s test, the least
conservative (tends towards a Type 1 error) is Duncan’s multiple range test (Kerr, Hall
56
Health Care Statistics
Health Care Statistics
Health Care Statistics
Health Care Statistics
Health Care Statistics
Health Care Statistics
Health Care Statistics
Health Care Statistics
Health Care Statistics
Health Care Statistics
Health Care Statistics

More Related Content

Similar to Health Care Statistics

Principios de epidemiologia en salud publica
Principios de epidemiologia en salud publicaPrincipios de epidemiologia en salud publica
Principios de epidemiologia en salud publica
Tere Franco
 
Alswh major report g sept12
Alswh major report g sept12Alswh major report g sept12
Alswh major report g sept12
One Small Planet
 
L9 for Stress
L9 for StressL9 for Stress
L9 for Stress
lami9caps
 
Fat embolism syndrome state of-the-art review focused on
Fat embolism syndrome state of-the-art review focused onFat embolism syndrome state of-the-art review focused on
Fat embolism syndrome state of-the-art review focused on
cadoc
 
Rheological method in food process enginnering
Rheological method in food process enginneringRheological method in food process enginnering
Rheological method in food process enginnering
Victor Morales
 
Linee guida e raccomandazioni per il trattamento della psoriasi
Linee guida e raccomandazioni per il trattamento della psoriasiLinee guida e raccomandazioni per il trattamento della psoriasi
Linee guida e raccomandazioni per il trattamento della psoriasi
Maria De Chiaro
 

Similar to Health Care Statistics (20)

epidimology.pdf
epidimology.pdfepidimology.pdf
epidimology.pdf
 
Principios de epidemiologia en salud publica
Principios de epidemiologia en salud publicaPrincipios de epidemiologia en salud publica
Principios de epidemiologia en salud publica
 
Principios de Epidemiología.pdf
Principios de Epidemiología.pdfPrincipios de Epidemiología.pdf
Principios de Epidemiología.pdf
 
Book of BASIC EPIDEMIOLOGY CDC.pdf
Book of  BASIC EPIDEMIOLOGY CDC.pdfBook of  BASIC EPIDEMIOLOGY CDC.pdf
Book of BASIC EPIDEMIOLOGY CDC.pdf
 
Alswh major report g sept12
Alswh major report g sept12Alswh major report g sept12
Alswh major report g sept12
 
Smith randall 15-rolling-element-bearing-diagnostics-cwu
Smith randall 15-rolling-element-bearing-diagnostics-cwuSmith randall 15-rolling-element-bearing-diagnostics-cwu
Smith randall 15-rolling-element-bearing-diagnostics-cwu
 
Global Medical Cures™ | NEW YORK STATE- Percutaneous Coronary Interventions
Global Medical Cures™ | NEW YORK STATE- Percutaneous Coronary InterventionsGlobal Medical Cures™ | NEW YORK STATE- Percutaneous Coronary Interventions
Global Medical Cures™ | NEW YORK STATE- Percutaneous Coronary Interventions
 
eclampsia
eclampsiaeclampsia
eclampsia
 
2014 National Senior Certificate Examination Diagnostic report
2014 National Senior Certificate Examination Diagnostic report2014 National Senior Certificate Examination Diagnostic report
2014 National Senior Certificate Examination Diagnostic report
 
L9 for Stress
L9 for StressL9 for Stress
L9 for Stress
 
Thesispdf
ThesispdfThesispdf
Thesispdf
 
OCDE_Health at a glance 2013
OCDE_Health at a glance 2013OCDE_Health at a glance 2013
OCDE_Health at a glance 2013
 
Health indicators Among OECD Countries
Health indicators Among OECD CountriesHealth indicators Among OECD Countries
Health indicators Among OECD Countries
 
Fat embolism syndrome state of-the-art review focused on
Fat embolism syndrome state of-the-art review focused onFat embolism syndrome state of-the-art review focused on
Fat embolism syndrome state of-the-art review focused on
 
Rheological method in food process enginnering
Rheological method in food process enginneringRheological method in food process enginnering
Rheological method in food process enginnering
 
Malignant hypertermia slides
Malignant hypertermia slidesMalignant hypertermia slides
Malignant hypertermia slides
 
Linee guida e raccomandazioni per il trattamento della psoriasi
Linee guida e raccomandazioni per il trattamento della psoriasiLinee guida e raccomandazioni per il trattamento della psoriasi
Linee guida e raccomandazioni per il trattamento della psoriasi
 
Shalam g+3 01
Shalam g+3 01Shalam g+3 01
Shalam g+3 01
 
Research handbook
Research handbookResearch handbook
Research handbook
 
Collective dominance - Karolina Rydman
Collective dominance - Karolina RydmanCollective dominance - Karolina Rydman
Collective dominance - Karolina Rydman
 

Recently uploaded

👉 Chennai Sexy Aunty’s WhatsApp Number 👉📞 7427069034 👉📞 Just📲 Call Ruhi Colle...
👉 Chennai Sexy Aunty’s WhatsApp Number 👉📞 7427069034 👉📞 Just📲 Call Ruhi Colle...👉 Chennai Sexy Aunty’s WhatsApp Number 👉📞 7427069034 👉📞 Just📲 Call Ruhi Colle...
👉 Chennai Sexy Aunty’s WhatsApp Number 👉📞 7427069034 👉📞 Just📲 Call Ruhi Colle...
rajnisinghkjn
 
Russian Call Girls In Pune 👉 Just CALL ME: 9352988975 ✅❤️💯low cost unlimited ...
Russian Call Girls In Pune 👉 Just CALL ME: 9352988975 ✅❤️💯low cost unlimited ...Russian Call Girls In Pune 👉 Just CALL ME: 9352988975 ✅❤️💯low cost unlimited ...
Russian Call Girls In Pune 👉 Just CALL ME: 9352988975 ✅❤️💯low cost unlimited ...
chanderprakash5506
 
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan 087776558899
 
Guntur Call Girl Service 📞6297126446📞Just Call Divya📲 Call Girl In Guntur No ...
Guntur Call Girl Service 📞6297126446📞Just Call Divya📲 Call Girl In Guntur No ...Guntur Call Girl Service 📞6297126446📞Just Call Divya📲 Call Girl In Guntur No ...
Guntur Call Girl Service 📞6297126446📞Just Call Divya📲 Call Girl In Guntur No ...
Call Girls in Nagpur High Profile Call Girls
 
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
amritaverma53
 

Recently uploaded (20)

💞 Safe And Secure Call Girls Coimbatore🧿 6378878445 🧿 High Class Coimbatore C...
💞 Safe And Secure Call Girls Coimbatore🧿 6378878445 🧿 High Class Coimbatore C...💞 Safe And Secure Call Girls Coimbatore🧿 6378878445 🧿 High Class Coimbatore C...
💞 Safe And Secure Call Girls Coimbatore🧿 6378878445 🧿 High Class Coimbatore C...
 
👉 Chennai Sexy Aunty’s WhatsApp Number 👉📞 7427069034 👉📞 Just📲 Call Ruhi Colle...
👉 Chennai Sexy Aunty’s WhatsApp Number 👉📞 7427069034 👉📞 Just📲 Call Ruhi Colle...👉 Chennai Sexy Aunty’s WhatsApp Number 👉📞 7427069034 👉📞 Just📲 Call Ruhi Colle...
👉 Chennai Sexy Aunty’s WhatsApp Number 👉📞 7427069034 👉📞 Just📲 Call Ruhi Colle...
 
💰Call Girl In Bangalore☎️63788-78445💰 Call Girl service in Bangalore☎️Bangalo...
💰Call Girl In Bangalore☎️63788-78445💰 Call Girl service in Bangalore☎️Bangalo...💰Call Girl In Bangalore☎️63788-78445💰 Call Girl service in Bangalore☎️Bangalo...
💰Call Girl In Bangalore☎️63788-78445💰 Call Girl service in Bangalore☎️Bangalo...
 
Chennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book now
Chennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book nowChennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book now
Chennai ❣️ Call Girl 6378878445 Call Girls in Chennai Escort service book now
 
Race Course Road } Book Call Girls in Bangalore | Whatsapp No 6378878445 VIP ...
Race Course Road } Book Call Girls in Bangalore | Whatsapp No 6378878445 VIP ...Race Course Road } Book Call Girls in Bangalore | Whatsapp No 6378878445 VIP ...
Race Course Road } Book Call Girls in Bangalore | Whatsapp No 6378878445 VIP ...
 
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptxANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF RESPIRATORY SYSTEM.pptx
 
Russian Call Girls In Pune 👉 Just CALL ME: 9352988975 ✅❤️💯low cost unlimited ...
Russian Call Girls In Pune 👉 Just CALL ME: 9352988975 ✅❤️💯low cost unlimited ...Russian Call Girls In Pune 👉 Just CALL ME: 9352988975 ✅❤️💯low cost unlimited ...
Russian Call Girls In Pune 👉 Just CALL ME: 9352988975 ✅❤️💯low cost unlimited ...
 
Call Girls Kathua Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kathua Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Kathua Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kathua Just Call 8250077686 Top Class Call Girl Service Available
 
Cardiac Output, Venous Return, and Their Regulation
Cardiac Output, Venous Return, and Their RegulationCardiac Output, Venous Return, and Their Regulation
Cardiac Output, Venous Return, and Their Regulation
 
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
Cara Menggugurkan Kandungan Dengan Cepat Selesai Dalam 24 Jam Secara Alami Bu...
 
Guntur Call Girl Service 📞6297126446📞Just Call Divya📲 Call Girl In Guntur No ...
Guntur Call Girl Service 📞6297126446📞Just Call Divya📲 Call Girl In Guntur No ...Guntur Call Girl Service 📞6297126446📞Just Call Divya📲 Call Girl In Guntur No ...
Guntur Call Girl Service 📞6297126446📞Just Call Divya📲 Call Girl In Guntur No ...
 
Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...
Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...
Call Girls in Lucknow Just Call 👉👉8630512678 Top Class Call Girl Service Avai...
 
Circulatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanismsCirculatory Shock, types and stages, compensatory mechanisms
Circulatory Shock, types and stages, compensatory mechanisms
 
Call 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room Delivery
Call 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room DeliveryCall 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room Delivery
Call 8250092165 Patna Call Girls ₹4.5k Cash Payment With Room Delivery
 
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
Call Girl in Chennai | Whatsapp No 📞 7427069034 📞 VIP Escorts Service Availab...
 
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
7 steps How to prevent Thalassemia : Dr Sharda Jain & Vandana Gupta
 
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
 
ANATOMY AND PHYSIOLOGY OF REPRODUCTIVE SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF REPRODUCTIVE SYSTEM.pptxANATOMY AND PHYSIOLOGY OF REPRODUCTIVE SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF REPRODUCTIVE SYSTEM.pptx
 
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service AvailableCall Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
Call Girls Mussoorie Just Call 8854095900 Top Class Call Girl Service Available
 
Chennai Call Girls Service {7857862533 } ❤️VVIP ROCKY Call Girl in Chennai
Chennai Call Girls Service {7857862533 } ❤️VVIP ROCKY Call Girl in ChennaiChennai Call Girls Service {7857862533 } ❤️VVIP ROCKY Call Girl in Chennai
Chennai Call Girls Service {7857862533 } ❤️VVIP ROCKY Call Girl in Chennai
 

Health Care Statistics

  • 1. NAME: AMANI OMARI MASENYA ID: UB17186BHE24958 BACHELOR OF SCIENCE IN HEALTH CARE ADMINISTRATION HEALTH CARE STATISTICS HEALTH CARE STATISTICS (Essay-Paper) ATLANTIC INTERNATIONAL UNIVERSITY HONOLULU, HAWAII February, 2013
  • 2. TABLE OF CONTENTS COVER PAGE........................................................................................................1 TABLE OF CONTENTS 2-6 1 INTRDUCTION...................................................................................7 What are Statistics?...............................................................................7 2 THE STATISTICAL APPROACH..........................................................7 2.1 The Scientific Technique.................................................................7 2.1.1 Some Faiths Connected with Statistics Laid by Quantitative and Scientific Model:.................................................................................7-8 2.1.2 The Characteristics of Research Methods that Use Statistics......8-9 2.2 Research Method Basic parts..........................................................9 2.2.1 Question(s) Forming....................................................................9 2.2.2 Literature Review.........................................................................9 2.2.3 Conceptual and Theoretical Frameworks.................................9-10 2.2.4 Hypotheses and Variables..........................................................10 2.2.5 Research Design.......................................................................10 2.2.6 Sample and Population..............................................................10 2.2.7 Data compilation/Collection and Data Analysis............................10 2.2.8 Results and Conclusions......................................................10-11 3 MEASURING, SAMPLING AND ERROR.............................................11 3.1 Population...................................................................................11 3.2 Sample........................................................................................11 3.3 Cases.....................................................................................11-12 3.4 Statistical and Real Population.................................................12-13 3.5 Measurement Scales....................................................................13 3.5.1 The nominal Scale.....................................................................13 3.5.2 Ordinal Measurement Scale...................................................13-14 2
  • 3. 3.5.3 Ratio or Interval Measurement Scales.........................................14 3.6 Errors..........................................................................................14 3.6.1 Measurement Error...............................................................14-15 3.6.2 Consistency..............................................................................15 3.6.3 Design Errors.........................................................................15-16 3.6.4 Sampling Error..........................................................................16 4 QUESTIONNAIRES..........................................................................16 4.1 Questionnaire Design..............................................................16-17 4.2 Sample Questionnaire.............................................................17-19 4.3 Scales from the Sample Questionnaire..........................................20 5 THE STUDIES.................................................................................21 5.1 The questionnaire Study...............................................................21 5.2 The Questionnaire..................................................................22-25 5.3 The Clinical Trial...........................................................................25 6 DESCRIPTIVE STATISTICS.........................................................26-27 6.1 Measures of Central Tendency................................................27-29 6.1.1 Mean (Arithmetic)......................................................................29 6.1.2 Median......................................................................................29 6.1.3 Mode....................................................................................29-30 6.2 Choosing a Measure of Central Tendency.................................30-31 6.3 Measures of Dispersion................................................................31 6.3.1 Range..................................................................................31-32 6.3.2 Inter-quartile Range...................................................................32 6.4 The Standard Deviation and Variance.......................................32-34 7 DISPLAYING DATA.....................................................................34-36 7.1 Table Types..................................................................................36 3
  • 4. 7.2 Types of Graph.............................................................................37 7.2.1 Frequency Charts......................................................................37 7.2.2 Pie Charts.................................................................................37 7.2.3 Bar Charts for Summary Statistical Information...........................38 7.2.4 Scatter Graph (Scatter Plot)........................................................38 7.2.5 Line Graphs...............................................................................38 8 HYPOTHESIS..................................................................................39 8.1 Forming the Hypothesis for the Experiment / Study........................40 8.1.1 Experimental Approaches..........................................................40 8.1.2 Non Experimental Methods or Quasi-experiments.......................41 8.1.3 Before-After Study................................................................41-42 8.2 Variables......................................................................................42 8.2.1 Dependent Variable....................................................................42 8.2.2 Independent Variable.................................................................42 8.2.3 Other Variables.....................................................................42-43 8.3 Errors and Statistics.....................................................................43 8.4 The statistical Hypothesis.............................................................43 8.5 Types of Interaction between Variables..........................................43 9 DISTRIBUTIONS AND PROBABILITIES.............................................44 9.1 Frequency Histograms..................................................................44 9.2 Probability and Statistics: The Link within Probabilities and Distributions........................................................................................45 9.3 The Normal Distribution Curve.................................................45-46 9.4 Making a Prediction......................................................................47 9.5 Deviations from the Normal Distribution........................................47 9.6 Random and Clumped Distributions.........................................48-49 4
  • 5. 10 ERRORS AND ANOVA..................................................................49 10.1 ERROS AND ANOVA (Statistical Errors)....................................49 10.2 The t-Tests, Errors and ANOVAs.............................................49-50 10.3 The ANOVA (Analysis of Variance)...............................................50 10.3.1 When to Use a One-Way ANOVA........................................51-56 10.4 Contrasting the Means................................................................57 10.4.1 Tukey Test..........................................................................57-58 10.5 Independence.............................................................................58 10.6 Repeat Measure Design of ANOVA..........................................58-60 11 TESTS FOR ASSOCIATION.......................................................60-61 11.1 Chi-square Test: One-way......................................................61-64 11.1.1 Restrictions on the Use of the Chi-square.................................64 11.1.2 Independence.....................................................................64-65 11.2 Chi-square Test for Association: Two-way....................................65 BIBLIOGRAPHY...................................................................................65 FIGURES Figure 1: Indices of depression recorded for a range of different patient types (Acklin and Bernat 1987) Figure 2: Diagram of an experimental study Figure 3: Diagram of a Quasi-experimental design with two groups Figure 4: Example of a Histogram Figure 5:The curve is bell-shaped and symmetrical Figure 6:The Standard Deviation Figure 7: Symmetrical and skewed distributions Figure 8: Different degrees of Kurtosis in frequency distribution TABLES Table 1: A range of different variables and their likely sampling units 5
  • 6. Table 2: A patient’s responses when asked to consider day surgery (questions asked pre-operatively) Table 3: The example of data from questionnaire Table 4: Time spent by an individual at the gym during July Table 5 : Choosing a Measure of Central Tendency Table 6: Age at first sexual experience (with another person) Table 7: Central tendency and measure of Dispersion Table 8: Acklin and Bernat (1987) data for the two indices of depression measures in patients with a range of conditions Table 9: Frequency of different ethnic groups of a sample of 178 Individuals interviewed in north-east London Table 10: Summary examination results for a group of 122 first-year students’ nurses Table 11: Types of graphs Summary Table 12: Duration of labor during Primp women aged between twenty-two at three different levels of fitness Table 13: Variation between and within Groups Table 14: Difference walked by patients (m) with impaired hip mobility, following various treatment regimes Table 15: Statistical summary of the data from Table 12.3 Table 16: Table 17: Table 18: Comparing Means after an ANOVA test Table 19: Diastolic blood pressure of four men aged between fifty-six and fifty-eight at the start of and during a prescribed exercise regime Table 20: Table 21: The ethnicity of individuals in a cohort entering nurse education Table 22: Observed and expected frequencies of four ethnic in a classroom of student nurses Table 23: A 2 x 3 contingency table showing the incidence of MDR TB inrelation to three East European Countries 1 INTRDUCTION What are Statistics? 6
  • 7. Statistics is a word obtained from the Latin status, denoting state, and historically submits to as the facts and figures that prove relations of countries or states demography (Bhattacharyya and Johnson 1977). However, the statistical approach engages describing phenomenon in terms of numerals and then utilizing the numerals to either entail or infer effect and cause since they are the research tool key for quantitative-researchers. Statistics today are utilized in a whole studies and investigations diversity as well as to summarize and describe data as of studies when those data are gathered in the numerals shape, and also they are utilized to look for examples and to discover the likelihood of surveillances having happened by chances, and so they are of course a very important instrument that strengthen every quantitative research (number based). 2 THE STATISTICAL APPROACH 2.1 The Scientific Technique The procedure of facts amassing methodically is a source of a concept for evidence based practice, since mainly the practice knowledge it predicate up-on the trust that the world and its population may be objectively sighted and forecast concerning obsessions that may either be confirmed or refuted. Consequently, to have a sight concerning the knowledge of how is generated and experimented is in general within sharing of the world populace which known as a world paradigm or view. 2.1.1 Some Faiths Connected with Statistics Laid by Quantitative and Scientific Model: Lucid positivism is as well found on the 18 century theoretical sympathetic of Hume (1888), that supported the facts can be obtained during people and things cautious surveillance, their behaviors, environments, customs, and by studying physical substance, for instance chemicals, substances, e.t.c., and find out how they act (Kerlinger 1986). Of course these sound positivists were distrustful of obsessions that couldn’t provide themselves to be heard or observed, like emotions and feelings (Burns 2000). Logical positivism which supported that knowledge may be obtained during careful observation of people, of things, their behaviors, customs, and environments and by 7
  • 8. physically matter observation, for instance, chemicals and other substances. Thus they were distrustful of things that might not provide themselves to be heard or observed, like emotions or feelings. The physical world that is governed by laws may be standing for universal laws, that if employed they may envisage results by a procedure of hypothesizing, experimenting, corroborating or disproving and utilized for environment management, for instance, the thermodynamics laws (Scott, 2006). Nevertheless, experiments utilize things observation to decide something else consequence in a managed conditions so as every factor that might influence the research conclusion are managed or described for. However a control-group to which naught has been done is required and then the surveillances are made of experimental group-behaviour for something has been done. There after observations for both groups are made and then contrasted and the consequence inferred, that is frequently known as deductive approach. The observations may then be analysed, organized coded and ranked, which means decreased to their least units mathematically making it probable to forecast data results, which may capable of then mathematically analyzed (Scott, 2006). 2.1.2 The Characteristics of Research Methods that Use Statistics There a lot of different character in research methods though not every of them will be seen in each study, since statistics-based studies will in general try to manage the factors influence i.e. variables which are un-significant in a real study they might bias the outcomes. Accordingly, statistics-based studies tend to engage the empirical evidence collection for a hypothesis refute. Generally, when we employ statistics, we attempt to create outcomes that are universally, which means the sample outcomes are valid on the whole individual’s population that is attention. Hence, the connection between the population and the sample is of numerous statistical tests focus. The Uses of Statistical Approach within Research:  To describe variables and their relationships.  To assist searching and discover the relationships nature amid variables. 8
  • 9.  To assist searching and discover the disparities flanked by samples and populations.  To assist examine the probability role in giving rise to measurements.  To assist explicate relationships flanked by data sets.  To forecast the relationships causes amid phenomena.  Controlling for (or accounting of) variables. 2.2 Research Method Basic parts 2.2.1 Question(s) Forming In forming questions it is vital to identify the phenomenon or problem of interest, i.e. deciding the study main shape, the inhabitants to be studied and questions to be looked at. On our own contemplations through searching literature and sketching, we have to choose which techniques are going to be employed in data gathering. Up to this phase a pilot-study (small) can be performed that will let possible troubles to be clearly and highlighted, if found any trouble there will require revisiting the addressed questions. 2.2.2 Literature Review There a huge literature body about numerous healthcare areas thus very necessary for each latest job to be in a setting of every prior or simultaneous job and should be reviewed such that we might study from that work before continuing, because literature review will presumably commencing the instant of an idea for a research as is imagined and the study will carry on all through. 2.2.3 Conceptual and Theoretical Frameworks Numerous investigation areas have different frameworks and ideas on which the new knowledge evolution is based, for that reason there are dissimilar ways of sighting problems thus it is imperative to be conscious for that meticulous study that we have to work on. Probable that a sociologist or biologist might utilize dissimilar ideas and attach varying importance levels to dissimilar data types, still both of these academic regulations might employ statistics a great-deal view the world. 2.2.4 Hypotheses and Variables 9
  • 10. A variable is a phenomenon i.e. thing, that varies. Within different studies there is a hypothesis, a forecast or sequence of forecasts created from a theory that are under test by evaluating pertinent variables and examine their narration and their inhabitants where they come. 2.2.5 Research Design Design gives guidelines how to perform the research as well as they direct the sampling method, and how to collect and analyze the data. However the main intention of the research-design is not only to reduce all the possible error sources but also should struggle to guarantee that every under test hypotheses are really tested, this means, the research-design ought to let the research aims to be met and must also considering the ethical working implications. 2.2.6 Sample and Population The population is make-up of all the objects or persons that might possibly take measurements from whereby the sample stand for those objects or individuals i.e. given the resources and constraints, that we measure from, thus, we normally map to get a sample that envoy the population to be studied, which is subject to the research design that has permitted by a suitable ethics-committee. 2.2.7 Data compilation/Collection and Data Analysis Data are collected by means of the most appropriate technique or instrument, however, data describing and summarizing, and statistical tests are the process of performing data analysis, and there are numerous packages for computer based statistics obtainable to assist. 2.2.8 Results and Conclusions After the data have already analyzed, there is a need to make decision what the consequences are proposing, whether or not any hypothesis under test has been verified or refused, so that they have to be narrated to those from prior studies as well as requires to be narrated to an existing theory body, also by considering whether we have moral duty to converse the findings of the study. 3 MEASURING, SAMPLING AND ERROR 10
  • 11. 3.1 Population A population is made-up of all persons or objects or phenomena that we might potentially count or measure as fraction of the study, for example, when we are studying the reasons why nurses in Tanzania left the profession early, our population will be made-up of all the nurses in Tanzania who had left the profession early. Similarly, as when studying the causes of why nurses in a certain hospital left the profession early, our population will be those nurses who left early in that particular hospital. It is as well imperative to consider that the population that we are intending to might not precisely be the same as the one we end-up sampling from since some of the persons in the population-of-interest might decline to be included in the study thus there can be a difference between the target-population and the actual-population. 3.2 Sample A sample is made-up of a proportion of objects or persons from the total population. Although, in most case because of the resource scarcity, it is unlikely that we might be able to gather information from all the population, so instead we must take a sample. 3.3 Cases Every sample is made-up of the objects or individuals under study that may be referred to by a diversity of names such as sampling units. Apiece individual or object we would measure some variable/variables which may be physical (e.g. blood-pressure), they may represent feelings or thoughts (e.g. anxiety) or represent events in the life of individual (e.g. number of visits to a health clinic). Every object or individual we take a measurement, it is recognized as a case, so it is very important to take into account that all the statistical variables are measurable; otherwise the variable can’t deal with statistics. For every case the measurement of each case may be referred to as a value and so the collection of values is named as the data. For each case we ought to state: the measurement, the variable being measured, the sampling unit, the actual sample and the population being studied, for instance, when we are going to studying hypertension in populations living near ironworks, 80 mm Hg will be measurement, will be diastolic blood pressure will be the variable , and the individuals who are in the ironworks vicinity will be the sampling unit and the sample 11
  • 12. will be made up of those individuals from whom measurements were taken. However, the real population is thus defined as all individuals willing and able to be measured living within the ironwork vicinity. 3.4 Statistical and Real Population The defined population includes the word willing and able that defines clearly we cannot take measurements from those who doesn’t comply the consent to take part in the study (i.e. from the example above, there is a slight disparity between population defined and the one that originally we intended to job on. There could be a dissimilarity from the population from whom the measurements were sampled, frequently named the statistical population, and the real population or biological population, that is, all those in the ironwork vicinity. Therefore the researcher have to beware that not all persons in a population are obtainable to become sampling units, as far as in some instances, what comprises the sampling unit and the variables being measured is not yet clear. Table 1: A range of different variables and their likely sampling units Variable Sample Sampling unit Occurrences of MRSA on a ward Wards which record occurrences of MRSA that you collected data from Ward Length of stay in Hospital Hospitals that record length of stay that you collected data from Hospital Purse rate Individuals that you record pulse rate from Individual Gender Individuals whose gender you recorded Individual Hospital position in league table Hospital uses to formulate league table Hospital Number of live birth Units of area from which records of live birth have been collected Unit of area If looking at the example in table 3.1, we find the range of variables that we can use is fairly wide, also we notice that some variables can be expressed are noticed as numbers on a scale, at the same time as others such as gender which are discrete. This difference is imperative thus before identifying a suitable statistical test we have to identify the variable type and the scale measurement type. 3.5 Measurement Scales 12
  • 13. In statistics there are recognized four types of measurement scales for variables, which are: (i) nominal scale (ii) ordinal scale (iii) interval scale and (iv) ratio scale. 3.5.1 The nominal Scale On the scale hierarchy the nominal scale is the lowest. Statisticians It has referred to a scales hierarchy by statisticians for the reason that every scale further up the order having the features of those of before. In nominal scale the data take place in named groups and are classified into the groups as well that are mutually exclusive; an item of data can’t be in more than one group. A good example of a nominal variable would be ethnic group or home town. Some nominal scaled variables have only two categories which are known dichotomous variables and some have more. Example: Gender with Male (0) and Female (1) 3.5.2 Ordinal Measurement Scale Data of ordinal-scale are similar to nominal-scale but the names of the groups contain an idea of position or rank, for example the grades of staff nurses within a health service by assessing the names of the groups, like staff-nurse, charge-nurse and matron conveying an order or position. The position, however, do not suggest how much higher or lower the positions are. The scale is utilized to arrange the measures from lowest to highest, but we would not say that a charge-nurse is two times as high as a staff-nurse. Ordinal-scales can be expressed in terms of names, as numbers, i.e. 1st and 2nd in a race and as letters as in the UK system for grading nurses, e.g. A-I. Remember, a study may use more than one measurement scale. We are supposed to practice identifying appropriate scales because when we look at every statistical test, it is imperative to consider that it is the scale that the measurement is made on which largely decides the type of statistical test type to be applied. Following are the examples of ordinal scaled variables: Variable: Frequency Never (0) Rarely (1) Sometimes (2) Frequently (4) Always (5) Variable: Satisfaction Dissatisfied (1) Satisfied (2) Very Satisfied (3) Variable: Performance Assessment Attainment Goal not attained (1) Goal Attained (2) Goal Exceeded (3) 3.5.3 Ratio or Interval Measurement Scales 13
  • 14. For the Ratio or interval scales positions are used except in these the distance between the positions is determined and the distance between the scales is determined, and the points between the positions could be sub-divided. Ratio-scales have a true zero point although are alike to interval-scales. For instance, we recorded temperature we would know that the difference between 37°C and 38°C is 1°C, as a result the scale is interval, but the centigrade scale doesn’t have a true zero point, therefore it is impossible to tell that 20°C is two times as hot as 10°C. A ratio-scale example is weight; it does have a true zero, thus many biological phenomena measures are performed on either the interval or the ratio scale. 3.6 Errors When conducting a study, error might happen in many different ways and so statistics can be utilized for accounting or assessing some of this error but not all. Start looking at three kinds of errors that we need to think on while conducting our research or thinking critically about studies of other researchers. All studies associated with errors, thus the more error that is present the less the researcher can rely on the study. 3.6.1 Measurement Error Measurement error happens when we are measuring things since most of the instruments we require to utilize measuring things with are not hundred per cent accurate, although we are considering something as easy as recording the number of patients/clients who turn-up at a GP surgery, or taking a measurement using a ruler, errors of measurement will happen. Therefore, this realize that, to some degree error associated with the use of apparatus and instruments to measure phenomena is easier to consider than error associated with methods such as observation used to record human behaviors. Measurement errors happen in most studies even when the variable is simply a count of something thus often counters make errors. These measurement errors might be random or systematic, whereby a random error happens by chance and there is an equivalent chance that it will be either lower or higher than the true value, however, they are not significant if they are adequately small than the overall variation in the data. On the other hand, systematic errors are those that are consistently either higher or smaller 14
  • 15. than the true value thus possible factors that lead to systematic errors have to be considered at the designing stage. To deal with measurement errors we need to be aware of choose a measurement tool that gives you the greatest level of accuracy, although you will need to consider the resources available to you. For example, high accuracy may involve spending more money or time. If we are using surveyors or interviewers we have to make sure that they have to be trained of how to use the data collection tools, (i.e. the questionnaire). 3.6.2 Consistency We are as well required to be conscious that some errors will happen due to measurement tools un-consistent, that is, for a person item under measurement an instrument, like a peak flow meter, will not all the time consistently make the same measurement therefore, instruments are required to be calibrated before they are utilized. It is recommended to measure how consistent the instruments are between measurements, and between the persons using the instruments. Then we can be conscious of the error when discussing our outcomes. 3.6.3 Design Error If the design of the experiment is flawed the design error is likely to occur, also there are of course varying degrees to which the study can be flawed and certainly most studies are flawed in some way. If the design error is large, however, the study will need to be abandoned therefore, it is so significant to get the right design. It is imperative to be conscious of some of the errors you are likely to come across by avoiding most common error which is that, the sample is not drawn truly randomly from the population and thus the conclusions drawn are not appropriate for the population as a whole. 3.6.4 Sampling Error Many statistical techniques try to take account of sampling error which is the difference between the sample and the population. If we take a sample it is unlikely that the measure taken from the person of the sample will precisely match that of the population. When we look to see how representative a sample is of the population or whether or not two samples come from the same population we need to take account of sampling error. 15
  • 16. Sampling error will increase the more variation within the sample you are measuring, and will reduce as the sample size increases. The relationship between sample size, sample variation and sample error is imperative since one question we are bound to ask is how large should my sample be? The answer is, it depends on the variation within the sample. If we were working with the variable height, using a population of basketball players, the variation in height of this population would be very small, and thus the sample size that we would need would be small. If, however, we are working on variable height across the global population the variation would be large and a larger sample would be needed. 4 QUESTIONNAIRES Questionnaires are an approach that is recognized and adopted by numerous studies that are related to health. However, the subject of questionnaires and research design is enormous thus can’t be addressed in detail in some minutes. Questionnaires could be utilized to give additional information from people about a particular subject matter. The questionnaire describes the uses of closed questions with the range of answers determined by the researcher that lends itself to statistical analysis. As once a questionnaire is constructed, it is as a rule referred to as a tool or an instrument. 4.1 Questionnaire Design The term questionnaire usually means a form containing a set of predetermined questions used for gathering information (data) from and about people as part of a survey (survey is used to describe a research approach that attempts to cover as wide a range of the population as possible for acquiring information about a subject). Actually, the use of questionnaires requires a great deal of time and effort in terms of careful planning, ordering and sequencing of the questions and the responses in order to obtain relevant data and needs to address the questions of design early in the study. The main questions to ask when thinking about designing a questionnaire are: What is it that I wish to find out, otherwise, is it to do with knowledge or attitudes or levels of understanding, or is it about behaviour or activities or decision making and for what is it that I wish to find out? 16
  • 17. In undertaking a questionnaire based study, questionnaires may be utilized in different kinds of research, e.g. descriptive, attitudinal and comparative and there are a lot of dissimilar kinds of questionnaire and will depend on the research question being inquired. Questionnaires may be utilized to produce both qualitative and quantitative data as well as a means to describing a population, to investigate cause and effect, and to monitor change over time. Frequently, questionnaires attempt to describe a population’s behaviour, its attitude, and view with regard to a certain topic, sympathetic of an issue or level of sympathetic. A survey frequently uses questionnaires in order to get information, and sometimes an interviewer may also be used, as far subjects, respondents or participants are often terms of people taking part in the survey. On these phenomena questionnaires help to put numerical indicators when is used as a quantitative tool because they are a good method of data collection, although such data might tend to be superficial, as there is no room for extracting or probing the responses meaning. However, questionnaires are a cost efficient way of collecting large data quantities in a short period of time, and if the questionnaire is properly structured large quantities of data can collected and subjected to statistical analysis, and the methods that questionnaires can be administered comprises not merely face to face interview but as well World Wide Web, telephone, mail and e-mail. 4.2 Sample Questionnaire I will describe a questionnaire that describes the behaviour of a peoples’ group. Let us say, for instance, I wanted to be acquainted with more concerning the epidemiology of HIV and AIDS in a certain village. It may be useful to know more on the general sexual- health of the population in that village. Safe-sex is much talked about in the media, and only way of keeping-safe is to utilize the contraception barrier methods and the government is spending a large amount of money on health promotion programmes to educate the general population about safe-sex. **Now, how many are there in the village sexually active people? Do all sexually active people are using a barrier method of contraception in practicing safe-sex such as condoms in the Village? However, surveys have to be used to give an indication of how many and a questionnaire has to then be planed to distribute to sexually active adults so as to 17
  • 18. discover what brands of sexual guard (i.e. femidom, condom or dental dam) are utilized. This type of survey will try to discover when those individuals utilized protection, when they were most possible and least possible to utilize protection, and how much cash was spent on sexual protection per week/month. There might be helpful information in a health promotion context:  Whether running a safe-sex campaign within a family planning Village Clinic.  Or in a genito-urinary Village Clinic.  When recognizing specific health promotion needs used in recognizing trends in the behaviour of client/patient.  When considering issues on social policy relating to such health issues as national trends in the development of HIV-AIDS. At first, I must be targeted the population that is to be sampled, for instance, permission from those adults over the eighteen years of age. This means I am not going to get any information concerning those under eighteen, as they are not part of the targeted population. The questionnaire example This survey is attempting to find out about the use of barrier methods of contraception as protection. Your answers will be treated in confidence and will help us plan health care services. Please indicate your responses by placing a cross in the box next to the answer you think best represents your answer: (1) Strongly agree (2) agree (3) not sure (4) disagree (5) strongly disagree Q1. I use the following types of barrier protection? Always Never 1 2 3 4 5 None Condom Femidom Dental dam Cap (dutch cap) Q2. When are you most likely to use sexual protection? Agree Disagree 18
  • 19. 1 2 3 4 5 I never use barrier methods of sexual protection I sometimes use protection when I remember I use a condom every time I have anal penetrative sex To avoid pregnancy Allergic to latex To avoid getting a sexually transmitted disease Q3. Please indicate your circumstances: which of the following categories applies to you? Tick those that do. Single Married or cohabiting Living with spouse Living with spouse I have many sexual partners I have one sexual partner  Thank you for completing this questionnaire.  Please return it in the enclosed pre-paid envelope.  Your responses will be treated in confidence. 4.3 Scales from the Sample Questionnaire When asking question to a respondents we rather frequently provide them a range of choices so that they can indicate where their answer lies according to a scale. The most commonly scale utilized is the Likert-scale, which determines the level to which a person agrees or disagrees with the inquiry and this scale is 1 to 5 of which frequently this scale might be (1) strongly disagree, (2) disagree, (3) not sure, (4) agree and (5) strongly agree. One common problem with using a scale like this which is based on an odd (5) number of choices is that, it all the time gives the choice to opt the middle point, thus often it’s easier for the subjects to take this easy choice than to struggle to make a decision. Another type of scale is the Thurstone scale that uses only two points: agree and disagree, it is customary to inquire numerous related questions that could be utilized to 19
  • 20. produce an overall score for an individual respondent that will be compared with that for the sample as a whole, or utilized so that differing populations are compared. A common example of the use of Thurstone scales is in psychometric tests that test aptitude and attitude, like a test as part of a job interview. However an alternative approach is to utilize semantic differential scales, which are good for investigating phenomena such as attitude and values, as far as are based on opposite points of view, or potential emotions concerning a subject or concept. Contrary to that, for example, when investigating people’s job environment, we ask questions concerning how they felt about aspects of the environment, and the response scale may range from helpful, nurturing, happy to unsupportive, blocking, toxic, dysfunctional. The concept (in table 4.1) below is day surgery; patients are asked to indicate where on the scale they lie. You would maybe want to ask more questions to get a good overview of an individual’s impressions of day surgery. Note that there is no consistent negative end of the scale. This helps to persuade the respondents to think about their answers. Table 2: A patient’s responses when asked to consider day surgery (questions asked pre-operatively) Exciting 1 2 3 4 5 6 Boring Frightenin g 1 2 3 4 5 6 Calming Useful 1 2 3 4 5 6 Useless Fast 1 2 3 4 5 6 Slow 5 THE STUDIES The Studies provide the background and data from two hypothetical studies, one from a clinical trial and the other from a questionnaire that enables us to complete a basic analysis of data sets and be able to draw some conclusions and be in a better position to analyze critically other studies that will have been devised to be relevant to modern 20
  • 21. health care as well as to stimulate our interest and enjoyment of data analysis (the following is the questionnaire example). 5.1 The questionnaire Study The following questionnaire is an example, concerns the sexual health of individuals who presented for advice at a walk-in clinic in central Kigoma. Here we show the extended version, although for the sake of simplicity we have reduced the extent of some of the questions. The aim of this study is to provide basic information as to the sexual behaviour of individuals of differing sex, age and ethnic group who used the walk-in clinic, and was initiated because the clinical leader considered that a large proportion of the clients were presenting with symptoms that related to sexual heath and he was considering putting in a bid for funds to support the employment of a specialist in this area, consequently, it is largely descriptive and exploratory in nature. However, the population of the study is all those individuals who could potentially use the walk-in centre and so the sample is made up of those that actually did enter the clinic and complete the questionnaire. 5.2 The Questionnaire This survey is attempting to find out about the use of the walk-in clinic in relation to sexual health. Your answers will be treated in confidence and will help us plan health care services. Please indicate your responses by placing a cross in the box next to the answer you think best represents your answer. 1. What made you attend the walk-in centre today? Agree Disagree i. Location ii. Availability iii. Access to medical staff iv. Access to nursing staff v. Emergency treatment required 21
  • 22. 2. What symptoms are you experiencing? Please tick the one you feel best describes your symptoms i. Headache ii. Chills/shakes iii. Temperature iv. Lack of appetite v. Feeling generally unwell all over vi. Cough vii. Pain/soreness in chest viii. Faintness ix. Collapse x. Pain or difficulty passing urine xi. Discharge from penis xii. Discharge from vagina 3. Which of the following descriptions best represents your sex life? (Please tick.) i. Very active (I have sex more than five times per week) ii. Active (I have sex between once and five times per week) iii. Not very active (I have sex less than once per week) iv. Non-existent 4. Please indicate how many sexual partners you have shared sex with in the past month. 5. My most frequent choice of barrier protection is: (Please tick one) i. None ii. Condom iii. Femidom iv. Dental dam v. Cap (Dutch cap) 6. When are you most likely to use sexual protection? Agree / Disagree i. I never use barrier methods of sexual protection 22
  • 23. ii. I sometimes use protection when I remember iii. I use protection (put a condom on) if I do not know my sexual partner very well iv. I use protection (put a condom on) when I think I am going to climax v. I use a condom every time I have oral sex vi. I use a condom every time I have vaginal penetrative sex vii. I use a condom every time I have anal penetrative sex viii. To avoid pregnancy ix. To avoid getting a sexually transmitted disease 7. What puts you off using sexual protection? Agree / Disagree i. Not very comfortable ii. Loss of sensation/cannot feel anything iii. Too fiddly to have to open packets iv. Cost v. By the time the packet is open the moment has gone vi. Need to use a lubricant as well vii. Have to plan sex in advance viii. Allergic to latex 8. How much money do you spend, on average, on sexual protection per week? Please indicate which category applies to you, using a tick: i. £0 ii. £0–£5 iii. £5–£10 iv. £10–15 v. I get them free 9. My age group is: Please indicate which category applies to you using a tick. 16–24 25–34 35–44 45–54 55–64 65–74 75–100 11. I consider my sexual orientation to be: (Please tick one) i. Heterosexual male ii. Heterosexual female 23
  • 24. iii. Gay (homosexual) male iv. Gay (lesbian) femal v. Bisexual male vi. Bisexual female vii. Asexual 12. I consider my ethnic background to be (Please tick): i. African ii. Asian iii. European **The questionnaire has been simplified for the purposes of my assignment. During data entering into charts/computers it is frequently easier to employ a line for every subject and enter values across the rows for the variables measured, however, all through codes shall be utilized to make simpler the data entry. In Table 5.1 the following codes have utilized: M for gender male and F for female; A code from 1 (very active) to 4 (non-existent) for sexual activity; A codes 1–5 for barrier choice for the participant selected most used protection and 1 is the first response possible (none) and 5 the last (Cap). For question 7 we have used a similar system to that used for barrier choice where (i) is the first option and (vii) the last. For the presentation type we have also used the same system as for questions 5 and 7, with 1 equivalent to headache and 12 to discharge from vagina. 5.3 The Clinical Trial This experimental trial looks at the ability of the novel drug Symphadiol (as Placebo) to help increase weight loss in individuals who are trying to lose weight using a calorie- controlled diet. This clinical trial is being organized by Spinto, a drug company, which is active in the field that has recruited individuals to take part in their study using a network of dieters’ groups, who were invited to take part by Spinto’s clinical trials specialist nurse to conduct the study, with the support of local GPs and Spinto’s dietician. The aim of the experiment is to test the hypothesis that a daily dose of Symphadiol enhances weight 24
  • 25. loss, in clinically obese individuals, compared with just using a calorie-controlled diet. It was decided to select men between the ages of twenty two and forty. The aim of the experiment is to test the hypothesis that a daily dose of Symphadiol enhances weight loss, in clinically obese individuals, compared with just using a calorie- controlled diet. It was decided to select men between the ages of twenty two and forty for the study. It was also decided to look at the impact of exercise in conjunction with Symphadiol. The population in this study is all healthy obese male individuals who are sufficiently motivated to lose weight to join a diet network and they must not be taking any medication, except that required for minor ailments. Table 3: The example of data from questionnaire Questions 13 12 3 4 5 7 2 9 Individu al Se x Ethnic group Sexual activity Partner s Barrier s choice Reason not Presentatio n Age 1 F E 3 1 1 1 1 22 2 F E 2 2 2 5 7 28 3 F E 1 8 2 7 4 21 4 F AS 2 1 2 1 9 26 5 F AS 4 0 0 5 12 27 6 F AS 2 1 2 7 6 30 7 F AS 4 0 0 5 5 29 8 F E 2 2 1 5 12 23 9 F E 4 0 0 3 10 21 10 M E 2 1 3 2 9 32 11 M E 1 1 5 2 3 18 12 M A 1 1 1 7 10 24 13 M A 1 3 5 6 12 23 13 M AS 2 0 2 1 7 23 14 M AS 1 4 0 3 12 20 15 M A 3 0 0 5 1 36 25
  • 26. 16 M E 4 4 2 5 2 20 17 F E 4 1 3 1 5 23 18 M E 1 0 2 3 3 24 19 M E 3 2 1 3 3 20 20 F A 2 0 2 1 12 36 6 DESCRIPTIVE STATISTICS Descriptive statistics are utilized in healthcare for numerous aspects, mostly to describe and summarize detailed data collected in a manner that can be interpreted quickly and easily in a study, and actually, they are most recognized kind of statistic that are probable encountered. They are utilized to administer, to watch and to evaluate services for health and the persons who job in a health facility/ organization. Consequently, to understand descriptive statistics is very significant for individuals who job in the healthcare surroundings. Usually descriptive statistics have two kinds: (1) Central tendency measures (typical value), the mean, the mode and the median, (2) Variability measures concerning the typical value (dispersion measures), which are range, inter-quartile range, variance and standard deviation. 6.1 Measures of Central Tendency A central tendency measure is a solitary value that tries to explain the data set by recognizing the position of the centre within that data set, and is sometimes known as central location measures as well they are classed as summary-statistics. The mean which frequently known the average, that is most probable the central tendency measure that we are most recognizable with, also there are median and the mode 6.1.1 Mean (Arithmetic) Mathematics mean are usually known as the average. In calculating the average time spent in a sports ground in a month, a scientist has to take the times spent during every visit, adding them together and dividing by the number of visits. Table 1 below, for example, shows the time that Amani spent in the sports ground throughout a month of January. Hence the mean (average) time length used-up in the sports ground is 48.16 min. Yet, 48.16 min score falls roughly in the centre, of the given data, with six visit 26
  • 27. times greater and with six visit times less than 48.16. But this case with the mean doesn’t always appear. Calculations: Table 4: Time spent by an individual at the gym during July Visit Time spent (minutes) 1 62 2 34 3 50 4 40 5 58 6 48 7 38 8 60 9 58 10 45 11 53 12 32 To calculate the mean: Mean = Minutes spent in each visit Total number of visits = 32 +34 +38+40 +45 +48 +50 +53 +58 +58 +60 +62 12 = 578 Mean = 48.16 Since nearly all statistics occupies computation that are performed on a basic-calculator that have an ability to let a scientist to compute the basic-statistical quicker, and if sometimes data sets turn out to be large use a package of computer. In the world of statistics as being a branch of mathematics, statistical tests are frequently represented as a formula and thus are much lies within the uses of symbols thereby recognizing them is akin to study a new language that more often than not let persons persuade that they can’t comprehend statistics. Subsequently here is an introduction of the few: x A person case akin to single visit to the sports ground. n The cases number (x) so twelve visits to the sports ground makes that n = 12. The summation of’ e.g. all the xs added-up, from the case above, the sum of x. = 578 _ x. The symbol representing the mean. 27
  • 28. Different from using words, one more method to describe the mean is by calculations using the following formula: **This formula is used while want to calculate the mean for a sample obtained from a population. It is advised to write-out in words every symbol in case of any difficulty following a statistical formula. Therefore for the mean we have the following formula: Mean = The sum of the cases The number of cases One disadvantage of the mean can mislead if some of the values are un- usually large or small. If, for example, Amani one day had been in the sports ground for only 5 min, then if adding this value into the data it might have an effect on the average score, creating the 44.85 min as an average time (see below). 5, 32, 34, 38, 40, 45, 48, 50, 53, 58, 58, 60, 62, 44.85 Really, this un-usual score of 5minutes will make the mean much lower; hence, it isn’t a true central point reflection and so called Outliers as far as they are excessive values. Symbols used while discussing on populations: N The size of population. µ The mean of population. s The population standard deviation. 6.1.2 Median The median assist in solving problems of outliers since, quite than utilizing every value in calculating the central tendency statistics, instead it utilizes just the value that are found in the center of the data which is the physical centre. It is necessary to have the data to be put in an ascending order when desire to calculate the median. Taking two examples below of exercise time 28
  • 29. and comprise the low value; it reveals that it has a scanty impact on the median even when it just happens at one end. 5, 32, 34, 38, 40, 45, 48, 50, 53, 58, 58, 60, 62, Median = 48 The great value just goes the median by 1, and the middle falls in the middle of two values, thus necessary to add the two middle numbers and divide by 2, for example: 32, 34, 38, 40, 45, 48, 50, 53, 58, 58, 60, 62, Median = 448 + 50 2 =49 . Median = 49 6.1.3 Mode The value that occurring most frequently is the mode, thus from the data set below, finding it is 58. 32, 34, 38, 40, 45, 48, 50, 53, 58, 58, 60, 62, Mode demonstrates the common value that appears in a data set, and its one advantage, it may be utilized on nominal and continuous data and sometimes it is merely the choice for measuring central tendency, while the mean and median may only be utilized on continuous data. For instance, if one question of a lifestyle survey asking is where students might primarily go for family-planning recommendation:  GP 5  Practice nurse 4  Family planning clinic 6  Friends 8  Chemist 2  Nowhere 2 29
  • 30. Consequently, category of friends is the category that is most commonly occurring, by having 8 scores and therefore in the data list above, the mean and median might not be of importance. 6.2 Choosing a Measure of Central Tendency Mean is mostly common to use since it is the most sensitive as far it considers each value of every case within the distribution, at the same time it is based mathematically subsequently it may be utilized in the statistical computations later, but it may only be utilized on measurement level for ratio or interval data and is simply deformed by outliers. Table 5 : Choosing a Measure of Central Tendency Choosing a Measure of Central Tendency Measure When to Use When Not to Use Mean Interval or ratio data Categorical data Ordinal data For most data sets, where the cases or less symmetrically distributed about the mean When there are outliers or the data are heavily skewed Where the measure is going to be used in further calculations Interval, ratio or ordinal data Median Data heavily skewed, mean distorted by outliers When the measure will be used in further calculations Categorical Mode Ordinal data There is no wrong or right central tendency measurement in many situations and persons like to choose the mathematical mean for the reason that they are recognizable with it, but it is best to keep in mind that the descriptive statistics major point is the communicate/organize information, thus a statistician ought to opt the gauge that takes the information in the optimum potential way and doesn’t misinform the spectators. 6.3 Measures of Dispersion Thus far, after having already dealt with the central values, then we have to look at statistics that notify the different scores within a sample; however, the term provided to measures that informing about variability level with the data is dispersion, thus there would be slight statistics requirement, if there might be no variation in populations. 30
  • 31. For example, the two groups of students were asked, one part time and one group full time, at what age at the time they had their first experience of sexual intercourse with another person. --- This subscription tells us that the mean is that of group AGroup A 14, 15, 18, 22, 23 = X 18.4 A --- Group B 17, 18, 19, 19, 20, = X 18.6 B Whilst the means are the same it is obvious that there is a difference in the variability of the values of the cases with the groups. In terms of using the statistics to develop health care practices, knowing the variability of values is as important as knowing the mean. After all, if shoe manufacturers only made shoes for people with average-size feet they would soon go out of business. We want to know about the variability in patients’ health that we will encounter and in their behaviors. 6.3.1 Range The range is one of the dispersion measures. Calculating this you subtract the lowest score as of the highest, so the group A range is: 23 - 14 = 9 The group B range is: 20 - 17 = 3 This means for group A scores are spread over 9 units, other-than for group B they cover only 3. This expresses that group A has a greater range of values in its cases. The ranges are a easy and quick way of approximating the level of variation within the sample, but must be careful of outliers. 6.3.2 Inter-quartile Range The inter-quartile range (IQR) is an alternative that have to be used to deal with outliers. A quartile proposed as a name, and is obtained by quartering the set of data, after placing that data set in an ascending order and divide into 4 quarters and the numbers at the limits of these quarters are recognized as quartiles. However, this inter-quartile range is in-fact the range for the middle 50% of the data and can be computed by situating the first quartile upper value which is the first 25% of the cases and subtract it 31
  • 32. from the third quartile upper value. For example, a group of teenagers asked for how many year they have thought it was safe to take the pills of contraceptive (Q quartile). 2, 2, 4, 5, 8, 10, 10, 10, 12, 15, 15, 30 Q1 Q2 Q3 From the above data set the range is twenty eighty but it is affected by the case with the value of thirty since there are twelve cases in the data set, and so there will be three cases in every quartile. Now, at the first quartile end, i.e. the first 25% of the cases, the value is four and at the third quartile end, i.e. first 75% of the cases, the value is twelve. Therefore the difference in these two values (eight), it is the inter-quartile range (IQR). 6.4 The Standard Deviation and Variance The variance and the standard deviation are the two most common dispersion measures that the amount of deviation from the mean is indicated, whereby the standard deviation is most quoted in describing data is, at the same time as the variance is utilized in numerous tests for statistics. Consequently, the variation is measure by standard deviation in the data that evaluates how much every case deviates from the mean, for example, if the mean is six, and a person case is eight, therefore, the deviation is two. At the same time as it is sufficiently and takes care of the statistic deviation part, it is actually the standard tad that is significant, hence a statistician has to take and observe all the deviations, and relating them to the mean size to obtain standard deviation. This is imperative because a deviation of 2 is of much less significance if the mean is 110 than if the mean is 8. Table 6: Age at first sexual experience (with another person) Case 1 2 3 4 5 6 7 8 9 10 Age at first sexual experience 14 15 17 18 18 19 19 20 22 23 32
  • 33. The standard deviation formula given below, it actually engages quite straightforward mathematics and looks complex, but using the data in Table 2 above it take through the formula step by step. s is the symbol for the standard deviation and s square is the symbol for the variance. Hence the square-root of variance is simply the standard deviation. The calculation steps are outlined below: Step 1: Calculate the mean. Step 2: Subtract the mean from each value 14 - 18.5 =-4.5 15 - 18.5 =-3.5 17 - 18.5 =-1.5 18 - 18.5 =-0.5 18 - 18.5 =-0.5 19 - 18.5 = 0.5 19 - 18.5 = 0.5 20 - 18.5 = 1.5 22 - 18.5 = 3.5 23 - 18.5 = 4.5 2 Step 3: Square each answer obtained in step 2. Step 4: Add up all the answers to step 3. This value is called the sum of squares. Step 5: Minus 1 from the size of your sample (n - 1). Step 6: Divide the value found in step 4 by the value calculated in step 5: This is called the variance. Step 7: Find the square root of the value obtained in step 6 to determine the value of one standard deviation: 7.83 = 2.80 Therefore our sample standard deviation is 2.80. When wanting to do a statistical test, have to be careful of sets of data, where the square of the standard deviation that is the variance, exceeds much the mean i.e. 2 33 x – _ x.
  • 34. times, if found the variance is smaller in relation to the mean or equals the mean. This shows signs that the set of data will have a more complex-form and requires to be handled in a conscientious way. However, it is significant when quoting a mean all the time to provide a dispersion measure since a mean measure without a dispersion measure is not easy to interpret and size of the sample is also supposed to be included. Table 7: Central tendency and measure of Dispersion Central tendency and measure of Dispersion Type of data Measure of central tendency Measure of Dispersion For most data sets, where the cases are more or less symmetrically distributed about the mean Mean; there should be no need to quote any other measure because all measures of central tendency for this type of data will be similar Standard deviation and also consider giving the range Interval ratio or ordinal data, data heavily skewed, mean distorted by outliers Median though you might also quote the mean Range and quartiles Nominal Mode No measure 7 DISPLAYING DATA The two most common forms of data display are graphs and tables. Both have the same aim, to summarize and present the data in a manner that is easy to understand and take in. Displaying your data is an essential part of analyzing the data. It allows graphs you to establish how the data are distributed, to see unusual cases and generally get a ‘feel’ for the data. Table 8: Acklin and Bernat (1987) data for the two indices of depression measures in patients with a range of conditions Patient type Index LBP Depressives Personality discover Non-patients Egocentricity 0.31 0.32 0.42 0.39 Sum morbid content 0.82 3.47 0.99 0.70 Tables present information in a text-based form. As such, much of the detail in the data can be retained. Unfortunately taking in lots of different numbers and seeing emerging patterns is rather difficult, and this is where graphs come in. When we present data in graphical form some of the detail tends to be lost but it becomes much easier to see the 34
  • 35. emerging patterns. In the example below we show some data from Acklin and Bernat (1987) in graphical (Figure 7.1) and in Table (7.1) form. Acklin and Bernat’s study examined the relationship between chronic low back pain and depression. In the graph and the table we have taken two of the indices of depression that they used and plotted them against patient type. Egocentricity Sum morbid content Figure 1: Indices of depression recorded for a range of different patient types (Acklin and Bernat 1987) Which display form makes it easier to see the trend? Which allows you to see most detail? As a general rule the more data put into a table the more it will become harder to read, and less likely to be read. Tables should be used when the data set is very simple or when data set are needed to be shown in great detail. The data we used for these graphs are already summaries of the data collected. This means that the figures are averages. 7.1 Table Types There are several different types of table that can be used. Your choice of table will depend on the type and number of variables that you have. In the example above (Table 7.1) there are two types of variable. Along the top of the table we have the nominal category ‘patient type’ whilst down the side of the table we have two interval scale variables, that is, the indices of depression. Table 9: Frequency of different ethnic groups of a sample of 178 Individuals interviewed in north-east London Ethnic group Frequency in sample White European (EU) 75 African 35 35
  • 36. Indian 32 Afro-Caribbean 26 Other 10 Other tables may have just one variable which runs either along the top or down the side of the table, there measurement could be the frequency or occurrence of that particular variable. In such a case the table becomes a frequency table and it is normal to have the most commonly occurring frequency at the top (see Table. 7.2). Sometimes tables may include summary statistics (as does Table 7.1). In Table 7.3 the bottom row is a summary of the data within the table. Tables that report on the frequency values of two nominal variables simultaneously, and that include totals, are often used to help look for associations between variables. These tables are known as contingency tables. Table 10: Summary examination results for a group of 122 first-year students’ nurses Exam paper % (average) 1 53 2 46 3 58 4 43 Average 50 7.2 Types of Graph From the guiding principle depicts less is more when using graphs, since in the data graphing if too much information is comprised there will distort the meaning. There are a variety of graph types. The main types you will see are frequency charts, histograms, bar charts, pie charts, scatter graphs and line graphs. The type of chart chosen to utilize will depend on the type and complexity of the data. Many graphs are plotted against two axes, the horizontal axis and the vertical axis, however the point to consider is that if your graph is drawn incorrectly it may mislead another audience. 7.2.1 Frequency Charts Data are divided into categories or/and counts of how frequently a certain value occurs and these occurring counts are referred as frequencies. One of the simplest type of data is nominal data, where things are categorized, then count the number of things in each 36
  • 37. category. Few examples of such categories are different types of diseases or the number of individuals belonging to each ethnic group or the number of males and females. If these data are measured initially on an interval or ratio scale, the most appropriate form of display is a histogram. Plotting data using a frequency histogram permits us to obtain a thought of how the data are distributed and as well as to have a feel for the data in a frequency histogram, the x axis covers the values range of the cases, however, each distance covered by a bar on the x axis stands for a range of potential recordable values for the measure so it necessitate to decide the size of the range categories that you will utilize. 7.2.2 Pie Charts The pie charts are an alternative form of frequency chart best used with ordinal or nominal scale data are pie charts, since they displays the count of things in each nominal group category as a proportion or frequency of the total number of counts. The total set of data is represented as a circle that is divided into segments the size of which reflects the frequency of every nominal group. For example, 7.2.3 Bar Charts for Summary Statistical Information The bar charts may as well be utilized for summary statistical information display such as means and standard deviations. In Figure 7.1 the means of two depression indices are plotted on a bar chart. This could also be plotted on the graph an indication of the variation in the data; this is frequently done in the form of error bars which are small vertical lines with horizontal bars at the top and bottom that mark the range of the mean plus or minus one standard deviation or standard error. 7.2.4 Scatter Graph (Scatter Plot) Scatter graph is utilized when there is relationship between two variables that is the value of one of the variables related with the value of another, for instance, height and weight are quite often closely linked. Yet, scatter graphs may be utilized with the data of interval, ratio or ordinal that have been collected in pairs, for example, measured of both the weight and the height of each of participants. 37
  • 38. The x axis carries the scale for one of the variables and the y axis the other then for each sample unit the points are plotted on the graph at the place where the values of x and y for a particular sampling unit meet. The graph will show the points scattered over the surface of the graph if you have enough data. 7.2.5 Line Graphs Similarly a scatter graph, a line graph is except that the points in this graph are plotted in sequence as the values increase along the x axis and a line is drawn between every point and the next. The line graphs are perfect for showing sequences, for example plots of patient study over-time, or the infants growth over-time. In general, line graphs have to be utilized simply when there is a best reason to presume that the line drawn between the points does actually represent what in all probability will occur. Consequently, they really ought not to be utilized for grouped data, for instance, monthly means or counts, although in practice they frequently done. Really, utilized in this manner, line graphs are rather a good method of permitting comparisons in trends across different the data groups. Table 11: Types of graphs Summary Type When to use Histogram For showing frequency distribution of data measured on the interval or ratio scales Bar chart Use for displaying frequencies of nominal or ordinal data, also for comparing measures of central tendency between groups of data measured on ordinal, interval or ratio scales Pie chart Used largely for showing frequency distribution of nominal data. Try to avoid using pie chart to compare different groups of data. Scatter graph Use with interval, ratio and ordinal data when if you want to see if two variables are linked. Two or more variables must be measured from each sampling unit. Line graph Used for data measured on interval, ratio or ordinal scales, particularly when you want to display a trend or change over time. Particularly useful for display trends in several groups of data at once. Avoid joining points if there is no reason to do so. 8 HYPOTHESIS 38
  • 39. The hypothesis is a proposed explanation for an observation that leads to a prediction(s) that through our investigation and the use of statistics we will either confirm or reject and in so doing test the validity of the hypothesis, otherwise simply a method of synthesizing an idea or an explanation as they are central to most studies that involve the collection of quantitative data and statistics and are usually built from a previous observation or experience that will lead to a prediction that there is a relationship or link between two or more variables for example if we are interested in studying the relationship between sexual activity and sexual health using placebo trial when interested in obesity and how it affects post-operative recovery time, within the broad areas of study, we have some specific relationships we wish to explore. Hypothesis building  Observation: A manager for sexual-health clinic reports that patients/clients from certain post code areas seem to be infrequent visitors to the clinic.  Hypothesis: Persons who live farther away from the clinic are less likely to visit.  Study: Make a detailed analysis of the distance people live away from the clinic and the frequency of visits. 8.1 Forming the Hypothesis for the Experiment / Study One of the hypotheses from the first investigation is that males are less likely to use the walk-in clinic than females. One of the hypotheses from the second study is that obese patients treated with Placebo will lose weight faster than those not given the drug. In this the predictions are highlighted in bold. 8.1.1 Experimental Approaches Some investigations will ask for testing those associations by experiments as an approach of finding answers since they keep the variables which we are not interested in constant, for example, in the second investigation we would divide the patients/clients into 2 clusters and we would subject 1 cluster to treatment with Placebo. Variables that we are not interested in, must be controlled carefully (i.e. the patient’s sex, age and socio-economic group) by making sure the composition of the two groups are alike, as we manipulate 1 of the groups of people that take part in the study, thus the investigation is an experiment. As these individual who take part in a study were 39
  • 40. recognized as subjects, they are now becoming more common to be referred to as participants, in acknowledgment that in most cases we have to get permission from people before we study them since they are not objects to be studied, they are participants in the study. Consequently, the manipulated group is recognized as the experimental or treatment group and the group which was not subject to the manipulation is recognized as the control group which will receive a Placebo. The control groups and the experimental treatment together are recognized as the treatment groups in statistical vocabulary,. Thereby the outcomes of the experiment will be subjected to the analysis of statistics for the purpose of assessing the probability of the results occurring by chance. Figure 2: Diagram of an experimental study Study population (sampling) Sample population (ranndomisation) Experimental group Control group First data collection (before interview) First data collection (same time as in study group) Period of intervention manipulation No manipulation intervention Last data collection (after intervention) Last data collection (same time as in study group) compare 8.1.2 Non Experimental Methods or Quasi-experiments In the hypothesis, from the walk in clinic study whereby males are less likely to visit the walk in clinic than females might presumably not utilize an experiment as the basis of the investigation but a study-based on the data statistical-analysis is narrating to the 40
  • 41. frequency of visits by males and females. The study might need to test if the males to females observed ratio of visiting the clinic was likely to occur by chance. Figure 3: Diagram of a Quasi-experimental design with two groups Study group Intervention Study group after Compare Control group before Control group after 8.1.3 Before-After Study It is another kind of study design Iis frequently chosen since its rather easy in setting up, it utilizes only one group whereby the intervention is performed. The condition has to be analysed previous to and after the intervention to test if there is any dissimilarity within the observation trouble, as it is considered as the pre-experimental study design quite different from a quasi-experimental study design for the reason that it engages neither the use of control group nor randomisation. 8.2 Variables Nevertheless, we have decided that a hypothesis is a prediction about 2 or more variables, it is imperative to remain aware of the role of every variable when a statistical test is applied, since we will have one dependent variable and one independent variable (or/and confounding variable). 8.2.1 Dependent Variable The dependent variable refers to a variable whose value is determined by/dependent on the value of another variable, and always is measured in experimental designs. We may for example hypothesize that age and blood-pressure are linked i.e. in a study this relationships the dependent variable would be age, as would not suggest that age was determined by blood pressure, instead would predict that age in some way was important in determining blood pressure. 8.2.2 Independent Variable 41
  • 42. The independent variable conversely is the consideration for the researcher to decide the value, as a minimum partly, of the dependent variable, i.e. while considering the relationships between age and blood-pressure; we can suggest that age in some way might account for the recorded level of blood-pressure and therefore age is the independent variable. The independent variable is the variable that is fixed or manipulated in the experimental designs 8.2.3 Other Variables The confounding variable is another important kind of hat has influence on the value of the dependent variable so far is not important with respect to the hypothesis that is being tested. For instance, in the test of the impact of Placebo it might be that the patient’s age influences the effects of Placebo. If this is the case and we fail to ensure that the two treatment groups have participants of similar age, then age will become a confounding variable, to interpret the results of our experiment might be difficult. Consequently, potential confounding variables necessitate to be taken into account using suitable and cautiously consideration out research designs, particularly with admiration to the sample selection. 8.3 Errors and Statistics There are several types of error that may give false in statistics of which fall into four categories that are random error, sampling error, measurement error and experimental error. Much of research design and statistics involves either trying to reduce error or trying to take account of it. One of the most significant of statistics uses is thus to help make a decision if an observed result could be due to chance, that is, caused by sampling and other non-systematic errors. 8.4 The statistical Hypothesis An experimental hypothesis has to be established by a researcher before performing an experiment/study to test it. In a same manner, when we test the results of the experiment to see if they might have occurred by chance, we as well establish a statistical hypothesis. The hypothesis of no difference is the most common form of statistical hypothesis, frequently namely the null hypothesis and given the symbol H base zero and the hypothesis is given the symbol H base one. 42
  • 43. 8.5 Types of Interaction between Variables When conducting studies we are not all the time looking for the same kind of relationship between variables since there are generally three kinds of interaction: relationships, associations and differences (clear-cut). To make a decision on the kind of interaction between the variables you are dealing with is an imperative aspect of statistics since it will in part determine the statistical test that you use. 9 DISTRIBUTIONS AND PROBABILITIES One of the more important concepts in statistics is the idea that numbers can be distributed in the frequency of occurrence of particular numbers. For example, a data set of the number of sexual partners that each individual has during a lifetime could contain just the values 4 or 3; it’s much more likely that it will be an important mixture of different numbers, from high to low, because the way numbers are mixed or distributed will largely determine the type of statistical test that are to be used, and so the easiest way to see the way in which data combinations are assembled is to plot them in a frequency histogram (Figure 9.1). 9.1 Frequency Histograms Actually the frequency histogram is a kind of bar-chart where the y axis is the frequency of incidence of a particular case and on the x axis we have a scale that is bounded by the values of the lowest and the highest of the cases and the values of the scale are placed in between by the use of appropriate intervals. 43
  • 44. Figure 4: Example of a Histogram A bar is drawn that fills the whole of each of the intervals being measured; the sides of the bars are parallel and the width of the bar is held constant. This type of figure is normally used for variables that are recorded on an interval or ratio scale. If your data are interval or ratio scale, data plotting them in this manner must be one of your very first steps. This is for the reason that the data distributions and numbers shape the basis of many tests of statistical and might be found that the numbers are distributed in numerous ways. Nevertheless, some of the distributions have features that can be developed by researchers. One such distribution that we can step up to find out is the normal distribution which forms the basis of numerous tests of statistical. 9.2 Probability and Statistics: The Link within Probabilities and Distributions Having a bag of laundry with equivalent numbers of blue and pink towels whereby you can’t see into the bag. When reach in and pull out a towel there are two possible results: the towel will be pink or the towel will be blue. You draw out three towels from your bag. The likely results are 8 (i.e. BPB, BPP, PBB, PBP, PPB, BBB,PPP or BBP). The likelihood of each outcome occurring is thus 1/8. We have four combinations: all blue, all pink, one pink and two blue, or two blue and one pink. Thus what is the chance of gaining every of these groupings? Well, for PPP and BBB it is simple, as we have already said the chance of these results is 1/8. There are three results that give us one pink and two blue towels, so the chance of this grouping is 1/8 +1/8 +1/8 = 3/8. Therefore, there are also three results that provide us one blue and two pink towels, so the chance of this grouping is 1/8 +1/8 +1/8 = 3/8. The type of distribution shown here is called the binomial distribution, which is a mathematical model that describes data whose distribution is determined by events that can occur as either of two categories. 44
  • 45. Hence the distributions of numbers and probabilities are linked thereby allowing making predictions and fortunately it just so happens, that natural phenomena produce sets of data that have a distribution similar to the one above. This distribution is recognized as the normal or gausian distribution since it forms the basis of many of the most commonly used statistics. Consequently, the statistics type that relies on numbers being distributed in a certain way is called parametric statistics. 9.3 The Normal Distribution Curve The normal distribution has mathematical properties that allow us to make predictions, just like the histogram. As far as we are of what is meant by the term probability and the ways in which probabilities may be expressed, however we ought to be aware that it is likely, to make predictions using knowledge of how numbers are distributed. Envisage that the intervals on the x axis were small infinitely, in its place of a bar chart with steps we have to produce a curve, especially if we haven’t shade in the bars obviously the normal distribution would look-like such a curve (see figure 9.3) after connecting the tops of the bars with a line and then removed the bars. (see figure 9.2), as a defined distribution curve of numbers the normal distribution has certain properties. The first is very clear, the curve is symmetrical, and it is sometimes referred to as ‘bell-shaped’, however, that curve depends on the standard deviation of the data. Figure 5:The curve is bell-shaped and symmetrical 45
  • 46. The mean is all the time in the middle of the x axis. The normally distributed curve tail (the rare values) is inclined to be short. Yet, presumably the most significant feature of the curve of normal distribution is that the point where the curve changes to convex from being concave (the inflection point) is always 1 standard deviation (SD) from the mean, away. This means the area enclosed by the boundaries of the mean plus 1 SD and the mean minus 1 SD is all the time the total area constant proportion, which is 68.27 per cent. If moving two standard deviations away from either side of the mean then we encapsulate 95% of the total area and if we would take a large sample of patients’ arm lengths that is large, we would anticipate that 67% of the results would lie within ± 1 SD from the mean and that 95% would lie within ± 2 SD from the mean (see figure below). Figure 6: The Standard Deviation 9.4 Making a Prediction I am interested in the number of Opsite dressings used on the average medical ward. I collect data from 102 wards and the data are normally distributed. Now how many wards will lay within ± 1 SD of the mean? Suggestion: in normally distributed data 68.27 per cent of the data lies within ± 1 SD of the mean. I have now introduced a means by which, if I know the mean and the data set of SD, and I know that it is normally distributed, I can make predictions. I have to use this knowledge as the basis of what are often called parametric statistics which are decided by processes that presume data are distributed in a particular way and share common characteristics 9.5 Deviations from the Normal Distribution 46
  • 47. Sometimes we find that the data we have collected do not fit the normal distribution. The best way to get a rough idea whether your data fit the distribution is to plot a frequency histogram. Some deviations have a particular shape and are given special names. The distribution (figure 7) is called negatively skewed since the mean lies to the left of the median (as you look at it) and the distribution (figure 7) is called positively skewed since the mean lies to the right of the median. Skewed sets of data tend to happen when there are values that are much lower or greater than the rest as a result the frequency histogram is not symmetrical, it is skewed. Therefore, in these distributions, the greater the dissimilarity between the mean and the median, the greater the skew, however, it is as well likely to have symmetrical distributions that don’t conform to the normal distribution. The most common are random distributions and the regular, or under-dispersed, distribution. Examples of which are given below (figure 7). Figure 7: Symmetrical and skewed distributions 9.6 Random and Clumped Distributions In sets of data when the variance is almost equal to the mean they tend to be un- common and referred to as randomly distributed. An example of a random distribution example can be the number of incidences of certain diseases in the areas that are defined geographically. The true randomness is relatively uncommon and that the of many disease phenomena geographical distribution be inclined to contain an over-dispersed or clumped distribution, when talking of disease-outbreaks where we recognize that certain areas having a high incidence of a particular disease. In random phenomena we are saying that every happening (e.g. an incidence of a cystic fibrosis) is un-related to any other incidence, but when the distribution is clumped it proposes that the episodes are narrated (for instance, in the contagious disease case, or a disease that is activated by 47
  • 48. some ecological reasons), surely will tend to show a strong positive-skew (the mean lies to the right of the median). The last distribution to be conscious of is the regular distribution which is actually an excessive shape of the normal distribution whereby the SD is small relatively to the mean, this means, there is very little set of data spread (for instance, might be the numbers of fingers and toes records within the population. Evidently, individuals with less than 8 fingers and 10 toes are un-usual, and thus the regular distribution would be. if the point of the curve is flattened or if a normal distribution is shaped like that shown below in Figure 9.8 it is said to demonstrate kurtosis. Figure 8: Different degrees of Kurtosis in frequency distribution It is imperative to differentiate between random and clumped distributions since the manner where data are distributed is significant, as it tells us about the basic properties we are studying and, as we have seen here, is very pertinent to studies of the spread and distribution of diseases (epidemiological). We also need to know how data are distributed before we get on many tests of statistical. 10 ERRORS AND ANOVA An ANOVA is an analysis of the variation and described as a powerful and robust technique, present in an experiment. However it is the hypothesis test that the variation in an experiment is no greater than that due to individuals' characteristics normal variation and error within their measurement. Criteria to be met before doing an ANOVA test the data of each treatment group are derived from a normal distribution, the data were measured on an interval/ratio scale, the variance between each group is not significantly different. (There are ways round this one) and the sample groups are measured independently of each other. 10.1 Statistical Errors 48
  • 49. There are several types of ANOVA but they have evolved to deal with a certain type of statistical error. Say you have a control group and two different levels of a treatment. In that case you can’t use a t test and must use a type of test that belongs to a group of ANOVA, which is shorthand for analysis of variance. A Type 1 error happens if the null hypothesis is rejected when it ought to have been accepted while a Type 2 error is when a fake null hypothesis is accepted hence they are errors are opposites, as we decreasing the probability of a type 1 the possibility of a Type 2 is increasing. Generally we tend to choose tests that will reduce the possibility of a Type1, so a careful approach is accepted, for instance, we always say that in many medical studies the significance level is set at P = 0.01. However, Type 1 error rejecting the null hypothesis when it is true while Type 2 errors not rejecting the null hypothesis when it should be rejected. 10.2 The t-Tests, Errors and ANOVAs When we have more than two groups we need an ANOVA, envisage we are doing a study where we have a control group (C) and two treatment groups (T1 and T2) and we desire to find out if their means are considerably different; if we use a t-test then we must test:  C against T1.  T1 against T2.  T2 against C. 10.3 The ANOVA (Analysis of Variance) Perhaps this is not too much trouble if using a computer or even a calculator but if you had five treatment groups you would need to do ten tests. Even if you are prepared to stand the boredom, and manage not to make any calculation errors, you will commit a statistical error. This is because: if you set your significance level at the normally accepted value of P = 0.05 (5 per cent), once every twenty tests (on average) you will get it wrong and commit a Type 1 error. But if, as in the case above, where we have five treatment groups, you perform ten t tests the chance of one of them being wrong goes up to one in two (that is, 49
  • 50. 0.05 × 10). So we need a way round this problem and hence the solution is to use an ANOVA. ANOVA allows us to compare the means of several treatment groups at the same time without having to worry about adjusting P values or increasing the chance of Type 2 errors. It does this because it compares all the treatment groups in a single test. As you can imagine, the number of calculations needed to perform an ANOVA is quite large. However, with the advent of computers the use of ANOVA has become much more common and many more ANOVA-type tests have been designed. In this chapter we will look at two types of ANOVA. In general it is better to use a computer, as they make fewer errors than humans. It is suggested that you focus on the structure of the tests and interpreting the output. The type of ANOVA that we will describe is the one way analysis of variance. Some of Computer statistics packages are: SPSS, Minitab and Microsoft Excel can all help to analyze data using the one way ANOVA described here. 10.3.1 When to Use a One-Way ANOVA  We are comparing the difference between more than two sample groups.  The data in each of our groups are normally distributed.  The data are measured on an interval scale.  Each case is measured independently. How does it work? First, here are some data. The data set is smaller than would normally be used for ANOVA but we will use it to help us examine the ANOVA. The data in Table 12 are from a study to examine whether the pre-natal fitness level of Primip women significantly affects duration of labor. The ANOVA test looks at the source of variation in the overall data set and tries to apportion it to different aspects of the data. Once the variation has been allocated it is possible to see if the differences between the sample groups are significant. The 50
  • 51. sources of variation in the data are the variability that occurs within a sample group and the variability that occurs between the groups (Table 13). We can say that: ANOVA seeks to determine how much of the variation in data sets can be attributed to error and how much can be attributed to the factor or treatment under study. We are interested in the between-group variation, that is, that which has occurred because of the fitness level. The rest of the variation, that is, that within the groups, we regard as error. The variability between groups will reflect the error that occurs within the groups and any additional variability caused by the treatment (in this case, fitness level). Total variability = Variability between the groups + Variability within the groups If there is no difference between the groups, that is, the null hypothesis is correct, we would expect there to be just as much variation between the groups as there is within the groups. If the between-group variation is more than the within-group variation we know that the treatment has had an effect; and this is the simple logic behind the ANOVA test. Table 12: Duration of labor during Primp women aged between Twenty two at three different levels of fitness Fitness level 1 Fitness level 2 Fitness level 3 20 34 16 32 12 15 14 23 22 15 10 10 Level 1, Low: Level 2, Medium: Level 3, High Table 13: Variation within and between groups Fitness level 1 Fitness level 2 Fitness level 3 20 34 16 32 12 15 14 23 22 15 10 10 51 Variability within a group Between –Group variability
  • 52. The test statistic produced by the ANOVA is F, a statistic we have seen before, and the measure of variation we use, the variance. Hence the name of the test: the analysis of variance. If we compute the within-group variance and compare it with the between- group variance, F will equal 1 if the null hypothesis is correct and if F is significantly different from 1 we know that the means are significantly different, as well as the level of fitness (treatment groups) had an effect. The procedure for calculating the ANOVA by hand is long winded. It is probably worth doing by hand once or twice, as that will help grasp how the procedure works and how ANOVAs are presented. Table 14: Difference walked by patients (m) with impaired hip mobility, following various treatment regimes Old frame New frame Exercise level 1 Exercise level 1 Exercise level 2 16.1 22.3 13.2 17.7 20.5 20.8 20.6 21.3 22.2 10.4 26.7 16.3 20.3 16.3 13.7 14.9 29.0 11.9 11.5 24.4 14.1 14.7 23.7 10.6 15.3 23.5 15.8 17.4 19.5 15.9 Mean 15.89 22.72 15.46 Standard deviation 3.32 3.63 3.67 In this example Anna Fimbo, a physiotherapist, is interested in the impact of the use of a new walking frame on her clients with impaired hip mobility. She has decided to test the new frame at two levels of exercise and use her old frame with the normal level of 52
  • 53. exercise as a control. Martha uses the distance the patient can walk unassisted as a measure of the effectiveness of the treatments (see table 14 above). Again, we will assume that the data are normally distributed, and remember that it would be normal to plot out the data to look for any odd results and get a ‘feel’ for your results. Place the data into a table and, using a scientific calculator (if you have one), calculate the mean, the standard deviation, the variance, the sum of the cases and the sum of the cases squared (Table 15). Now we need to make sure that the variances of our sample groups are not significantly different, see criterion 3. Table 15: Statistical summary of the data from Table 14 New frame Group 1 (GP1) Old frame Group 2 (GP2) Exercise Level 1 Group 3 (GP3) Exercise Level 2 12.1 22.3 13.2 15.7 20.5 20.8 18.6 21.3 22.2 9.4 26.7 16.3 18.3 16.3 13.7 12.9 29.0 11.9 9.5 24.4 14.1 12.7 23.7 10.6 13.3 23.5 15.8 15.4 19.5 15;9 Sample number 10 10 10 Mean 13.79 22.72 15,46 Standard Deviation (SD) 3.21 3.63 3.67 SD square 10.27 13.15 13.45 Sum of samples 137.9 227.2 154.6 Sum of samples square 19,016.4 51,619.8 23,890.8 53
  • 54. Sum of squares of samples 1,994.11 5,280.36 2,762.4 Total number of samples, ntotal = 30 Sum of samples square, = 519.7 Sum of square of samples, =10,036.9 To do this, select the largest and smallest variances and perform an F test, there is no significant difference in the variances, so we can proceed with the test. The ANOVA test uses the sums of squares as a measure of variation. Calculations Step 1 Calculate a correction factor (CF). This makes the calculations a little quicker: Step 2 Calculate the sums of squares (SS) for the whole sample: Step 3 Calculate the between-groups sums of squares: SSbetween = 449.67 Step 4: Calculate within-group sums of squares. A short cut can be used here because we know the between-group sums of squares and the total and we know that the between-groups and within-groups sums of squares must add up to the total: So: SStotal = SSbetween - SSwithin SSwithin= SStotal - SSbetween 584.33 = 1,034 - 449.67 54
  • 55. If you are forced to do ANOVAs by hand it’s probably best to calculate both the within- group and the between-group sums of squares by longhand. This will allow you to check your maths. Step 5: Determine the degrees of freedom for both the within and between groups following the following rules. d.f. for SSbetween = number of groups -1, (In this example 3 - 1 = 2) d.f. for SSwithin = total number of cases - number of groups, (In this example 30 - 3 = 27) d.f. for SStotal = d.f. for SSbetween + d.f. for SSwithin Step 6: Calculate the variances for both the between- and the within-group sums of squares: Step 7: Calculate F: F = Variance between groups = 224.83 = 10.38 Variance between groups Step 8: It is normal for the results from an ANOVA to be put in a table laid out in a standard format like Table 16. The results of ANOVAs performed using statistical packages are often presented in such tables. An alternative would be Table 17. Table 16: Source of variation Sum of squares d.f. Variance F Between groups 449.67 2 224.83 10.38 Within groups 584.33 27 21.64 Total 1,034 29 Table 17: Source of variation Sum of squares d.f. Variance F Between groups 449.67 2 224.83 10.38 Error 584.33 27 21.64 55
  • 56. Total 1,034 29 Step 9: Let’s look the value of F in the appropriate statistical table Note that the variance between groups should always be on top, and larger than the within-group variance. If the between-group variance is less than the within-group variance the null hypothesis is automatically accepted. The value of 10.38 is significant at the P < 0.01 level and so we can reject the null hypothesis and say that the difference between the groups is significantly different. We would express this result by saying that there was a significant difference between the three treatment groups (ANOVA F2, 27 = 10.38, n = 30, P < 0.01.). Unlike the t test we also give the degrees of freedom for both within and between groups. They are given as a subscript to the F statistic. 10.4 Contrasting the Means You may have noted that there is a slight problem with the ANOVA in that, whilst we can say that there is a significant difference between the sample groups, we can’t say which groups are different from each other and which are not. Table 18: Comparing Means after an ANOVA test Group means Group 1:13.79 Group2:22.72 Group3:15.46 Group 1:13.79 9.05 1.67 Group 2:22.72 7.26 Group 3:15.46 Thus in the first example we do not know if both exercise regimes are both different from the control, or if they are different from each other, etc. Fortunately we can do follow-up tests that allow us to determine which sample groups are significantly different from each other. For those using computer packages there are a range of these follow- up test options with an assortment of names. The only one to avoid is the least significant difference test, as you will make the same error as if you did multiple t tests. The most conservative (tends towards a Type 2 error) is Scheffe’s test, the least conservative (tends towards a Type 1 error) is Duncan’s multiple range test (Kerr, Hall 56