Introduction to Statistics
We don’t have to actually do statistis but how to
Som understanding of what they are telling you!!!
The wohole point is to be able to tell if research is
usable in practice
I know what that word means
Why they are doing it
Don’t have to understand the workings out!!!!
Not high felutinng mathematics
Why they have done that they hvave done!!!!
The mathematics of the collection, organization,
and interpretation of numerical data, especially
the analysis of population characteristics by
inference from sampling.
about a population
a sample of the population
use numbers to help us in what we are
Statistics is the study of large numbers, such as
those produced by government departments,
with the aim of extracting some approximate
truth from them.
APROXIMATE TRUTH – or as near as we can
Numerical values such as mean, median, and
mode which describe the chief features of a group
of scores, without regard to a larger population.
Not necearilty infer from the results
Inferential statistics do not just describe numbers,
they infer causes. We use them to draw
inferences (informed guesses) about situations
where we have only gathered part of the
information that exists.
The part of the information is called a sample.
The whole body of information from which it
is taken is called the population. In a basic
statistical test we would have two samples and
would try to establish if they are significantly
One lot has something so you infer that it may be
applied to the whole population
2 samples – are they the same or difference
Then hope to infer that it can be that it applies to
the whole population
Lookingat the two types and see what goes on in
Two types of summaries of this……
frequency tables, histograms, pie charts,
graphs, bar charts.
mean, median, mode, variance, range,
How many represented in each bar
You can see immediately what the
You can use a bar chart
(nominal (= name) data
Like a bar chart
Ie patient satisfactory
Make it easier
Not categories but different levels…..
Represent a %
Want to present it as a picture
Descriptive stats presented in the best
You get a whole list of data………..
Then the number of times it appeared in
If you are dealing with a thousand scores
You can do a lot more with it…
You can see where the majority of scores
Frequency Table: Patient satisfaction
Representation of averages
The average usually add up
the ages and devide by the
But inresearch it is different
Use the term………..
To look at the middle of the
1. Mean: (x with a bar over it…..) The
arithmetic average. Indicates the
frequency of occurrence and range (ie.
The sum of scores divided by the total
number of scores.
•Problem: It can be distorted by
abnormally high or low scores.
You can skew your results….
2. Median: The middle point in a set of
data (ie. Half way between highest and
Indication of the range…
But high or low will distort….
Won’t tell you what the most people
•Problem: Shows an indication of the
range, but any abnormally high or low
score will distort the median. Gives no
indication of frequency of occurrence
3. Mode: The most frequently occurring
in a set of data.
•Problem: Gives no indic ation of
highest/lowest scores ie. range of data
So you need some combinataion
The mean is used the most…..
1 = normal disatrution
When the particular attribute are distributed
in that mean, mode and median are allin the
The “perfect world” ….
If you have a normal distribution curve you
can do lots of statuistis
Pic 2 = negative skew….
Most people are scoring about 20 and only a
few scoring 100
(early mode… mean towards the upper end
median is in the middle… )
Pic 3 = positive skew
Most people scoring high
(low mean… high mean median still in the
Calculate the mean
Middle of the range
So you want the score between 12 and 13 =
Mode is 12.
So they are fairly near each terh
Adding in 2 £50k
Mean = 19k
Mdiealn 13 k
Mode = 12
It increase the mean and and median
Being aware of where you think it is very
obvious but it may not be
Need to know more of what the mean is
We use some other form of meausment
Measurement of distru bmtiono….
How far where the other scores distributed
around the mean…
Measurement of distribution
Range: describes the difference
between the highest & lowest score.
Highlights extreme values only
Either between top and bottom and top
or the diffence in the score…
Quartiles: A way of dividing the range
into 4 equal parts. Each part will contain
25% of the data.
Inter- quartile range The distance
between the lower and upper quartile in
the distribution curve, representing 50%
of the data
Measurement of distribution
Tells you where half the data lies…
Can plot if your data can see where you
Ie Centile charts…
Mean weight where you are expected….
Do you fall within or without
Standard Deviation (SD)
Most often used after mean…
X bar =….
A measure of how the scores deviate from the
mean. This is the most common way of showing
Are they bunched or not….
Mean + SD = accurate description of data
Probably the most accurate…
In a normal distribution:
One SD above the mean + one SD below the
mean includes approx 68% of the scores
There is a formular but we don’t need to work it
Works it out from the mean and the numbers…..
Can show you where they are distributed and where
Two SD above the mean + two SD below the
mean includes approx 95% of the scores
Ie mean = 100
Sd = 10
So 1 sd bellow = 90
And 1 sd above = 110
Thefore 68% lie between 90 and 110
See the picture…
You don’t have to worry too much how they get it….
They have done the calucations and have come up
with a figute of twhtat the stadrnder dviddation its….
But they are alsyws the same
68 % between 1
Standard Deviation (SD)
Helps undersgandint of a set of data
Applies to the sample that you’ve got
Want to leap from what you got for
The wider population……
So will they share the same thing?
So we talk about……….
Confidence Intervals (CI)
You sometimes get given a CI
A confidence interval is a range or
interval of values. In a normal distribution
95% of the area falls between + or - 1.96
SD from the mean. 95% of the time, the
population mean will fall within +/-
1.96SD of a sample mean. It is this range
that is called a confidence interval.
The narrower the CI, the more certain
you can be about the true result.
Small is good!!!!!!!!!!!!!!!!!!!!
Wide is ……. Uncertain…..!!!!
May not be translatable to the wider
Levels of measurement
Dependant of what level you’ve got you
can use specific tests….
They just exist..
Not able to be put in order
Are or are not…..
Nominal: Non interval. Indicates
‘sameness’ or ‘differences.’
Male/female: blood groups: categories
You can do the least with it…
You can do some things but not much
Ordinal: Scale where size of
intervals is not known or not equal.
Measures ‘more’ or ‘less’
Good, Fair, Bad. Staff grades etc
You can put them in order but you can’t be
There is an order but not a measurable
Interval: Scale of equal intervals. No
Temperature (ºC) & (ºF)
You can do a lot more with it now…..
You have a scale but there is no fixed zero…
but there is a relationship.
Ratio: Scale of equal intervals with
As interval but with a zero….
Look at the type of data we are looking
Like modes and mediains
Or more complicated stuff……
Inferring form the data …..
Parametric Tests: Powerful tests used on
large, homogenous, random, normally distributed
(We like them….)
Interval & Ratio level data
eg. T test; ANOVA; Pearson’s R (correlation)
we are not expected to know what each test it is
we need to know
is it the right test…?
Has it been used appropriately.?
What kind of test?
What kind of data…?
Non-parametric Tests: Used on small, random,
non-normally distributed samples
Nominal & Ordinal level data
eg. Chi square; Spearman Rank order
(correlation); Kendall Tau; Mann-Whitney U
For SKEWED samples….
Don’t have to work them out… expected to know
what they are and what they are used for……
What is the probability of it happping by
Was it a fluke?
Did it happened because it is a real
This is the likelihood that the event, situation or
pattern of numbers has occurred by chance.
Was it do with the IV or just chance…
Is it applyable to the whole population…?
Is it generisable.???
Is it SIGNIFICANT…?
This may be expressed as a ratio; percentage or
50:50; 50%; 0.5
0.5 is a probality of tossing a coin and getting
Need a high degree of certainty in the
If we want to be sure that the research
then we want to have a ghinger
Statistical tests are needed to
determine how safe it is to attribute any
difference obtained to a real difference in
the phenomena as opposed to being due
to chance. The level of significance is
normally set at p = 0.01 or 0.05
Normally set 0.01 (one in 100)
Or very minimum which is
acceptable in research is 0.05
What level have they set the p
A significance of 0.05 is saying that 95
times out of 100 the result was not due to
BECAUSE it was a result of the IV not
A REAL finding
5/100 may be chance….
Still got to bear in mind 5% chance of
So you may want a higher confidence
And you may get an exact correlation..
p = 0.07 not significant
p = 0.000000001 significant.
Sometimes then we want
0.0000000000000000001 levels of
Like going in a plain…..
“The result suggests that….”
Is there a relationship between x and y
Do a scatter diagram
Do a test
Is there a relationship?
The examination of relationships
between variables. Statistically this is
known as Correlation co-efficient. It is
measured on a scale of +1 to -1
+1 is perfect positive correlation
The more the more
0 is no association at all
-1 is perfect inverse correlation.
The more the less…
Absolute “1’s” not really possible…..
But it would be nice…
Over 0.75 +/- is acceptable in some case
Inter rater reliableity
Using a computer for
Statistical analysis can be conducted manually or
Advantages of computer software include:
Only having to input data once
You only need to know the tests you want to
conduct, not how to do them
No errors in calculation
Able to recalculate or do different tests, and
produce graphs, tables etc. once input is carried
SPSS, Minitab are both on UoP system
Not a test but a computer program..
Type it all in
And you get what happens….
Computer may not make a mistake
But you have to put the right things in…..
Still have to know which tests you have to do…
It will do what you ask it….
But you do need some different tests