Validity! We need to 
find out if our 
research is sound. 
Do our tests 
measure what they 
claim to measure?
Are techniques used to collect data in tests, 
questionnaires, interviews and observations measuring 
what is claimed? For example was the Strange Situation 
really measuring attachment style?
We need to be 
able to measure 
or observe 
something time 
after time and 
produce the 
same or similar 
results
I want to measure intelligence. If the same 
person sits the test on several occasions and 
the results change each time, then that test 
lacks reliability
The test also arguably lacks validity because the 
scores are meaningless
If I test my participants again several months 
later and their scores remains consistent, I can 
say the test is reliable, but it might still lack 
validity.
Is an A level in Psychology a valid and reliable 
assessment of your performance in Psychology.
This measures consistency from one occasion to another – 
the same result should be found on different days, in different 
labs , observations or interviews, by different researchers 
I exposed these 
teenage brain 
cells to 1000 
PowerPoint 
slides last 
Monday and 
they’re all dead 
I thought that 
was a fluke but 
they seem to 
be shrivelling 
after only five 
minutes!
Participants take the same test on different occasions – a high correlation between 
test scores indicates the test has good external reliability . 
Timing is crucial. Why? 
January June 
I hope that’s 
the right 
answer this 
time
This refers to the consistency of a researcher’s behaviour. 
A researcher should produce similar test results, or make similar observations or 
carry out interviews in the same way on more than one occasion. 
Thanks for taking 
part today. Any 
problems and I’ll 
be right over. Take 
your time. 
Right. Let’s get on. 
Fast as you can. 
How much longer 
before I can get in 
the pub and relax 
my facial muscles?
In observational 
studies this is known 
as inter-observer 
reliability – observers 
have to agree on what 
they see and carry out 
the same procedure 
Consistency between 
different researchers 
working on the some 
study is very important 
for reliability
1. Increase reliability by standardising instructions 
2. Carry out a pilot study to improve procedures and 
materials 
3. You will be thoroughly trained in the use 
of materials and procedures prior to our 
study taking place
This measures the extent to which a test or procedure is 
consistent within itself, i.e., questionnaire items or questions 
in an interview should all be measuring the same thing 
Do you like to keep to deadlines? 
Do you get impatient driving? 
Do you like cheese? 
Do you like doing several tasks at once? 
Do you like chocolate? 
Do you get easily irritated? 
Are you competitive? 
This interviewer seems 
a little confused about 
Type A personality traits
Odds/Evens Top/Bottom 
Compares a participant’s performance on two halves of a test or questionnaire – 
there should be a close correlation between scores on both halves of the test. 
Questions in both halves should be of equal quality for good internal reliability.
Would you see this as bullying or 
horseplay in the playground? 
You would see 
this from your 
own subjective 
viewpoint – 
we’re biased by 
experience and 
expectation 
Observers must 
agree about what 
they are observing – 
they need to use 
standardised 
behavioural 
categories
Measuring Reliability 
Match the method of estimating reliability 
to the description 
Test-Retest 
reliability 
If the measure depends 
upon interpretation of 
behaviour, we can 
compare the results 
from two or more 
raters. 
If the results in the two 
halves are similar, we can 
assume the test is reliable 
Split Half 
Reliability 
Splitting a test into two 
halves, and comparing 
the scores in both 
halves 
If the results on the two 
tests are similar, we can 
assume the test is reliable 
Inter-Rater 
reliability 
The measure is 
administered to the 
same group of people 
twice 
If there is high agreement 
between the raters, the 
measure is reliable
The tool is measuring what it is 
intending to measure 
= 
= 
The findings can be generalized 
beyond the context of the 
research situation
Does our 
measuring 
tool appear 
to be doing 
what it 
should? 
Face 
validity: 
One or more 
judges assess 
whether the 
test seems 
appropriate 
and suggest 
changes if 
necessary
Does the content of a 
test cover everything in 
the area of interest? 
Content validity: 
More rigorous – 
experts in the field 
systematically examine 
the tool’s components 
and compare them with 
set standards 
They have to agree the 
content is appropriate
Improving internal validity 
• Single blind procedure - reduces demand 
characteristics 
• Double blind procedure ….
Population Validity 
Can we generalise 
findings from our 
research participants 
to other population 
groups?
Can we apply our findings to 
other contexts and situations 
outside of the research setting? 
Ecological Validity
Improving external validity 
• Sample must be representative of target 
population and be unbiased….. 
• Research situation must reflect real life 
situation e.g. debate over Milgram….Strange 
Situation

A" Research Methods Reliability and validity

  • 1.
    Validity! We needto find out if our research is sound. Do our tests measure what they claim to measure?
  • 2.
    Are techniques usedto collect data in tests, questionnaires, interviews and observations measuring what is claimed? For example was the Strange Situation really measuring attachment style?
  • 3.
    We need tobe able to measure or observe something time after time and produce the same or similar results
  • 4.
    I want tomeasure intelligence. If the same person sits the test on several occasions and the results change each time, then that test lacks reliability
  • 5.
    The test alsoarguably lacks validity because the scores are meaningless
  • 6.
    If I testmy participants again several months later and their scores remains consistent, I can say the test is reliable, but it might still lack validity.
  • 7.
    Is an Alevel in Psychology a valid and reliable assessment of your performance in Psychology.
  • 8.
    This measures consistencyfrom one occasion to another – the same result should be found on different days, in different labs , observations or interviews, by different researchers I exposed these teenage brain cells to 1000 PowerPoint slides last Monday and they’re all dead I thought that was a fluke but they seem to be shrivelling after only five minutes!
  • 9.
    Participants take thesame test on different occasions – a high correlation between test scores indicates the test has good external reliability . Timing is crucial. Why? January June I hope that’s the right answer this time
  • 10.
    This refers tothe consistency of a researcher’s behaviour. A researcher should produce similar test results, or make similar observations or carry out interviews in the same way on more than one occasion. Thanks for taking part today. Any problems and I’ll be right over. Take your time. Right. Let’s get on. Fast as you can. How much longer before I can get in the pub and relax my facial muscles?
  • 11.
    In observational studiesthis is known as inter-observer reliability – observers have to agree on what they see and carry out the same procedure Consistency between different researchers working on the some study is very important for reliability
  • 12.
    1. Increase reliabilityby standardising instructions 2. Carry out a pilot study to improve procedures and materials 3. You will be thoroughly trained in the use of materials and procedures prior to our study taking place
  • 13.
    This measures theextent to which a test or procedure is consistent within itself, i.e., questionnaire items or questions in an interview should all be measuring the same thing Do you like to keep to deadlines? Do you get impatient driving? Do you like cheese? Do you like doing several tasks at once? Do you like chocolate? Do you get easily irritated? Are you competitive? This interviewer seems a little confused about Type A personality traits
  • 14.
    Odds/Evens Top/Bottom Comparesa participant’s performance on two halves of a test or questionnaire – there should be a close correlation between scores on both halves of the test. Questions in both halves should be of equal quality for good internal reliability.
  • 15.
    Would you seethis as bullying or horseplay in the playground? You would see this from your own subjective viewpoint – we’re biased by experience and expectation Observers must agree about what they are observing – they need to use standardised behavioural categories
  • 16.
    Measuring Reliability Matchthe method of estimating reliability to the description Test-Retest reliability If the measure depends upon interpretation of behaviour, we can compare the results from two or more raters. If the results in the two halves are similar, we can assume the test is reliable Split Half Reliability Splitting a test into two halves, and comparing the scores in both halves If the results on the two tests are similar, we can assume the test is reliable Inter-Rater reliability The measure is administered to the same group of people twice If there is high agreement between the raters, the measure is reliable
  • 17.
    The tool ismeasuring what it is intending to measure = = The findings can be generalized beyond the context of the research situation
  • 18.
    Does our measuring tool appear to be doing what it should? Face validity: One or more judges assess whether the test seems appropriate and suggest changes if necessary
  • 19.
    Does the contentof a test cover everything in the area of interest? Content validity: More rigorous – experts in the field systematically examine the tool’s components and compare them with set standards They have to agree the content is appropriate
  • 20.
    Improving internal validity • Single blind procedure - reduces demand characteristics • Double blind procedure ….
  • 21.
    Population Validity Canwe generalise findings from our research participants to other population groups?
  • 22.
    Can we applyour findings to other contexts and situations outside of the research setting? Ecological Validity
  • 23.
    Improving external validity • Sample must be representative of target population and be unbiased….. • Research situation must reflect real life situation e.g. debate over Milgram….Strange Situation