In daily life, we often use "reliability" to denote trustworthiness, but in research, reliability and validity are different concepts. Reliability in data analysis signifies the consistency of replicating an outcome, while validity relates to the accuracy of a measurement. For example, if you measure a cup of rice multiple times with consistent results, that's reliability, while validity assesses how well the measurement accurately represents its intended quantity.
2. RELIABILITY AND VALIDITY
Reliability and validity are both about how well a method measures something:
● Reliability refers to the consistency of a measure (whether the results can be reproduced
under the same conditions).
● Validity refers to the accuracy of a measure (whether the results really do represent what
they are supposed to measure).
● For example, if you measure a cup of rice three times, and you get the same result each
time, that result is reliable. The validity, on the other hand, refers to the measurement
accuracy.This means that if the standard weight for a cup of rice is 5 grams, and you
measure a cup of rice, it should be 5 grams
3. VALIDITY
A measurement or test is valid when it correlates with the expected result. It examines the accuracy of
your result.
Here’s where things get tricky: to establish the validity of a test, the results must be consistent. Looking
at most experiments (especially physical measurements), the standard value that establishes the
accuracy of a measurement is the outcome of repeating the test to obtain a consistent result.
For example, before I can conclude that all 12-inch rulers are one foot, I must repeat the experiment
several times and obtain very similar results, indicating that 12-inch rulers are indeed one foot.
Most scientific experiments are inextricably linked in terms of validity and reliability. For example, if
you’re measuring distance or depth, valid answers are likely to be reliable.
4. But for social experiment, one isn’t the indication of the other. For example, most
people believe that people that wear glasses are smart.
Of course, I’ll find examples of people who wear glasses and have high IQs (reliability),
but the truth is that most people who wear glasses simply need their vision to be better
(validity).
So reliable answers aren’t always correct but valid answers are always reliable
5.
6. Types of Validity
Content Validity
This refers to determining validity by evaluating what is being measured. So content validity tests if your research
is measuring everything it should to produce an accurate result.
For example, if I were to measure what causes hair loss in women. I’d have to consider things like postpartum hair
loss, alopecia, hair manipulation, dryness, and so on. By omitting any of these critical factors, you risk
significantly reducing the validity of your research because you won’t be covering everything necessary to make
an accurate deduction.For example, a certain woman is losing her hair due to postpartum hair loss,
excessive manipulation, and dryness, but in my research, I only look at postpartum hair loss. My research
will show that she has postpartum hair loss, which isn’t accurate.
Yes, my conclusion is correct, but it does not fully account for the reasons why this woman is losing her
hair.
7. 2)Criterion Validity
This measures how well your measurement correlates with the variables you want to compare it
with to get your result. The two main classes of criterion validity are predictive and concurrent.
Predictive validity
It helps predict future outcomes based on the data you have. For example, if a large number of
students performed exceptionally well in a test, you can use this to predict that they understood the
concept on which the test was based and will perform well in their exams.
Concurrent validity
On the other hand, involves testing with different variables at the same time. For example, setting
up a literature test for your students on two different books and assessing them at the same time.
You’re measuring your students’ literature proficiency with these two books. If your students truly
understood the subject, they should be able to correctly answer questions about both books.
8. Face Validity
Face validity is concerned with whether the method used for measurement will produce
accurate results rather than the measurement itself.
If the method used for measurement doesn’t appear to test the accuracy of a
measurement, its face validity is low.
Here’s an example: less than 40% of men over the age of 20 in Texas, USA, are at least
6 feet tall. The most logical approach would be to collect height data from men over the
age of twenty in Texas, USA.
However, asking men over the age of 20 what their favorite meal is to determine their
height is pretty bizarre. The method I am using to assess the validity of my research is
quite questionable because it lacks correlation to what I want to measure.
9. Construct-Related Validity
Construct-related validity assesses the accuracy of your research by collecting multiple
pieces of evidence. It helps determine the validity of your results by comparing them to
evidence that supports or refutes your measurement.
Convergent validity
If you’re assessing evidence that strongly correlates with the concept, that’s convergent
validity.
Discriminant validity
Examines the validity of your research by determining what not to base it on. You are
removing elements that are not a strong factor to help validate your research. Being a
vegan, for example, does not imply that you are allergic to mea
10.
11.
12. TYPES OF VALIDITY IN ASSESSMENT
Criterion-Related Validity requires that a test demonstrates a direct statistical
relationship between test scores and job performance (like the one represented by the blue
dot in the image above).
Content-Related Validity requires the content of the test to encompass all the areas of the
characteristic it is measuring. That is to say; if you give your candidate a personality test,
its content should cover all areas such as Openness, Conscientiousness, Extraversion,
Agreeableness and Neuroticism.
Construct-Related Validity: Say you have been using an employment assessment for
quite some time and made many good hiring decisions. This is proof enough that the test is
valid. Simply put, if there is enough factual evidence that the test works, it must be valid.
13. Importance of Validity
Validity is a direct indication of the usefulness of a test. Remember, reliability tells you how
accurate the scores are, but validity gives meaning to these scores. Without this, there is no way
of correlating your candidate's test scores to their job performance.
Interpretation of Validity
Similar to reliability, to interpret validity, there exists a validity coefficient. Typically, when you
consider a single test, a coefficient value greater than 0.35 is known to be very beneficial
14. RELIABILITY
When a measurement is consistent it’s reliable. But of course, reliability doesn’t mean your
outcome will be the same, it just means it will be in the same range.
For example, if you scored 95% on a test the first time and the next you score, 96%, your
results are reliable. So, even if there is a minor difference in the outcomes, as long as it is
within the error margin, your results are reliable.
Reliability allows you to assess the degree of consistency in your results. So, if you’re
getting similar results, reliability provides an answer to the question of how similar your
results are
15.
16. TYPES OF RELIABILITY
1. Test-Retest Reliability: Consistency across scores over time.
2. Parallel Form Reliability: Consistency across scores for various forms of the test.
3. Inter-Rater Reliability: Consistency across scores when rated by different
individuals.
4. Internal Consistency Reliability: The extent to which items on a test measure the
same thing
17. Test-retest Reliability
Testing reliability over time does not imply changing the amount of time it takes to conduct an
experiment; rather, it means repeating the experiment multiple times in a short time.
For example, if I measure the length of my hair today, and tomorrow, I’ll most likely get the same result
each time.
A short period is relative in terms of reliability; two days for measuring hair length is considered short. But
that’s far too long to test how quickly water dries on the sand.
A test-retest correlation is used to compare the consistency of your results. This is typically a scatter plot
that shows how similar your values are between the two experiments.
If your answers are reliable, your scatter plots will most likely have a lot of overlapping points, but if they
aren’t, the points (values) will be spread across the graph
18. Internal Consistency
It refers to the consistency of results for various items when measured on the same scale.This is particularly
important in social science research, such as surveys, because it helps determine the consistency of people’s
responses when asked the same questions.
Most introverts, for example, would say they enjoy spending time alone and having few friends. However, if
some introverts claim that they either do not want time alone or prefer to be surrounded by many friends, it
doesn’t add up.These people who claim to be introverts or one this factor isn’t a reliable way of measuring
introversion.
Internal reliability helps you prove the consistency of a test by varying factors. It’s a little tough to measure
quantitatively but you could use the split-half correlation. The split-half correlation simply means dividing
the factors used to measure the underlying construct into two and plotting them against each other in the
form of a scatter plot.
Introverts, for example, are assessed on their need for alone time as well as their desire to have as few friends
as possible. If this plot is dispersed, likely, one of the traits does not indicate introversion.
19. Inter-rater reliability
Inter-rater reliability assessment helps judge outcomes from the different perspectives
of multiple observers.
A good example is if you ordered a meal and found it delicious. You could be biased
in your judgment for several reasons, perception of the meal, your mood, and so
on.But it’s highly unlikely that six more people would agree that the meal is delicious
if it isn’t. Another factor that could lead to bias is expertise. Professional dancers, for
example, would perceive dance moves differently than non-professionals. So, if a
person dances and records it, and both groups (professional and unprofessional dancers) rate
the video, there is a high likelihood of a significant difference in their ratings.
But if they both agree that the person is a great dancer, despite their opposing viewpoints, the
person is likely a great dancer.
20. INTERPRETATION OF RELIABILITY
Reliability is quantified using the reliability coefficient (r). Typically higher values of 'r' indicate
more reliability. For a test to be usable, you want the coefficient to be higher than 0.7 on a scale of
1. Any test below this benchmark may not have much application in practical workplace scenarios
21. How do Reliability and Validity apply to tests?
The QUESTION is simple, strike the centre of the board. This
is similar to what you are trying to accomplish with a pre-
employment test, you want the test to do one thing, and you
want it to be done right. Let's consider the three possible
scenarios:
Not only are your shots off target, but none of them
consistently strike any point on the board. Similarly, tests that
are neither valid nor reliable do not have consistent test scores,
nor do they have any relation between the test score and job
performance.
Not reliable, not valid
22. You had a better game than before since you managed to
strike near the target consistently, but you still aren't hitting
the mark. With reliable tests, the scores are consistent, but
with no test validity, the relation with job performance
does not exist.
Reliable, but not Valid
23. This is the performance you want to have every game.
You consistently strike the target (or close to it). In this
scenario, the test scores are consistent, and the scores
relate perfectly to the candidate's job. Higher scores imply
better job performance, making filtering candidates a walk
in the park.
Reliable and Valid