Reliability and validity are important concepts for researchers to consider when developing and evaluating measurement tools and methods. Reliability refers to the consistency of a measure, while validity refers to the accuracy. There are different types of reliability, including test-retest and internal consistency, which can be estimated using Cronbach's alpha. Validity includes face, content, construct, internal, external, statistical conclusion, and criterion-related validity. Researchers must ensure their measures are both reliable in providing consistent results and valid in accurately measuring the intended construct.
Topic: What is Reliability and its Types?
Student Name: Kanwal Naz
Class: B.Ed 1.5
Project Name: “Young Teachers' Professional Development (TPD)"
"Project Founder: Prof. Dr. Amjad Ali Arain
Faculty of Education, University of Sindh, Pakistan
Reliability
Reliability refers to the extent to which a scale produces consistent results, if the measurements are repeated a number of times.
Reliability is a measure of the stability or consistency of test scores.
When a measurement procedure yields consistent scores when the phenomenon being measured is not changing
Degree to which scores are free of “Measurement Error Consistency of the measurement
Example: Weighing scale used multiple times in a day by the same individual
Types of reliability
Internal consistency reliability
Test-retest reliability
Split–half method
Inter-rater reliability
Internal consistency reliability
Also known as inter-item reliability.
It is the measure of how well the items on the test measure the same construct or idea.
Cronbach's Alpha
Cronbach's Alpha are most commonly used used to measure inter-item reliability to see if questionnaires with multiple questions are reliable. Value must by above 0.7.
Test-retest reliability
Test-retest reliability is a measure of reliability obtained by administering the same test twice over a period of time to same group of individuals.
Test-retest reliability is the degree to which scores are consistent over time.
Same test- different times
Example: Administering the same questionnaire at 2 different times such as IQ test.
Split–half method
A method of determining the reliability of a test by dividing the whole test into two halves and scoring the two halves separately.
Especially appropriate when the test is very long.
The most used method to split the test into two is using the odd-even strategy.
Inter-rater reliability
Inter-rater reliability is the extent to which two or more raters (or observers, coders, examiners) agree.
Inter-rater reliability is essential when making decisions in research and clinical settings.
References
Neuman, L. (2014). Social Research Methods: Qualitative and Quantitative Approaches. Pearson Education Limited.
Topic: What is Reliability and its Types?
Student Name: Kanwal Naz
Class: B.Ed 1.5
Project Name: “Young Teachers' Professional Development (TPD)"
"Project Founder: Prof. Dr. Amjad Ali Arain
Faculty of Education, University of Sindh, Pakistan
Reliability
Reliability refers to the extent to which a scale produces consistent results, if the measurements are repeated a number of times.
Reliability is a measure of the stability or consistency of test scores.
When a measurement procedure yields consistent scores when the phenomenon being measured is not changing
Degree to which scores are free of “Measurement Error Consistency of the measurement
Example: Weighing scale used multiple times in a day by the same individual
Types of reliability
Internal consistency reliability
Test-retest reliability
Split–half method
Inter-rater reliability
Internal consistency reliability
Also known as inter-item reliability.
It is the measure of how well the items on the test measure the same construct or idea.
Cronbach's Alpha
Cronbach's Alpha are most commonly used used to measure inter-item reliability to see if questionnaires with multiple questions are reliable. Value must by above 0.7.
Test-retest reliability
Test-retest reliability is a measure of reliability obtained by administering the same test twice over a period of time to same group of individuals.
Test-retest reliability is the degree to which scores are consistent over time.
Same test- different times
Example: Administering the same questionnaire at 2 different times such as IQ test.
Split–half method
A method of determining the reliability of a test by dividing the whole test into two halves and scoring the two halves separately.
Especially appropriate when the test is very long.
The most used method to split the test into two is using the odd-even strategy.
Inter-rater reliability
Inter-rater reliability is the extent to which two or more raters (or observers, coders, examiners) agree.
Inter-rater reliability is essential when making decisions in research and clinical settings.
References
Neuman, L. (2014). Social Research Methods: Qualitative and Quantitative Approaches. Pearson Education Limited.
Validity:
Validity refers to how well a test measures what it is purported to measure.
Types of Validity:
1. Logic valididty:
Validity which is in the form of theory, statements. It has 2 types.
I. Face Validity:
It is the extent to which the measurement method appears “on its face” to measure the construct of interest.
• Example:
• suppose you were taking an instrument reportedly measuring your attractiveness, but the questions were asking you to identify the correctly spelled word in each list
II. Content Validity:
Measuring all the aspects contributing to the variable of the interest.
Example:
For physical fitness temperature, height and stamina are supposed to be assess then a test of fitness must include content about temperatures, height and stamina.
2. Criterion
It is the extent to which people’s scores are correlated with other variables or criteria that reflect the same construct
Example:
An IQ test should correlate positively with school performance.
An occupational aptitude test should correlate positively with work performance.
Types of Criterion Validity
Concurrent validity:
• When the criterion is something that is happening or being assessed at the same time as the construct of interest, it is called concurrent validity.
• Example:
Beef test.
Predictive validity:
• A new measure of self-esteem should correlate positively with an old established measure. When the criterion is something that will happen or be assessed in the future, this is called predictive validity.
• Example:
GAT, SAT
Other types of validity
Internal Validity:
It is basically the extent to which a study is free from flaws and that any differences in a measurement are due to an independent variable and nothing else
External Validity
• It is the extent to which the results of a research study can be generalized to different situations, different groups of people, different settings, different conditions, etc.
Ethical consideration of Quantitative and Qualitative ResearchThiyagu K
Ethics has become a cornerstone for conducting effective and meaningful research. Research ethics provides guidelines for the responsible conduct of research. In addition, it educates and monitors scientists conducting research to ensure a high ethical standard. The following are some ethical principles: Accuracy, Credibility, Confidential, Transparency, Honesty, protecting, authenticity, originality, and plagiarism.
Validity:
Validity refers to how well a test measures what it is purported to measure.
Types of Validity:
1. Logic valididty:
Validity which is in the form of theory, statements. It has 2 types.
I. Face Validity:
It is the extent to which the measurement method appears “on its face” to measure the construct of interest.
• Example:
• suppose you were taking an instrument reportedly measuring your attractiveness, but the questions were asking you to identify the correctly spelled word in each list
II. Content Validity:
Measuring all the aspects contributing to the variable of the interest.
Example:
For physical fitness temperature, height and stamina are supposed to be assess then a test of fitness must include content about temperatures, height and stamina.
2. Criterion
It is the extent to which people’s scores are correlated with other variables or criteria that reflect the same construct
Example:
An IQ test should correlate positively with school performance.
An occupational aptitude test should correlate positively with work performance.
Types of Criterion Validity
Concurrent validity:
• When the criterion is something that is happening or being assessed at the same time as the construct of interest, it is called concurrent validity.
• Example:
Beef test.
Predictive validity:
• A new measure of self-esteem should correlate positively with an old established measure. When the criterion is something that will happen or be assessed in the future, this is called predictive validity.
• Example:
GAT, SAT
Other types of validity
Internal Validity:
It is basically the extent to which a study is free from flaws and that any differences in a measurement are due to an independent variable and nothing else
External Validity
• It is the extent to which the results of a research study can be generalized to different situations, different groups of people, different settings, different conditions, etc.
Ethical consideration of Quantitative and Qualitative ResearchThiyagu K
Ethics has become a cornerstone for conducting effective and meaningful research. Research ethics provides guidelines for the responsible conduct of research. In addition, it educates and monitors scientists conducting research to ensure a high ethical standard. The following are some ethical principles: Accuracy, Credibility, Confidential, Transparency, Honesty, protecting, authenticity, originality, and plagiarism.
In daily life, we often use "reliability" to denote trustworthiness, but in research, reliability and validity are different concepts. Reliability in data analysis signifies the consistency of replicating an outcome, while validity relates to the accuracy of a measurement. For example, if you measure a cup of rice multiple times with consistent results, that's reliability, while validity assesses how well the measurement accurately represents its intended quantity.
Faith & ReasonFaith is not opposed to reason, but is sometime.docxmecklenburgstrelitzh
Faith & Reason
“Faith is not opposed to reason, but is sometimes opposed to feelings and appeareances.” Tim Keller
How do faith and reason coexist for the Christian disciple? Do faith and reason oppose each other, work together, or end up at the same end goal from completely unrelated paths?
In Ephesians ch. 4, Paul writes:
Ephesians 4:11-15 New King James Version (NKJV)
11 And He Himself gave some to be apostles, some prophets, some evangelists, and some pastors and teachers, 12 for the equipping of the saints for the work of ministry, for the [a]edifying of the body of Christ, 13 till we all come to the unity of the faith and of the knowledge of the Son of God, to a perfect man, to the measure of the stature of the fullness of Christ; 14 that we should no longer be children, tossed to and fro and carried about with every wind of doctrine, by the trickery of men, in the cunning craftiness of deceitful plotting, 15 but, speaking the truth in love, may grow up in all things into Him who is the head—Christ—
Faith and knowledge /reason will always feed off one another as we grow in Christ.
Throughout the rest of this semester we will be discussing our faith and how we think through issues related and influenced by our faith.
Christian Reflections – Reflection paper 3-4 pages (1,050-1,400 words) APA format, include references.
To what extent is religious faith objective (i.e., based on reasons or evidence that should be obvious to others) and/or subjective (i.e., based on personal reasons that are not necessarily compelling to others)?
1) In what ways and to what extent do you believe that faith:
· Is derived from what we consider to be true and reasonable?
· Goes beyond what reason and evidence dictate?
· Goes against what is reasonable?
2) What is the role of feelings and emotions in religious faith?
· Does faith depend upon them?
· To what extent should they embraced or controlled?
1
Promoting Reliability
Both MacMillan and Dar (see below) provide suggestions on how promote reliability in classroom assessments. Doing the things mentioned
below can help control both external and internal sources of error which in turn helps bolster reliability of test scores.
McMillan’s (2006, p.51) suggestion on how to help bolster or promote reliability in the classroom assessments:
Motivated students to put forth their best efforts on assessment
Use sufficient number of items or tasks. A minimum of 5 items is needed to assess a single trait or skill
Construct items, scoring criteria, and tasks that clearly differentiate students on what is being assessed, and make the criteria
public
Make sure scoring procedures for constructed-response items are consistently applied to all students
Use independent raters or observers to score a sample of student responses, and check consistency with your evaluations
Build in as much objectivity into scoring as possible and still maintain the integrity of what is be.
This short SlideShare presentation explores a basic overview of test reliability and test validity. Validity is the degree to which a test measures what it is supposed to measure. Reliability is the degree to which a test consistently measures whatever it measures. Examples are given as well as a slide on considerations for writing test questions that demand higher-order thinking.
What we got covered?
1) What Is Industrial IoT
2) Application of Industrial IOT
3) Machine To Machine (M2M)
4) Benefits of Industrial IoT
5) Vendors in Industrial IoT
6) Features of Industrial IoT
A presentation of COVID-19. This presentation covers the following contents-
1) Symptoms
2) Flowchart
3) Deaths
4) Ratio of effect, Recovered & Death
5) Prevention
6) Vaccines and medicines
7) Treatment
There are thirteen technologies growing up quickly.
1) Blockchain
2) 5G Network
3) Autonomous Driving
4) Human Augmentation
5) Distributed Cloud
6) DARQ Age
7) Personal Profiling
8) AI Products
9) Data Policing
10) Momentary Markets
11) Automation
12) Reskilling Human Workforce
13) Medical Upgrade
Data Analysis in Research: Descriptive Statistics & NormalityIkbal Ahmed
A Presentation on Data Analysis using descriptive statistics & normality. From this presentation you can know-
1) What is Data
2) Types of Data
3) What is Data analysis
4) Descriptive Statistics
5) Tools for assessing normality
Theoretical and Conceptual framework in ResearchIkbal Ahmed
A presentation on Theoretical framework and conceptual framework of a research.From this presentation you can know -
1) About theory and 4 types of theory
2) How to choose a suitable theoretical framework for your research
3) How to make a good conceptual framework for your research
4) Find out Independent variable and dependent variable of your problem statement
5) Relation between TF & CF relative to Quantitative and Qualitative methodology
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
2. • Measurement involves assigning scores to individuals so that they represent some
characteristic of the individuals. But how do researchers know that the scores actually
represent the characteristic, especially when it is a construct like intelligence, self-
esteem, depression, or working memory capacity? The answer is that they conduct
research using the measure to confirm that the scores make sense based on their
understanding of the construct being measured. This is an extremely important point.
Psychologists do not simply assume that their measures work. Instead, they collect
data to demonstrate that they work. If their research does not demonstrate that a
measure works, they stop using it.
• As an informal example, imagine that you have been dieting for a month. Your
clothes seem to be fitting more loosely, and several friends have asked if you have
lost weight. If at this point your bathroom scale indicated that you had lost 10 pounds,
this would make sense and you would continue to use the scale. But if it indicated
that you had gained 10 pounds, you would rightly conclude that it was broken and
either fix it or get rid of it. In evaluating a measurement method, psychologists
consider two general dimensions: reliability and validity.
Why Reliability and validity?
3.
4. Reliability alone is not enough, measures need to be reliable, as well as, valid. For example, if a
weight measuring scale is wrong by 4kg (it deducts 4 kg of the actual weight), it can be specified
as reliable, because the scale displays the same weight every time we measure a specific item.
However, the scale is not valid because it does not display the actual weight of the item.
5. Quantitative Data
•Reliability
1. Test/retest
2. Internal Consistency
• Validity
1. Face validity
2. Content validity
3. Construct validity
4. Internal validity
5. External validity
6. Statistical conclusion validity
7. Criterion-related validity
Reliability is about the consistency of a measure,
and validity is about the accuracy of a measure
6. Reliability
• Reliability is the consistency of your measurement, or the
degree to which an instrument measures the same way each
time it is used under the same condition with the same subjects.
• In short, it is the repeatability of your measurement.
• A measure is considered reliable if a person's score on the
same test given twice is similar. It is important to remember that
reliability is not measured, it is estimated.
7. There are two ways that reliability is usually estimated:
1. test/retest
The more conservative method to estimate reliability. Simply put, the idea
behind test/retest is that you should get the same score on test 1 as you do on
test 2.
• The three main components to this method are as follows:
1. Implement your measurement instrument at two separate times for
each subject;
2. Compute the correlation between the two separate measurements;
3. Assume there is no change in the underlying condition (or trait you are
trying to measure) between test 1 and test 2.
8. 2. Internal Consistency
• Internal consistency estimates reliability by grouping questions in a questionnaire that measure the same concept. One common way of
computing correlation values among the questions on your instruments is by using Cronbach's Alpha. Cronbach’s Alpha Formula
• The formula for Cronbach’s alpha is:
Where:
• N = the number of items.
• c̄ = average covariance between item-pairs.
• v̄ = average variance.
• In short, Cronbach's alpha splits all the questions on your instrument every possible way and computes correlation values for them all
(we use a computer program for this part).
• In the end, your computer output generates one number for Cronbach's alpha - and just like a correlation coefficient, the closer it is to
one, the higher the reliability estimate of your instrument. Cronbach's alpha is a less conservative estimate of reliability than test/retest.
11. What is validity in research?
validity is about the accuracy of a measure
• Research validity in surveys relates to the extent at which the survey measures
right elements that need to be measured. In simple terms, validity refers to how
well an instrument as measures what it is intended to measure.
• Validity and reliability make the difference between “good” and “bad”
research reports. Quality research depends on a commitment to testing and
increasing the validity as well as the reliability of your research results.
Reliability alone is not enough, measures need to be reliable, as well as, valid. For
example, if a weight measuring scale is wrong by 4kg (it deducts 4 kg of the actual
weight), it can be specified as reliable, because the scale displays the same weight every
time we measure a specific item. However, the scale is not valid because it does not display
the actual weight of the item.
12. Types of Validity:
Here are the 7 key types of validity in research:
1. Face validity
2. Content validity
3. Construct validity
4. Internal validity
5. External validity
6. Statistical conclusion validity
7. Criterion-related validity
13. 1. Face validity
• Face validity is how valid your results seem based
on what they look like. This is the least scientific
method of validity, as it is not quantified using
statistical methods.
• Face validity is not validity in a technical sense of the
term. It is concerned with whether it seems like we
measure what we claim.
• Face validity is one of the methods to assess content
validity, it is important to check whether an
instrument is valid for a particular culture and also if
it contains unclear or unrelated items. But statistically
it is considered weak because the judgments might
be subjective.
14. 2. Content validity
• Content validity is also similar to face validity.
However, they both use different approaches to check
for validity. Face validity is an informal way to check
for validity; anyone could take a test at it’s “face
value” and say it looks good. Content validity uses
a more formal, statistics-based approach, usually
with experts in the field. These experts judge the
questions on how well they cover the material.
• Content validity and internal consistency are similar,
but they are not the same thing. Content validity is
how well an instrument (i.e. a test or questionnaire)
measures a theoretical construct. Internal consistency
measures how well some test items or
questions measure particular characteristics or
variables in the model. For example, you might have a
ten-question customer satisfaction survey with three
questions that test for “overall satisfaction with phone
service.” Testing those three questions for satisfaction
with phone service is an example of checking for
internal consistency; taking the whole survey and
making sure it measures “customer satisfaction”
would be an example of content validity.
15. 3. Construct validity
• Construct validity is one way to test the validity of a test; it’s used in education,
the social sciences, and psychology. It demonstrates that the test is actually
measuring the construct it claims it’s measuring. For example, you might try to
find out if an educational program increases emotional maturity in elementary
school age children. Construct validity would measure if your research is actually
measuring emotional maturity.
• An example is a measurement of the human brain, such as intelligence, level of
emotion, proficiency or ability.
• There are two types of construct validity-
• Convergent validity
• Discriminant validity
16. Convergent validity and discriminant
validity are commonly regarded as
subsets of construct validity.
• Convergent validity tests that constructs that are expected to be related are, in fact, related. Discriminant validity
(or divergent validity) tests that constructs that should have no relationship do, in fact, not have any relationship.
• Convergent and discriminant validity is that convergent validity tests whether constructs that should be
related, are related. Discriminant validity tests whether believed unrelated constructs are, in fact, unrelated.
• Imagine that a researcher wants to measure self-esteem(self-respect), but she also knows that the other four
constructs are related to self-esteem and have some overlap. The ultimate goal is to maker an attempt to isolate
self-esteem.
• In this example, convergent validity would test that the four other constructs are, in fact, related to self-esteem in
the study. The researcher would also check that self-worth and confidence, and social skills and self-appraisal,
are also related.
• Discriminant validity would ensure that, in the study, the non-overlapping factors do not overlap. For example,
self esteem and intelligence should not relate (too much) in most research projects.
• As you can see, separating and isolating constructs is difficult, and it is one of the factors that makes social
science extremely difficult.
• Social science rarely produces research that gives a yes or no answer, and the process of gathering knowledge is
slow and steady, building on top of what is already known.
17. Questionnaire
The THinK
questionnaire included
16 items, using a 5-level
Likert scale (yes, much,
somewhat, little, no).
Questions concerned
general knowledge
about vaccination
(acceptance,
administration,
effectiveness), HPV and
related risks and
acceptability of vaccine.
The age, birthplace and
education of each
respondent were
requested too.
18. Cronbach’s alpha
Whole questionnaire 0.816
kHPV 0.882
aHPV 0.784
KV 0.732
Internal consistency of the THinK questionnaire.
kHPV = knowledge of HPV infection
aHPV = Attitude to get vaccinated against HPV
KV = Knowledge about vaccines
19. 4. Internal validity
• Internal validity is the extent to which a study establishes a trustworthy cause-and-
effect relationship between a treatment and an outcome.It also reflects that a given study
makes it possible to eliminate alternative explanations for a finding. For example, if you
implement a smoking cessation(pause) program with a group of individuals, how sure can
you be that any improvement seen in the treatment group is due to the treatment that you
administered?
• Internal validity depends largely on the procedures of a study and how rigorously it is
performed.
• Internal validity is not a "yes or no" type of concept. Instead, we consider how confident
we can be with the findings of a study, based on whether it avoids traps that may make the
findings questionable.
• The less chance there is for "confounding“(puzzle) in a study, the higher the internal
validity and the more confident we can be in the findings. Confounding refers to a
situation in which other factors come into play that confuses the outcome of a study. For
instance, a study might make us unsure as to whether we can trust that we have identified
the above "cause-and-effect" scenario.
20. 5. External validity
• External validity refers to how well the outcome of a study can be expected to
apply to other settings. In other words, this type of validity refers to how
generalizable the findings are. For instance, do the findings apply to other people,
settings, situations, and time periods?
• Ecological validity, an aspect of external validity, refers to whether a study's
findings can be generalized to the real world.
• While rigorous research methods can ensure internal validity, external validity, on
the other hand, may be limited by these methods.
• Another term called transferability relates to external validity and refers to
the qualitative research design. Transferability refers to whether results transfer to
situations with similar characteristics.
21. External vs. Internal Validity
Internal validity is a way to gauge how strong your research
methods were. External validity helps to answer the question: can the
research be applied to the “real world”? If your research is applicable to
other situations, external validity is high. If the research cannot be replicated
in other situations, external validity is low.
22. 6. Statistical conclusion validity
Statistical Conclusion Validity(SCV), or just Conclusion Validity is a measure of
how reasonable a research or experimental conclusion is. For example, let’s say you
ran some research to find out if two years of preschool is more effective than one.
Based on the data, you conclude that there’s a positive relationship between how
well a child does in school and how many years of preschool they attended.
Conclusion validity well tell you how reliable that conclusion is.
Conclusion validity is only concerned with the question:
Based on the data, is there a relationship or isn’t there? It
doesn’t delve into specifics (like reliability tests) about what
kinds of relationship exist. It can be used for qualitative
research as well as quantitative research. That said, if you use the
term statistical conclusion validity, that’s usually taken as
meaning there’s some type of statistical data analysis involves
(i.e. that your research has quantitative data).
23. 7. Criterion-related validity
• Criterion validity (or criterion-related validity) measures how well one measure predicts an
outcome for another measure. A test has this type of validity if it is useful for predicting
performance or behavior in another situation (past, present, or future). For example:
• A job applicant takes a performance test during the interview process. If this test accurately
predicts how well the employee will perform on the job, the test is said to have criterion validity.
• A graduate student takes the GRE. The GRE has been shown as an effective tool (i.e. it has
criterion validity) for predicting how well a student will perform in graduate studies.
• The first measure (in the above examples, the job performance test and the GRE) is sometimes
called the predictor variable or the estimator. The second measure is called the criterion
variable as long as the measure is known to be a valid tool for predicting outcomes.