SlideShare a Scribd company logo
July 2017 Professor Paul Irwing
Validity Analysis
Analysis of predictive validity for job performance and educational
attainment for the MAT80.
Paul Irwing
July, 2017
In order to estimate the predictive validity of the MAT80 for job performance
we used a synthetic validity approach based on meta-analysis of equivalent
scales.
The useful validity of personality assessments for predicting job performance is
estimated at 0.27 by the most definitive meta-analysis (Barrick, Mount &
Judge, 2001). This is a useful but rather small predictive validity. In contrast, the
predictive validity of other components of the MAT80 are generally much
larger: with cognitive ability at 0.68 (Schmidt, Shaffer, & Oh, 2008), and
creativity at 0.50 (Harari, Reaves, & Viswesvaran, 2016); although intrinsic
motivation has a similar predictive validity to personality at 0.26 (Cerasoli,
Nicklin, & Ford, 2014). It is the combination of assessments of cognitive ability,
creativity, personality and intrinsic motivation, which contribute to the overall
score on the MAT80, which need to be combined in order to estimate its
predictive validity.
We corrected all of these predictive validities downwards to allow for
attenuation due to measurement error, and then combined the validities
using multiple regression. The resultant synthetic validity of the MAT80 for the
prediction of job performance was 0.78.
July 2017 Professor Paul Irwing
Validity Analysis
At the same time we carried out a conventional validity study for the
prediction of educational attainment using a sample of MBA students (N =
1999). On the same basis as used by Kunzel, Credo and Thomas (2007), the
predictive validity of the MAT80 was estimated at 0.55 or 0.66 depending on
whether the Business Reasoning Test was included in the scoring, which
compares favourably to the validity for the GMAT at 0.47.
Why are these estimates of predictive validity so different. Clearly one reason
is that the outcome criteria are different. However, probably, the main reason
is that traditional validity studies underestimate true validities because they
are underpowered (see below).
A second question is why the MAT80 predicts so much better than other tests.
Obviously, the answer to this question depends on which comparison you
choose to make.
The most appropriate comparison is the screening test employed for much
large volume recruitment. Usually this includes a personality test which is some
variant of the Five Factor model, and in some cases a few cognitive ability
tests will also be included.
What specific advantages does the MAT80 confer?
Firstly, MAT80 has been designed as a customized screening test incorporating
personality, ability and the psychometric basis for making decisions based on
the outputs.
Secondly, there are two elements of the MAT80 which are not normally found
in a screening test. The most important of these are six scales which measure
creativity originally taken from the Me2 diagnostic tool, but then subsequently
July 2017 Professor Paul Irwing
Validity Analysis
developed as part of the MAT80. These scales have been developed in line
with recommendations in my (Irwing & Hughes, in press) chapter on test
development, in the forthcoming Wiley Handbook of Psychometric Testing,
and the description of this development is contained in the technical
manuals for Me2 and the MAT80. These tests have therefore been developed
in line with state-of-the-art procedures. We do not believe that there are any
comparable tests in existence. That is there are no self-rating creativity
measures which directly rate creative performance, an omission highlighted
in Harari, et al., 2016.
The importance of this is demonstrated in the findings of Harari, et al.’s (2016)
meta-analysis with regard to those creative and innovative performance
scales currently in existence. Overall, on the basis of 28 studies with N = 7660,
the population relationship of such scales with task performance ratings was
0.55. When self-ratings were employed, which is the case with the MAT80 the
level of prediction dropped to 0.50. However those rating scales which
concerned creativity alone, as is the case in the MAT80, achieved a higher
level of validity at 0.59. According to the meta-analytic data, therefore, we
can estimate the predictive validity of self-rated creativity rating scales at
0.54. However this is the maximum possible validity, whereas operational
validities will depend on the reliability of the tests employed. The composite
reliability of the six MAT80 scales is 0.97. This level of reliability matches the gold
standard of reliability attained by cognitive ability tests such as the WAIS III,
and the Woodcock-Johnson III. This level of reliability means that the
achieved operational validity of these scales, using the meta-analytic
findings, is 0.52, not far short of the theoretical maximum.
July 2017 Professor Paul Irwing
Validity Analysis
This level of validity is substantially higher than the operational validity of any
currently existent creativity rating scales, and in fact, as noted above, there is
no such scale currently available suitable for self-rating. Of course, creativity
does correlate to a small degree with both cognitive ability and personality,
so the increment in predictive validity achieved by adding a highly reliable
creativity scale is slightly smaller than implied by the predictive validity of 0.52.
Nevertheless, that the MAT80 provides a highly predictive measure of creativity
gives it a unique advantage over any screening tests currently in use.
A second crucial decision which potentially has a massive impact on the
predictive validity of a test is how the specification equation is derived. The
specification equation is based on the scales included in the test battery. In
the case of the MAT80 there are 15 or 16 scale scores to choose from
depending on whether the Business Reasoning test is scored. There are three
decisions which must be made in order to derive a specification equation.
The first decision is which scales to include in the specification equation, the
second is what weight to apply and the third is whether to use broad traits or
facets as predictors.
You might imagine that the question of which scales to include is
straightforward, surely you should include all the scales, and otherwise why
are they present in the test. However, mostly specification equations only
employ a small subset of the test’s potential predictors. The reason for this is
that specification equations are normally based on validity studies. Because
validity studies are typically quite small, they can only accurately assess the
weights for a small number of predictors.
July 2017 Professor Paul Irwing
Validity Analysis
In the MAT80 we used a quite different approach which has only really
become feasible very recently. We based the calculation of weights for each
scale included in the specification equation on the meta-analyses listed
above plus Judge, Rodell, Kliner, Simon and Crawford (2013). You will note
that the most recent of these was only published in 2016, so up to then such
an approach was not viable. The advantage of using meta-analytic data is
that the sample sizes are very large, and therefore accurate weights can be
calculated for all of the potential predictor scales. The ability in the MAT80 to
include all scales and calculate accurate weights, alone leads to an
appreciable increment in predictive validity as compared with past practice.
The third decision is whether to use broad traits or facets as predictors. For
example, in terms of the Five Factor Model (FFM) the decision is whether to
use the broad factors of openness-to-experience, conscientiousness,
extraversion, agreeableness, and emotional stability or whether to use the
facets of personality which make up these broad factors: In the case of the
FFM, six facets per broad factor.
There is a long history of debate on this issue. One extremely influential article
which argued very strongly for the use of the broad factors as predictors was
Ones and Viswesvaran (1996). It is not possible to assess the extent to which
psychometric companies followed this recommendation, but the arguments
contained in this article were very powerful. However, recently a meta-
analysis has been able to directly compare the predictive validity of facets
versus broad factors (Judge et al., 2013). The comparative validities expressed
as R2s for the outcome of overall job performance were: Conscientiousness
(facets = 6.8%, broad factor = 6.7%, Agreeableness (facets = 3.7%, broad
factor = 2.7%), Neuroticism (facets = 5.2%, broad factor = 1.0%), Openness
July 2017 Professor Paul Irwing
Validity Analysis
(facets = 9.0%, broad factor = 0.6%), and Extraversion (facets = 16.5%, broad
factor = 4.0%). While some of these differences are relatively small, in the case
of Openness, Extraversion and Neuroticism, the gain in increased predictive
validity obtained by using facets versus broad factors is staggeringly large. If
one sums these R2s to provide an approximate comparison of the validities of
facets versus broad factors, facets explain 41.2% of variance in overall job
performance and broad factors explain 15%. Of course both figures are over
estimates because personality traits are correlated, nevertheless it is
apparent, even allowing for the somewhat lower reliability of the shorter facet
scales, that there is a considerable advantage in using facets as predictors,
provided you use valid weights. For this reason the MAT80 uses facet level
prediction, which confers an advantage over any test which uses broad traits.
A final clear advantage of the MAT80 is that the development of each of its
scales followed state-of-the-art procedures as outlined in the Wiley Handbook
of Psychometric Testing (Irwing, Booth & Hughes, in press). The Handbook
advocates a ten stage model of test development as shown in Table 1. Each
stage involves numerous micro-decisions. With the possible exception of
decisions in stage ten, each of these decisions, if made correctly, adds a
small increment to test reliability and validity. As an example the item
development procedure used in the MAT80 followed the 13 step procedure
shown in Figure 1.
July 2017 Professor Paul Irwing
Validity Analysis
Table 1.1 Stages of Test Development
Stages and sub-stages
1. Construct definition, specification of test need, test structure.
2. Overall planning.
3. Item development .
a. Construct definition.
b. Item generation: theory versus sampling.
c. Item review.
d. Piloting of items.
4. Scale construction – factor analysis and Item Response Theory (IRT).
5. Reliability.
6. Validation.
7. Test scoring and norming.
8. Test specification.
9. Implementation and testing.
10. Technical Manual.
July 2017 Professor Paul Irwing
Validity Analysis
While undoubtedly some tests have followed some of the recommendations
outlined in the Handbook, current tests will not have followed all
recommendations optimally, and many tests will only have followed a few of
the recommendations. Used in concert, employment of optimal test
development procedures will again have conferred a substantial advantage
on the scales used in the MAT80, which will have contributed to the increased
level of reliability and validity evidenced by the MAT80 test.
Figure 1. Item development process used in devising the MAT80.

More Related Content

What's hot

Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Matt Hansen
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Matt Hansen
 
Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)
Matt Hansen
 
Chp5 - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
Chp5  - Research Methods for Business By Authors Uma Sekaran and Roger BougieChp5  - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
Chp5 - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
Hassan Usman
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:1)Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:1)
Matt Hansen
 
Basic Statistical Concepts & Decision-Making
Basic Statistical Concepts & Decision-MakingBasic Statistical Concepts & Decision-Making
Basic Statistical Concepts & Decision-MakingPenn State University
 
Hypothesis Testing: Statistical Laws and Confidence Intervals
Hypothesis Testing: Statistical Laws and Confidence IntervalsHypothesis Testing: Statistical Laws and Confidence Intervals
Hypothesis Testing: Statistical Laws and Confidence Intervals
Matt Hansen
 
Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)
Matt Hansen
 
Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)
Matt Hansen
 
Hypothesis Testing: Relationships (Overview)
Hypothesis Testing: Relationships (Overview)Hypothesis Testing: Relationships (Overview)
Hypothesis Testing: Relationships (Overview)
Matt Hansen
 
Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)
Matt Hansen
 
Business Optimization via Causal Inference
Business Optimization via Causal InferenceBusiness Optimization via Causal Inference
Business Optimization via Causal Inference
Hanan Shteingart
 
Neuroecon Seminar Pres
Neuroecon Seminar PresNeuroecon Seminar Pres
Neuroecon Seminar Prestkvaran
 
Hypothesis Tests in R Programming
Hypothesis Tests in R ProgrammingHypothesis Tests in R Programming
Hypothesis Tests in R Programming
Atacan Garip
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine Learning
Bill Liu
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
Amit Sharma
 
How to evaulate the unusualness (base rate) of WJ IV cluster or test score di...
How to evaulate the unusualness (base rate) of WJ IV cluster or test score di...How to evaulate the unusualness (base rate) of WJ IV cluster or test score di...
How to evaulate the unusualness (base rate) of WJ IV cluster or test score di...
Kevin McGrew
 
The Business Value of Reinforcement Learning and Causal Inference
The Business Value of Reinforcement Learning and Causal InferenceThe Business Value of Reinforcement Learning and Causal Inference
The Business Value of Reinforcement Learning and Causal Inference
Hanan Shteingart
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection Reliability
Julián Urbano
 
Lecture 7
Lecture 7Lecture 7
Lecture 7butest
 

What's hot (20)

Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Normal (Compare 2+ Factors)
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 2+ Factors)
 
Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)Hypothesis Testing: Relationships (Compare 1:1)
Hypothesis Testing: Relationships (Compare 1:1)
 
Chp5 - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
Chp5  - Research Methods for Business By Authors Uma Sekaran and Roger BougieChp5  - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
Chp5 - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
 
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:1)Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:1)
Hypothesis Testing: Central Tendency – Non-Normal (Compare 1:1)
 
Basic Statistical Concepts & Decision-Making
Basic Statistical Concepts & Decision-MakingBasic Statistical Concepts & Decision-Making
Basic Statistical Concepts & Decision-Making
 
Hypothesis Testing: Statistical Laws and Confidence Intervals
Hypothesis Testing: Statistical Laws and Confidence IntervalsHypothesis Testing: Statistical Laws and Confidence Intervals
Hypothesis Testing: Statistical Laws and Confidence Intervals
 
Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)
Hypothesis Testing: Central Tendency – Normal (Compare 1:Standard)
 
Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)Hypothesis Testing: Spread (Compare 1:1)
Hypothesis Testing: Spread (Compare 1:1)
 
Hypothesis Testing: Relationships (Overview)
Hypothesis Testing: Relationships (Overview)Hypothesis Testing: Relationships (Overview)
Hypothesis Testing: Relationships (Overview)
 
Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)Hypothesis Testing: Proportions (Compare 1:1)
Hypothesis Testing: Proportions (Compare 1:1)
 
Business Optimization via Causal Inference
Business Optimization via Causal InferenceBusiness Optimization via Causal Inference
Business Optimization via Causal Inference
 
Neuroecon Seminar Pres
Neuroecon Seminar PresNeuroecon Seminar Pres
Neuroecon Seminar Pres
 
Hypothesis Tests in R Programming
Hypothesis Tests in R ProgrammingHypothesis Tests in R Programming
Hypothesis Tests in R Programming
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine Learning
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
 
How to evaulate the unusualness (base rate) of WJ IV cluster or test score di...
How to evaulate the unusualness (base rate) of WJ IV cluster or test score di...How to evaulate the unusualness (base rate) of WJ IV cluster or test score di...
How to evaulate the unusualness (base rate) of WJ IV cluster or test score di...
 
The Business Value of Reinforcement Learning and Causal Inference
The Business Value of Reinforcement Learning and Causal InferenceThe Business Value of Reinforcement Learning and Causal Inference
The Business Value of Reinforcement Learning and Causal Inference
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection Reliability
 
Lecture 7
Lecture 7Lecture 7
Lecture 7
 

Similar to MAT80 - White paper july 2017 - Prof. P. Irwing

Statistics For Bi
Statistics For BiStatistics For Bi
Statistics For Bi
Angela Hays
 
Discretion-Related Validity
Discretion-Related ValidityDiscretion-Related Validity
Discretion-Related Validity
Kate Loge
 
Sample Size Determination
Sample Size DeterminationSample Size Determination
Ismail+Reid2016 - Ask The Experts - The Actuary (June 2016)
Ismail+Reid2016 - Ask The Experts - The Actuary (June 2016)Ismail+Reid2016 - Ask The Experts - The Actuary (June 2016)
Ismail+Reid2016 - Ask The Experts - The Actuary (June 2016)Raveem Ismail
 
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby AFOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
JeanmarieColbert3
 
DEDEVELOPMENT ASSESMENT PROCESS AND ASSESMENT CENTRE PROPO.docx
DEDEVELOPMENT ASSESMENT PROCESS AND ASSESMENT CENTRE PROPO.docxDEDEVELOPMENT ASSESMENT PROCESS AND ASSESMENT CENTRE PROPO.docx
DEDEVELOPMENT ASSESMENT PROCESS AND ASSESMENT CENTRE PROPO.docx
vickeryr87
 
Running head Organization behaviorOrganization behavior 2.docx
Running head Organization behaviorOrganization behavior 2.docxRunning head Organization behaviorOrganization behavior 2.docx
Running head Organization behaviorOrganization behavior 2.docx
toltonkendal
 
Week 6 Group Assignment
Week 6 Group AssignmentWeek 6 Group Assignment
Week 6 Group AssignmentKarli Stanley
 
1c6fbee33774bc2f422a5152ac43539d660e.pdf
1c6fbee33774bc2f422a5152ac43539d660e.pdf1c6fbee33774bc2f422a5152ac43539d660e.pdf
1c6fbee33774bc2f422a5152ac43539d660e.pdf
Vasun13
 
Statistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxStatistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docx
darwinming1
 
Telco Stakeholders
Telco StakeholdersTelco Stakeholders
Telco Stakeholders
Tracy Clark
 
Reviewing quantitative articles_and_checklist
Reviewing quantitative articles_and_checklistReviewing quantitative articles_and_checklist
Reviewing quantitative articles_and_checklist
Lasse Torkkeli
 
Asking When, Not If in Predictive Modeling
Asking When, Not If in Predictive ModelingAsking When, Not If in Predictive Modeling
Asking When, Not If in Predictive Modeling
Andrea Kropp
 
Statistics for Business Decision-making
Statistics for Business Decision-makingStatistics for Business Decision-making
Statistics for Business Decision-making
Jason Martuscello
 
Characteristics of effective tests and hiring
Characteristics of effective tests and hiringCharacteristics of effective tests and hiring
Characteristics of effective tests and hiring
Binibining Kalawakan
 
Dynamic Stress Test Diffusion Model Considering The Credit Score Performance
Dynamic Stress Test Diffusion Model Considering The Credit Score PerformanceDynamic Stress Test Diffusion Model Considering The Credit Score Performance
Dynamic Stress Test Diffusion Model Considering The Credit Score Performance
GRATeam
 
2016 Symposium Poster - statistics - Final
2016 Symposium Poster - statistics - Final2016 Symposium Poster - statistics - Final
2016 Symposium Poster - statistics - FinalBrian Lin
 

Similar to MAT80 - White paper july 2017 - Prof. P. Irwing (20)

Statistics For Bi
Statistics For BiStatistics For Bi
Statistics For Bi
 
Discretion-Related Validity
Discretion-Related ValidityDiscretion-Related Validity
Discretion-Related Validity
 
Sample Size Determination
Sample Size DeterminationSample Size Determination
Sample Size Determination
 
Ismail+Reid2016 - Ask The Experts - The Actuary (June 2016)
Ismail+Reid2016 - Ask The Experts - The Actuary (June 2016)Ismail+Reid2016 - Ask The Experts - The Actuary (June 2016)
Ismail+Reid2016 - Ask The Experts - The Actuary (June 2016)
 
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby AFOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
FOUR TYPES OF BUSINESS ANALYTICS TO KNOWBUSINESS ANALYTICSby A
 
DEDEVELOPMENT ASSESMENT PROCESS AND ASSESMENT CENTRE PROPO.docx
DEDEVELOPMENT ASSESMENT PROCESS AND ASSESMENT CENTRE PROPO.docxDEDEVELOPMENT ASSESMENT PROCESS AND ASSESMENT CENTRE PROPO.docx
DEDEVELOPMENT ASSESMENT PROCESS AND ASSESMENT CENTRE PROPO.docx
 
Running head Organization behaviorOrganization behavior 2.docx
Running head Organization behaviorOrganization behavior 2.docxRunning head Organization behaviorOrganization behavior 2.docx
Running head Organization behaviorOrganization behavior 2.docx
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validity
 
Week 6 Group Assignment
Week 6 Group AssignmentWeek 6 Group Assignment
Week 6 Group Assignment
 
1c6fbee33774bc2f422a5152ac43539d660e.pdf
1c6fbee33774bc2f422a5152ac43539d660e.pdf1c6fbee33774bc2f422a5152ac43539d660e.pdf
1c6fbee33774bc2f422a5152ac43539d660e.pdf
 
Statistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxStatistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docx
 
Telco Stakeholders
Telco StakeholdersTelco Stakeholders
Telco Stakeholders
 
Reviewing quantitative articles_and_checklist
Reviewing quantitative articles_and_checklistReviewing quantitative articles_and_checklist
Reviewing quantitative articles_and_checklist
 
Asking When, Not If in Predictive Modeling
Asking When, Not If in Predictive ModelingAsking When, Not If in Predictive Modeling
Asking When, Not If in Predictive Modeling
 
Statistics for Business Decision-making
Statistics for Business Decision-makingStatistics for Business Decision-making
Statistics for Business Decision-making
 
Seawell_Exam
Seawell_ExamSeawell_Exam
Seawell_Exam
 
Characteristics of effective tests and hiring
Characteristics of effective tests and hiringCharacteristics of effective tests and hiring
Characteristics of effective tests and hiring
 
SDM Mini Project Group F
SDM Mini Project Group FSDM Mini Project Group F
SDM Mini Project Group F
 
Dynamic Stress Test Diffusion Model Considering The Credit Score Performance
Dynamic Stress Test Diffusion Model Considering The Credit Score PerformanceDynamic Stress Test Diffusion Model Considering The Credit Score Performance
Dynamic Stress Test Diffusion Model Considering The Credit Score Performance
 
2016 Symposium Poster - statistics - Final
2016 Symposium Poster - statistics - Final2016 Symposium Poster - statistics - Final
2016 Symposium Poster - statistics - Final
 

MAT80 - White paper july 2017 - Prof. P. Irwing

  • 1. July 2017 Professor Paul Irwing Validity Analysis Analysis of predictive validity for job performance and educational attainment for the MAT80. Paul Irwing July, 2017 In order to estimate the predictive validity of the MAT80 for job performance we used a synthetic validity approach based on meta-analysis of equivalent scales. The useful validity of personality assessments for predicting job performance is estimated at 0.27 by the most definitive meta-analysis (Barrick, Mount & Judge, 2001). This is a useful but rather small predictive validity. In contrast, the predictive validity of other components of the MAT80 are generally much larger: with cognitive ability at 0.68 (Schmidt, Shaffer, & Oh, 2008), and creativity at 0.50 (Harari, Reaves, & Viswesvaran, 2016); although intrinsic motivation has a similar predictive validity to personality at 0.26 (Cerasoli, Nicklin, & Ford, 2014). It is the combination of assessments of cognitive ability, creativity, personality and intrinsic motivation, which contribute to the overall score on the MAT80, which need to be combined in order to estimate its predictive validity. We corrected all of these predictive validities downwards to allow for attenuation due to measurement error, and then combined the validities using multiple regression. The resultant synthetic validity of the MAT80 for the prediction of job performance was 0.78.
  • 2. July 2017 Professor Paul Irwing Validity Analysis At the same time we carried out a conventional validity study for the prediction of educational attainment using a sample of MBA students (N = 1999). On the same basis as used by Kunzel, Credo and Thomas (2007), the predictive validity of the MAT80 was estimated at 0.55 or 0.66 depending on whether the Business Reasoning Test was included in the scoring, which compares favourably to the validity for the GMAT at 0.47. Why are these estimates of predictive validity so different. Clearly one reason is that the outcome criteria are different. However, probably, the main reason is that traditional validity studies underestimate true validities because they are underpowered (see below). A second question is why the MAT80 predicts so much better than other tests. Obviously, the answer to this question depends on which comparison you choose to make. The most appropriate comparison is the screening test employed for much large volume recruitment. Usually this includes a personality test which is some variant of the Five Factor model, and in some cases a few cognitive ability tests will also be included. What specific advantages does the MAT80 confer? Firstly, MAT80 has been designed as a customized screening test incorporating personality, ability and the psychometric basis for making decisions based on the outputs. Secondly, there are two elements of the MAT80 which are not normally found in a screening test. The most important of these are six scales which measure creativity originally taken from the Me2 diagnostic tool, but then subsequently
  • 3. July 2017 Professor Paul Irwing Validity Analysis developed as part of the MAT80. These scales have been developed in line with recommendations in my (Irwing & Hughes, in press) chapter on test development, in the forthcoming Wiley Handbook of Psychometric Testing, and the description of this development is contained in the technical manuals for Me2 and the MAT80. These tests have therefore been developed in line with state-of-the-art procedures. We do not believe that there are any comparable tests in existence. That is there are no self-rating creativity measures which directly rate creative performance, an omission highlighted in Harari, et al., 2016. The importance of this is demonstrated in the findings of Harari, et al.’s (2016) meta-analysis with regard to those creative and innovative performance scales currently in existence. Overall, on the basis of 28 studies with N = 7660, the population relationship of such scales with task performance ratings was 0.55. When self-ratings were employed, which is the case with the MAT80 the level of prediction dropped to 0.50. However those rating scales which concerned creativity alone, as is the case in the MAT80, achieved a higher level of validity at 0.59. According to the meta-analytic data, therefore, we can estimate the predictive validity of self-rated creativity rating scales at 0.54. However this is the maximum possible validity, whereas operational validities will depend on the reliability of the tests employed. The composite reliability of the six MAT80 scales is 0.97. This level of reliability matches the gold standard of reliability attained by cognitive ability tests such as the WAIS III, and the Woodcock-Johnson III. This level of reliability means that the achieved operational validity of these scales, using the meta-analytic findings, is 0.52, not far short of the theoretical maximum.
  • 4. July 2017 Professor Paul Irwing Validity Analysis This level of validity is substantially higher than the operational validity of any currently existent creativity rating scales, and in fact, as noted above, there is no such scale currently available suitable for self-rating. Of course, creativity does correlate to a small degree with both cognitive ability and personality, so the increment in predictive validity achieved by adding a highly reliable creativity scale is slightly smaller than implied by the predictive validity of 0.52. Nevertheless, that the MAT80 provides a highly predictive measure of creativity gives it a unique advantage over any screening tests currently in use. A second crucial decision which potentially has a massive impact on the predictive validity of a test is how the specification equation is derived. The specification equation is based on the scales included in the test battery. In the case of the MAT80 there are 15 or 16 scale scores to choose from depending on whether the Business Reasoning test is scored. There are three decisions which must be made in order to derive a specification equation. The first decision is which scales to include in the specification equation, the second is what weight to apply and the third is whether to use broad traits or facets as predictors. You might imagine that the question of which scales to include is straightforward, surely you should include all the scales, and otherwise why are they present in the test. However, mostly specification equations only employ a small subset of the test’s potential predictors. The reason for this is that specification equations are normally based on validity studies. Because validity studies are typically quite small, they can only accurately assess the weights for a small number of predictors.
  • 5. July 2017 Professor Paul Irwing Validity Analysis In the MAT80 we used a quite different approach which has only really become feasible very recently. We based the calculation of weights for each scale included in the specification equation on the meta-analyses listed above plus Judge, Rodell, Kliner, Simon and Crawford (2013). You will note that the most recent of these was only published in 2016, so up to then such an approach was not viable. The advantage of using meta-analytic data is that the sample sizes are very large, and therefore accurate weights can be calculated for all of the potential predictor scales. The ability in the MAT80 to include all scales and calculate accurate weights, alone leads to an appreciable increment in predictive validity as compared with past practice. The third decision is whether to use broad traits or facets as predictors. For example, in terms of the Five Factor Model (FFM) the decision is whether to use the broad factors of openness-to-experience, conscientiousness, extraversion, agreeableness, and emotional stability or whether to use the facets of personality which make up these broad factors: In the case of the FFM, six facets per broad factor. There is a long history of debate on this issue. One extremely influential article which argued very strongly for the use of the broad factors as predictors was Ones and Viswesvaran (1996). It is not possible to assess the extent to which psychometric companies followed this recommendation, but the arguments contained in this article were very powerful. However, recently a meta- analysis has been able to directly compare the predictive validity of facets versus broad factors (Judge et al., 2013). The comparative validities expressed as R2s for the outcome of overall job performance were: Conscientiousness (facets = 6.8%, broad factor = 6.7%, Agreeableness (facets = 3.7%, broad factor = 2.7%), Neuroticism (facets = 5.2%, broad factor = 1.0%), Openness
  • 6. July 2017 Professor Paul Irwing Validity Analysis (facets = 9.0%, broad factor = 0.6%), and Extraversion (facets = 16.5%, broad factor = 4.0%). While some of these differences are relatively small, in the case of Openness, Extraversion and Neuroticism, the gain in increased predictive validity obtained by using facets versus broad factors is staggeringly large. If one sums these R2s to provide an approximate comparison of the validities of facets versus broad factors, facets explain 41.2% of variance in overall job performance and broad factors explain 15%. Of course both figures are over estimates because personality traits are correlated, nevertheless it is apparent, even allowing for the somewhat lower reliability of the shorter facet scales, that there is a considerable advantage in using facets as predictors, provided you use valid weights. For this reason the MAT80 uses facet level prediction, which confers an advantage over any test which uses broad traits. A final clear advantage of the MAT80 is that the development of each of its scales followed state-of-the-art procedures as outlined in the Wiley Handbook of Psychometric Testing (Irwing, Booth & Hughes, in press). The Handbook advocates a ten stage model of test development as shown in Table 1. Each stage involves numerous micro-decisions. With the possible exception of decisions in stage ten, each of these decisions, if made correctly, adds a small increment to test reliability and validity. As an example the item development procedure used in the MAT80 followed the 13 step procedure shown in Figure 1.
  • 7. July 2017 Professor Paul Irwing Validity Analysis Table 1.1 Stages of Test Development Stages and sub-stages 1. Construct definition, specification of test need, test structure. 2. Overall planning. 3. Item development . a. Construct definition. b. Item generation: theory versus sampling. c. Item review. d. Piloting of items. 4. Scale construction – factor analysis and Item Response Theory (IRT). 5. Reliability. 6. Validation. 7. Test scoring and norming. 8. Test specification. 9. Implementation and testing. 10. Technical Manual.
  • 8. July 2017 Professor Paul Irwing Validity Analysis While undoubtedly some tests have followed some of the recommendations outlined in the Handbook, current tests will not have followed all recommendations optimally, and many tests will only have followed a few of the recommendations. Used in concert, employment of optimal test development procedures will again have conferred a substantial advantage on the scales used in the MAT80, which will have contributed to the increased level of reliability and validity evidenced by the MAT80 test. Figure 1. Item development process used in devising the MAT80.