MAT80 - White paper july 2017 - Prof. P. Irwing

July 2017 Professor Paul Irwing
Validity Analysis
Analysis of predictive validity for job performance and educational
attainment for the MAT80.
Paul Irwing
July, 2017
In order to estimate the predictive validity of the MAT80 for job performance
we used a synthetic validity approach based on meta-analysis of equivalent
scales.
The useful validity of personality assessments for predicting job performance is
estimated at 0.27 by the most definitive meta-analysis (Barrick, Mount &
Judge, 2001). This is a useful but rather small predictive validity. In contrast, the
predictive validity of other components of the MAT80 are generally much
larger: with cognitive ability at 0.68 (Schmidt, Shaffer, & Oh, 2008), and
creativity at 0.50 (Harari, Reaves, & Viswesvaran, 2016); although intrinsic
motivation has a similar predictive validity to personality at 0.26 (Cerasoli,
Nicklin, & Ford, 2014). It is the combination of assessments of cognitive ability,
creativity, personality and intrinsic motivation, which contribute to the overall
score on the MAT80, which need to be combined in order to estimate its
predictive validity.
We corrected all of these predictive validities downwards to allow for
attenuation due to measurement error, and then combined the validities
using multiple regression. The resultant synthetic validity of the MAT80 for the
prediction of job performance was 0.78.

Validity Analysis
At the same time we carried out a conventional validity study for the
prediction of educational attainment using a sample of MBA students (N =
1999). On the same basis as used by Kunzel, Credo and Thomas (2007), the
predictive validity of the MAT80 was estimated at 0.55 or 0.66 depending on
whether the Business Reasoning Test was included in the scoring, which
compares favourably to the validity for the GMAT at 0.47.
Why are these estimates of predictive validity so different. Clearly one reason
is that the outcome criteria are different. However, probably, the main reason
is that traditional validity studies underestimate true validities because they
are underpowered (see below).
A second question is why the MAT80 predicts so much better than other tests.
Obviously, the answer to this question depends on which comparison you
choose to make.
The most appropriate comparison is the screening test employed for much
large volume recruitment. Usually this includes a personality test which is some
variant of the Five Factor model, and in some cases a few cognitive ability
tests will also be included.
What specific advantages does the MAT80 confer?
Firstly, MAT80 has been designed as a customized screening test incorporating
personality, ability and the psychometric basis for making decisions based on
the outputs.
Secondly, there are two elements of the MAT80 which are not normally found
in a screening test. The most important of these are six scales which measure
creativity originally taken from the Me2 diagnostic tool, but then subsequently

Validity Analysis
developed as part of the MAT80. These scales have been developed in line
with recommendations in my (Irwing & Hughes, in press) chapter on test
development, in the forthcoming Wiley Handbook of Psychometric Testing,
and the description of this development is contained in the technical
manuals for Me2 and the MAT80. These tests have therefore been developed
in line with state-of-the-art procedures. We do not believe that there are any
comparable tests in existence. That is there are no self-rating creativity
measures which directly rate creative performance, an omission highlighted
in Harari, et al., 2016.
The importance of this is demonstrated in the findings of Harari, et al.’s (2016)
meta-analysis with regard to those creative and innovative performance
scales currently in existence. Overall, on the basis of 28 studies with N = 7660,
the population relationship of such scales with task performance ratings was
0.55. When self-ratings were employed, which is the case with the MAT80 the
level of prediction dropped to 0.50. However those rating scales which
concerned creativity alone, as is the case in the MAT80, achieved a higher
level of validity at 0.59. According to the meta-analytic data, therefore, we
can estimate the predictive validity of self-rated creativity rating scales at
0.54. However this is the maximum possible validity, whereas operational
validities will depend on the reliability of the tests employed. The composite
reliability of the six MAT80 scales is 0.97. This level of reliability matches the gold
standard of reliability attained by cognitive ability tests such as the WAIS III,
and the Woodcock-Johnson III. This level of reliability means that the
achieved operational validity of these scales, using the meta-analytic
findings, is 0.52, not far short of the theoretical maximum.

Validity Analysis
This level of validity is substantially higher than the operational validity of any
currently existent creativity rating scales, and in fact, as noted above, there is
no such scale currently available suitable for self-rating. Of course, creativity
does correlate to a small degree with both cognitive ability and personality,
so the increment in predictive validity achieved by adding a highly reliable
creativity scale is slightly smaller than implied by the predictive validity of 0.52.
Nevertheless, that the MAT80 provides a highly predictive measure of creativity
gives it a unique advantage over any screening tests currently in use.
A second crucial decision which potentially has a massive impact on the
predictive validity of a test is how the specification equation is derived. The
specification equation is based on the scales included in the test battery. In
the case of the MAT80 there are 15 or 16 scale scores to choose from
depending on whether the Business Reasoning test is scored. There are three
decisions which must be made in order to derive a specification equation.
The first decision is which scales to include in the specification equation, the
second is what weight to apply and the third is whether to use broad traits or
facets as predictors.
You might imagine that the question of which scales to include is
straightforward, surely you should include all the scales, and otherwise why
are they present in the test. However, mostly specification equations only
employ a small subset of the test’s potential predictors. The reason for this is
that specification equations are normally based on validity studies. Because
validity studies are typically quite small, they can only accurately assess the
weights for a small number of predictors.

Validity Analysis
In the MAT80 we used a quite different approach which has only really
become feasible very recently. We based the calculation of weights for each
scale included in the specification equation on the meta-analyses listed
above plus Judge, Rodell, Kliner, Simon and Crawford (2013). You will note
that the most recent of these was only published in 2016, so up to then such
an approach was not viable. The advantage of using meta-analytic data is
that the sample sizes are very large, and therefore accurate weights can be
calculated for all of the potential predictor scales. The ability in the MAT80 to
include all scales and calculate accurate weights, alone leads to an
appreciable increment in predictive validity as compared with past practice.
The third decision is whether to use broad traits or facets as predictors. For
example, in terms of the Five Factor Model (FFM) the decision is whether to
use the broad factors of openness-to-experience, conscientiousness,
extraversion, agreeableness, and emotional stability or whether to use the
facets of personality which make up these broad factors: In the case of the
FFM, six facets per broad factor.
There is a long history of debate on this issue. One extremely influential article
which argued very strongly for the use of the broad factors as predictors was
Ones and Viswesvaran (1996). It is not possible to assess the extent to which
psychometric companies followed this recommendation, but the arguments
contained in this article were very powerful. However, recently a meta-
analysis has been able to directly compare the predictive validity of facets
versus broad factors (Judge et al., 2013). The comparative validities expressed
as R2s for the outcome of overall job performance were: Conscientiousness
(facets = 6.8%, broad factor = 6.7%, Agreeableness (facets = 3.7%, broad
factor = 2.7%), Neuroticism (facets = 5.2%, broad factor = 1.0%), Openness

Validity Analysis
(facets = 9.0%, broad factor = 0.6%), and Extraversion (facets = 16.5%, broad
factor = 4.0%). While some of these differences are relatively small, in the case
of Openness, Extraversion and Neuroticism, the gain in increased predictive
validity obtained by using facets versus broad factors is staggeringly large. If
one sums these R2s to provide an approximate comparison of the validities of
facets versus broad factors, facets explain 41.2% of variance in overall job
performance and broad factors explain 15%. Of course both figures are over
estimates because personality traits are correlated, nevertheless it is
apparent, even allowing for the somewhat lower reliability of the shorter facet
scales, that there is a considerable advantage in using facets as predictors,
provided you use valid weights. For this reason the MAT80 uses facet level
prediction, which confers an advantage over any test which uses broad traits.
A final clear advantage of the MAT80 is that the development of each of its
scales followed state-of-the-art procedures as outlined in the Wiley Handbook
of Psychometric Testing (Irwing, Booth & Hughes, in press). The Handbook
advocates a ten stage model of test development as shown in Table 1. Each
stage involves numerous micro-decisions. With the possible exception of
decisions in stage ten, each of these decisions, if made correctly, adds a
small increment to test reliability and validity. As an example the item
development procedure used in the MAT80 followed the 13 step procedure
shown in Figure 1.

Validity Analysis
Table 1.1 Stages of Test Development
Stages and sub-stages
1. Construct definition, specification of test need, test structure.
2. Overall planning.
3. Item development .
a. Construct definition.
b. Item generation: theory versus sampling.
c. Item review.
d. Piloting of items.
4. Scale construction – factor analysis and Item Response Theory (IRT).
5. Reliability.
6. Validation.
7. Test scoring and norming.
8. Test specification.
9. Implementation and testing.
10. Technical Manual.

Validity Analysis
While undoubtedly some tests have followed some of the recommendations
outlined in the Handbook, current tests will not have followed all
recommendations optimally, and many tests will only have followed a few of
the recommendations. Used in concert, employment of optimal test
development procedures will again have conferred a substantial advantage
on the scales used in the MAT80, which will have contributed to the increased
level of reliability and validity evidenced by the MAT80 test.
Figure 1. Item development process used in devising the MAT80.

MAT80 - White paper july 2017 - Prof. P. Irwing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to MAT80 - White paper july 2017 - Prof. P. Irwing

Similar to MAT80 - White paper july 2017 - Prof. P. Irwing (20)

MAT80 - White paper july 2017 - Prof. P. Irwing