This document summarizes the preliminary development of scales to measure cognitive ability and personality. It describes the development and testing of subscales measuring verbal reasoning, quantitative reasoning, psychology knowledge, and spirituality. The internal consistency of most scales was acceptable, though the psychology subscale was unreliable. Further analysis is recommended to ensure the validity of the scales before implementation.
1. Developing A Measure Of Cognitive
Ability And Personality
By:
Paula Brown
Alicia Glatt
Submitted to Dr. Ashita Goswami, Ph.D.
Psy 790 - Psychometrics
SALEM STATE UNIVERSITY
9 May 2016
2. 2
Executive Summary
This report documents the preliminary development of the PAL* Cognitive and Spirituality
Scales. The scales are composed of three dichotomous cognitive subscales measuring
quantitative reasoning, verbal reasoning, and psychology knowledge, as well as a non-
dichotomous personality subscale measuring an individual’s level of spirituality.
The spirituality scale showed very high levels of internal consistency. The overall cognitive
composite, and verbal and quantitative reasoning subscales also showed acceptable measures of
internal consistency. The acceptable internal consistency measure in verbal reasoning,
quantitative, overall cognitive composite and spirituality indicate the scales were a good measure
of their respective constructs. The internal consistency for the psychology subscale fell outside
the acceptable range and is therefore not a good measure of psychology knowledge.
The spirituality scale and all facets of cognitive ability were further evaluated with item analysis.
Items which fell outside of the acceptable difficulty range and which showed poor discrimination
among high and low performers were removed; however, after doing so, the estimated internal
consistency improved only slightly for the quantitative subscale and combined cognitive test.
There were no items from the verbal subscale that qualified for removal. Due to a large number
of test questions being removed, the internal consistency of the psychology subscale decreased in
value, remaining far outside the acceptable range. Finally, various items were evaluated with
distractor analysis, and recommendations for improvement to individual questions are provided.
In conclusion, the PAL Cognitive and Spirituality Scales show promise as psychometric tools for
measuring cognitive reasoning, psychology knowledge and personality. Further analysis should
be conducted to ensure the validity of both the cognitive and spirituality scales. Implementation
of the recommendations made in the distractor analysis section should improve internal
consistency.
Introduction
As a field of study, psychometrics attempts to conceptualize human behavior and measure the
differences between individuals in terms aptitudes, personality, values, skills, intelligence and
attitudes. In addition to the development and refinement of theoretical approaches to
measurement, a major component of psychometrics is the construction of instruments and
procedures for measuring such constructs. This project provided an opportunity to apply the
theoretical components of classical test theory in the development of a series of scales measuring
cognitive ability, as well as a separate measure of personality. Through this exercise we gained
practical experience in the construction, evaluation and interpretative methods for psychological
tests.
*PAL isan acronym of the authors’names, PaulaandAlicia.
3. 3
Construct Development
The PAL cognitive scale included a 20-item measure of quantitative reasoning, a 20-item
measure of verbal reasoning, and a 24-item measure of psychology knowledge. In terms of both
format and difficulty, questions were modeled after those used by the Educational Testing
Service for assessing readiness for graduate education (i.e., the GRE general test and psychology
subject test).
Quantitative reasoning, verbal reasoning and psychology knowledge are all multifaceted
constructs; therefore, in order to ensure content validity, survey questions were designed to cover
a large spectrum of facets. Math questions included algebra, arithmetic, data interpretation and
geometry. Verbal questions included analogies, antonyms, reading comprehension and sentence
completion. Likewise, questions in the psychology knowledge section were representative of a
dozen different subject areas that included neuroscience, clinical/abnormal psychology, and
measurement and methodology. In addition to representing a wide variety of constructs,
questions were designed to span a wide range of abilities, with easier questions presented earlier
and increasing in difficulty over the course of the survey.
Unrelated to cognitive reasoning, the spirituality scale is a 20-item personality scale developed to
assess an individual’s personal spirituality. It was developed using a 5-point Likert scale for
participants to respond to statements such as, “Although lacking in material possessions, it is
possible to feel fulfilled,” and “I believe my life has a purpose.” The scale was developed in
response to the growing trend of spirituality in the workplace. Spirituality in the
workplace/spiritual leadership are growing trends reflected in the steady increase in the number
of books, publications and conferences on this topic over the past 20 years (Imel 1998). Experts
have pointed to a number of factors behind this trend, including the rise in corporate layoffs and
downsizing, the decline of traditional support networks, efforts of individuals to find personal
fulfillment on the job, the need to reconcile personal values with those of the corporation, and the
rise of innovative organizational trends, such as learning organizations and the quality movement
and corporate desires to help workers achieve more balanced lives (Fry 2016, Laabs 1995; Leigh
1997; McLaughlin 1998). Anticipating that many survey participants may not be currently
employed, researchers focused on spirituality as a general construct as opposed to spirituality in
the workplace.
4. 4
Test Development
Two researchers were involved in the creation of the survey, which consisted of a total of 100
items. A measure of self-reported performance (Appendix C) had 7 questions; the spirituality
scale (Appendix D) was composed of 20 questions; the psychology subscale (Appendix E) had
24 questions; the quantitative reasoning subscale (Appendix F) had 20 questions; and the verbal
subscale (Appendix G) had 20 questions. The remainder of the questions solicited demographic
information. Cognitive items had between 4 and 5 multiple choice options (i.e., one correct
answer and 3 or 4 distractors, respectively). Spirituality items were structured on a 5-point
Likert Scale with one point being “strongly disagree” and 5 points being “strongly agree” with
the respective statement. Since there were 20 items on the survey, the total possible points was
100. Total points were then divided by the number of items answered to obtain an average score
between 1 and 5.
Procedure and Sample
Procedure
Researchers solicited participation from among the researchers’ personal social networks of
contacts. Survey questions were entered into Survey Monkey, and the corresponding electronic
link to the survey was posted to the researchers’ Facebook pages. In addition, electronic links
were directly emailed to friends/acquaintances/coworkers who the researchers felt would be
amenable to participating. Participants were permitted to use calculators for the quantitative
section, as suggesting otherwise would be unenforceable. Likewise, no time limits were placed
on participants as enforcement was impossible to achieve.
Sample
Among the 45 respondents, ages ranged from 23 to 70 (SD = 15) and the average age was 49.
Sixty-seven percent were female, 96% were white, 42% had a Bachelor’s degree, 40% had a
Master’s degree, and 13% had a PhD or equivalent. The respondents reported grade point
averages (GPAs) ranging from 2.8 to 4.0, with a mean of 3.62. Sixteen percent of respondents
were in graduate school; the remainder were not students.
5. 5
Descriptive Statistics Of CognitiveTests
Table 1 provides descriptive statistics for the three subscales of the PAL cognitive test (verbal,
quantitative and psychology), as well as a combined total of all three subscales (combined
cognitive). Scoring of the cognitive portions of the test consisted of adding a point for each
correct answer. Therefore, the range of scores for each subscale was 0 - 20 for verbal and
quantitative reasoning, 0-24 for psychology knowledge, and 0 - 64 for the combined cognitive
score. Of the three cognitive subscales, quantitative reasoning had the lowest average mean
(mean = 10.38, SD = 4.27) while verbal reasoning had the highest average mean (mean = 13.56,
SD = 3.63).
Crohnbach’s alpha is a statistic that measures the internal consistency of a scale (i.e., how
reliable the test is) and values greater than 0.70 are considered acceptable. Notably, the
psychology subscale did not meet the predetermined threshold; therefore it was not considered
reliable. The combined cognitive score, verbal reasoning, and quantitative reasoning scales all
had very high values of internal consistency.
Table1- DescriptiveStatisticsForCognitiveSubscalesAndCombinedCognitiveScore
Number
of Items
Minimum
Score
Maximum
Score
Mean Standard
Deviation
Internal
Consistency
(α)
Verbal Reasoning 20 4 20 13.56 3.63 0.75
Psychology Knowledge 24 5 21 12.67 3.63 0.64
Quantitative Reasoning 20 2 18 10.38 4.27 0.81
Combined Cognitive 64 15 53 36.60 8.01 0.81
N=45. The combinedcognitive subtotal was computedby addingthe verbal,quantitative, andpsychology subscores. Acceptable scores of
internal consistencyare bolded.
Table 2 shows the descriptive statistics and internal consistency (α or alpha) for psychology
knowledge, quantitative reasoning, verbal reasoning and combined cognitive with the
respondents separated by gender. With the exception of psychology knowledge amongst
females, all scales had strong scores of internal consistency. Cohen’s d is a statistic that
compares the difference in the standardized mean between the two groups. A positive “d” in
Table 2 represents a stronger female performance, while a negative “d” represents a stronger
male performance. There is a statistically small effect of gender on the psychology scale, and the
moderate effect of gender on the quantitative (males scored higher) and verbal (females scored
higher) reasoning subscales have the effect of canceling each other out so that there was virtually
no gender effect on the combined cognitive scale.
6. 6
Table2- DescriptiveStatisticsForCognitiveSubscalesAndCombinedCognitiveByGender
Female
(N = 30)
Male
(N = 14)
Mean Standard
Deviation
Min. Max. α Mean Standard
Deviation
Min. Max. α Cohen’s
d
Psychology
Knowledge
12.87 3.33 5.00 21.00 0.59 12.27 4.27 5.00 20 0.75 0.16
Quantitative
Reasoning
9.600 4.05 3.00 18.00 0.79 11.93 4.40 2.00 18.00 0.83 -0.55
Verbal
Reasoning
14.17 3.52 6.00 20.00 0.74 12.33 3.64 4.00 17.00 0.75 0.51
Combined
Cognitive
36.63 6.82 21.00 49.00 0.73 36.53 10.27 15.00 53.00 0.89 0.01
N=45. The combinedcognitive subtotal was computedby addingthe verbal, quantitative, andpsychology subscores. Verbal andquantitative
subscales had20 items each; psychology knowledge had24 items. The combinedcognitive had75items. Acceptable scores of internal
consistency arebolded.
Descriptive Statistics For Spirituality, Quantitative, Verbal And Psychology
Subscales And Combined Cognitive By Ethnicity
Due to the homogeneity of respondents, the statistical software was unable to provide descriptive
statistics by race. Out of 45 test takers, only one was Asian, one was Hispanic, and the
remainder (96%) were white.
Table 3 displays descriptive analysis organized by academic degree. The psychology scale has
inconsistent reliability, with a score of only 0.29 for PhD-level respondents but an acceptable
score (α = 0.71) at the Masters-level. In addition, the verbal reasoning subscale at the Master’s
level was not acceptable. All other alpha scores are at or well above the 0.70 threshold of
reliability.
Regarding the cognitive scales, one might expect respondents with more education to perform
better on the survey, but that occurred only within the quantitative subscale. Respondents with
Bachelor’s degrees scored lowest (mean = 10.26, SD = 3.77), followed by those with Master’s
degrees (mean = 10.5; SD = 4.66), and those with PhD’s scored highest (mean = 11.33; SD =
5.47). In regards to the other cognitive subscales (verbal and psychology) and the combined
cognitive test, PhD-level respondents actually had the lowest scores of the three groups. It is
important to note that the sample size of PhD respondents was markedly lower than the other
groups and that different trends might emerge with a larger sample.
7. 7
Table3- DescriptiveStatisticsForQuantitative,Verbal AndPsychologySubscalesAnd
CombinedCognitiveByAcademicDegree
Education
Level
Facet Minimum Maximum Mean Standard
Deviation
Alpha
Bachelors
(N = 19)
Verbal
Reasoning
4.00 20.00 13.42 3.76 0.77
Quantitative
Reasoning
2.00 18.00 10.26 3.77 0.74
Psychology
Knowledge
5.00 21.00 12.53 3.72 0.68
Combined
Cognitive
15.00 46.00 36.21 7.61 0.79
Masters
(N = 18)
Verbal
Reasoning
8.00 18.00 14.00 3.12 0.67
Quantitative
Reasoning
3.00 18.00 10.50 4.66 0.85
Psychology
Knowledge
5.00 20.00 13.06 4.04 0.71
Combined
Cognitive
23.00 53.00 37.56 7.46 0.78
PhD
(N = 6)
Verbal
Reasoning
6.00 18.00 12.50 5.09 0.87
Quantitative
Reasoning
5.00 18.00 11.33 5.47 0.90
Psychology
Knowledge
10.00 15.00 12.17 2.48 0.29
Combined
Cognitive
21.00 49.00 36.00 12.59 0.93
Total N = 45. The boldedinternalconsistencyindicates an acceptable reliabilitymeasure> 0.70. Verbal andquantitative reasoningsubscales had
20 items each; psychology knowledge had24 items. The combinedcognitive, whichis calculatedby addingtogether scores of thethree
subscales, had75 items.
Correlativeand Validity AnalysisOf
CognitiveScales
A test is said to have “construct validity” if it accurately measures a theoretical, non-observable
construct or trait -- in our case an individual’s aptitude in regards to the field of psychology, as
well as verbal and quantitative reasoning. One method of establishing a test’s construct validity
is called convergent/divergent validation. A test has convergent validity if it has a high
correlation with another measure of a similar construct or a construct you would expect to
mirror. In our case, we compared the subscales and the combined cognitive test with self-
reported GPAs. By contrast, a test’s divergent validity is demonstrated through a low correlation
with a test that we would expect to measure a different construct. In our case, we compared the
8. 8
subscales and the combined cognitive test with individuals’ scores on the spirituality
scale. Evidence of high divergent validity is established by a low correlation coefficient.
Table 4 shows the correlation coefficients of each of the subscales, the combined cognitive test,
GPAs and scores on the spirituality scale. To establish convergent validity, we are looking for a
high correlation between the cognitive scales and GPA. Unfortunately, the data do not support
this. The correlation between verbal reasoning and GPA (r = -0.01) is almost non-
existent. Looking at cognitive scores, GPA correlates most highly with the psychology subscale
(r = 0.12), but the effect size is still considered to be small. As far at the test’s divergent validity,
we are looking for a low correlation between the cognitive scales and the spirituality
scale. While we do observe a very low correlation between spirituality and the combined
cognitive score (r = 0.04), the correlations between each of the subscales and the spirituality
scale are higher than their correlations with GPA.
Table4- CorrelationsBetweenTheSpiritualityScale,TheTotal CognitiveTest,All Three
SubscalesAndCollegeG.P.A.
1 2 3 4 5
1 -
Spirituality
---
2 - Verbal
Reasoning
0.15 ---
3 -
Quantitative
Reasoning
-0.26 0.28 ---
4 -
Psychology
Knowledge
0.24 0.28 0.12 ---
5 -
Combined
Cognitive
0.04 .729 .711 .643 ---
6 - GPA 0.16 -0.01 -0.07 0.12 0.02
N=45. Boldedcorrelations are significant at the 0.001level (2-tailed).
Table 5 examines the correlations between the cognitive subscales and the transformed scores for
each subscale. Because the scores have undergone a linear transformation, each of the three
subscales correlates perfectly with the subscale’s Z-score, IQ score and T-score. This occurs
because the values did not change in relation to each other, i.e., the transformation was
standardized.
10. 10
Item AnalysisOf CognitiveTest And Its
Facets
Despite the cognitive subscales’ promising scores of internal consistency, it nevertheless is
important to perform an item analysis to ensure the quality of each item. Two statistics that help
us discern this are p-values and CITC. P-values range from 0 - 1 and represent the difficulty of a
test item: scores < .30 are considered too hard (i.e., less than 30% of respondents answered
correctly) and scores > .80 are considered too easy (i.e., less than 20% answered incorrectly).
CITC stands for “corrected item total correlation” and is a measure of how well an item
discriminates between respondents who are knowledgeable in the content area and those who are
not. Scores can range from -1 to 1, and test items with a CITC < .20 are considered poor
differentiators and should be considered for removal.
Table6- DifficultyandDiscriminationofQuantitativeReasoning,Verbal ReasoningAnd
PsychologyKnowledge
Math Verbal Psychology
Item
Number
Difficulty
(p)
CITC Difficulty
(p)
CITC Difficulty
(p)
CITC
1 0.53 0.27 0.49 -0.01 0.89 0.17
2 0.82 0.38 0.47 0.47 0.38 0.35
3 0.42 0.39 0.91 0.25 0.27 0.05
4 0.29 0.37 0.44 0.29 0.20 0.04
5 0.71 0.52 0.96 0.27 0.38 0.08
6 0.40 0.27 0.78 0.53 0.71 -0.01
7 0.44 0.36 0.67 0.35 0.87 0.11
8 0.38 0.37 0.91 0.31 0.42 0.18
9 0.24 0.14 0.76 -0.17 0.22 0.22
10 0.60 -0.04 0.69 0.31 0.73 0.26
11 0.69 0.36 0.78 0.44 0.69 0.38
12 0.38 0.42 0.87 0.51 0.56 0.20
13 0.60 0.31 0.73 0.16 0.62 0.09
14 0.07 -0.21 0.73 0.27 0.87 0.05
15 0.38 0.17 0.36 0.10 0.64 0.10
16 0.84 0.20 0.51 0.30 0.64 -0.04
11. 11
17 0.69 0.38 0.31 0.03 0.53 0.38
18 0.87 0.04 0.78 0.23 0.42 0.09
19 0.44 0.44 0.71 0.32 0.29 0.14
20 0.58 0.33 0.71 0.36 0.60 0.23
N = 45. Difficulty (p) refers tothe percentage of correct responses. Items that are too easy(p > .80)or toohard(p < .30) arebolded. CITC
refers to CorrectedItemTotal Correlations, anditems that have a CITC < .20 are problematicand are bolded. Blue-shadedtest items indicate
that both statistics are outside the recommendedrange.
Each item from the cognitive scales is represented in Table 6, along with the corresponding p-
values and CITC scores. Math question 14 holds the highest difficulty (p = .07), and verbal
question 5 was the least difficult (p = .96), and both were amongst those bolded for being outside
the threshold values. Math Item 14 also scored -.21 in correct item total correlation, meaning
the participants who performed poorly on the entire test had a higher rate of success on this
question. Conversely, Item 2 on the verbal subscale has favorable values: with a CITC of .47,
we are assured it a good discriminator, and a p-value of .47 tells us that nearly as many
individuals in our sample answered the item correctly as those who answered incorrectly.
The bolded values in Table 6 represent those that failed to meet one of the thresholds described
above. Blue-shaded test items indicate that both statistics are outside the recommended range,
and those questions will undergo further scrutiny through a “distractor analysis” and possibly be
removed from the survey. The verbal reasoning section of the test did not have any flagged
questions, while the psychology subscale had the greatest number of flagged questions (6).
Quality test items will correlate with their own composite scores to a greater degree than they do
with the scores of another subscale. Failure to meet this qualification is an indication that the
question may be a better predictor of a construct other than the one it is meant to measure.
Tables 7a, b, and c represent correlations between each test question and the subscale composite
scores.
Table7a- CorrelationsBetweenPsychologyItemsAnd CompositeScores
Psychology Item
Number
Psychology
(CITCs)
Verbal Quantitative
1 0.17 .17 -.07
2 0.35 .24 .17
3 0.05 .14 -.27
4 0.04 -.09 -.19
5 0.08 -.02 -.16
6 -0.01 -.09 .02
7 0.11 .19 -.14
8 0.18 .11 .06
12. 12
9 0.22 .11 -.16
10 0.26 .26 .03
11 0.38 .31 .12
12 0.20 .00 .25
13 0.09 -.02 .04
14 0.05 -.05 .13
15 0.10 .15 -.02
16 -0.04 -.17 .15
17 0.38 .29 .34
18 0.09 -.02 .17
19 0.14 .11 .04
20 0.23 .24 .11
N=45. Boldeditems represent the highest correlationforeach item. Blue -shadeditems are problematic because theyrepresent items that
correlatemore highlywithsubscales otherthanthe psychology subscale.
Table 7a indicates psychology items 3, 6, 7, 12, 14, 15, 16, 18, and 20 are problematic because
they correlate with the quantitative or verbal composite at a higher rate than to its own
composite. All other items correlate most strongly with the psychology composite.
Table 7b indicates verbal items 1, 5, 8, 9, 15, and 18 are problematic because they correlate with
the quantitative or psychology composite at a higher rate than to its own composite. All other
items correlate most strongly with the verbal composite.
Table7b- CorrelationsBetween VerbalReasoning ItemsAndCompositeScores
Verbal Item Number Verbal Reasoning
(CITCs)
Psychology Quantitative
1 -0.01 -0.35 0.28
2 0.47 0.28 0.29
3 0.25 0.19 0.04
4 0.29 0.04 0.03
5 0.27 0.30 0.13
6 0.53 0.30 0.32
7 0.35 0.32 0.07
8 0.31 0.32 -0.03
9 -0.17 -0.12 -0.24
10 0.31 0.13 0.10
11 0.44 0.16 0.34
12 0.51 0.19 0.38
13 0.16 0.01 0.15
13. 13
14 0.27 0.15 0.00
15 0.10 0.27 -0.23
16 0.30 0.09 0.23
17 0.03 0.00 -0.07
18 0.23 0.02 0.26
19 0.32 0.12 0.17
20 0.36 0.20 0.17
N=45. Boldeditems represent the highest correlationforeach item. Blue -shadeditems are problematic because theyrepresent items that
correlatemore highlywithsubscales otherthanthe verbal subscale.
Table 7c indicates quantitative items 9, 10, and 14 are problematic because they correlate with
the verbal or psychology composite at a higher rate than to its own composite. All other items
correlate most strongly with the quantitative composite.
14. 14
Table7c- CorrelationsBetweenQuantitative ReasoningItemsAndCompositeScores
Quantitative Item
Number
Quantitative
Reasoning (CITCs)
Verbal Psychology
1 0.27 0.17 -0.06
2 0.38 0.28 0.07
3 0.39 0.17 -0.01
4 0.37 0.26 0.11
5 0.52 0.37 0.25
6 0.27 0.14 -0.03
7 0.36 0.19 0.12
8 0.37 0.05 0.05
9 0.14 0.19 -0.08
10 -0.04 0.01 0.08
11 0.36 0.05 0.08
12 0.42 0.25 0.11
13 0.31 -0.01 0.06
14 -0.21 -0.29 0.17
15 0.17 -0.15 0.11
16 0.20 0.15 0.03
17 0.38 0.18 -0.01
18 0.04 0.01 -0.20
19 0.44 0.26 0.18
20 0.33 0.16 0.02
N=45. Boldeditems represent the highest correlationforeach item. Blue -shadeditems are problematic because theyrepresent items that
correlatemore highlywithsubscales otherthanthe verbal subscale.
Summary ofProblematic Cognitive Test Items
A summary of the problematic items, including why the item was considered for removal or
alteration is presented in Table 8. Test items that fell out of both the acceptable difficulty range
(.30 < p < .80) and discrimination range (CITC < .20) were automatically deleted. Falling into
this category were items 1, 3, 4, 7, 14, and 17 from the psychology subscale and items 9, 10, 14
and 18 from the math subscale. None of the items in the verbal subscale failed both criteria.
Items that met only one of those criteria and/or displayed a higher correlation with a subscale
other than its own were also considered for distractor analysis, the next step in evaluating item
integrity. Falling into this category were items 2, 15, and 16 from the math subscale, items 6, 15,
and 16 from the psychology subscale, and items 3, 5, and 8 from the verbal subscale.
15. 15
Table8– SummaryTableofProblematicCognitiveTestItems
Test Item p-value CITC Reasons For Concern
Was Item
Removed
From Survey?
Psychology 1 .89 .17 p, CITC Yes
Psychology 3 .27 .05 p, high correlation with verbal Yes
Psychology 4 .20 .04 p and CITC Yes
Psychology 5 .38 .08 CITC No
Psychology 6 .71 -.01 CITC, high correlation with math No
Psychology 7 .87 .11 p, CITC, high correlation with verbal Yes
Psychology 8 .42 .18 CITC No
Psychology 9 .22 .22 p No
Psychology 12 .56 .20 High correlation with math No
Psychology 13 .62 .09 CITC No
Psychology 14 .87 .05 p, CITC, high correlation with math Yes
Psychology 15 .64 .10 CITC, high correlation with verbal No
Psychology 16 .64 -.04 CITC, high correlation with math No
Psychology 18 .42 .09 CITC, high correlation with math No
Psychology 19 .29 .14 p, CITC Yes
Psychology 20 .60 .23 High correlation with verbal No
Math 2 .82 .38 p No
Math 4 .29 .37 p No
Math 9 .24 .14 p, CITC, high correlation with verbal Yes
Math 10 .60 -.04 CITC, high correlation with psychology No
Math 14 .07 -.21 p, CITC, high correlation with
psychology
Yes
Math 15 .38 .17 CITC No
Math 16 .84 .20 p No
Math 18 .87 .04 p, CITC Yes
Verbal 1 .49 -.01 CITC, high correlation with math No
Verbal 3 .91 .25 p No
Verbal 5 .96 .27 p, high correlation with psychology No
Verbal 8 .91 .32 p, high correlation with psychology No
Verbal 9 .76 -.17 CITC, high correlation with psychology No
Verbal 12 .87 .51 p No
Verbal 13 .73 .16 CITC No
Verbal 15 .36 .10 CITC, high correlation with psychology No
Verbal 17 .3.781 .03 CITC No
Verbal 18 .23 High correlation with math No
The psychologysubscale is composedof 24items; verbal andquantitative subscales each have 20items.
16. 16
Distractor AnalysisOf CognitiveScale
As required by the assignment, distractor analyses were conducted on three problematic items
from each subscale to determine if the item should be removed or altered. In a distractor analysis,
the average composite score of individuals who select a particular response for an item is
calculated to determine two properties of a good item:
1. An equal distribution of participants selecting incorrect responses (distractors).
2. The mean score of the overall subscale should be higher for the correct response,
indicating that the question correctly discriminates between high and low performers
on that particular subscale.
Psychology Knowledge Item Analysis
Psychology knowledge items 1, 3 and 4 were elected for analysis and removal.
Psychology Item 1
The distractor analysis of psychology Item 1 is represented in Table 9a. Table 8 indicates the
item had a low difficulty level with a p value of .89 and also a low CITC score of .17. The low
CITC score represents that those with a strong performance on the psychology knowledge test
are not statistically more successful at this item.
Table9a- PsychologyScoreOfRespondentsSelectingEachOptionForItem 1
Response
Choice
Psychology
Reasoning
Score
N %
A 5 1 2.22%
B 9.5 2 4.44%
C 11 1 2.22%
D 9 1 2.22%
E 13.15 40 88.89%
N=45. Correct response is E.
Table 9a indicates a very high percentage of participants selected the correct answer E (88.89%).
The combination of information in Table 8 and Table 9a display a valid reason to eliminate and
replace Item 1. The replacement should be moderately easy due to its placement in the test but
more difficult than the original question.
17. 17
Psychology Item 3
The distractor analysis of Item 3 is represented in Table 9b. Table 8 indicates the item had a high
difficulty with a p value of .27 and also had a low CITC score of .05. The low CITC score
indicates that participants who performed well on the psychology knowledge test did not
statistically perform well on this item.
Table9b- PsychologyScoreOfRespondentsSelectingEachOptionForItem 3
Response
Choice
Psychology
Reasoning
Score
N %
A 9.88 8 17.78%
B 115.42 12 26.67%
C 10.86 7 15.56%
D 11.88 8 17.78%
E 13.50 10 22.22%
N=45. Correct response is B.
As Table 9b demonstrates, the highest percentage of participants selected the correct answer B
(26.67%), which is ideal. Furthermore, the distribution of incorrect answers amongst the
distractors is relatively evenly disbursed, which is also ideal. But because the CITC score is so
low, it warrants removal and replacement with an item that is better at discriminating between
high and low performers.
Psychology Item 4
The distractor analysis of Item 4 is represented in Table 9c. The above Table 8 indicates the item
had a high difficulty with a p value of .20 and also had a negative CITC score of .04. The low
CITC score indicates that participants who performed well on the psychology knowledge test did
not statistically perform well on this item.
Table9c- PsychologyScoreOfRespondentsSelectingEachOptionForItem 4
N=45 Correct response is D.
Response
Choice
Psychology
Reasoning
Score
N %
A 11 7 15.16%
B 9.75 8 17.78%
C 11.88 8 17.78%
D 16.56 9 20%
E 13.15 13 28.89%
18. 18
Table 9c displays a lower percentage of participants selecting the correct answer D (20%) and a
large percentage of participants selecting answer E (28.89%). The highest percentage should be
the correct answer and the responses should have a more even distribution.
The combination of information in Table 9c and Table 8 display a valid reason to eliminate and
replace Item 4. The replacement should be of moderate difficulty due to its placement in the test.
Verbal Reasoning Item Analysis
Verbal reasoning items 5, 8 and 9 were elected for analysis.
Verbal Item 5
The distractor analysis of Item 5 is represented in Table 9d. Table 8 indicates the item had a low
difficulty with a p value of .96 and also had an acceptable CITC score of .27. The acceptable
CITC score indicates that those who did well on this item also did well on the rest of the verbal
reasoning section.
Table9d– Verbal ReasoningScoreOfRespondentsSelectingEachOptionForItem 5
Response
Choice
Verbal
Reasoning
Score
N %
A 10.5 2 4.44%
C 13.70 43 95.56%
N=45. Correct response is C.
Table 9d displays a very high percentage of participants selecting the correct answer of C
(95.56%). No one answered B, D or E and only two participants answered both A. The
responses should have a much better distribution.
The combination of information in Table 10d and Table 9 provide a valid reason to revise Item 5.
The question is very easy; therefore, the revision should include more appealing distractors. Item
5 has been rewritten:
5. Love and beauty are best described as __________ concepts of the mind.
A) Physical
B) Concrete
C) Psychology
D) Factual
19. 19
Verbal Item 8
The distractor analysis of Item 8 is represented in Table 9e. Table 8 indicates the item had a low
difficulty with a p value of .91 and an acceptable CITC score of .32. The acceptable CITC score
indicates that those who did well on this item also did well on the rest of the verbal reasoning
section.
Table9e- Verbal ReasoningScoreOfRespondentsSelectingEachOptionForItem 8
Response
Choice
Verbal
Reasoning
Score
N %
A 9.33 3 6.67%
C 14 41 91.11%
D 8 1 22.22%
N=45. Correct response is C.
Table 9e displays a very high percentage of participants selecting the correct answer of C
(91.11%). None of the participants answered B or E and only one participant answered D. The
responses should have a much better distribution.
The combination of information in Table 9e and Table 8 display a valid reason to revise Item 8.
Item 8 below has been rewritten:
8. It is our view that as a result of intercession from foreign stakeholders our company’s financial
difficulties have lamentably been ___________.
A) Curtailed
B) Ameliorated
C) Exacerbated
D) Mitigated
Verbal Item 9
The distractor analysis of Item 9 is represented in Table 9f. Table 8 indicates the item had an
acceptable p value of .76 but a negative CITC score of -.17. The negative CITC score represents
that those with a strong performance on the verbal reasoning test are not statistically more
successful at this item.
20. 20
Table9f- Verbal ReasoningScoreOfRespondentsSelectingEachOptionForItem 9
Response
Choice
Verbal
Reasoning
Score
N %
A 14 4 8.89%
B 14.67 3 6.67%
C 11.75 4 8.89%
D 13.62 34 75.56%
N=45. Correct response is D.
Table 9f displays a high percentage of participants selecting the correct answer D (75.56%). The
highest percentage should be the correct answer but the responses should have a much better
distribution. The mean of the correct response (13.63) is lower than response A (14) and B
(14.67). This indicates the question does not discriminate between high and low performers on
the verbal reasoning subscale which is why the CITC score is low (-.17). The item may be too
easy and high performers are over thinking the answer.
The combination of information in Table 9f and Table 8 display a valid reason to rewrite Item 9.
Item 9 has been rewritten below to make the other options more appealing.
9. Finish the analogy:
Grain : Rice ::
A. Cube : Ice
B. Rain : Precipitation
C. Cow : Milk
D. Flake : Snow
Quantitative Reasoning Item Analysis
Quantitative reasoning items 9, 14 and 18 were elected for analysis and removal.
Quantitative Item 9
The distractor analysis of Item 9 is represented in Table 9g. Table 8 indicates the item had high
difficulty with a p value of .24 and a CITC score of .14. The CITC score indicates that those with
a strong performance on the quantitative test are not statistically more successful at this item.
21. 21
Table9g- Quantitative ReasoningScoreOfRespondentsSelectingEachOptionForItem 9
Response
Choice
Quantitative
Score
N %
A 7.75 4 9.52%
B 7.86 7 16.67%
C 12.09 11 26.19%
D 10.77 13 30.95%
E 12.29 7 16.67%
N=42. Correct response is C.
As Table 9g displays, a higher percentage of participants selected the answer D (30.95%) rather
than the correct answer C (26.19%). The correct answer should have the highest percent of
respondents. The mean of the correct response (12.09) is slightly lower than that of response E
(12.29). This indicates the question does not discriminate between high and low performers on
the quantitative reasoning subscale, which is why the CITC score is low (.14). The item may be
too easy and high performers are over thinking the answer.
The combination of information in Table 8 and Table 9g display a valid reason to eliminate and
replace Item 9.
Quantitative Item 14
The distractor analysis of Item 14 is represented in Table 9h. Table 8 indicates the item had a
high difficulty with a p value of .07 and also had a negative CITC score of -.21. The negative
CITC score represents that those with a strong performance on the quantitative test are not
statistically more successful at this item.
Table9h- Quantitative ReasoningScoreOfRespondentsSelectingEachOptionForItem 14
Response
Choice
Quantitative
Score
N %
A 3 1 2.39%
B 2 1 2.39%
C 7.75 4 9.52%
D 7 3 7.14%
E 11.85 33 78.57%
N=42. Correct response is D
Table 9h displays a low percentage of participants selecting the correct answer D (7.14%) and a
very large percentage of participants selecting answer E (78.57%). The highest percentage
should be the correct answer. The distribution is also not equal across the incorrect choices. The
mean of the correct answer (7) should be higher than the other responses. The lower mean
22. 22
indicates the question does not discriminate between high and low performers on the quantitative
subscale.
The combination of information in Table 8 and Table 9h display a valid reason to eliminate or
replace Item 14. The replacement question should be of moderate difficulty because of the
placement of the item.
Quantitative Item 18
The distractor analysis of Item 18 is represented in Table 9i. Table 8 indicates the item had a
low difficulty with a p value of .87 and also had a low CITC score of .04. The CITC score
indicates that those with a strong performance on the quantitative test are not statistically more
successful at this item.
Table9i - Quantitative ReasoningScoreOfRespondentsSelectingEachOptionForItem 18
Response
Choice
Quantitative
Score
N %
A 10.85 39 86.67%
B 3 1 2.22%
C 12 2 4.44%
D 5.5 2 4.44%
E 6 1 2.22%
N=45 responseis A.
Table 9i displays a high percentage of participants selecting the correct answer A (86.67%). The
highest percentage should be the correct answer but the distribution should be more evenly
distributed across the incorrect choices. The mean of the correct answer (10.85) should be
higher than the mean of answer C (12). The lower mean indicates the question does not
discriminate between high and low performers on the quantitative subscale.
Revised CognitiveScale
As a result of our item and distractor analyses, the following items were eliminated: Psychology
items 1, 3, 4, 7, 14 and 19 and Quantitative items 9, 14, 18. No verbal reasoning items were
removed. The combined score is a combination of all three subscales so the number of items
decreased by 9. By removing the problematic items, the scale’s reliability potentially will be
altered; therefore, another analysis is required. The results shown in Table 10 indicate that the
removal of the problematic items did not result in changes in reliability for the psychology scale.
Quantitative reasoning and the combined cognitive have improved slightly higher than their
already acceptable level, meaning the internal consistency has improved these two measures.
23. 23
Table10- DescriptiveStatisticsForCognitiveScaleCorrectedTotals
Number of
Items
Minimum
Score
Maximum
Score
Mean Standard
Deviation
Internal
Consistency
(α)
Verbal
Reasoning
Old 20 4 20 13.56 3.63 .75
New 20 4 20 13.56 3.63 .75
Quantitative
Reasoning
Old 20 2 18 10.38 4.27 .81
New 17 1 17 9.20 4.15 .82
Psychology
Knowledge
Old 24 5 21 12.67 3.63 .64
New 18 3 16 9.29 2.94 .54
Combined
Cognitive
Old 64 15 53 36.60 8.01 .81
New 55 10 47 32.04 7.70 .82
N = 45. Newacceptable internal consistency (> 0.70) arebolded. Newitems indicate the problem items havebeen removed. Psychologyitems 1,
3, 4, 7, 14, 19 andquantitative items 9, 14,18.The newcombinedcognitive subtotal was computedby addingthe correctedverbal, psychology
andmath subscales andcorrectingfor guessing.
Descriptive Statistics For Spirituality
Scale
Scoring of the spirituality portion of the test consisted of adding 1 point for each statement to
which individuals responded with “strongly disagree”, 2 points for “slightly disagree”, 3 points
for “neither agree nor disagree”, 4 points for “slightly agree”, and 5 points for “strongly agree.”
The total was then divided by the number of questions on the scale to obtain an individual
average. Therefore an individual’s score for the spirituality scale could range from 1 - 5, with
higher scores indicating higher degrees of spirituality.
Table 11 provides descriptive statistics for the overall spirituality scale, as well as divided by
gender. The scores obtained by our sample ranged from 1.95 to 4.9, with a mean score of 3.60
and a standard deviation of .57. Women reported higher levels of spirituality (Mean = 3.79; SD
= .49) than men (Mean = 3.23; SD .55). Cohen’s d is a statistic that compares the difference in
the standardized mean between two groups. A positive “d” in Table11 represents a stronger
female performance, while a negative “d” represents a stronger male performance. There is a
statistically large effect of gender on the spirituality scale.
24. 24
Table11- DescriptiveStatisticsForSpiritualityScaleByGender
Overall
(N = 45)
Female
(N = 30)
Male
(N = 15)
Mean 3.60 3.79 3.23
Standard Deviation .57 .49 .55
Minimum Score 1.95 2.85 1.95
Maximum Score 4.90 4.90 3.95
Internal Consistency .90 .86 .91
Cohen’s d 1.08 (large effect)
Total N = 45. The boldedinternal consistency indicates an acceptable reliability measure > .70.The spirituality scale was composedof 20 items.
Crohnbach’s alpha is a statistic that measures the internal consistency of a scale (i.e., how
reliable the test is) and values greater than .70 are considered to be acceptable. At .90 overall,
the spirituality scale has a remarkable score of internal consistency. Even though the reliability
of the spirituality scale for men (α = .91) is higher than for women (α = .86), the female score is
still well above the .70 threshold of acceptability.
Table 12 displays descriptive analysis organized by academic degree. The spirituality scale has
impressive scores of internal consistency at all academic levels. There were no discernable
trends related to scores on the spirituality scale and academic degree of the respondents.
Table12- DescriptiveStatisticsForSpiritualityScaleByAcademicDegree
Education Level Minimum
Score
Maximum
Score
Mean Standard
Deviation
Alpha
(α)
Bachelors (N = 19) 1.95 4.90 3.63 0.71 .93
Masters (N = 18) 2.63 4.25 3.50 0.42 .80
PhD or equivalent (N = 6) 3.20 4.35 3.66 0.52 .88
Total N = 45. The boldedinternalconsistencyindicates an acceptable reliabilitymeasure> .70. Spiritualityscale had20 items.
Descriptive Statistics For Spirituality Scale By Ethnicity
Due to the homogeneity of respondents the statistical software was unable to provide descriptive
statistics by race. Out of 45 test takers, only one was Asian, one was Hispanic, and the
remainder (96%) were white.
Validity And CorrelationAnalysis For Spirituality Scale
The spirituality score was correlated with the combined cognitive test and all three of its
subscales and is represented in Table 13. There is a negative correlation between spirituality and
quantitative reasoning (r = -0.26), which represents a small to medium effect. Conversely, there
25. 25
is a positive correlation between spirituality and the remainder of the scales. There is a small
correlation between spirituality and verbal reasoning (r = 0.15), a small to medium correlation
between spirituality and psychology knowledge (r = 0.24), and a very small correlation between
spirituality and the combined cognitive scale (r = 0.04).
Table13- CorrelationsBetweenTheSpiritualityScale,TheTotal CognitiveTestAndAll
ThreeSubscales
Verbal Reasoning Quantitative
Reasoning
Psychology
Knowledge
Combined
Cognitive
Spirituality 0.15 -0.26 0.24 0.04
Nature Of Effect
Size
Small Small/Medium Small/Medium VerySmall
N = 45. None ofthe correlations was statistically significant.
Overall Analysis Of The Spirituality Scale
To test whether the spirituality scale was truly measuring spirituality, reliability analysis was
conducted on the spirituality scale. Again the acceptable value of internal consistency is above
.70 which indicates that the scale is measuring what it intends to measure. Reliability analysis of
the spirituality scale resulted in internal consistency of α=.90. This indicates that the spirituality
scale measured what it intended to measure. In other words, it is a good measure of the
spirituality construct. Even though it had a significant internal consistency further analyses were
conducted at the item level of the spirituality scale as specified by the class assignent.
Item Analysis Of The Spirituality Scale
Table 14 shows that all of the items in the spirituality scale had acceptable CITC scores, meaning
the respondents who scored high on an item were likely to score highly on the spirituality survey
as a whole. Item 1 (CITC = .22) and Item 4 (CITC = .21) had the lowest CITC scores. No items
fell below the CITC threshold of .20, so no items had respondents who scored highly on the item
score but low on the spirituality subscale.
26. 26
Table14- DescriptiveStatisticsAndDiscriminationIndex(CITC)ForTheSpiritualityScale
Item Number Mean Score Standard Deviation CITC
1 4.26 0.54 0.22
2 4.36 0.53 0.36
3 4.17 0.73 0.47
4 3.88 0.77 0.21
5 4.31 0.78 0.57
6 3.29 0.86 0.65
7 3.74 0.83 0.54
8 4.12 1.09 0.61
9 3.33 1.30 0.79
10 3.14 1.05 0.37
11 3.81 1.07 0.68
12 3.21 1.09 0.84
13 4.02 0.78 0.32
14 3.62 1.04 0.61
15 4.07 0.84 0.60
16 3.88 0.71 0.34
17 2.31 1.16 0.35
18 2.45 1.27 0.71
19 3.60 1.13 0.23
20 2.60 1.29 0.76
N=45. BoldedCITCs indicate items with the lowest (but still acceptable) CITC scores.
To satisfy the requirements of the assignment, further analysis was conducted to determine the
relationship between the individual items on the spirituality scale and the other facets of the PAL
survey. A good item correlates highly with its own facet more than any other facet. If an item
correlates more highly with another facet, then it is a poor item. Table 15 presents the results of
this analysis. Only Items 1 and 4 are correlating with facets other than its own.
27. 27
Table15- CorrelationsBetweenSpiritualityAndCognitiveScores
Item Number Spirituality CITC Verbal Quantitative Psychology
1 0.22 0.34 0.13 0.05
2 0.36 -0.05 -0.16 -0.13
3 0.47 0.00 -0.09 0.03
4 0.21 0.28 0.04 0.26
5 0.57 0.13 -0.05 0.04
6 0.65 -0.04 -0.11 0.27
7 0.54 0.22 -0.11 0.15
8 0.61 0.09 -0.05 0.17
9 0.79 -0.04 -0.34 0.08
10 0.37 -0.12 -0.17 0.14
11 0.68 0.18 -0.32 0.18
12 0.84 0.29 -0.14 0.20
13 0.32 0.03 -0.25 0.11
14 0.61 0.11 -0.20 0.18
15 0.60 0.23 -0.19 0.04
16 0.34 0.02 -0.27 0.16
17 0.35 -0.09 -0.15 0.22
18 0.71 0.80 -0.20 0.16
19 0.23 0.02 -0.09 0.03
20 0.76 0.26 -0.20 0.29
N=45. CITC refers to CorrectedItem Total Correlations. Boldedcorrelations represent the highest correlations forthe itemamongthe subscales.
Distractor Analysis OfSpirituality Scale Items
The spirituality scale is different from the other subscales in that the answers are not right or
wrong. Items 1 and 4 have acceptable CITC scores but they are in the lower range of acceptable
scores, indicating that those who scored highly on the items scored lower on the rest of the
spirituality scale. Although these items are within the acceptable range, they are the weakest of
the 20 questions. Tables 16a and 16b display distractor analysis for the two items. None of the
questions in the spirituality scale will be removed.
Table16a- SpiritualityScoreOfRespondentsSelectingEachOptionForItem1
Response Choice Spirituality Score N %
Slightly disagree (3) 3.78 2 4.44%
Slightly agree (4) 3.50 29 64.44%
Agree (5) 3.80 14 31.11%
N=45. Thespiritualityscale included20 questions.
28. 28
Table16b- SpiritualityScoreOfRespondentsSelectingEachOptionForItem4
Response Choice Spirituality Score N %
Disagree (2) 3.21 4 8.89%
Slightly Disagree (3) 3.31 4 8.89%
Slightly Agree (4) 3.64 31 68.89%
Agree (5) 3.87 6 13.33%
N=45. Thespiritualityscale included20 questions.
Biases Of Spirituality Scale Items
Additional investigation into individual responses was taken by looking for response biases of
participants completing the spirituality personality measure. A response bias can occur on an
attitude measure when a person systematically chooses responses based on something besides
their true stance on a question. The tables below show three separate participants flagged for
three types of response biases. These respondents were reviewed and ultimately removed.
Central Tendency
The type of error where a respondent tends to choose the answers closest to the middle is central
tendency. In these cases respondents are reluctant to choose strongly agree or strongly disagree
which are the extreme positive and extreme negative choices. Case 45 is an example of central
tendency in the spirituality respondents. Table 17a displays the respondents answers clustered in
the middle of the rating continuum. All twenty answers fell within the three central responses,
and a majority of 10 responses being ‘neither’. The participant may not have taken the time to
read the entire question properly; therefore, this case will be deleted.
Table17a- SpiritualityResponseOfRespondent45:Central TendencyError
Frequency Percent
Disagree 3 15%
Neither Agree Nor Disagree 10 50%
Agree 7 35%
29. 29
Severity Bias
By definition, severity bias is an error that occurs as the result of a rater’s tendency to be overly
critical; however, that definition falsely suggests that there is a preferred way to answer the
questions on the spirituality scale. In this study, we arbitrarily assigned severity bias to the cases
in which most of an individual’s responses were at the negative end of the Likert scale (i.e,
disagreeing to some degree with almost all of the items on the survey.) Case 9 displayed in
Table 17b has a severity bias frequency distribution. Case 9 will be removed.
Table17b- SpiritualityResponseOfRespondent9:SeverityBiasError
Frequency Percent
Strongly Disagree 8 40%
Disagree 8 40%
Neither Agree Nor Disagree 1 5%
Agree 3 15%
Leniency Error
By definition, leniency error occurs as the result of a rater’s tendency to be too forgiving and
insufficiently critical; however, that definition falsely suggests that there is a preferred way to
answer the questions on the spirituality scale. In this study, we arbitrarily assigned leniency
error to the cases in which most of an individual’s responses were at the positive end of the
Likert scale (i.e, agreeing to some degree with almost all of the items on the survey.) Case 14
has a strong leniency error seen in Table 17c. The respondent had a vast majority of the
responses under the same reply; therefore, the case will be removed.
Table17c- SpiritualityResponseOfRespondent14:LeniencyError
Frequency Percent
Agree 2 10%
Strongly Agree 18 90%
30. 30
Revised Spirituality Scale
As a result of our item and distractor analyses, the following respondent cases were eliminated:
9, 14 and 45. All questions met the requirements to be included in the final results, no questions
were eliminated. The statistical results of eliminating the three cases are delineated in Table 18.
The new estimate of internal consistency decreased from 0.90 to 0.85; however, the corrected
spirituality scale remains a highly reliable instrument.
Table18- DescriptiveStatisticsFor SpiritualityScaleCorrectedTotals
Original Corrected
Minimum Score 1.95 2.63
Maximum Score 4.90 4.55
Mean 3.60 3.62
Standard Deviation 0.57 0.49
Internal Consistency (α) 0.90 0.85
Original N = 45, Corrected N = 42. Newacceptable internal consistencies (> 0.70) are bolded. “Corrected” indicates the problematic cases (9, 14
and45) have been removed.
31. 31
Correctionfor Guessing On Cognitive Test
In order to obtain a better representation of a participant's true score a correction for guessing
was performed on the combined cognitive scores. The spirituality measure was corrected by
removing respondents with severe response bias. Table 18 shows that after correcting for
guessing, the combined cognitive score correlation with GPA decreased. The spirituality
correlation with GPA also decreased from .16 to .11. The correction for guessing made a
difference in the criterion validity for the spirituality scale.
Variance Explained
Table 19 indicates that the corrected combined cognitive test accounted for none of the variance
explained by GPA. Likewise, the uncorrected combined cognitive test accounted for none of the
variance explained by GPA. Additionally, the spirituality scale accounted for 2.46% of the
variance explained by GPA and the corrected spirituality scale accounted for 1.10% of the
variance explained by GPA. The nature of the effect size in spirituality is considered to be small.
The amount of variance explained in GPA is not affected by the correction for guessing in the
cognitive test.
Table19- VarianceinGPAexplainedeffectsizeand GPAcorrelationofcorrecteditems.
Original
Combined
Cognitive
Partially-
Corrected
Combined
Cognitive
Corrected
Combined
Cognitive
Original
Spirituality
Corrected
Spirituality
GPA
Correlation(r)
0.02 .00 .00 .16 .11
GPAExplained
Variance
(r2
* 100)
0.00% 0.00% 0.00% 2.46% 1.10%
Nature of
EffectSize
No Effect No Effect No Effect Small Effect Small Effect
Proportion ofthe variance explained: .01=small effect, .09=medium effect,.25=large effect
Correctionfor Attenuation
The relationship between two variables can be weakened by measurement error. The correlation
between two items is more accurately reflected if a correction for attenuation has been
performed. The correlations of the three subscales, spirituality and combined cognitive score
with the self-reported measure are reported in Table 20. The correction improved the correlation
coefficient of all measures but quantitative which had almost no original correlation. The
psychology subscale had the largest increase in effect size from .29 to .39.
32. 32
Table20- CorrectedForAttenuationWith SelfPerformanceReportsAndTestPerformance
Verbal
Reasoning
Quantitative
Reasoning
Psychology
Knowledge
Combined
Cognitive
Spirituality
Attenuation .21 .00 .39 .21 .43
Self-
Performance
.18 .00 .29 .19 .40
Reliability .75 .82 .54 .82 .85
The transformed scores still correlate completely with the untransformed scores. The other
corrected correlations only vary slightly from the uncorrected scores. Because the scores have
undergone a linear transformation, each of the three subscales correlates perfectly with the
subscale’s Z-score, IQ score and T-score. This occurs because the values did not change in
relation to each other, i.e., the transformation was standardized.
Table21- CorrelationBetween CorrectedRawAndStandardizedScoresOfTheCombined
CognitiveTestAndIts Subscales
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1. Corrected
Combined
Cognitive
---
2. Combined
Cognitive Z-
Score
1.00 ---
3. Combined
Cognitive T-
Score
1.0 1.00 ---
4. Combined
Cognitive IQ
1.00 1.00 1.00 ---
5. Psychology .64 .64 .64 .64 ---
6. Psychology Z-
Score
.64 .64 .64 .64 1.00 ---
7. Psychology T-
Score
.64 .64 .64 .64 1.00 1.00 ---
8. Psychology IQ .64 .64 .64 .64 1.00 1.00 1.00 ---
9. Quantitative .76 .76 .76 .76 .23 .23 .23 .23 ---
10. Quantitative
Z-Score
.76 .76 .76 .76 .23 .23 .23 .23 1.00 ---
11. Quantitative
T-Score
.76 .76 .76 .76 .23 .23 .23 .23 1.00 1.00 ---
12. Quantitative
IQ
.76 .76 .76 .76 .23 .23 .23 .23 1.00 1.00 1.00 ---
13. Verbal .73 .73 .73 .73 .29 .29 .29 .29 .28 .28 .28 .28 ---
14. Verbal Z-
Score
.73 .73 .73 .73 .29 .29 .29 .29 .28 .28 .28 .28 1.00 ---
15. Verbal T-
Score
.73 .73 .73 .73 .29 .29 .29 .29 .28 .28 .28 .28 1.00 1.00 ---
16. Verbal IQ .73 .73 .73 .73 .29 .29 .29 .29 .28 .28 .28 .28 1.00 1.00 1.00 ---
N = 45. Boldedcorrelations are significant at the 0.01level (2-tailed).
33. 33
Overall ConclusionsAnd
Recommendations
The PAL Cognitive and Spirituality Scales show promise as psychometric tools for measuring
cognitive reasoning, psychology knowledge and personality. Overall the cognitive scale was
found to have high internal consistency, as did the verbal and quantitative subscales. The
psychology subscale was found not to have good internal consistency, however, and the
mandatory removal of items not meeting reliability standards further decreased its internal
consistency, an unfortunate consequence when a scale has only a small number of items. The
spirituality subscale showed extremely high levels of internal consistency and all items met the
required thresholds of reliability and were therefore retained.
This report has presented a number of recommendations to further improve the scales’ validity
and reliability. Implementation of the recommendations made in the distractor analysis section
should improve internal consistency. For example the development of several new questions for
the psychology subscale is recommended. Finally, further analysis should be conducted to
ensure the validity of both the cognitive and spirituality scales.
References
Imel, Susan (1998). “Spirituality in the Workplace.” Trends and Issues Alert, ERIC
Clearinghouse on Adult, Career and Vocational Information
Fry, Louis (2016). “Toward a Theory of Spiritual Leadership.” The Leadership Quarterly, no.
14, 693-727
Laabs, Jennifer J. (1995). "Balancing Spirituality and Work." Personnel Journal 74, no. 9: 60-
76.
Leigh, Pamela. (1997). "The New Spirit at Work." Training and Development 5, no. 3:
26-33.
McLaughin, Abraham. (1998). "Seeking Spirituality...At Work." Christian Science Monitor