SlideShare a Scribd company logo
1 of 25
Download to read offline
Article
Applied Psychological Measurement
36(2) 122–146
Ó The Author(s) 2012
Reprints and permission:
sagepub.com/journalsPermissions.nav
DOI: 10.1177/0146621612438725
http://apm.sagepub.com
Using the Graded Response
Model to Control Spurious
Interactions in Moderated
Multiple Regression
Brendan J. Morse1
, George A. Johanson2
, and Rodger W. Griffeth2
Abstract
Recent simulation research has demonstrated that using simple raw score to operationalize
a latent construct can result in inflated Type I error rates for the interaction term of a mod-
erated statistical model when the interaction (or lack thereof) is proposed at the latent vari-
able level. Rescaling the scores using an appropriate item response theory (IRT) model can
mitigate this effect under similar conditions. However, this work has thus far been limited to
dichotomous data. The purpose of this study was to extend this investigation to multicate-
gory (polytomous) data using the graded response model (GRM). Consistent with previous
studies, inflated Type I error rates were observed under some conditions when polytomous
number-correct scores were used, and were mitigated when the data were rescaled with the
GRM. These results support the proposition that IRT-derived scores are more robust to
spurious interaction effects in moderated statistical models than simple raw scores under
certain conditions.
Keywords
graded response model, item response theory, polytomous models, simulation
Operationalizing a latent construct such as an attitude or ability is a common practice in psy-
chological research. Stine (1989) described this process as the creation of a mathematical
structure (scores) that represents the empirical structure (construct) of interest. Typically,
researchers will use simple raw scores (e.g., either as a sum or a mean) from a scale or test
as the mathematical structure for a latent construct. However, much debate regarding the
properties of such scores has ensued since S. S. Stevens’s classic publication of the nominal,
ordinal, interval, and ratio scales of measurement (Stevens, 1946). Although it is beyond the
scope of this article to enter the scale of measurement foray, an often agreed-on position is
that simple raw scores for latent constructs do not exceed an ordinal scale of measurement.
This scale imbues such scores with limited mathematical properties and permissible
1
Bridgewater State University, MA, USA
2
Ohio University, Athens, USA
Corresponding author:
Brendan J. Morse, Department of Psychology, Bridgewater State University, 90 Burrill Avenue, 340 Hart Hall,
Bridgewater, MA 02325, USA
Email: bmorse@bridgew.edu
transformations that are necessary for the appropriate application of parametric statistical
models. Nonparametric, or distribution-free, statistics have been proposed as a solution for
the scale of measurement problem. However, many researchers are reluctant to use nonpara-
metric techniques because they are often associated with a loss of information pertaining to
the nature of the variables (Gardner, 1975). McNemar (1969) articulated this point by say-
ing, ‘‘Consequently, in using a non-parametric method as a short-cut, we are throwing away
dollars in order to save pennies’’ (p. 432).
Assuming that simple raw scores are limited to the ordinal scale of measurement and
researchers typically prefer parametric models to their nonparametric analogues, the empiri-
cal question regarding the robustness of various parametric statistical models to scale viola-
tions arises. Davison and Sharma (1988) and Maxwell and Delaney (1985) demonstrated
through mathematical derivations that there is little cause for concern when comparing mean
group differences in the independent samples t test when the assumptions of normality and
homogeneity of variance are met. However, Davison and Sharma (1990) subsequently
demonstrated that scaling-induced spurious interaction effects could occur with ordinal-level
observed scores in multiple regression analyses. These findings suggest that scaling may
become a problem when a multiplicative interaction term is introduced into a parametric sta-
tistical model.
Scaling and Item Response Theory (IRT)
An alternative solution to the scale of measurement issue for parametric statistics is to rescale
the raw data itself into an interval-level metric, and a variety of methods for this rescaling have
been proposed (see Embretson, 2006; Granberg-Rademacker, 2010; Harwell & Gatti, 2001). A
potential method for producing scores with near interval-level scaling properties is the applica-
tion of IRT models to operationalize number-correct scores into estimated theta scores—the
IRT-derived estimate of an individual’s ability or latent construct standing. Conceptually, the
attractiveness of this method rests with the invariance property in IRT scaling, and such scores
may provide a more appropriate metric for use in parametric statistical analyses.1
Reise,
Ainsworth, and Haviland (2005) stated that
Trait-level estimates in IRT are superior to raw total scores because (a) they are optimal scalings of
individual differences (i.e., no scaling can be more precise or reliable) and (b) latent-trait scales have
relatively better (i.e., closer to interval) scaling properties. (p. 98, italics in original)
In addition, Reise and Haviland (2005) gave an elegant treatment of this condition by demon-
strating that the log-odds of endorsing an item and the theta scale form a linearly increasing rela-
tionship. Specifically, the rate of change on the theta scale is preserved (for all levels of theta) in
relation to the log-odds of item endorsement.
Empirical Evidence of IRT Scaling
In a simulation testing the effect of scaling and test difficulty on interaction effects in factor-
ial analysis of variance (ANOVA), Embretson (1996) demonstrated that Type I and Type II
errors for the interaction term could be exacerbated when simple raw scores are used under
nonoptimal psychometric conditions. Such errors occurred primarily due to the ordinal-level
scaling limitations of simple raw scores, and the ceiling and floor effects imposed when an
assessment is either too easy or too difficult for a group of individuals—a condition known
Morse et al. 123
as assessment inappropriateness (see Figure 1). Embretson fitted the one-parameter logistic
(Rasch) model to the data and was able to mitigate the null hypothesis errors using the esti-
mated theta scores rather than the simple raw scores. These results illuminated the
Assessment Appropriateness
Theta
−4 −2 0 2 4
Trait Scores
Test Information
Assessment Inappropriateness
Theta
−4 −2 0 2 4
Figure 1. A representation of the latent construct distribution and test information (reliability)
distributions for appropriate assessments (top) and inappropriate assessments (bottom)
124 Applied Psychological Measurement 36(2)
usefulness of IRT scaling for dependent variables in factorial models, especially under sub-
optimal psychometric conditions. Embretson argued that researchers are often unaware when
these conditions are present and can benefit from using appropriately fitted IRT models to
generate scores that are more appropriate for use with parametric analyses.
An important question that now arises is whether these characteristics extend to more com-
plex IRT models such as the two- and three-parameter logistic models (dichotomous models
with a discrimination and guessing parameter, respectively) and polytomous models. Although
the Rasch model demonstrates desirable measurement characteristics (i.e., true parameter invar-
iance; Embretson & Reise, 2000; Fischer, 1995; Perline, Wright, & Wainer, 1979), it is some-
times too restrictive to use in practical contexts. However, the consensus regarding the
likelihood that non-Rasch models could achieve interval-level scaling properties is ‘‘yes’’
(Embretson & Reise, 2000; Hambleton, Swaminathan, & Rogers, 1991; Harwell & Gatti, 2001;
Reise et al., 2005). Investigations into the scaling properties of these more complex IRT models
are thus necessary.
In one extension of this sort, Kang and Waller (2005) simulated the scaling properties of
simple raw scores, estimated theta scores derived from a two-parameter logistic IRT model,
and assessment appropriateness with the interaction term in a moderated multiple regression
(MMR) analysis. Similar to the findings of Embretson (1996), Kang and Waller discovered
that using simple raw scores to operationalize a latent construct resulted in substantial infla-
tions of the Type I error rate (.50% or p . .50) for the interaction term in MMR under con-
ditions of assessment inappropriateness. However, the IRT-derived theta score estimates
were found to mitigate the Type I error rate to acceptable levels (10% or p  .10) under
the same conditions. This extension demonstrated that the estimated theta scores from a
non-Rasch IRT model could be used to better fit the assumptions of parametric statistical
models involving an interaction term. Finally, Harwell and Gatti (2001) investigated the
congruence of estimated (theta) and actual construct scores using a popular polytomous IRT
model, the graded response model (GRM; Samejima, 1969, 1996). The authors posited that
if the estimated construct (theta) scores were sufficiently similar to the actual construct
(theta) scores, which can be defined (albeit arbitrarily) as a metric with interval-level scaling
properties, then the GRM results in scores that are sufficiently interval level. The results of
their study supported this relationship. However, a concrete endorsement of the scaling prop-
erties should be made with caution due to the inherently arbitrary metric of the theta scale in
most (if not all) IRT models.
The preceding theoretical and simulation evidence suggests that scale of measurement viola-
tions accompanied with suboptimal psychometric conditions (i.e., assessment inappropriateness)
may have nonnegligible effects on the accuracy of common parametric analyses. However, this
evidence still has limited generalizability to the majority of psychological research due to the
nature of the simulated data. Specifically, Embretson (1996) and Kang and Waller (2005) simu-
lated dichotomous data and fit logistic IRT models appropriate for such data. However, the
majority of psychological research uses polytomous data, or data with multiple response
options, such as Likert-type scales. Therefore, the purpose of the current study is to extend the
understanding of scaling and assessment appropriateness on parametric statistical analyses by
simulating polytomous data and fitting an appropriate polytomous IRT model. The authors’ pri-
mary null hypothesis against which the performance of number-correct scores and estimated
theta scores are being compared is that there is no significant interaction on the actual theta
scale. Thus, any significant interaction identified in the simulation results represents a spurious
observed effect (Type I error).
Morse et al. 125
Method
This study used a Monte Carlo simulation to identify the psychometric conditions that lead to
an elevated risk of Type I errors for interaction effects in MMR when the theta scale is consid-
ered to be the true metric. This simulation was similar to the simulation conducted by Kang and
Waller (2005) and extends this work into polytomous scales indicative of those commonly used
in applied psychological research.
The GRM
The GRM (Samejima, 1969, 1996) is an IRT model suitable for modeling data with ordered
categories such as Likert-type scales, and is an extension of the two-parameter logistic model.
The GRM is considered a difference family model and was developed specifically to model
polytomous data that represent the psychological processes underlying multicategory decision
making (Ostini & Nering, 2006). In addition, theta estimates derived using the GRM may show
evidence of interval-level scaling properties (Harwell & Gatti, 2001).
Using the GRM, an individual’s likelihood of responding in a particular response category is
derived using a two-step process. First, category boundary functions (CBRFs) are calculated to
determine boundary decision probabilities of j 2 1 response categories for each item. The
CBRFs in the GRM can be derived with Equation 1 (adapted from Embretson & Reise, 2000).
Pix à ðuÞ =
e ai uÀbijð Þ½ Š
1 + e ai uÀbijð Þ½ Š
; ð1Þ
In Equation 1, Pix à (u) is the probability that an individual with a trait (construct) level u will
respond positively at the boundary of category j for item i where x = j = 1 . . . mi. Theta (u) rep-
resents the individual’s trait (construct) level, ai represents the item discrimination or slope, and
bij represents the category location or difficulty parameter with respect to the trait continuum.
Importantly, the values of bij should be successive integers reflecting increased difficulty in pro-
gressing through the response options in well-functioning items.
In the second step of the GRM, the probability of responding in a particular category is deter-
mined using category response functions (CRFs), which are derived by subtracting Pix à (u)
from the following category. This process is illustrated in Equation 2 (adapted from Embretson
& Reise, 2000).
PixðuÞ = Pix à ðuÞ À Piðx + 1Þ Ã ðuÞ; ð2Þ
Determining the first category is done by simply subtracting Pi1 Ã (u) from 1.0, and the last
category is simply Pim à (u). The GRM was used as a model for the number-correct score algo-
rithm as well as to estimate theta scores in this study.
Independent Variables
The independent variables in this study were respondent sample size (n: two levels), scale
length (k: two levels), item discrimination (ai: two levels), item difficulty (bi,1 . . . j21: three lev-
els), scale bandwidth (fidelity: two levels), and the regression coefficients (b1 and b2: two lev-
els). The structure of this study was therefore a 2 3 2 3 2 3 3 3 2 3 2 design comprising 96
conditions.
Sample size (n). Two respondent sample sizes were simulated according to recent evidence
of the stability of parameter estimates in polytomous IRT, and actual sample sizes in MMR
126 Applied Psychological Measurement 36(2)
studies in applied psychology. Ostini and Nering (2006) reported that stable estimates for polyt-
omous IRT models could be obtained with as few as 250 individuals, but that samples between
500 and 1,000 are still considered to be desirable. In addition, Aguinis Beaty, Boik, and Pierce
(2005) indicated that the average sample size for MMR studies in applied psychological
research is xn = 272 with an average standard deviation of sn = 434. These results indicate that
the simulation outcomes for the n = 250 sample size will be the most relevant for the majority
of applied psychological research; however, some studies do achieve sample sizes upward of
n = 1,000 (see, for example, Witt, 1998; n = 979). Therefore, sample sizes included two levels
of n = 250 and n = 750 respondents to maximize the generalizability for the majority of empiri-
cal MMR studies in applied psychology as well as for typical IRT studies.
Scale length (k). Two scale lengths of k = 15 and k = 30 items were simulated in this study to
model typical scales used in applied psychological research. In a review of validated scales used
in applied psychological research, Fields (2002) indicated a modal scale length of 15 items with
a mean of 15.43 and a standard deviation of 10.43 for validated scales in applied psychology.
The distribution related to these values is also slightly positively skewed, indicating the exis-
tence of several very long scales.
Discrimination (ai). To derive the highest level of generalizability from this study, item para-
meter values were randomly selected from specified distributions as opposed to using constant
values. Following the structure of Kang and Waller (2005), item discrimination values were
selected from a uniform distribution between the values of 0.31 to 0.58 for moderate discrimi-
nation and 0.58 to 1.13 for high discrimination. Estimating discrimination values from a uni-
form distribution has been demonstrated to appropriately represent empirically determined item
discrimination values (Reise  Waller, 2003), and the particular cutoff values of 0.31, 0.58,
and 1.13 were demonstrated to appropriately represent low, moderate, and high factor loadings
for items (Kang  Waller, 2005; Takane  De Leeuw, 1987). Because the GRM is a polyto-
mous extension of the two-parameter logistic model, these values can be deemed appropriate
for use in this study. Furthermore, the decision to retain the values from the Kang and Waller
(2005) study was made to maintain a basis of comparison for the extension to polytomous data.
Item difficulty (bi,1 . . . j21)/assessment appropriateness. Three item difficulty conditions were
simulated to represent a ‘‘difficult,’’‘‘moderate,’’ and ‘‘easy’’ scale with respect to the simu-
lated distribution of construct scores (see Figure 1). The item difficulty conditions are also ana-
logous to the assessment appropriateness conditions with the ‘‘difficult’’ and ‘‘easy’’ conditions
representing assessment inappropriateness and the ‘‘moderate’’ condition representing assess-
ment appropriateness. Item difficulty values were randomly selected from a N(21.5, 1.0) distri-
bution for the easy (inappropriate) conditions, a N(0.0, 1.0) distribution for the moderate
(appropriate) conditions, and a N(1.5, 1.0) distribution for the difficult (inappropriate)
conditions.
Four item difficulty parameters were randomly selected from the appropriate distribution for
each item to represent the j – 1 CBRFs specified in Equation 1 for the GRM. An important
aspect of the difficulty parameters in polytomous IRT models is that the difficulty parameter for
each CBRF is sequentially ordered. Therefore, the difficulty parameters were modeled with the
sequential ordering restriction imposed similar to that implemented by Meade, Lautenschlager,
and Johnson (2007).
Scale fidelity. An assessment’s fidelity is measured as the inverse of variability (i.e., band-
width) in the difficulty of the items (Stocking, 1987). Fidelity contributes to assessment appro-
priateness by either restricting (high fidelity) or expanding (low fidelity) the width of the item
difficulty distribution. The high-fidelity conditions were simulated in this study by generating a
second set of item difficulty values from more restricted normal distributions with a mean and
standard deviation of N(21.50, 0.50) for easy scales, N(0.00, 0.50) for moderate scales, and
Morse et al. 127
N(1.50, 0.50) for difficult scales. These restricted distributions will create the high-fidelity and
low-bandwidth situation in which Kang and Waller (2005) observed the highest prevalence of
Type I errors. As in the previous difficulty parameter selection, there were four (j 2 1) diffi-
culty values for each item sampled from within the specified distribution with the sequential
ordering restriction imposed.
Regression weights. In accordance with Kang and Waller (2005), regression weights were set
at a value of 0.30 or 0.50 for both b1 and b2. An intercept of 0 is used and therefore omitted
from the regression models. It should be noted that these regression weights are fixed only for
the purposes of simulating the dependent variables.
Fixed Effects
Item response categories (j). Five item response categories were used to simulate a five-
category Likert-type response scale. Fields (2002) identified 134 validated construct assess-
ments that are used in applied psychological research of which five-category Likert-type
response scales were the most common (n = 57).
Regression models. The purpose of this study was to observe the prevalence of Type I errors
in MMR in three different pairs of models. In the first regression model pair, actual latent trait
scores u will be analyzed (see Equations 3a and 3b). In the second regression model pair,
number-correct scores (X) will be analyzed (see Equations 4a and 4b). In the third regression
model pair, estimated theta scores ^u will be analyzed (see Equations 5a and 5b). These three
model pairs were expressed in accordance with Kang and Waller (2005) as follows:
u3 = b1u1 + b2u2 + e; ð3aÞ
u3 = b1u1 + b2u2 + b3u1u2 + e; ð3bÞ
X3 = b1X1 + b2X2 + e; ð4aÞ
X3 = b1X1 + b2X2 + b3X1X2 + e; ð4bÞ
^u3 = b1
^u1 + b2
^u2 + e; ð5aÞ
^u3 = b1
^u1 + b2
^u2 + b3
^u1
^u2 + e: ð5bÞ
The first model of each pair is the additive model, and the second model in each pair con-
tains a multiplicative (interaction) term. Each model pair was structured as a hierarchical regres-
sion analysis where the interaction term is entered at the second step (Aiken  West, 1991;
Cohen, Cohen, West,  Aiken, 2003). A significant change in variance accounted for (DR2
)
between the first and second model indicated the existence of a spurious interaction effect based
on the null hypothesis that the data were created with no significant interaction on the actual
theta scale.
Regression Main Effects
Two continuous predictor variables were simulated for each regression model specified in
Equations 3 through 5b. Predictor variables u1 and u2 were randomly selected for the number
of observations (n) from normal distributions with a mean and standard deviation equal to
N(0.00, 1.00). These variables served as the main effect scores in the regression models. It is
important to note that u1 and u2 were sampled from identical but independent distributions;
thus, no multicollinearity was modeled.
128 Applied Psychological Measurement 36(2)
Regression Criterion Variables
One continuous criterion variable was calculated for each regression model specified in
Equations 3 through 5b. In accordance with Kang and Waller (2005), the general form of the
criterion variables is given by the following equation, which represents a multiple regression
model with two significant main effects and no interaction on the actual theta scale.
u3 = b1u1 + b2u2 +
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 À ðb2
1 + b2
2Þ
q
3 e; ð6Þ
In Equation 6, b1 and b2 are the simulated regression weights and e is an error term. Note
that the intercept term, b0, was set to equal 0 and thus omitted from the model. The term
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 À b2
1 + b2
2
À Áq
was included to represent an appropriate error variance component for each
level of b. See the Appendix for a derivation of the error term.
Number-correct scores
To generate the number-correct scores, X1, X2, and X3, the values of the previously defined con-
struct scores u1, u2, and u3 were entered into the GRM algorithm (Equations 1 and 2) for each
simulated participant.
A matrix of response scores was generated by reporting the number-correct score (1, 2, 3, 4,
or 5) corresponding to the highest-category response likelihood for each simulated participant
on each item. These values were derived using an algorithm written by the first author in the
R language based on response probabilities calculated in Equations 1 and 2. Actual number-
correct score responses were generated by comparing a randomly selected value from a uniform
distribution, U(0.0, 1.0), with the relative response probabilities that are generated for each level
of theta (or individual) and each item. This process can be thought of as determining the relative
likelihood of a category response given the item and person parameters with a realistic level of
decision-making error (Kang  Waller, 2005; Stone, 1992). This integration of response error is
important so as to not assume perfect responding by simulated individuals. A mean score for X1,
X2, and X3 for each simulated individual was calculated from the number-correct score response
matrices for analysis in the regression models.
Estimated Theta Scores
Finally, theta scores ^u1, ^u2, and ^u3 were estimated from the simulated raw data, using
PARSCALE 4.1 (Muraki  Bock, 2003). PARSCALE was set to derive the person (latent con-
struct scores) and item parameters using the expected a posteriori (EAP) method. This method
calculates ^u1, ^u2, and ^u3 as the modal value of the posterior distribution, which is the most
likely value of theta for the observed response pattern (Baker  Kim, 2004) and is a preferred
estimation method for assessments that are moderate to short in item length (Mislevy 
Stocking, 1989).
Iterations
For the purposes of estimating Type I error rates in Monte Carlo studies, Robey and
Barcikowski (1992) specify that approximately 1,000 iterations will achieve a power equal
to .90 when approximating an alpha level of a = .05 and using the interval of a 6 1=2a as a
robustness interval. Therefore, 1,000 iterations per condition were conducted. This allowed
Morse et al. 129
for adequate reduction in sampling variance for the IRT parameter estimates (Harwell,
Stone, Hsu,  Kirisci, 1996), achieves a power of .90 around the interval :025 a :075
(Robey  Barcikowski, 1992), and doubles the number of iterations used by Kang and
Waller (2005).
Simulation-Dependent Variables
Type I errors. The primary dependent variable for this study was the empirical Type I error
rate (p) that is observed for the interaction term of the MMR models. The specific value of p
was identified in a three-step process. In each iteration of the simulation, the variance in u3
accounted for by u1 and u2 was recorded as the R2
value for the additive and multiplicative
regression models specified in Equations 3 through 5b. Second, the significance of the change
in variance accounted for, DR2
, between the respective additive and multiplicative models was
tested at an alpha level of p :05, and recorded as 1 for a significant result and 0 for a non-
significant result. Finally, the empirical alpha level p was recorded as the proportion ð x
1;000Þ of
iterations resulting in a significant DR2
for the actual latent trait scores u3, the number-correct
scores X3, and the estimated theta scores ^u3.
Procedure
The simulation for the current study was conducted in the R environment (version 2.9.0; Ihaka
 Gentleman, 1996; R Development Core Team, 2008) using a series of functions written by
the authors, contributed code from Kang and Waller (2005), and PARSCALE 4.1 (Muraki 
Bock, 2003). For ease of interpretation, four separate simulations were conducted. The four
simulations were separated based on sample size (n = 250, 750) and scale fidelity (normal,
high). In each simulation, the independent variables of scale length, regression weights, dis-
crimination, and difficulty will be systematically varied. Therefore, the summary statistics for
each simulation are included in four tables, each with 24 rows.
Each simulation was run using the following process. First, using the pseudorandom number
generator in R, theta vectors were sampled from a standard normal distribution N(0.0, 1.0) for
u1 and u2. Next, corresponding vectors for u3 were calculated using Equation 6. These vectors
were saved as the actual latent construct scores. To calculate the number-correct score matrices,
X1, X2, and X3, each of these three score vectors were evaluated in an algorithm written by the
first author that implements Equation 1 and Equation 2 to determine the probability of a cate-
gory response. Final number-correct score values were determined by the comparison of a ran-
domly selected value from a uniform distribution as previously described. Finally, the estimated
theta scores ^u1, ^u2, and ^u3 were derived using PARSCALE 4.1 (Muraki  Bock, 2003). To
accomplish this task, the number-correct score matrices were ‘‘batched’’ out to PARSCALE
with an accompanying syntax file following the structure identified by Gagne, Furlow, and
Ross (2009). The estimated theta scores from PARSCALE were then returned to R as the vec-
tors ^u1, ^u2, and ^u3.
Finally, the nine score vectors to be entered into the corresponding additive and multiplica-
tive regression models specified in Equations 3 through 5b and the change in variance accounted
for between the two corresponding models was recorded. The final summary statistics and tables
were generated using portions of code provided by Niels Waller and used in the Kang and
Waller (2005) study.
130 Applied Psychological Measurement 36(2)
Results
Using the a 6 1=2a criterion, the results indicated meaningfully inflated Type I error rates in 53
of 96 conditions (55%) when number-correct scores were used to operationalize the latent con-
structs, and in 33 of 96 conditions (34%) when estimated theta scores were used to operationa-
lize the latent constructs (see conditions marked with an asterisk under the columns labeled pX
and p^u, respectively, in Tables 1-4). In addition, a binomial test was conducted as a measure of
statistically significant departures from the nominal (a = .05) alpha level for each scoring
method. The results of the binomial test were slightly more conservative and indicated signifi-
cantly inflated Type I error rates in 63 of 96 conditions (66%) when number-correct scores
were used to operationalize the latent constructs and in 44 of 96 conditions (46%) when esti-
mated theta scores were used to operationalize the latent constructs (see conditions marked with
dagger under the columns labeled pX and p^u, respectively, in Tables 1-4).
In addition, Figure 2 represents the frequency of the empirical Type I error rates for the
number-correct scores and estimated theta scores. For the number-correct scores, these data
indicate a positively skewed distribution (skew = 2.04) ranging from 3.1% to 84.9%, with a
mean empirical Type I error rate of 17.5%, median of 8.7%, and a standard deviation of 20%.
Restricting the summary to only those occurrences outside of the a 6 1=2a interval, the mean
empirical Type I error rate was 27.6% with values ranging from 7.8% to 84.9%. The distribu-
tion for the estimated theta scores was also positively skewed (skew = 2.97) and ranged from
3.7% to 43.6% with a mean empirical Type I error rate of 9.0%, median of 5.9%, and a standard
deviation of 8.0%. Restricting the summary to only those occurrences outside of the a 6 1=2a
interval, the mean empirical Type I error rate was 15.9%, with values ranging from 7.6% to
43.6%.
These results indicate that there were instances of spurious interactions regardless of the
scoring method. However, it is clear that the number-correct scores performed much worse than
the estimated theta scores in comparable conditions. Finally, an important finding to highlight
is that, of the conditions with meaningfully inflated Type I error rates for the estimated theta
scores, none were unique with regard to the number-correct scores (see Tables 1-4). In other
words, no meaningful inflations existed for the estimated theta scores that did not also exist for
the number-correct scores.
Assessment Appropriateness and Type I Errors
The results of this simulation also clearly indicate the anticipated effect of scoring method and
assessment appropriateness on the occurrence of Type I errors for the interaction term of a
MMR analysis. Figure 3 represents the mean and maximum Type I error rates for each scoring
method collapsed across the 32 assessment appropriateness conditions. Under these conditions,
there is no significant departure from the nominal Type I error rate, regardless of whether one
uses simple raw scores or estimated theta scores in the MMR analysis. These results are consis-
tent with previous findings related to scaling effects on Type I error rates for moderated statisti-
cal models (Davison  Sharma, 1990; Embretson, 1996; Kang  Waller, 2005).
However, striking differences in the empirical Type I error rate can be observed for each
scoring method when the assessment is inappropriate for the individuals. Figure 4 represents the
mean and maximum Type I error rates for each scoring method collapsed across the 64 assess-
ment inappropriateness (easy/difficult) conditions. Number-correct scores resulted in empirical
Type I error rates that were above the acceptable interval in 53 of the 64 (83%) inappropriate
assessment conditions. At the iteration level, a direct logistic regression analysis indicated that
the likelihood of committing a Type I error was 8.13 times greater when number-correct scores
(Text continues on p. 137.)
Morse et al. 131
Table1.ResultsofSimulation1(NormalFidelity,DistributionofLatentConstructScores=StandardNormalN(0,1))
cnbi,j21aibkpuDR2
upxDR2
xp^uDR2
^u
aRMSESWuSWxSW^uskxsk^u
1250EasyLow.3150.0620.0180.0550.0210.0470.022.660.860.960.060.780.540.23
2250EasyLow.3300.0620.0180.068a
0.0200.0520.023.800.840.960.060.770.560.22
3250EasyLow.5150.0620.0110.088a,b
0.0180.0580.018.660.860.960.230.840.540.24
4250EasyLow.5300.0620.0110.113a,b
0.0170.0610.017.800.830.960.300.820.570.24
5250EasyHigh.3150.0620.0180.0550.0210.0470.022.661.440.960.060.780.540.23
6250EasyHigh.3300.0610.0180.105a,b
0.0220.082a,b
0.016.921.440.960.000.411.010.34
7250EasyHigh.5150.0610.0110.296a,b
0.0200.089a,b
0.012.861.440.960.010.570.990.34
8250EasyHigh.5300.0620.0110.372a,b
0.0190.098a,b
0.014.921.440.960.020.611.010.34
9250ModerateLow.3150.0620.0180.0460.0200.0560.020.690.700.960.830.860.020.01
10250ModerateLow.3300.0620.0180.0420.0210.0480.020.820.700.960.860.860.020.02
11250ModerateLow.5150.0620.0110.0410.0160.0520.017.690.700.960.910.860.020.02
12250ModerateLow.5300.0620.0110.0370.0150.0370.018.820.700.960.930.860.020.03
13250ModerateHigh.3150.0620.0180.0510.0190.0470.020.880.530.960.430.880.040.01
14250ModerateHigh.3300.0620.0180.0480.0190.0590.019.940.520.960.500.880.030.00
15250ModerateHigh.5150.0620.0110.0410.0140.0500.014.880.530.960.860.910.040.00
16250ModerateHigh.5300.0610.0110.0390.0130.0550.014.940.520.960.890.910.030.00
17250DifficultLow.3150.0610.0180.064a
0.0200.0530.021.650.890.960.030.720.580.21
18250DifficultLow.3300.0610.0180.067a
0.0210.0520.022.790.880.960.040.700.600.22
19250DifficultLow.5150.0610.0110.106a,b
0.0190.075a
0.018.650.890.960.150.790.580.21
20250DifficultLow.5300.0610.0110.138a,b
0.0170.0550.019.790.880.960.230.780.600.20
21250DifficultHigh.3150.0610.0180.112a,b
0.0230.098a,b
0.011.851.580.960.000.301.060.37
22250DifficultHigh.3300.0610.0180.123a,b
0.0220.070a
0.014.921.590.960.000.261.080.38
23250DifficultHigh.5150.0610.0110.317a,b
0.0210.120a,b
0.011.851.590.960.000.491.060.38
24250DifficultHigh.5300.0610.0110.386a,b
0.0200.102a,b
0.012.921.590.960.010.471.080.38
Note:c=condition;n=numberofindividuals;bij21=itemcategorydifficultydistribution,Easy(assessmentinappropriateness)=N(21.5,1),Moderate(assessment
appropriateness)=N(0,1),Difficult(assessmentinappropriateness)=N(1.5,1);ai=itemdiscriminationdistribution,Low=U(.31,.58),High=U(.58,1.13);b=regressionweight;k=
numberofitems;pu=empiricalTypeIerrorrateforactualthetascores;DR2
u=averageeffectsizeforsignificantinteractionsforactualthetascores;px=empiricalTypeIerrorrate
fornumber-correctscores;DR2
x=averageeffectsizeforsignificantinteractionsfornumber-correctscores;p^u=empiricalTypeIerrorrateforestimatedthetascores;DR2
^u
=average
effectsizeforsignificantinteractionsforestimatedthetascores;a=averageinternalconsistencyforthenumber-correctscores;RMSE=rootmeansquareerrorfortheestimated
thetascores;SWu=proportionofn.s.Shapiro–Wilktestsfortheactualthetascores;SWx=proportionofn.s.Shapiro–Wilktestsforthenumber-correctscores;SW^u=proportion
ofn.s.Shapiro–Wilktestsfortheestimatedthetascores;skx=|skewness|forthenumber-correctscores(abs.value);sk^u=|skewness|fortheestimatedthetascores.Iterationsper
condition=1,000.
a
SignificantTypeIErrorratebasedontheresultsofabinomialtest.
b
SignificantTypeIErrorratebasedonthealpha+/-.5alphacriterion.
132
Table2.ResultsofSimulation2(NormalFidelity,DistributionofLatentConstructScores=StandardNormalN(0,1))
cnbi,j21aibkpuDR2
upxDR2
xp^uDR2
^u
aRMSESWuSWxSW^uskxsk^u
25750EasyLow.3150.0490.0060.074a
0.0070.0560.007.660.820.950.000.440.550.23
26750EasyLow.3300.0490.0060.069a
0.0070.0570.007.800.840.950.000.450.570.24
27750EasyLow.5150.0490.0040.167a,b
0.0070.089a,b
0.006.660.820.950.000.650.550.22
28750EasyLow.5300.0490.0040.222a,b
0.0060.079a,b
0.006.800.840.950.010.660.560.24
29750EasyHigh.3150.0490.0060.162a,b
0.0080.084a,b
0.006.861.310.950.000.111.000.31
30750EasyHigh.3300.0490.0060.142a,b
0.0070.066a
0.006.921.310.950.000.111.010.32
31750EasyHigh.5150.0490.0040.627a,b
0.0090.173a,b
0.006.861.310.950.000.521.000.31
32750EasyHigh.5300.0490.0040.710a,b
0.0090.158a,b
0.006.921.310.950.000.521.000.32
33750ModerateLow.3150.0490.0060.0560.0060.0520.006.690.680.950.320.760.020.01
34750ModerateLow.3300.0490.0060.0460.0070.0430.007.820.660.950.440.770.020.02
35750ModerateLow.5150.0490.0040.0460.0060.0550.005.690.680.950.690.820.020.02
36750ModerateLow.5300.0490.0040.0430.0050.0380.006.820.660.950.810.820.020.02
37750ModerateHigh.3150.0490.0060.0500.0060.0530.006.880.510.950.010.690.030.00
38750ModerateHigh.3300.0490.0060.0440.0060.0440.006.940.530.950.010.710.030.01
39750ModerateHigh.5150.0490.0040.065a
0.0050.068a
0.005.880.500.950.510.790.030.00
40750ModerateHigh.5300.0490.0040.0560.0040.0550.005.940.530.950.670.800.030.02
41750DifficultLow.3150.0480.0060.075a
0.0070.0580.007.660.860.950.000.340.580.21
42750DifficultLow.3300.0490.0060.081a,b
0.0080.0430.007.790.870.950.000.360.600.20
43750DifficultLow.5150.0490.0040.164a,b
0.0070.075a
0.007.660.860.950.000.560.580.18
44750DifficultLow.5300.0490.0040.269a,b
0.0070.0590.006.790.870.950.000.600.600.19
45750DifficultHigh.3150.0490.0060.159a,b
0.0080.066a
0.007.851.400.950.000.071.070.33
46750DifficultHigh.3300.0490.0060.180a,b
0.0080.065a
0.007.921.410.950.000.071.090.33
47750DifficultHigh.5150.0490.0040.635a,b
0.0100.193a,b
0.006.851.400.950.000.491.070.33
48750DifficultHigh.5300.0490.0040.765a,b
0.0100.192a,b
0.006.921.410.950.000.481.090.33
Note:c=condition;n=numberofindividuals;bij21=itemcategorydifficultydistribution,Easy(assessmentinappropriateness)=N(21.5,1),Moderate(assessment
appropriateness)=N(0,1),Difficult(assessmentinappropriateness)=N(1.5,1);ai=itemdiscriminationdistribution,Low=U(.31,.58),High=U(.58,1.13);b=regressionweight;k=
numberofitems;pu=empiricalTypeIerrorrateforactualthetascores;DR2
u=averageeffectsizeforsignificantinteractionsforactualthetascores;px=empiricalTypeIerrorrate
fornumber-correctscores;DR2
x=averageeffectsizeforsignificantinteractionsfornumber-correctscores;p^u=empiricalTypeIerrorrateforestimatedthetascores;DR2
^u
=average
effectsizeforsignificantinteractionsforestimatedthetascores;a=averageinternalconsistencyforthenumber-correctscores;RMSE=rootmeansquareerrorfortheestimated
thetascores;SWu=proportionofn.s.Shapiro–Wilktestsfortheactualthetascores;SWx=proportionofn.s.Shapiro–Wilktestsforthenumber-correctscores;SW^u=proportion
ofn.s.Shapiro–Wilktestsfortheestimatedthetascores;skx=|skewness|forthenumber-correctscores(abs.value);sk^u=|skewness|fortheestimatedthetascores.Iterationsper
condition=1,000.
a
SignificantTypeIErrorratebasedontheresultsofabinomialtest.
b
SignificantTypeIErrorratebasedonthealpha+/-.5alphacriterion.
133
Table3.ResultsofSimulation3(HighFidelity,DistributionofLatentConstructScores=StandardNormalN(0,1))
cnbi,j21aibkpuDR2
upxDR2
xp^uDR2
^u
aRMSESWuSWxSW^uskxsk^u
49250EasyLow.3150.0620.0180.067a
0.0210.0550.020.640.70.960.010.780.640.23
50250EasyLow.3300.0620.0180.078a,b
0.0200.0540.022.780.690.960.010.750.670.24
51250EasyLow.5150.0620.0110.099a,b
0.0180.064a
0.018.640.700.960.100.890.640.23
52250EasyLow.5300.0620.0110.132a,b
0.0180.0570.019.780.690.960.150.830.670.24
53250EasyHigh.3150.0620.0180.128a,b
0.0240.079a,b
0.022.841.570.960.000.031.340.76
54250EasyHigh.3300.0610.0180.152a,b
0.0230.085a,b
0.021.911.560.960.000.041.380.76
55250EasyHigh.5150.0610.0110.390a,b
0.0230.215a,b
0.019.841.570.960.000.261.340.77
56250EasyHigh.5300.0620.0110.467a,b
0.0230.224a,b
0.019.911.560.960.000.271.370.76
57250ModerateLow.3150.0620.0180.0470.0200.0440.021.680.560.960.770.960.010.00
58250ModerateLow.3300.0620.0180.0440.0200.0580.020.810.560.960.800.970.010.00
59250ModerateLow.5150.0620.0110.0410.0160.0500.016.680.560.960.890.970.010.00
60250ModerateLow.5300.0620.0110.0400.0150.0500.017.810.560.960.900.960.010.00
61250ModerateHigh.3150.0620.0180.0470.0190.0560.018.880.380.960.110.930.020.01
62250ModerateHigh.3300.0620.0180.0420.0200.0590.019.930.390.960.140.930.020.01
63250ModerateHigh.5150.0620.0110.0310.0140.0420.014.880.380.960.790.950.020.01
64250ModerateHigh.5300.0610.0110.0320.0140.0540.014.930.390.960.860.950.020.01
65250DifficultLow.3150.0610.0180.075a
0.0210.0500.020.630.720.960.010.740.660.24
66250DifficultLow.3300.0610.0180.066a
0.0220.0490.021.780.710.960.010.730.690.24
67250DifficultLow.5150.0610.0110.115a,b
0.0200.071a
0.018.630.720.960.070.830.660.24
68250DifficultLow.5300.0610.0110.150a,b
0.0180.067a
0.018.780.700.960.110.830.690.24
69250DifficultHigh.3150.0610.0180.141a,b
0.0250.098a,b
0.022.831.650.960.000.031.390.80
70250DifficultHigh.3300.0610.0180.155a,b
0.0240.093a,b
0.022.911.650.960.000.021.420.80
71250DifficultHigh.5150.0610.0110.411a,b
0.0250.246a,b
0.021.831.650.960.000.221.400.81
72250DifficultHigh.5300.0610.0110.488a,b
0.0250.235a,b
0.020.911.650.960.000.241.430.81
Note:c=condition;n=numberofindividuals;bi,j21=itemcategorydifficultydistribution,Easy(assessmentinappropriateness)=N(21.5,0.5),Moderate(assessment
appropriateness)=N(0,0.5),Difficult(assessmentinappropriateness)=N(1.5,0.5);ai=itemdiscriminationdistribution,Low=U(.31,.58),High=U(.58,1.13);b=regressionweight;
k=numberofitems;pu=empiricalTypeIerrorrateforactualthetascores;DR2
u=averageeffectsizeforsignificantinteractionsforactualthetascores;px=empiricalTypeIerror
ratefornumber-correctscores;DR2
x=averageeffectsizeforsignificantinteractionsfornumber-correctscores;p^u=empiricalTypeIerrorrateforestimatedthetascores;DR2
^u
=
averageeffectsizeforsignificantinteractionsforestimatedthetascores;a=averageinternalconsistencyforthenumber-correctscores;RMSE=rootmeansquareerrorforthe
estimatedthetascores;SWu=proportionofn.s.Shapiro–Wilktestsfortheactualthetascores;SWx=proportionofn.s.Shapiro–Wilktestsforthenumber-correctscores;SW^u=
proportionofn.s.Shapiro–Wilktestsfortheestimatedthetascores;skx=|skewness|forthenumber-correctscores(abs.value);sk^u=|skewness|fortheestimatedthetascores.
Iterationspercondition=1,000.
a
SignificantTypeIErrorratebasedontheresultsofabinomialtest.
b
SignificantTypeIErrorratebasedonthealpha+/-.5alphacriterion.
134
Table4.ResultsofSimulation4(HighFidelity,DistributionofLatentConstructScores=StandardNormalN(0,1))
cnbi,j–1aibkpuDR2
upxDR2
xp^uDR2
^u
aRMSESWuSWxSW^uskxsk^u
73750EasyLow.3150.0490.0060.086a,b
0.0070.066b
0.007.640.650.950.000.160.650.26
74750EasyLow.3300.0490.0060.080a,b
0.0070.0560.007.780.650.950.000.160.670.26
75750EasyLow.5150.0490.0040.183a,b
0.0070.100a,b
0.006.640.650.950.000.540.650.26
76750EasyLow.5300.0490.0040.268a,b
0.0060.086a,b
0.006.780.650.950.000.550.670.26
77750EasyHigh.3150.0490.0060.216a,b
0.0080.109a,b
0.007.841.360.950.000.001.350.62
78750EasyHigh.3300.0490.0060.199a,b
0.0090.095a,b
0.007.911.360.950.000.001.380.62
79750EasyHigh.5150.0490.0040.740a,b
0.0110.407a,b
0.007.841.360.950.000.071.350.62
80750EasyHigh.5300.0490.0040.842a,b
0.0120.388a,b
0.007.911.350.950.000.081.380.62
81750ModerateLow.3150.0490.0060.0470.0060.0490.006.680.560.950.150.900.010.00
82750ModerateLow.3300.0490.0060.0450.0070.0460.006.810.560.950.230.890.010.00
83750ModerateLow.5150.0490.0040.0430.0050.0470.005.680.560.950.610.930.010.00
84750ModerateLow.5300.0490.0040.0460.0050.0420.006.810.560.950.740.940.010.00
85750ModerateHigh.3150.0490.0060.0430.0060.0490.006.880.390.950.000.700.020.00
86750ModerateHigh.3300.0490.0060.0410.0060.0420.006.930.390.950.000.720.020.00
87750ModerateHigh.5150.0490.0040.0450.0050.0540.004.880.390.950.320.920.020.00
88750ModerateHigh.5300.0490.0040.0400.0040.0470.005.930.390.950.460.920.020.00
89750DifficultLow.3150.0480.0060.080a,b
0.0080.0590.007.640.680.950.000.120.670.26
90750DifficultLow.3300.0490.0060.094a,b
0.0070.0410.007.780.670.950.000.120.690.26
91750DifficultLow.5150.0490.0040.180a,b
0.0070.094a,b
0.007.640.680.950.000.500.670.26
92750DifficultLow.5300.0490.0040.315a,b
0.0070.076a,b
0.006.780.670.950.000.510.690.27
93750DifficultHigh.3150.0490.0060.199a,b
0.0090.106a,b
0.008.841.460.950.000.001.400.66
94750DifficultHigh.3300.0490.0060.236a,b
0.0090.107a,b
0.008.911.460.950.000.001.430.65
95750DifficultHigh.5150.0490.0040.734a,b
0.0120.436a,b
0.008.841.460.950.000.061.400.66
96750DifficultHigh.5300.0490.0040.849a,b
0.0130.404a,b
0.008.911.460.950.000.051.430.65
Note:c=condition;n=numberofindividuals;bi,j21=itemcategorydifficultydistribution,Easy(assessmentinappropriateness)=N(21.5,0.5),Moderate(assessment
appropriateness)=N(0,0.5),Difficult(assessmentinappropriateness)=N(1.5,0.5);ai=itemdiscriminationdistribution,Low=U(.31,.58),High=U(.58,1.13);b=regressionweight;
k=numberofitems;pu=empiricalTypeIerrorrateforactualthetascores;DR2
u=averageeffectsizeforsignificantinteractionsforactualthetascores;px=empiricalTypeIerror
ratefornumber-correctscores;DR2
x=averageeffectsizeforsignificantinteractionsfornumber-correctscores;p^u=empiricalTypeIerrorrateforestimatedthetascores;DR2
^u
=
averageeffectsizeforsignificantinteractionsforestimatedthetascores;a=averageinternalconsistencyforthenumber-correctscores;RMSE=rootmeansquareerrorforthe
estimatedthetascores;SWu=proportionofn.s.Shapiro–Wilktestsfortheactualthetascores;SWx=proportionofn.s.Shapiro–Wilktestsforthenumber-correctscores;SW^u=
proportionofn.s.Shapiro–Wilktestsfortheestimatedthetascores;skx=|skewness|forthenumber-correctscores(abs.value);sk^u=|skewness|fortheestimatedthetascores.
Iterationspercondition=1,000.
a
SignificantTypeIErrorratebasedontheresultsofabinomialtest.
b
SignificantTypeIErrorratebasedontheresultsofthealpha+/-.5alphacriterion.
135
Figure 3. Empirical Type I error rates for the interaction term of a simulated moderated multiple
regression model under conditions of assessment appropriateness
Figure 2. Distribution of spurious interactions for number-correct scores and estimated theta scores
136 Applied Psychological Measurement 36(2)
from inappropriate assessments were used, x2
(1, N = 96,000) = 5,008.55, p  .001, odds ratio =
8.13. In addition, estimated theta scores resulted in empirical Type I error rates that were above
the acceptable interval in 33 of the 64 (51%) inappropriate assessment conditions. A direct
logistic regression analysis indicated that the likelihood of committing a Type I error was 2.4
times greater when estimated theta scores from inappropriate assessments were used, x2
(1, N =
96,000) = 918.15, p  .001, odds ratio = 2.40.
Impact of the Independent Variables on the Empirical Type I Error Rate
Table 5 represents the mean empirical Type I error rate as well as direct logistic regression tests
for the levels of each independent variable. The dependent variable in the logistic regression
analyses was the occurrence of a Type I error and the iteration level, and was coded as a 1 if the
DR2
between the additive and multiplicative model was significant (p  .05) or a 0 if it was not
significant. All of the independent variables were entered into the model simultaneously as cate-
gorical predictors. A general pattern can be identified in these results such that higher empirical
Type I error rates were observed for the stronger level of each independent variable. This pat-
tern would indicate that each psychometric characteristic that was varied in the simulations had
an overall effect on the empirical Type I error rates for the interaction term.
Several important findings can be identified from these results. First, the psychometric char-
acteristics that were manipulated in this simulation had a stronger overall effect on Type I
errors when the variables were operationalized as number-correct scores when compared with
estimated theta scores. These results suggest that number-correct scores are more sensitive to
Figure 4. Empirical Type I error rates for the interaction term of a simulated moderated multiple
regression model under conditions of assessment inappropriateness
Morse et al. 137
Table5.ImpactofIndividualPredictorsonEmpiricalTypeIErrorRates
MDirectlogisticregressionfor
number-correctscoreTypeIerrorsa
Directlogisticregressionfor
estimatedthetascoreTypeIerrorsb
pupxp^uWaldc2
dfBORc
Waldx2
dfBORc
Appropriateness(difficulty)Appropriate0.060.040.055,008.55***12.0968.13918.15***10.8752.40
Inappropriate0.060.240.11
DiscriminationLow0.060.100.064,437.19***11.3393.821,185.01***10.8362.30
High0.060.250.12
Samplesize2500.060.160.09154.09***10.2341.269.93***10.0731.08
7500.060.190.09
FidelityNormal0.060.160.07165.42***10.2421.27364.77***10.4461.56
High0.060.190.11
Items150.060.160.09154.09***10.2341.269.93***10.0731.08
300.060.190.09
Betaweights.30.060.090.064,876.13***11.4174.12881.98***10.7102.03
.50.060.260.12
Note:OR=oddsratio.
a
Omnibusfullmodel,x2
(1,N=96,000)=17,157.51,p.001,R2
=.27.
b
Omnibusfullmodel,x2
(1,N=96,000)=3,571.47,p.001,R2
=.08.
c
Ineachcase,theORreportedcorrespondstoincreasesinthepredictorvariable(e.g.,increasedassessmentinappropriatenessresultsinhigherlikelihoodsofTypeIerrors,
increasesindiscriminationresultsinhigherlikelihoodsofTypeIerrors,etc.).
***p.001.
138
measurement effects in parametric analyses than are IRT-derived theta estimates. For both
dependent variables, assessment appropriateness was the most impactful predictor of Type I
errors, followed by item discrimination and regression weights. This result confirms and
extends the effects of assessment appropriateness identified by Kang and Waller (2005), as well
as arguments raised by Busemeyer (1980) on the role of assessment difficulty in parametric
statistics.
Strength of Spurious Interaction Effects
Finally, the authors were interested in understanding how assessment appropriateness affected
the strength of spurious interactions for the different scoring methods. The columns labeled with
DR2
for each respective scoring method in Tables 1 through 4 indicate the average strength of
the interaction when a spurious interaction was identified. Because sample size is known to
affect the strength of interaction effects, the authors used a multivariate analysis of covariance
(MANCOVA) to determine the effect of assessment appropriateness using sample size as a cov-
ariate. After adjusting for the effects of sample size, the results indicated a significant effect of
assessment appropriateness on the strength of spurious interaction effects for number-correct
scores F(1, 93) = 51.92, p  .001, partial h2
= .36 such that the average interaction strength in
the inappropriate assessment conditions (M = 0.015, SD = 0.007) was significantly greater than
the appropriate assessment conditions (M = 0.011, SD = 0.006). A similar, albeit much weaker,
result was also identified for estimated theta scores F(1, 93) = 4.47, p  .05, partial h2
= .05
such that the average interaction strength in the inappropriate assessment conditions (M = 0.013,
SD = 0.006) was significantly greater than the appropriate assessment conditions (M = 0.012,
SD = 0.006). No significant difference was identified for actual theta scores. These results indi-
cate that assessment appropriateness has an effect on the strength of spurious interaction effects
for number-correct scores and estimated theta scores and that the effect is considerably stronger
in the number-correct score conditions.
Discussion
Theoretical and empirical evidence has emerged to suggest that using IRT to operationalize an
individual’s standing on a latent construct has important measurement implications over the use
of number-correct scores (Borsboom, 2008; Embretson, 1996, 2006; Embretson  DeBoeck,
1994; Harwell  Gatti, 2001; Kang  Waller, 2005; Perline et al., 1979; Reise  Haviland,
2005; Reise et al., 2005; Wainer, 1982). Specifically, IRT-derived theta scores have been
demonstrated to be resistant to inflated Type I error rates in moderated statistical models poten-
tially due to achieving an interval, or nearly interval, scale of measurement (Embretson, 1996;
Kang  Waller, 2005). This previous work has been limited to applications of dichotomous
data and restrictive IRT models. Therefore, the authors’ goal was to extend their understanding
of these potentially beneficial measurement properties by modeling multicategory data and
implementing a polytomous IRT model. These studies represent a generalizability trend such
that each successive study branches further away from the measurement ideal and into the reali-
ties of psychological data.
It is imperative to point out that under certain conditions, significantly inflated Type I error
rates were observed for both the estimated theta scores and the number-correct scores. This
result for the number-correct scores was expected; however, this result for the estimated theta
scores was somewhat unexpected. Strong psychometric influences still resulted in inflated
Type I error rates for the estimated theta scores from the GRM. However, it was often the case
that the Type I error rate of the number-correct scores far exceeded that of the estimated theta
Morse et al. 139
scores. For example, in conditions 7, 8, 23, and 24 in Table 1, the Type I error rates for the
number-correct scores ranged from .296 to .386 whereas the respective Type I error rates for
the estimated theta scores ranged from .089 to .120. Clearly, given the alternative, the estimated
theta scores would be more attractive to researchers in these conditions. In addition, in cases
where the Type I error rates were grossly inflated, such as conditions 31, 32, 47, and 48 in
Table 2 and 79, 80, 95, and 96 in Table 4, the Type I error rate for the number-correct scores
was approximately 200% to 450% higher than the Type I error rate for the estimated theta
scores. An examination of Figure 2 clearly reveals these disparities between the number-correct
scores and estimated theta scores. Although the estimated theta scores did not perform perfectly
within the acceptable limits, these results demonstrate a clear preference for their use in applied
research when certain psychometric conditions exist.
Another prominent result was that of the role of assessment appropriateness on Type I error
rates for both number-correct scores and estimated theta scores. Assessment appropriateness is
defined as the congruence between the reliability of an assessment and the latent construct dis-
tribution of the individuals responding to an assessment (see Figure 1). The results of these
simulations demonstrated that, under conditions of assessment appropriateness, there is no con-
cern as to unacceptable Type I error rates for any psychometric condition or scoring technique.
The complimentary result, that spurious interactions were only observed under some conditions
of assessment inappropriateness, was also true. Embretson (1996) and Kang and Waller (2005)
also identified assessment appropriateness and inappropriateness as the primary factor in their
simulations of spurious interaction effects. Embretson (1996) determined that the degree and
direction of the inappropriateness fully accounted for the nature of the interaction with regard to
treatment groups in a simulated factorial ANOVA. Prior to these studies, Maxwell and Delaney
(1985) demonstrated how various distributional shapes of latent constructs can interact with
assessment difficulty (appropriateness) to result in artificial group mean differences when the
observed scores and latent scores were related through a nonlinear, monotonic relationship.
However, the expectation that this effect would not occur when using estimated theta scores
was not fully supported. This finding suggests that certain conditions may reduce the level of
linearity in the theta–estimated theta relationship. Specifically, an examination of the root mean
square error (RMSE) values in these conditions indicates less precise parameter recovery. This
would suggest that the degree of congruence between the actual and estimated theta scores was
being eroded in exceedingly easy or difficult assessments. Ferrando (2009) explored the issue of
measurement inappropriateness in which the appropriateness of an item can be constrained to a
particular range of the trait continuum, thus degrading the fit of a linear model above and below
the ceiling and floor limits. Indices for determining this range are provided for binary items in a
unidimensional, two-parameter model as well as in multidimensional cases and can provide use-
ful information for determining an acceptable range of measurement appropriateness.
These results also suggest a more complex relationship underlying data structures and the
assessment of moderators in MMR analyses than perhaps previously thought. Paunonen and
Jackson (1988) conducted a simulation in which Type I error rates were compared between ordi-
nary least squares regression (OLS) and principal components regression (PCR) in relation to
the multicollinearity of the predictors. Their results indicated that OLS performed much better
than did PCR with regard to accurate moderator detection, and that linear transformations of the
data had little effect on the Type I error rates for either procedure. In their study, the researchers
simulated random effects data from normal distributions just as in this study, but did not investi-
gate any influences of psychometric characteristics on the data (i.e., difficulty, discrimination,
assessment appropriateness, etc.). Conceptually, Paunonen and Jackson (1988) generated data
as if they were able to collect actual theta scores. It is not surprising, therefore, that their results
were well within the normal Type I error rate for MMR. An examination of the empirical
140 Applied Psychological Measurement 36(2)
Type I error rates for the actual theta scores in Tables 1 through 4 replicate these findings. The
generalizability of the results from Paunonen and Jackson’s simulation are therefore limited
because actual latent trait scores are unobservable (directly). The results of this study indicate
that psychometric characteristics such as assessment appropriateness and the operationalization
of the scores have a significant influence on the performance of MMR analyses.
Limitations and Extensions
Violations of the normality assumption in MMR models were present in these simulations,
lending some caution to the interpretation of the results. Specifically, skewness was observed in
the simulated dependent variables, and the magnitude of the skew was positively related to the
empirical Type I error rate for both number-correct scores and estimated theta scores. In addi-
tion, the results of the Shapiro–Wilk tests indicated that cases of residual nonnormality also
increased with the empirical Type I error rate for both number-correct scores and estimated
theta scores (see Tables 1-4). To examine these results more closely, score distributions were
generated for three separate conditions in which the skew of the scores appeared to be strongly
related to their respective Type I Error rate. Figure 5 contains distributions for the actual theta
scores and the derived scores (estimated theta and number-correct scores) from conditions 12,
44, and 96. These conditions represent cases in which number-correct scores and theta scores
performed equally well (condition 12), number-correct scores performed poorly but theta scores
performed well (condition 44), and neither number-correct scores nor theta scores performed
well (condition 96). The commonality that can be observed here is that the poor performance is
clearly aligned with the nonnormal score distributions, and the greatest amount of nonnormality
is observed under conditions of assessment inappropriateness in which the latent construct dis-
tribution is poorly matched with the test information function (reliability) of the assessment.
These findings create a potential confound such that the empirical Type I error rates may be
partially related to these violations of normality, which were observed as a result of the simula-
tions rather than a controlled factor. Kang and Waller (2005) identified a similar pattern in their
study involving dichotomous data, and in a brief investigation, the researchers were able to cor-
rect some of the effects of nonnormality using the Box-Cox transformation. Specifically, moder-
ate empirical Type I error rates responded well to this correction, but higher rates were not fully
mitigated to the nominal level of .05 (Kang  Waller, 2005).
It is conceivable that variations in observed nonnormality are a result of variations in nonli-
nearity between the overall scale (test) response function and the latent trait. A preliminary
investigation into scatterplots of individual iterations confirmed this relationship (results avail-
able on request). Thus, these relationships could be used as an indicator of a risk for spurious
interactions.2
An additional indication of this effect is the pattern of parameter recovery values
(RMSE) reported in Tables 1 through 4. As nonnormality increased (and in inappropriate assess-
ment conditions), the IRT parameter estimation of the theta scores worsened.
Finally, it is pertinent to discuss the inherent arbitrariness of the scaling metrics and the use
of the GRM to generate the response score matrices. The actual theta scores were generated
without a significant interaction and derived scores were tested for a significant interaction.
Although this practice is common in measurement simulations involving the translation from
IRT scores to other scoring methods (Embretson, 1996; Harwell et al., 1996; Kang  Waller,
2005), an argument can be made that the authors are favoring the GRM scores as a nonarbitrary
trait scale and implicitly satisfying the assumptions of the IRT scale but not the number-correct
score scale. A limiting factor that may arise here is that the results only extend to empirical
situations in which the theta scale is regarded as the true metric. This limitation highlights the
Morse et al. 141
possibility that the interaction null hypothesis is scale specific and the findings may not gener-
alize to situations in which another metric is considered to be the true measurement scale.3
Conclusion
Overall, the results of this study provide support for the use of IRT for the operationalization of
latent constructs. However, this support is not ubiquitous and the direct utility of IRT-derived
theta scores for parametric statistics is limited to suboptimal psychometric conditions (i.e.,
assessment inappropriateness). These conditions resulted in ceiling and floor effects, which can
be identified in preliminary data analyses; however, some evidence indicates that extreme cases
will not be fully corrected with typical score transformations.
If a researcher can demonstrate that the reliability of the assessment and the distribution of
construct scores for the individuals are reasonably matched, there is no evidence here that the
Type I error rate will reach an unacceptable level for any scoring technique. However,
Figure 5. Score distributions for conditions 12, 44, and 96
142 Applied Psychological Measurement 36(2)
Embretson (1996) cautioned that these conditions are difficult to anticipate, and the use of theta
estimates could be justified as a default measurement technique to assuage any concerns.
Furthermore, De Boeck and Wilson (2004) suggested that experimental hypotheses can be
tested within an appropriate IRT model. This integration of measurement and experimental test-
ing could provide another avenue for greater decision-making accuracy.
The investigation of the performance of polytomous IRT models in a variety of contexts is
still an important avenue of measurement research. Harwell and Gatti (2001) specifically iden-
tified a need for a deeper understanding of the properties of the scores generated from complex
models such as the GRM as well as various item and scale properties that can influence those
scores. The simulations conducted in this study represent an important step forward in filling
these needs.
Appendix
The error variance term can be derived in the following manner. First, the predictor variables u1
and u2 and the criterion variable u3 are normally distributed with a mean and standard deviation
equal to N(0.00, 1.00). Because the standard deviation is simply the square root of the variance,
the variance of the predictor and criterion variables is equal to one. Given these conditions, the
following derivation gives the error term for the regression models:
s2
u3
= b2
1s2
u1
+ b2
1s2
u2
+ s2
e; ðA1aÞ
where s2
u1
, s2
u2
, and s2
u3
are all equal to 1.00; therefore,
1 = b2
1 + b2
1 + s2
e; ðA1bÞ
1 À b2
1 À b2
1 = s2
e; ðA1cÞ
1 À b2
1 + b2
1
À Á
¼ s2
e; ðA1dÞ
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 À b2
1 + b2
1
À Áq
= se: ðA1eÞ
Finally, given that the two levels of regression weights b1 and b2 were simulated as .3 and
.5, Equation 6 can be further reduced to the following form.
u3 = :5 u1ð Þ + :5 u2ð Þ + :7071 eð Þ; ðA2aÞ
u3 = :3 u1ð Þ + :3 u2ð Þ + :9055 eð Þ: ðA2bÞ
In the simulation, the criterion variable u3 was operationalized with Equations A2a and A2b
for the two levels of the regression weights, .3 and .5, respectively. An alternative way of speci-
fying this derivation would be to say that the error associated with each regression model is
being sampled from a N(0.00, 0.7071) and N(0.00, 0.9055) distribution for each level of b.
Authors’ Note
The data presented in this study were generated as a part of the first author’s doctoral dissertation.
Morse et al. 143
Acknowledgments
The authors would like to acknowledge Niels Waller for his assistance with the R programming code for
portions of the simulation. We would additionally like to thank Dr. Paula Popovich, Dr. Jeff Vancouver,
and Dr. Victor Heh for their comments on a previous version of this manuscript.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or pub-
lication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Notes
1. Readers interested in the basic components of IRT models are referred to Embretson and Reise (2000).
2. We would like to thank an anonymous reviewer for calling our attention to this point.
3. We would like to thank Mark Davison for bringing this issue to light in his helpful role as the editor
for this article.
References
Aguinis, H., Beaty, J. C., Boik, R. J.,  Pierce, C. A. (2005). Effect size and power in assessing
moderating effects of categorical variables using multiple regression: A 30-year review. Journal of
Applied Psychology, 90, 94-107.
Aiken, L. S.,  West, S. G. (1991). Multiple regression: Testing and interpreting interactions. Thousand
Oaks, CA: SAGE.
Baker, F. B.,  Kim, S. H. (2004). Item response theory parameter estimation techniques (2nd ed.). New
York, NY: Marcel Dekker.
Borsboom, D. (2008). Latent variable theory. Measurement, 6, 25-53.
Busemeyer, J. R. (1980). Importance of measurement theory, error theory, and experimental design for
testing the significance of interactions. Psychological Bulletin, 88, 237-244.
Cohen, J., Cohen, P., West, S. G.,  Aiken, L. S. (2003). Applied multiple regression/correlation analysis
for the behavioral sciences (3rd ed.). Mahwah, NJ: Erlbaum.
Davison, M. L.,  Sharma, R. (1988). Parametric statistics and levels of measurement. Psychological
Bulletin, 104, 137-144.
Davison, M. L.,  Sharma, R. (1990). Parametric statistics and levels of measurement: Factorial designs
and multiple regression. Psychological Bulletin, 107, 394-400.
De Boeck, P.,  Wilson, M. (2004). A framework for item response models. In P. De Boeck  M. Wilson
(Eds.), Explanatory item response models: A generalized linear and non-linear approach (pp. 3-39).
New York, NY: Springer.
Embretson, S. E. (1996). Item response theory models and spurious interaction effects in factorial ANOVA
designs. Applied Psychological Measurement, 20, 201-212.
Embretson, S. E. (2006). The continued search for non-arbitrary metrics in psychology. American
Psychologist, 61, 50-55.
Embretson, S. E.  DeBoeck, P. (1994). Latent trait theory. In R. J. Sternberg (Ed.), Encyclopedia of
Intelligence (p. 644–647). New York, NY: MacMillan.
Embretson, S. E.,  Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Earlbaum.
Ferrando, P. J. (2009). Difficulty, discrimination, and information indices in the linear factor analysis
model for continuous item responses. Applied Psychological Measurement, 33, 9-24.
144 Applied Psychological Measurement 36(2)
Fields, D. L. (2002). Taking the measure of work: A guide to validated scales for organizational research
and diagnosis. Thousand Oaks, CA: SAGE.
Fischer, G. H. (1995). Derivations of the Rasch model. In G. Fischer  I. Molenaar (Eds.), Rasch models:
Foundations, recent developments, and applications (pp. 15-38). New York, NY: Springer-Verlag.
Gagne, P., Furlow, C.,  Ross, T. (2009). Increasing the number of replications in item response theory
simulations. Educational and Psychological Measurement, 69, 79-84.
Gardner, P. L. (1975). Scales and statistics. Review of Educational Research, 45, 43-57.
Granberg-Rademacker, J. S. (2010). An algorithm for converting ordinal scale measurement data to
interval/ratio scale. Educational and Psychological Measurement, 70, 74-90.
Hambleton, R. K., Swaminathan, H.,  Rogers, H. J. (1991). Fundamentals of item response theory.
Newbury Park, CA: SAGE.
Harwell, M.,  Gatti, G. G. (2001). Rescaling ordinal data to interval data in educational research. Review
of Educational Research, 71, 105-131.
Harwell, M., Stone, C. A., Hsu, T.,  Kirisci, L. (1996). Monte Carlo studies in item response theory.
Applied Psychological Measurement, 20, 101-125.
Ihaka, R.,  Gentleman, R. (1996). R: A language for data analysis and graphics. Journal of
Computational and Graphical Statistics, 5, 299-314.
Kang, S.,  Waller, G. (2005). Moderated multiple regression, spurious interaction effects, and IRT.
Applied Psychological Measurement, 29, 87-105.
Maxwell, S.,  Delaney, H. (1985). Measurement and statistics: An examination of construct validity.
Psychological Bulletin, 97, 85-93.
McNemar, Q. (1969). Psychological statistics (4th ed.). New York, NY: Wiley.
Meade, A. W., Lautenschlager, G. J.,  Johnson, E. C. (2007). A Monte Carlo examination of the
sensitivity of the differential functioning of items and tests framework for tests of measurement
invariance with Likert data. Applied Psychological Measurement, 31, 430-455.
Mislevy, R. J.,  Stocking, M. L. (1989). A consumer’s guide to LOGIST and BILOG. Applied
Psychological Measurement, 13, 57-75.
Muraki, E.,  Bock, R. D. (2003). PARSCALE: IRT item analysis and test scoring for rating-scale data
(Version 4.1) [Computer software]. Lincolnwood, IL: Scientific Software International.
Ostini, R.,  Nering, M. L. (2006). Polytomous item response theory models. Thousand Oaks, CA: SAGE.
Paunonen, S. V.,  Jackson, D. N. (1988). Type I error rates for moderated multiple regression. Journal of
Applied Psychology, 73, 569-573.
Perline, R., Wright, B. D.,  Wainer, H. (1979). The Rasch model as additive conjoint measurement.
Applied Psychological Measurement, 3, 237-255.
R Development Core Team. (2008). R: A language and environment for statistical computing. Vienna,
Austria: R Foundation for Statistical Computing. Retrieved from http://www.R-project.org
Reise, S. P., Ainsworth, A. T.,  Haviland, M. G. (2005). Item response theory: Fundamentals,
applications, and promise in psychological research. Current Directions in Psychological Science, 14,
95-101.
Reise, S. P.,  Haviland, M. G. (2005). Item response theory and the measurement of clinical change.
Journal of Personality Assessment, 84, 228-238.
Reise, S. P.,  Waller, G. (2003). How many IRT parameters does it take to model psychopathology
items? Psychological Methods, 8, 164-184.
Robey, R. R.,  Barcikowski, R. S. (1992). Type I error and the number of iterations in Monte Carlo
studies of robustness. British Journal of Mathematical and Statistical Psychology, 45, 283-288.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika
(Monograph Suppl. 17), 35, 139.
Samejima, F. (1996). The graded response model. In W. J. van der Linden  R. K. Hambleton (Eds.),
Handbook of modern item response theory (pp. 85-100). New York, NY: Springer.
Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677-680.
Stine, W. W. (1989). Meaningful inference: The role of measurement in statistics. Psychological Bulletin,
105, 147-155.
Morse et al. 145
Stocking, M. L. (1987). Two simulated feasibility studies in computerised adaptive testing. Applied
Psychology: An International Review, 36, 263-277.
Stone, C. A. (1992). Recovery of marginal maximum likelihood estimates in the two-parameter logistic
response model: An evaluation of MULTILOG. Applied Psychological Measurement, 16, 1-6.
Takane, Y.,  De Leeuw, J. (1987). On the relationship between item response theory and factor analysis
of discretized variables. Psychometrika, 52, 392-408.
Wainer, H. (1982). Robust statistics: A survey and some prescriptions. In G. Keren (Ed.), Statistical and
methodological issues in psychology and social sciences research (pp. 187-214). Hillsdale, NJ:
Erlbaum.
Witt, L. A. (1998). Enhancing organizational goal congruence: A solution to organizational politics.
Journal of Applied Psychology, 83, 666-674.
146 Applied Psychological Measurement 36(2)

More Related Content

What's hot

Cannonical Correlation
Cannonical CorrelationCannonical Correlation
Cannonical Correlationdomsr
 
Lecture note 2
Lecture note 2Lecture note 2
Lecture note 2sreenu t
 
Nonparametric Methods and Evolutionary Algorithms in Genetic Epidemiology
Nonparametric Methods and Evolutionary Algorithms in Genetic EpidemiologyNonparametric Methods and Evolutionary Algorithms in Genetic Epidemiology
Nonparametric Methods and Evolutionary Algorithms in Genetic EpidemiologyColleen Farrelly
 
IDBR-DP-TaverneLambert
IDBR-DP-TaverneLambertIDBR-DP-TaverneLambert
IDBR-DP-TaverneLambertCedric Taverne
 
DESIGN METHODOLOGY FOR RELATIONAL DATABASES: ISSUES RELATED TO TERNARY RELATI...
DESIGN METHODOLOGY FOR RELATIONAL DATABASES: ISSUES RELATED TO TERNARY RELATI...DESIGN METHODOLOGY FOR RELATIONAL DATABASES: ISSUES RELATED TO TERNARY RELATI...
DESIGN METHODOLOGY FOR RELATIONAL DATABASES: ISSUES RELATED TO TERNARY RELATI...ijdms
 
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...ijiert bestjournal
 
Ash bus 308 week 4 quiz
Ash bus 308 week 4 quizAsh bus 308 week 4 quiz
Ash bus 308 week 4 quizFaarooqkhaann
 
Recommender system
Recommender systemRecommender system
Recommender systemBhumi Patel
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis緯鈞 沈
 
Ash bus 308 week 4 quiz
Ash bus 308 week 4 quizAsh bus 308 week 4 quiz
Ash bus 308 week 4 quizuopassignment
 
Artikel Original Uji Sobel (Sobel Test)
Artikel Original Uji Sobel (Sobel Test)Artikel Original Uji Sobel (Sobel Test)
Artikel Original Uji Sobel (Sobel Test)Trisnadi Wijaya
 
Generalized Additive and Generalized Linear Modeling for Children Diseases
Generalized Additive and Generalized Linear Modeling for Children DiseasesGeneralized Additive and Generalized Linear Modeling for Children Diseases
Generalized Additive and Generalized Linear Modeling for Children DiseasesQUESTJOURNAL
 

What's hot (18)

Cannonical Correlation
Cannonical CorrelationCannonical Correlation
Cannonical Correlation
 
ESTIMATING R 2 SHRINKAGE IN REGRESSION
ESTIMATING R 2 SHRINKAGE IN REGRESSIONESTIMATING R 2 SHRINKAGE IN REGRESSION
ESTIMATING R 2 SHRINKAGE IN REGRESSION
 
Lecture note 2
Lecture note 2Lecture note 2
Lecture note 2
 
Nonparametric Methods and Evolutionary Algorithms in Genetic Epidemiology
Nonparametric Methods and Evolutionary Algorithms in Genetic EpidemiologyNonparametric Methods and Evolutionary Algorithms in Genetic Epidemiology
Nonparametric Methods and Evolutionary Algorithms in Genetic Epidemiology
 
IDBR-DP-TaverneLambert
IDBR-DP-TaverneLambertIDBR-DP-TaverneLambert
IDBR-DP-TaverneLambert
 
Canonical correlation
Canonical correlationCanonical correlation
Canonical correlation
 
DESIGN METHODOLOGY FOR RELATIONAL DATABASES: ISSUES RELATED TO TERNARY RELATI...
DESIGN METHODOLOGY FOR RELATIONAL DATABASES: ISSUES RELATED TO TERNARY RELATI...DESIGN METHODOLOGY FOR RELATIONAL DATABASES: ISSUES RELATED TO TERNARY RELATI...
DESIGN METHODOLOGY FOR RELATIONAL DATABASES: ISSUES RELATED TO TERNARY RELATI...
 
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
 
Ash bus 308 week 4 quiz
Ash bus 308 week 4 quizAsh bus 308 week 4 quiz
Ash bus 308 week 4 quiz
 
Ash bus 308 week 4 quiz
Ash bus 308 week 4 quizAsh bus 308 week 4 quiz
Ash bus 308 week 4 quiz
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Ash bus 308 week 4 quiz
Ash bus 308 week 4 quizAsh bus 308 week 4 quiz
Ash bus 308 week 4 quiz
 
cca stat
cca statcca stat
cca stat
 
Artikel Original Uji Sobel (Sobel Test)
Artikel Original Uji Sobel (Sobel Test)Artikel Original Uji Sobel (Sobel Test)
Artikel Original Uji Sobel (Sobel Test)
 
Generalized Additive and Generalized Linear Modeling for Children Diseases
Generalized Additive and Generalized Linear Modeling for Children DiseasesGeneralized Additive and Generalized Linear Modeling for Children Diseases
Generalized Additive and Generalized Linear Modeling for Children Diseases
 
AJSR_23_01
AJSR_23_01AJSR_23_01
AJSR_23_01
 
Aussem
AussemAussem
Aussem
 

Viewers also liked

LOOKBOOK 1 COMPLETE
LOOKBOOK 1 COMPLETELOOKBOOK 1 COMPLETE
LOOKBOOK 1 COMPLETEJessica Rowe
 
Tableau-Salesforce_Topic5_Setup SSO
Tableau-Salesforce_Topic5_Setup SSOTableau-Salesforce_Topic5_Setup SSO
Tableau-Salesforce_Topic5_Setup SSOMathieu Emanuelli
 
5ustainable paths presentation
5ustainable paths presentation 5ustainable paths presentation
5ustainable paths presentation Philip Harfield
 
Visie_-_Big_Data_voor_energie_en_ultilities_sector_v1.0._docx
Visie_-_Big_Data_voor_energie_en_ultilities_sector_v1.0._docxVisie_-_Big_Data_voor_energie_en_ultilities_sector_v1.0._docx
Visie_-_Big_Data_voor_energie_en_ultilities_sector_v1.0._docxMarc Govers
 
Data management-interview ManageIT
Data management-interview ManageITData management-interview ManageIT
Data management-interview ManageITMarc Govers
 
CHARYASS PHARMACY PROFILE 2
CHARYASS PHARMACY PROFILE 2CHARYASS PHARMACY PROFILE 2
CHARYASS PHARMACY PROFILE 2Charles Nkrumah
 
Science vs. Sorcery: A SXSW proposal
Science vs. Sorcery: A SXSW proposalScience vs. Sorcery: A SXSW proposal
Science vs. Sorcery: A SXSW proposalSarah Rose Cavanagh
 
テキスト分析を用いたΩ型経営の検証
テキスト分析を用いたΩ型経営の検証テキスト分析を用いたΩ型経営の検証
テキスト分析を用いたΩ型経営の検証oonishisatomi
 

Viewers also liked (13)

LOOKBOOK 1 COMPLETE
LOOKBOOK 1 COMPLETELOOKBOOK 1 COMPLETE
LOOKBOOK 1 COMPLETE
 
Basic electronics
Basic electronicsBasic electronics
Basic electronics
 
ASHSCREEN
ASHSCREENASHSCREEN
ASHSCREEN
 
Final3
Final3Final3
Final3
 
Emily t
Emily tEmily t
Emily t
 
Tableau-Salesforce_Topic5_Setup SSO
Tableau-Salesforce_Topic5_Setup SSOTableau-Salesforce_Topic5_Setup SSO
Tableau-Salesforce_Topic5_Setup SSO
 
5ustainable paths presentation
5ustainable paths presentation 5ustainable paths presentation
5ustainable paths presentation
 
Visie_-_Big_Data_voor_energie_en_ultilities_sector_v1.0._docx
Visie_-_Big_Data_voor_energie_en_ultilities_sector_v1.0._docxVisie_-_Big_Data_voor_energie_en_ultilities_sector_v1.0._docx
Visie_-_Big_Data_voor_energie_en_ultilities_sector_v1.0._docx
 
Data management-interview ManageIT
Data management-interview ManageITData management-interview ManageIT
Data management-interview ManageIT
 
PowerPoint
PowerPointPowerPoint
PowerPoint
 
CHARYASS PHARMACY PROFILE 2
CHARYASS PHARMACY PROFILE 2CHARYASS PHARMACY PROFILE 2
CHARYASS PHARMACY PROFILE 2
 
Science vs. Sorcery: A SXSW proposal
Science vs. Sorcery: A SXSW proposalScience vs. Sorcery: A SXSW proposal
Science vs. Sorcery: A SXSW proposal
 
テキスト分析を用いたΩ型経営の検証
テキスト分析を用いたΩ型経営の検証テキスト分析を用いたΩ型経営の検証
テキスト分析を用いたΩ型経営の検証
 

Similar to Morse et al 2012

ProjectWriteupforClass (3)
ProjectWriteupforClass (3)ProjectWriteupforClass (3)
ProjectWriteupforClass (3)Jeff Lail
 
artigo correlação policorica x correlaçãoperson pdf
artigo correlação policorica x correlaçãoperson pdfartigo correlação policorica x correlaçãoperson pdf
artigo correlação policorica x correlaçãoperson pdflarissaxavier60
 
STRUCTURAL EQUATION MODEL (SEM)
STRUCTURAL EQUATION MODEL (SEM)STRUCTURAL EQUATION MODEL (SEM)
STRUCTURAL EQUATION MODEL (SEM)AJHSSR Journal
 
Assessing relative importance using rsp scoring to generate
Assessing relative importance using rsp scoring to generateAssessing relative importance using rsp scoring to generate
Assessing relative importance using rsp scoring to generateDaniel Koh
 
Assigning And Combining Probabilities In Single-Case Studies
Assigning And Combining Probabilities In Single-Case StudiesAssigning And Combining Probabilities In Single-Case Studies
Assigning And Combining Probabilities In Single-Case StudiesZaara Jensen
 
Slides sem on pls-complete
Slides sem on pls-completeSlides sem on pls-complete
Slides sem on pls-completeDr Hemant Sharma
 
Hierarchical clustering and topology for psychometric validation
Hierarchical clustering and topology for psychometric validationHierarchical clustering and topology for psychometric validation
Hierarchical clustering and topology for psychometric validationColleen Farrelly
 
Relationships Among Classical Test Theory and Item Response Theory Frameworks...
Relationships Among Classical Test Theory and Item Response Theory Frameworks...Relationships Among Classical Test Theory and Item Response Theory Frameworks...
Relationships Among Classical Test Theory and Item Response Theory Frameworks...AnusornKoedsri3
 
journal in research
journal in research journal in research
journal in research rikaseorika
 
research journal
research journalresearch journal
research journalrikaseorika
 
published in the journal
published in the journalpublished in the journal
published in the journalrikaseorika
 
Ejbrm volume6-issue1-article183
Ejbrm volume6-issue1-article183Ejbrm volume6-issue1-article183
Ejbrm volume6-issue1-article183Soma Sinha Roy
 
On Confidence Intervals Construction for Measurement System Capability Indica...
On Confidence Intervals Construction for Measurement System Capability Indica...On Confidence Intervals Construction for Measurement System Capability Indica...
On Confidence Intervals Construction for Measurement System Capability Indica...IRJESJOURNAL
 

Similar to Morse et al 2012 (20)

ProjectWriteupforClass (3)
ProjectWriteupforClass (3)ProjectWriteupforClass (3)
ProjectWriteupforClass (3)
 
artigo correlação policorica x correlaçãoperson pdf
artigo correlação policorica x correlaçãoperson pdfartigo correlação policorica x correlaçãoperson pdf
artigo correlação policorica x correlaçãoperson pdf
 
Sem sample size
Sem sample sizeSem sample size
Sem sample size
 
STRUCTURAL EQUATION MODEL (SEM)
STRUCTURAL EQUATION MODEL (SEM)STRUCTURAL EQUATION MODEL (SEM)
STRUCTURAL EQUATION MODEL (SEM)
 
Assessing relative importance using rsp scoring to generate
Assessing relative importance using rsp scoring to generateAssessing relative importance using rsp scoring to generate
Assessing relative importance using rsp scoring to generate
 
Assigning And Combining Probabilities In Single-Case Studies
Assigning And Combining Probabilities In Single-Case StudiesAssigning And Combining Probabilities In Single-Case Studies
Assigning And Combining Probabilities In Single-Case Studies
 
SENIOR COMP FINAL
SENIOR COMP FINALSENIOR COMP FINAL
SENIOR COMP FINAL
 
Measurement theory
Measurement theoryMeasurement theory
Measurement theory
 
Slides sem on pls-complete
Slides sem on pls-completeSlides sem on pls-complete
Slides sem on pls-complete
 
Hierarchical clustering and topology for psychometric validation
Hierarchical clustering and topology for psychometric validationHierarchical clustering and topology for psychometric validation
Hierarchical clustering and topology for psychometric validation
 
Relationships Among Classical Test Theory and Item Response Theory Frameworks...
Relationships Among Classical Test Theory and Item Response Theory Frameworks...Relationships Among Classical Test Theory and Item Response Theory Frameworks...
Relationships Among Classical Test Theory and Item Response Theory Frameworks...
 
StatsModelling
StatsModellingStatsModelling
StatsModelling
 
Samle size
Samle sizeSamle size
Samle size
 
journals public
journals publicjournals public
journals public
 
journal in research
journal in research journal in research
journal in research
 
research journal
research journalresearch journal
research journal
 
published in the journal
published in the journalpublished in the journal
published in the journal
 
Mixed models
Mixed modelsMixed models
Mixed models
 
Ejbrm volume6-issue1-article183
Ejbrm volume6-issue1-article183Ejbrm volume6-issue1-article183
Ejbrm volume6-issue1-article183
 
On Confidence Intervals Construction for Measurement System Capability Indica...
On Confidence Intervals Construction for Measurement System Capability Indica...On Confidence Intervals Construction for Measurement System Capability Indica...
On Confidence Intervals Construction for Measurement System Capability Indica...
 

Morse et al 2012