This paper examines the concurrent validity and test-retest reliability of the maze task in a sample of college students. The maze task is a form of Curriculum Based Measurement that is typically used to assess reading comprehension in elementary and secondary students, but has not been used with college students. Three visions of the maze task (one-, two-, and three-minute probes) were created and compared to student scores on the Nelson-Denny Reading Test (NDRT), student’s self-report of GPA, and scores on the SAT-Reading, -Math, and –Writing tests. The one-minute probe was found to have the best psychometric properties, with high correlations with the NDRT, GPA, and SAT-Reading, as well as divergent validity with the SAT-Writing test. Implications for use with college students is discussed.
Validity and Reliability of the Maze Task for College Students
1. IDL - International Digital Library Of
Management & Research
Volume 1, Issue 3, Mar 2017 Available at: www.dbpublications.org
International e-Journal For Management And Research-2017
IDL - International Digital Library 1 | P a g e Copyright@IDL-2017
Concurrent Validity and Stability of the Maze
Task in a Sample of College Students
James M. Kuterbach
Dept. of Human Development and Family Studies
Penn State DuBois
DuBois, PA, USA
jmk110@psu.edu
Abstract: This paper examines the concurrent
validity and test-retest reliability of the maze task in a
sample of college students. The maze task is a form of
Curriculum Based Measurement that is typically used
to assess reading comprehension in elementary and
secondary students, but has not been used with college
students. Three visions of the maze task (one-, two-,
and three-minute probes) were created and compared
to student scores on the Nelson-Denny Reading Test
(NDRT), student’s self-report of GPA, and scores on
the SAT-Reading, -Math, and –Writing tests. The
one-minute probe was found to have the best
psychometric properties, with high correlations with
the NDRT, GPA, and SAT-Reading, as well as
divergent validity with the SAT-Writing test.
Implications for use with college students is discussed.
Keywords: Reading assessment, maze task, college
students, learning disabilities
1. INTRODUCTION
With the large influx of students with a learning
disability into colleges and universities, administrators
and evaluators need tools that will aid them in
determining the need for special accommodations for
student populations. While current assessments do a
good job, they are generally lengthy and may be a
deterrent to students seeking out assistance. What is
needed is a brief screening tool that is both a valid and
reliable measure of a student’s academic skills. The
purpose of this study is to evaluate the reliability and
validity of the maze task, a commonly used reading
comprehension curriculum-based assessment tool, in a
sample of college students.
The disability category with the largest proportion of
students continuing on for a postsecondary education
is students with a learning disability (Horn, Berktold,
& Bobbitt, 1999). Students with a learning disability
also make up the fastest growing population of college
students with a disability (Henderson, 2001), with
more than twenty-seven percent of high school
students diagnosed as having a specific learning
disability continuing on for postsecondary education
(Wagner, Newman, Cameto, Garza, & Levine, 2005).
This increase in students with a learning disability at
the postsecondary level is creating new challenges for
educators, evaluators, and administrators.
College disability services administrators report that
students with a learning disability tend to want the
same accommodations at the postsecondary level that
they received in high school (e.g., oral essay exams,
no foreign language, extended time on tests), even
when the documentation of their disability maybe
shaky, at best (McGuire, 2000). One author reports
that the number of overall requests for specific
accommodations had risen by more than 160%
between the years 2000 and 2001 (Ofiesh, Mather, &
Russell, 2005). The required type and
comprehensiveness of assessments at the secondary
and postsecondary levels, as well as the documentation
needed to demonstrate a history of services varies
greatly from institution to institution (Gregg,
Coleman, Davis, Lindstrom, & Hartwig, 2006). Some
students are required to produce documentation on
2. IDL - International Digital Library Of
Management & Research
Volume 1, Issue 3, Mar 2017 Available at: www.dbpublications.org
International e-Journal For Management And Research-2017
IDL - International Digital Library 2 | P a g e Copyright@IDL-2017
their own, while others are required to complete new
evaluations. With comprehensive evaluation lasting
six to eight hours some students find this to be
daunting (Canter, 2004).
The largest segment of students with a learning
disability is those with a reading disorder (Lorry,
2000). Both typically achieving students and learning
disabled students are often overwhelmed by the
amount of reading that is needed at the college level
(Du Boulay, 1999). Even though reading ability has
been found to be a significant predictor of college
freshman grades (Wood, 1982), reading is generally
not directly assessed beyond the secondary level.
While reading is not directly assessed, the results of
reading are indirectly assessed for college students
throughout their college career (Du Boulay, 1999).
Lack of reading skills, even for students without a
diagnosed disability, is one of the biggest problems in
postsecondary education, but problems in reading at
the postsecondary level are generally not identified
until the problem manifests itself in the classroom (Du
Boulay, 1999). Reading disabilities are also the most
likely of all learning disabilities to serve as a basis for
an accommodations claim in higher education (Lorry,
2000). Educators and administrators working with
postsecondary students with reading problems need a
quick, easy method to assess a student’s reading
ability.
A test currently in use in colleges around the country
is the Nelson-Denny Reading Test (NDRT; Brown,
Fishco, & Hanna, 1993). The NDRT is a widely used
test of reading, which includes a vocabulary and a
comprehension section, as well as a reading fluency
measure. The NDRT has been used with college
students in research on reading comprehension
(Nicaise & Gettinger, 1995; Bell & Perfetti, 1994;
Onwuegbuzie & Collins, 2002), as well as for
diagnostic and decision-making purposes (Norman,
Kemper, & Kynette, 1992). The NDRT has been
found to be a good predictor of college freshman
grades (Wood, 1982) and has been used as a criterion
measure for other tests of reading (Hannon &
Daneman, 2006; Wood, Nemeth, & Brooks, 1985),
memory, and cognitive processes (Carver, 1992;
Davis, Bardos, & Woodward, 2006; Millis, Magiliano,
& Todaro, 2006). However, the NDRT takes a total of
35 minutes to administer, with the Comprehension
section taking 20 minutes.
An alternative to norm-based testing is curriculum-
based measurements (CBM). Curriculum-based
measurement has been defined by Deno (1987) as
“any set of measurement procedures that use direct
observation and recording of a student’s performance
… as a basis for gathering information to make
instructional decisions” (in Marston, 1989, p. 62).
Curriculum-based measurement has a long history in
the research literature (Shapiro, Keller, Lutz, Santoro,
& Hintze, 2006). Test-retest reliability coefficients for
reading CBM probes range from .82 to .96, with
interrater reliability of .99 and reliability coefficients
for parallel forms ranging from .84 to .96 (Marston,
1989). Validity studies for CBM in reading have found
correlation coefficients ranging from .73 to .91, with
most coefficients above .80 (Marston, 1989).
One example of a reading CBM is the maze task. The
maze task is a multiple-choice cloze task that students
complete while silently reading a short reading
passage that is rated at their grade level (Fuchs &
Fuchs, 1992). The first sentence of the passage is left
unchanged, but thereafter every seventh word of the
passage is replaced with a three-word forced choice
inside of parenthesis. Two of the words are distractors
and one is the correct word for the sentence. The
number of correct words chosen during the testing
time is the student’s score. The maze task is used to
determine a student’s functioning level, but it can also
be used as a progress-monitoring tool because it can
produce multiple data points in order to chart growth
across the school year. Another advantage that the
maze task has is that it can be given to several students
at the same time (Madelaine & Wheldall, 2004).
3. IDL - International Digital Library Of
Management & Research
Volume 1, Issue 3, Mar 2017 Available at: www.dbpublications.org
International e-Journal For Management And Research-2017
IDL - International Digital Library 3 | P a g e Copyright@IDL-2017
Timed tests have been used to distinguish between
college students with a learning disability from those
who are typically achieving (Ofiesh, Mather, &
Russell, 2005) and another study found similar results
using adults in a clinic setting (Lesaux, Pearson, &
Siegel, 2006). Lack of reading skills is one of the
biggest problems in postsecondary education, but
problems in reading at the postsecondary level are
generally not identified until the problem manifests
itself in the classroom (Du Boulay, 1999). Students
are often overwhelmed by the amount of reading that
is needed at the college level (Du Boulay, 1999).
Some researchers feel that reading is the most
important skill in college (Onwuegbuzie & Collins,
2002).
2. OBJECTIVES
The purpose of this study is to evaluate the reliability
and validity of the maze task, a commonly used reading
comprehension curriculum-based assessment tool, in a
sample of college students. The maze test is a quick,
easy reading comprehension test that can be used for
both determining a student’s functioning level, as well
as for progress monitoring. While the maze task is
commonly used in both primary and secondary schools
it has not been used with college students. The
Standards for Educational and Psychological Testing
(AERA, APA, & NCME, 1999) dictate that when a test
is used in a way in which it has not been validated new
evidence should be collected in order to justify this new
use. Consequently, this study will investigate the
concurrent validity of the maze task in a college sample.
3. METHODOLOGY
Participants
Participants were 141 undergraduate college students
who were enrolled in two upper-level human
development classes. There were 16 male participants
and 125 female participants, between the ages 18 and
50 with a median age of 21. Of those participants
reporting a race 78 percent were white, 7.8 percent
were African-American, 2.1 percent were Asian-
American, 2.1 percent were Latino, and 3.7 percent
indicated they were multi-racial. Participants’ semester
standing ranged from first-semester to ninth-semester,
with a median semester standing of seventh-semester.
Seventy-seven percent of participants were Human
Development and Family Studies majors, with the
remaining participants majoring in other social
sciences, including Psychology and Communication
Sciences and Disorders. Four participants (2.8 percent)
reported having received special education services at
some time during their education.
Materials
The Nelson-Denny Reading Test. The Nelson-Denny
Reading Test Form G (NDRT; Brown, Fishco, &
Hanna, 1993) is a group administered standardized
reading test that assesses a student’s vocabulary,
reading comprehension, and reading rate. The NDRT
is a commonly used measure of silent reading
comprehension (Cirino, Israelian, Morris, & Morris,
2005), with reported test-retest reliability scores
ranging from .76 to .81 (Brown et at., 1993), and has
established correlations between .60 and .69 with other
measures of reading (Murphy, 1995). The NDRT
Reading Comprehension section consists of seven
reading passages and 38 multiple-choice
comprehension questions. Administration time-limit
for the NDRT Reading Comprehension section is 20
minutes (Brown et al., 1993).
The maze task. The maze task is a commonly used
curriculum-based measurement (CBM) used to assess
student reading comprehension in the elementary and
secondary levels of education. Fuchs and Fuchs (1992)
reported correlations between the maze task and the
Reading Comprehension subtest of the Stanford
Achievement Test of .77 in elementary students, while
Espin and Foegen (1996) report correlations between
the maze task and three different measures of
4. IDL - International Digital Library Of
Management & Research
Volume 1, Issue 3, Mar 2017 Available at: www.dbpublications.org
International e-Journal For Management And Research-2017
IDL - International Digital Library 4 | P a g e Copyright@IDL-2017
comprehension to be between .56 and .62 in secondary
students. The creation and administration of the maze
task followed the procedure described by Fuchs and
Fuchs (1992). The maze task requires the participant
to read a passage which has been previously prepared
such that, following the first sentence, every seventh
word has been replaced with a forced-choice of three
possible words (Espin et al., 2001). This study used
three different passages to create probes of 1-minute,
2-minutes, and 3-minutes in length. The passages used
in the study were taken from a textbook used in a
survey of Human Development course (Dacey &
Travers, 2004) and each had a Flesch-Kincaid Grade
Level of 12.0.
Procedure
Participants completed a questionnaire that included
demographic information, education experiences, and
recollection of grade point average (GPA) and scores
of the Scholastic Aptitude Test (SAT). Both the maze
task and the NDRT were administered in a group
setting. Participants were given the maze task probes
first and were timed by the investigator for one-
minute, two-minutes, and three-minutes. The NDRT
Reading Comprehension section was then
administered, with a 20-minute time-limit.
Approximately three weeks later students from one
class (68 participants) completed a retest involving the
three maze task probes.
Analysis
Pearson Product Moment Correlations were performed
between the three maze task probes and the NDRT. In
addition, the maze task probes were correlated with
participants’ recollection of their GPA and SAT
scores. Finally, test-retest reliability was evaluated by
correlating the one-minute probe, two-minute probe,
and three-minute probe maze tasks from the initial
administration with those from the retest
administration. Significance level was initially set a
=0.05; however the Bonferroni correction for
multiple comparisons was employed.
4. OUTCOMES
Independent samples t-test showed no significant
difference between male and female participants in
terms of their scores on the NDRT, one-minute, two-
minute, or three-minute maze task probes. Correlations
above .50 are considered large (Cohen, 1988). Table 1
displays the correlations between the maze task probes
and the criterion measures. All correlations between
maze tasks and the NDRT were significant at the
=.003 level, with the strongest correlation found
between the one-minute probe and the NDRT (r =
.606). In addition, both the one-minute and two-
minute probes correlated moderately with participant
recollection of their GPA (r = .390 and .341,
respectively). Finally, only the one-minute maze task
probe correlated significantly with participant
recollection of their SAT reading score (r = .502).
None of the maze task probes correlated significantly
with participant recollection of SAT mathematics
scores or SAT writing scores.
Test-retest reliabilities can be found in table 2. All
three maze task probes had good test-retest reliability.
The one-minute probe showed the highest reliability,
with a correlation of .952, which is suitable reliability
5. IDL - International Digital Library Of
Management & Research
Volume 1, Issue 3, Mar 2017 Available at: www.dbpublications.org
International e-Journal For Management And Research-2017
IDL - International Digital Library 5 | P a g e Copyright@IDL-2017
for making diagnostic decisions (Salvia & Ysseldyke,
2003).
CONCLUSION
This research demonstrates that the maze task has the
psychometric properties necessary to be used to
determine reading comprehension ability in college
students. This study examined one-minute, two-
minute, and three-minute probes and found that the
one-minute probe had the best psychometric
properties, with the highest test-retest reliability,
highest correlation with the NDRT, self-reported
GPA, and SAT-Reading scores, and good divergent
validity with SAT-Writing scores. While the
divergent validity between the maze task and the SAT-
Math score was not as robust, the correlation would
not be described as high (Cohen, 1988). These results
give evidence to the idea that the maze task could be
used for determining reading comprehension levels in
college students, as well as for progress monitoring of
college students with a reading disability.
REFERENCES
Bell, L. C. & Perfetti, C. A. (1994). Reading skill:
Some adult comparisons. Journal of Educational
Psychology, 86, 244-255.
Brown, J. I., Fishco, V. V., & Hanna, G. (1993).
Nelson-Denny Reading Test: Manual for scoring
and interpretation. Chicago, IL: Riverside.
Canter, A. (2004). A problem-solving model for
improving student achievement. Principal
Leadership Magazine, 5. Retrieved from
http://www.naspcenter.org/principals/nassp_probso
lve.html
Carver, R. P. (1992). Reliability and validity of the
speed of thinking test. Educational and
Psychological Measurement, 52, 125-134.
Cirino, P. T., Israelian, M. K., Morris, M. K., &
Morris, R. D. (2005). Evaluation of the double-
deficit hypothesis in college students referred for
learning difficulties. Journal of Learning
Disabilities, 38, 29-44.
Cohen, J. (1988). Statistical power analysis for the
behavioral sciences (2nd
ed.). Hillsdale, NJ:
Erlaum.
Davis, A. S., Bardos, A. N., & Woodward, K. M.
(2006). Concurrent validity of the general ability
measure for adults (GAMA) with sudden-onset
neurological impairment. International Journal of
Neuroscience, 116, 1215-1221.
Du Boulay, D. (1999). Argument in reading: What
does it involve and how can students become better
critical readers? Teaching in Higher Education, 4,
147-162.
Fuchs, L.S., & Fuchs, D. (1992). Identifying a
measure for monitoring student reading progress.
School Psychology Review, 21, 45-58.
Greg, N., Coleman, C., Davis, M., Lindstrom, W., &
Hartwig, J. (2006). Critical issues for the diagnosis
of learning disabilities in the adult population.
Psychology in the Schools, 43, 889-899.
Hannon, B. & Daneman, M. (2006). What do test of
reading comprehension ability such as the VSAT
really measure?: A componential analysis. In A. V.
Mittel (Ed.) Focus on educational psychology (pp.
105-146). Hauppauge, NY: Nova Science
Publishers.
Henderson, C. (2001). College freshman with
disabilities: A statistical profile. HEATH Resource
Center, American Council on Education, U.S.
Department of Education.
Horn, L., Berktold, J. & Bobbitt, L. (1999). Students
with disabilities in postsecondary education: A
profile of preparation, participation, and
outcomes. Retrieved from
http://nces.ed.gov/pubs99/1999187.pdf
Lesaux, N. K., Pearson, M. R., & Siegel, L. S. (2006).
The effects of timed and untimed testing
6. IDL - International Digital Library Of
Management & Research
Volume 1, Issue 3, Mar 2017 Available at: www.dbpublications.org
International e-Journal For Management And Research-2017
IDL - International Digital Library 6 | P a g e Copyright@IDL-2017
conditions on the reading comprehension
performance of adults with reading disabilities.
Reading and Writing, 19, 21-48.
Lorry, B. J. (2000). Language-based learning
disabilties. In M. Gordon and S. Keiser (Eds.)
Accommodations in higher education under the
Americans with disabilities act (ADA): A no-
nonsense guide for clinicians, educators,
administrators, and lawyers. (pp. 20-45). New
York, NY: The Guilford Press.
Madelaine, A. & Wheldall, K. (2004). Curriculum-
based measurement of reading: Recent advances.
International Journal of Disability, Development
and Education, 51, 57-82.
Marston, D. B. (1989). A curriculum-based
measurement approach to assessing academic
performance: What is it and why do it. In M. R.
Shinn (Ed.) Curriculum-based measurement:
Assessing Special Children. (pp. 18-78). New
York, NY: The Guilford Press.
McGuire, J. (2000). Educational accommodations: A
university administrator’s view. In M. Gordon and
S. Keiser (Eds.) Accommodations in higher
education under the Americans with disabilities act
(ADA): A no-nonsense guide for clinicians,
educators, administrators, and lawyers. (pp. 20-
45). New York, NY: The Guilford Press.
Millis, K., Magliano, J., & Todaro, S. (2006).
Measuring discourse-level processes with verbal
protocols and latent semantic analysis. Scientific
Studies of Reading, 10, 225-240.
Murphy, S. (1995). An analysis of the construct and
predictive validity of the CPT-R and Nelson
Denny tests. Unpublished manuscript, Rose State
College, Midwest City, Oklahoma.
Nicaise, M. & Gettinger, M. (1995). Fostering reading
comprehension in college students. Reading
Psychology, 16, 283-337.
Norman, S., Kemper, S., & Kynette, D. (1992).
Adults’ reading comprehension: Effects of
syntactic complexity and working memory.
Journal of Gerontology, 47, 258-265.
Ofiesh, N., Mather, N., & Russell, A. (2005). Using
speeded cognitive, reading, and academic
measures to determine the need for extended test
time among university students with learning
disabilities. Journal of Psychoeducational
Assessment, 23, 35-52.
Onwuegbuzie, A. J. & Collins, K. M. (2002). Reading
comprehension among graduate students.
Psychological Reports, 90, 879-882.
Salvia, J. & Ysseldyke, J. E. (2003). Assessment in
special and inclusive education (9th
edition).
Boston, MA: Houghton Mifflin Company.
Shapiro, E. S., Keller, M. A., Lutz, J. G., Santoro,
L.E., & Hintze, J. M. (2006). Curriculum-based
measures and performance on state assessment and
standardized tests: Reading and math performance
in Pennsylvania. Journal of Psychoeducational
Assessment, 24, 19-35.
Wagner, M., Newman, L., Cameto, R., Garza, N., &
Levine, P. (2005). After high school: A first look at
the postschool experiences of youth with
disabilities. A report from the National
Longitudinal Transition Study-2. Menlo Park, CA:
SRI International.
Wood, P. H. (1982). The Nelson-Denny Reading Test
as a predictor of college freshman grades.
Educational and Psychological Measurement, 42,
575-583.
Wood, P. H., Nemeth, J. S., & Brooks, C. C. (1985).
Criterion-related validity of the Degrees of
Reading Power Test (Form CP-1A). Educational
and Psychological Measurement, 45, 965-969.