SlideShare a Scribd company logo
LONGITUDINAL ASSESSMENT
OF CRITICAL THINKING
IN COLLEGE
WHAT MEASURES ASSESS
CURRICULAR IMPACT?
Marcia Mentkowski
Glen Rogers
Office of Research & Evaluation
Alverno College
Findings reported here are
based upon a study funded by
a grant from the National
Institute of Education (NIE-G-
77-0058), as one part of a
larger research project. Tamar
Ben-Ur assisted with the
statistical analyses reported in
this paper.
Paper presented at the annual
meeting of The Mid-Western
Educational Research
Association in Chicago,
October 1985.
Longitudinal Assessment
of Critical Thinking in
College
What Measures Assess
Curricular Impact?
Marcia Mentkowski Glen Rogers
Educational Research and Evaluation
ALVERNO COLLEGE
Milwaukee, WI
This publication is available from:
Alverno College Institute
3400 South 43rd
Street
PO Box 343922
Milwaukee, WI 53234-3922
Phone: 414-382-6000
www.alverno.edu
Graphic Design: Lynn Chabot-Long, Project Specialist, Educational Research and Evaluation
 Copyright 1985. Alverno College Institute, Milwaukee, Wisconsin. All rights reserved under U.S., International and
Universal Copyright Conventions. Reproduction in part or whole by any method is prohibited by law.
Longitudinal Assessment of Critical Thinking in College Page i
ABSTRACT
Longitudinal results from a college outcomes study were used to judge one widely used and three new
measures of critical thinking. Each measure was reviewed for how well it measured longitudinal vs.
cross-sectional change, for whether it was associated with progress in the curriculum, for its association
to background variables (e.g., High school GPA), and for whether it assessed change in critical thinking
for both traditional-age and older adult students. Two of the newer measures are better college
outcomes measures on these criteria. Interrelationships among the measures suggest that critical
thinking is made up of multiple components, so researchers and educators are cautioned to improve
both production and recognition measures.
INTRODUCTION
Objectives
This study explores (1) the validity of a set of four critical thinking measures, (2) the possibility of an
expanded domain of critical thinking abilities, (3) the degree and pattern of change in critical thinking
abilities in an undergraduate population, and (4) the relationship between progress in the curriculum,
which is outcome-centered and performance-based, and change in critical thinking abilities over the
course of four years.
Perspective and Theoretical Background
Educators and researchers alike are strengthening their commitment to critical thinking as a college
outcome. The American Association of Colleges, in a recent report redefining the baccalaureate, has
identified critical thinking as one of the outcomes all colleges should prepare students to demonstrate
(Integrity in the college curriculum, February 1985). In 1984, major conferences at Wingspread, Harvard,
Sonoma State and The University of Chicago addressed critical thinking assessment. Accrediting agencies
(e.g., North Central, COPA) are also calling for assessment of broad college outcomes. Recently,
attention has turned to both the assessment of individual student learning and to the institutional
evaluation of student outcomes (Marchese, 1985; Ewell, 1980; Marchese, 1985, Mentkowski & Doherty,
1984).
The need for measures that assess broad college outcomes and that are not limited to amount of
knowledge is critical. But, how are administrators and faculty to do so without instruments that
contribute to cross-college measurement, and that have face validity for liberal arts faculty? Many
liberal arts faculty do not just focus on knowledge measured by recognition tasks like SATs or GREs, but
instead focus on higher order cognitive processes that are measured by production tasks and that have
been conceptualized as central to newer definitions of critical thinking (Lipman, 1984; Nickerson, 1984;
Paul, 1984; Sigel, 1984; Sternberg, 1983; Winter & McClelland, 1978). Moreover, there is a reciprocal
relation between assessment and teaching. For example, Frederiksen (1984) observes in the American
Psychologist that use of the newer production measures could also encourage teaching of higher level
cognitive skills and provide practice with feedback. Several researchers have designed newer measures,
but how well do they work as college outcomes measures?
METHOD
Critical Thinking Measures
A total of 12 cognitive-developmental, learning style and generic ability measures were administered in
a longitudinal study of college outcomes (Mentkowski & Doherty, 1983). Four of these are the focus of
this analysis of critical thinking measures. Of these, two involve, at least partially, the more typical
recognition tasks whereby the participant must select the correct answer:
Longitudinal Assessment of Critical Thinking in College Page 2
(1) Watson and Glaser’s Critical Thinking Appraisal (Watson & Glaser, 1964) is a standardized, easily
used recognition measure of component critical thinking abilities. The Form ZM subscales
assessing quality of inferences, recognition of assumptions, and deductive reasoning were
administered in this study. These subscales are henceforth designated as the Inference,
Recognition of Assumptions, and Deduction subscales, respectively. The interpretation subscale
and the evaluation of argument subscale were not administered.
(2) Test of Cognitive Development (Renner, Fuller, Lockhead, Tomlinson-Keasey & Campbell, 1976)
is a newer series of paper and pencil tasks designed to measure formal operational thinking as
defined by Fiaget. Although participant responses for several tasks are scored from multiple
choice answers, written justification of the answers often is considered in scoring as well.
Scoring of the flexibility of rods task focuses upon participants’ written explanation of their
reasoning. The theoretical justification for the test suggests sophisticated cognitive processes
are being measured.
The other two critical thinking measures administered are pure production measures:
(3) Analysis of Argument (Stewart, 1977a, 1977b) is designed to measure flexibility in arguing (with
consistency) for opposite positions on a specified issue. In response to the stimulus, an
emotional one-sided essay, participants write two essays. The “Attack” essay is scored for
whether it has a central organizing principle and whether it focuses on the faulty logic of the
stimulus essay. The “Defense” essay is scored for whether it reflects a modified or qualified
endorsement of the counter-attitudinal stimulus essay.
(4) Test of Thematic Analysis (Winter, 1976; Winter & McClelland, 1978) is designed to measure the
ability to form complex concepts and communicate them. The task requires participants to
compare and contrast two sets of essays according to the themes in the two essay sets. Set A
includes three stimuli essays about four sentences long, as does Set B. Appendix A gives the
brief titles of the nine scoring criteria.
Study Design and Inventory Administration
All undergraduates who entered a women’s college in 1976 and 1977 were recruited as volunteer
participants in the longitudinal study. In 1977, a Weekend College timeframe was offered for the first
time. Thus, the 1977 cohort includes students from both the weekend and weekday timeframes. The
longitudinal cohorts were assessed at entrance, two years after entrance, and three and one-half years
after entrance.
As a cross-sectional comparison group, the entire 1978 weekday graduating class was recruited for
assessment at graduation. In order to control for attrition effects, the entrance scores of the students in
the 1977 weekday longitudinal cohort who did not graduate were deleted when they were cross-
Longitudinal Assessment of Critical Thinking in College Page 3
sectionally compared to the scores of the 1978 graduates. Measures were administered in large group
sessions.
Data Source, Attrition, and Participation
Alverno College is a women’s college. At the time of the study, the students were predominantly
Caucasian and from one midwest state. Many were first-generation college students. The college
traditionally has served working class students from a large urban area, and has not been highly
selective.
At the first assessment, all of the women who entered in 1976 and 1977 were recruited as participants.
To be considered eligible for recruitment for later longitudinal assessments, the students had to be
currently enrolled and to have completed the prior assessments on at least a subset of the inventories.
The weekend college attracted older students in increased numbers. Longitudinal data analyzed for this
study include all students who participated on all three occasions for a particular instrument. Between
83% to 99% of the students participated at each assessment for at least a subset of the inventories. The
longitudinal data pool, n = 208, included both traditional-age (17–19 years; n = 108) and older students
(20–55 years; n = 100). The graduating class used as a cross-sectional comparison group also included
both traditional age (n = 45) and older students (n = 15).
Since not all participants completed all of the inventories, the number of observations per inventory
varies. The proportion of older and younger participants remains about the same. The lowest number of
replicated observations across three assessments occurred for the Analysis of Argument “Attack” essay,
n = 133, and “Defense” essay, n = 130. This is accounted for by a delayed decision to include the
inventory in the study, which resulted in a reduced number of observations at time 1 for this inventory.
For the other inventories reported here, between 181 and 194 participants completed all three
assessments for the particular inventory. The vast majority of the participants completed all of the
instruments. For the purposes of obtaining internal reliability estimates only, the Analysis of Argument
scores were used for all those who completed the Analysis of Argument inventory at the time of
assessment, even if they did not complete all three assessments. Thus, between 183 and 189
participants were available for these time 2 and time 3 reliability analyses.
Main Analyses Employed
Raw change over occasions of assessment was studied with multivariate analysis of variance for
repeated measures and unequal n’s in an Age X Time factorial design, with the repeated measures on
the Time of assessment. In this analysis, both the linear and the quadratic effects of Time were tested
with orthogonal polynomial contrasts. Weights were used to adjust for the unequal lengths of Time
between the intervals. The association of the critical thinking measures with rate of progress in the
curriculum was measured with correlational analyses. Alpha coefficients were computed to determine
the internal reliability of measures at each time of assessment. The amount of variance accounted for by
scores from the previous administration of each measure was computed to determine test-retest
Longitudinal Assessment of Critical Thinking in College Page 4
predictability. Mentkowski and Strait (1983) have previously reported data analyses of this data set,
which are sometimes referenced here. The present analyses extend the previous analyses by exploring
the Age X Time interaction directly, by focusing entirely upon unadjusted scores, by exploring the
internal reliability of the summated measures, and by analyzing the component criteria for the Test of
Thematic Analysis and Analysis of Argument measures.
RESULTS AND CONCLUSIONS
Reliability
Analysis of Argument post-test scores showed no predictability from Analysis of Argument pre-test
scores. The inter-item reliability analysis of the “Attack” essay’s five scoring criteria (one was dropped
because of multicollinearity) yielded unacceptably low reliability coefficients at times two and three (see
Table 1), even though at least 184 students participated at these administrations. Although we believe
researchers need to accept lower reliability estimates for production measures, the coefficients for the
“Attack” essay’s scoring criteria at time two and time three do not inspire confidence in the current
measurement. Since there appeared to be sufficient variability in the scores for each scoring criterion,
we surmise that our relatively low internal reliability coefficients might be improved by increasing the
number of scoring criteria. Alternatively, a consistently unitary construct may not yet underlie the six
scoring criteria for the “Attack” essay. The internal reliability of the “Defense” essay’s four scoring
criteria was relatively high at each time of assessment (see Table 1).
The Test of Cognitive Development showed good internal reliability for the five item scale (see Table 1).
In addition, pretest scores for the Test of Cognitive Development also accounted for a high proportion of
the variance in the Test of Cognitive Development post-test scores. Averaging across the percent of
variance accounted for by time one in predicting time two and by time two in predicting time three,
29.87 of the variance (R Square) was statistically predictable.
The Test of Thematic Analysis appeared less internally reliable. The low internal reliability may be
partially due to the lack of variance in some of the nine scoring criteria, which were dichotomously
scored. The reliability coefficients for the scale were improved by removing just one scoring criterion,
“making exceptions or qualifications.” Another scoring criterion was removed because of its poor
correlation with the remaining seven criteria and its limited variance. This negatively scored criterion,
“Affect,” is scored if the participant makes a comparison that is based upon her emotional reaction to
the story. In order to prevent inappropriate extreme scores, two scoring criteria that yielded scores with
an extremely restricted variance were also removed, leaving 5 of 9 of the Test of Thematic Analysis
scoring criteria for a summated scale. These procedures yielded somewhat improved internal reliability
coefficients (see Table 1). Appendix A documents the scoring criteria included and excluded from this
exploratory 5 of 9 summated criteria-measure for the Test of Thematic Analysis.
Although internal reliability coefficients were not computed for the Critical Thinking Appraisal subscales,
the proportion of variance accounted for by pre-test scores for each interval confirms the test-retest
Longitudinal Assessment of Critical Thinking in College Page 5
reliability of these subscales. Averaged across the two intervals, pretest scores account for 26.3% of the
Inference post-test scores, for 19.17 of the Recognition of Assumptions post-test scores, and for 31.97
of the Deduction post-test scores. Although the relatively high predictability of the post-test scores from
the pretest scores supports the reliability of this set of Critical Thinking Appraisal measures, the lower
predictability of the post-test scores for the Test of Thematic of Analysis may be only due to individual
differences in response to educational or other influences.
A Face Validity Issue
A qualitative examination of the “Defense” essays written by the students suggested that the students
were role playing the emotional and faulty arguments they were asked to defend. This face validity
concern is consistent with the scoring of the “Defense” essays. At each time of assessment, the scoring
of the “Defense” essays according to the four scoring-criteria yielded very little variance among
students. In the case of the first assessment, for example, 937 “totally” endorsed the presented
argument (which was probably counter to their attitude); 917 presented new arguments supporting the
counter-attitudinal statement; 47 made a modified endorsement of it; and 47 accepted a particular part
of it. In other words, in the “Defense” essay, almost all of the entering freshman did exactly opposite of
what would yield a high score in the standard scoring. Our qualitative observation that the students
were role playing the bad arguments they were asked to defend is also consistent with the lack of
instructions to dissuade against role playing in the Analysis of Argument inventory. We also observe that
our students encounter numerous role playing exercises in their education.
Trends through Time
Our concerns about the face validity of the “Defense” essays and about the internal reliability of the
“Attack” essay scoring criteria are supported by the previously conducted analyses on this data set (see
Mentkowski and Strait, 1983). We know from these analyses that neither the “Attack” essay’s
summated measure nor the “Defense” essay’s summated measure yielded any statistically significant
differences across the times of assessment. Mentkowski and Strait (1983) also found that these
summated measures were not related to measures of progress in the college’s curriculum. The lack of
internal reliability for the “Attack” essay’s scoring criteria suggested an exploratory analysis with the
individual criteria as the unit of analysis. None of these individual scoring criteria yield statistically
significant differences for the Time of the assessment, however.
Given the exploratory nature of this research, the Test of Thematic Analysis was analyzed in more than
one way. Not only was a total score for the Test of Thematic Analysis included (in this case, the 5 of 9
criteria summated measure), but also, three of the scoring criteria derived from this measure were
separately analyzed. Thus, analysis of variance was also performed on (1) the criterion of making
exceptions or qualifications to one’s ascriptions, (2) the criterion of giving examples for observations,
and (3) the negatively scored criterion of affective reaction. It should be noted that the examples
criterion is a component of the Test of Thematic Analysis summated measure (5 of 9 criteria). As is
noted below, analysis of this summated measure and analysis of its component scoring criterion,
Longitudinal Assessment of Critical Thinking in College Page 6
examples, yield very similar results. This is probably largely because the scores yielded by the examples
scoring criterion actually contribute the bulk of the variance to the summated score. Thus, like the
summated scale, the examples scoring criterion does not correlate positively with the exceptions
criterion at any of the times of measurement. Again, the direction of association is always negative,
even though, in this case, it is not statistically significant.
The Test of Thematic Analysis scoring criterion for affect-based comparisons was scored for only 16% of
the essay comparisons. Because of low variance, a Friedman’s distribution-free one-way analysis of
variance procedure for time of assessment was again conducted. It showed no statistically significant
difference, Chi Square (2, 194), F < 1.
The Age X Time repeated measures analysis of variance procedure for unequal n’s and unequal intervals,
which tested for the linear and quadratic effects of Time, were performed on the following seven
measurements: the Test of Cognitive Development summated measure, the Test of Thematic Analysis
summated measure (5 of 9 criteria), the examples and exceptions scoring criteria for the Test of
Thematic Analysis, and the three subscales of the Critical Thinking Appraisal. This subset of seven
measurements, which includes both summated scales and single item scores, yielded many effects. The
Age X Time analyses yielded main effects for Age on 4 of 7 of these measurements (see Table 2). They
also yielded main effects for linear Time on all seven of these measurements. A main effect for quadratic
Time was revealed on the Test of Thematic Analysis (5 of 9 criteria) and the Recognition of Assumption
subscale of the Critical Thinking Appraisal (see Table 2). The lack of any interaction effects between Age
and Time (all F values less than 1) suggests that these main effects of linear Time, and quadratic Time
are relatively general for both age groups.
First, looking at those measures not associated with Age effects, we see that scores on the Test of
Cognitive Development and the scores on the Deduction subscale of the Critical Thinking Appraisal
improve linearly with Time (see Table 3). Recall that the analysis of variance procedure (see Table 2)
adjusted for unequal interval lengths.
Next, looking at those measures with Age main effects (see Table 4), older age students show a fairly
constant advantage on these measures at each time of assessment. There is no statistically significant
interaction between Age and Time for scores yielded by the exceptions scoring criterion despite the
apparent convergence of the means at time three for the two age groups.
The quadratic effect of Time for the Test of Thematic Analysis summated score (5 of 9) is reflected in a
drop on this measure at time three (see Table 4), which is confirmed by a posteriori analyses. There is
also a linear decrease with Time (see Table 2). The examples scoring criterion, which is a component of
this summated measure, likewise shows a linear (see Table 2) decrease with Time (see Table 4). In
contrast, the exceptions scoring criterion shows a linear (see Table 2) increase with Time (see Table 4).
Thus, by exploring the scoring criterion components of the Test of Thematic Analysis, we have found
trends in opposite directions.
Longitudinal Assessment of Critical Thinking in College Page 7
The Inference subscale of the Critical Thinking Appraisal shows a linear (see Table 2) increase with Time
(see Table 4). Like the linear increases with Time for the Deduction subscale (see Table 3), these
increases occur generally across our two age cohorts. The Recognition of Assumptions subscale shows
both a linear and quadratic effect (see Table 2). Examination of Table 4 shows an increase on the
Recognition of Assumptions subscale at the second interval, which is confirmed by a posteriori analyses.
Relation of Measures to Curricular Progress
Are these critical thinking and exploratory single item measures that have shown change as a function of
time of assessment sensitive to curricular impact? In preparation for examining the relation between
curriculum progress and performance on the critical thinking measures, we tabulated the measures of
curriculum progress according to two separate cumulative subtotals: one subtotal for the first two years
since enrollment and the other subtotal for the subsequent year and a half. The one curriculum progress
subtotal corresponds to the interval of time between the first and second fieldings of the critical
thinking measures, and the other curriculum progress subtotal corresponds to the interval of time
between the second and third fieldings of the critical thinking measures.
We reasoned that students showing a high degree of curriculum progress in the intervals of the study
would at the end of the intervals show higher performance on the critical thinking measures. Thus,
progress in the curriculum up to the second assessment was correlated with the critical thinking
measures at the second assessment, and progress in the curriculum between the second and third
assessments was correlated with the critical thinking measures at the third assessment.
The measures of curriculum progress employed are conceptually well suited for showing the kind of
student development relevant to growth of critical thinking. The curriculum of Alverno College is
abilities-based; On over 100 assessments Alverno students demonstrate to criteria each of eight broad
abilities set by the faculty (eg. communication, analysis, problem solving, and valuing). These
assessments lead to the credentialing of the student on the sequential and developmentally arranged
“competence level units.” The number of competence level units completed and the number of course
credits completed were employed as two separate measures of progress in the curriculum.
Examination of Table 5 shows that the Test of Cognitive Development scores at time two are positively
correlated with cumulative progress in the curriculum. At time three, the Test of Cognitive Development
scores do not correlate with the intervening progress in the curriculum (see Table 5).
The Test of Thematic Analysis showed a somewhat complex relation to progress in the curriculum. The
Test of Thematic Analysis summated scale (5 of 9 criteria) was negatively associated with curriculum
progress at both the second and third assessments.
While the examples scoring criterion performs similarly to the summated scale, of which it is a
component, the exceptions scoring criterion yields different results. Thus, the examples criterion is
negatively associated with curriculum progress at the second assessment, and, for at least one of the
Longitudinal Assessment of Critical Thinking in College Page 8
measures of curriculum progress, is also negatively associated with curriculum progress at the third
assessment (see Table 5). Again in contrast, the exceptions criterion shows a small positive relation to
progress in the curriculum at the third assessment (see Table 5).
For the three Critical Thinking Appraisal subscales only one of the twelve correlations with progress in
the curriculum is statistically significant (see Table 5). The Test of Cognitive Development and the Test of
Thematic Analysis and its single item subcomponents were related to progress in the curriculum, but the
Critical Thinking Appraisal measures generally were not. Why not?
Level of Thinking, Mastery of That Level, and Habituality of Use
We believe the lack of relationship of the Critical thinking Appraisal with the measures of curriculum
progress flows from the multiple choice format of the Critical Thinking Appraisal. Before we clarify why
we suspect this, we need to distinguish between some broad performance characteristics. From a
developmental perspective we might describe one performance as demonstrating a higher level of
thinking than another. Higher levels of thinking are perhaps more complex, broader in scope, and so on.
Also, a sophisticated thought process could be based upon the coordination of several thought
processes, and might be predicated upon the development of other thought processes.
We can conceptually distinguish the sophistication or level of a thought process from other performance
characteristics. An individual can be said to have mastered a particular thinking process when, at a
certain frequency, they can demonstrate the process of thinking when it is explicitly requested. But,
even if an individual has mastered a level or kind of thinking, they might not habitually employ it. For
example, they may not be called on to use it, they may have opposing tendencies, or they may not know
it is expected of them. The distinctions between sophistication of a thinking process, mastery of a
thinking process, and habitual use of a thinking process may help explain why some Critical Thinking
measures that showed development during the college years also showed a relationship to curriculum
progress, while others did not.
Because of the recognition character of the Critical Thinking Appraisal, we do not feel confident in it as a
measure of well-practiced mastery. College students can probably recognize higher levels of thinking
than they can systematically produce. Performance on the Critical Thinking Appraisal may have more to
do with the breadth of the exposure to the thinking processes the test taps, than with a well-practiced
mastery of the thinking processes. Thus, although we are able to show gains on all of the Critical
Thinking Appraisal subscales, these robust gains may indicate that it is relatively easy to show progress
on these recognition based measures.
Even those students progressing slowly in the curriculum may have a breadth of exposure to the
thinking processes required by the Critical Thinking Appraisal. Indeed, the breadth of exposure that
slower progressing students have compared to that of faster progressing students may be of equal value
in recognizing the “correct” answers on the Critical Thinking Appraisal. This assumption would help
Longitudinal Assessment of Critical Thinking in College Page 9
account for the finding that rate of progression in the curriculum is not generally related to subsequent
performance on the Critical Thinking Appraisal subscales.
Nonetheless, we must note that the Recognition of Assumptions subscale gains were most strongly
associated with the second interval in the study. This suggests that a sophisticated critical thinking
process was being measured. And, theoretically, the sophistication required to recognize assumptions of
arguments does seem high enough to account for delayed development in the curriculum. If a later
curriculum exposure is responsible for the delayed development on the Recognition of Assumptions
subscale, it would seem that this argues for expecting to find a relationship with curriculum progress.
But, could it be that both the slower and faster progressing students were both far enough along in the
curriculum to be exposed sufficiently to this thinking process as it was required by the recognition test?
We think so, and can note that the variance in the rate of progression of our students was not
inordinately large.
In explaining the relationship of the Test of Thematic Analysis to progress in the curriculum, we
conversely focus on its characteristic of being a production measure. We believe these curriculum-linked
changes on the Test of Thematic Analysis and its single item criterion-scores are consistent with an
interpretation that focuses on the assumed relationship between production measures and the
measurement of either broadly based mastery or habituality. More specifically, we suspect that mastery
of a way of thinking or habituality in a way of thinking develops incrementally and would not be
sensitive to minimal curriculum exposure. As a result, a range of progress in the curriculum would be
likely to be associated with differential performance on a production measure, which is what we found.
The response format of the Test of Cognitive Development is not purely either recognition or
production, and so it is inappropriate to speculate on how the results for this measure reflect upon our
thesis that production measures capture more habitual tendencies or more well-practiced abilities. We
do, however, gain greater confidence in this measure as a measure of college outcomes because of its
relation to progress in the curriculum. Furthermore, the historical development of this measure in the
Research on Formal Operations does suggest the instrument as a measure of sophisticated and well-
practiced thinking.
Inter-Relation of Critical Thinking Measures/Background
Variables
More generally, we find other evidence that these various instruments are measuring different aspects
of higher-order thinking. First, the Test of Thematic Analysis (the purest production measure) is related
to Age, but the Test of Cognitive Development is not. Second, the Test of Cognitive Development and
the Critical Thinking Appraisal subscales are related to high school GPA at entrance, but the Test of
Thematic Analysis is not. Finally, although these measures are correlated with one another positively
(see Table 6), previously reported factor analyses (see Mentkowski & Strait, 1983) supports distinctions
between the summated measures of critical thinking for the first two assessments.
Longitudinal Assessment of Critical Thinking in College Page 10
Comparison with Cross-Sectional Results
We are compelled to offer another word of caution. Our own cross-sectional analyses would have led to
different conclusions than our longitudinal analyses. The cross-sectional analyses, which controlled for
attrition, showed a “gain” only on the “Defense” score for the Analysis of Argument instrument, F(1,125)
- 7.26, p<.01, and the Inference subscale of the Critical Thinking Appraisal, F(1, 127) - 12.75, p<.001.
Moreover, because the cross-sectional comparison involves the comparison of existing groups (which
differed on high school GPA, for example), even these “gains” would have been potentially spurious.
When we did control statistically for high school GPA in our cross-sectional comparison, the cross-
sectional Inference “gains” were no longer statistically significant. We place greater faith in our
longitudinal changes, of course. We point up this discrepancy between the longitudinal and cross-
sectional results, because we believe it is an object lesson in the problems of interpretation from cross-
sectional findings, even though each method has its place.
Interpreting the New Production Measures
The task of more specifically interpreting changes on Test of Thematic Analysis remains. We believe the
decline on the Test of Thematic Analysis summated measure (5 of 9 criteria) can be attributed to the
decline on the examples criterion. We interpret the tendency of students to give less examples, but to
make more exceptions, as a positive outcome. The increasing tendency of students to make exceptions
in their analysis of the essays suggests that the thinking of the students may be becoming more abstract.
The decreasing tendency of the students to give examples of their abstractions seems somewhat
puzzling at first. In order to explain this, we make the following observations.
Faculty at Alverno College are pioneers in the writing across the curriculum movement. They explicitly
teach students to write for a particular audience, and they have developed standards of performance on
this dimension. Since the format of the Test of Thematic Analysis suggests that researchers are the
audience, we imagine them increasingly reasoning as follows as they repeatedly encounter the Test of
Thematic Analysis: “These researchers assuredly are aware of what the four sentence essays say. I don’t
need to give examples from these essays to make my points, because they already know the examples.
They would be bored by the obviousness of the examples, which would be about as long as the set of
essays.”
We believe styles of writing developed across school programs, as well as across disciplines, may vary.
Thus, we encourage a complex interpretation of the decline and gain on the Test of Thematic Analysis
scores. Future modifications of the instrument should consider, at a minimum, providing directions
which identify a more explicit audience
Although the findings for the Analysis of Argument instrument cast some doubt upon its suitability for
use at this college as an outcomes measure, we believe modifications in the instrument may overcome
these difficulties.
Longitudinal Assessment of Critical Thinking in College Page 11
In regard to the instructions for the “Defense” essay, if they advised against role playing, this might
improve the “Defense” essay’s measurement properties. In effect, students are penalized for role
playing “unsophisticated” thinking, and for empathizing with the perspective of the essay. In regard to
the “Attack” essay’s instructions, we note that they do not specifically request students to analyze the
adequacy of the arguments in the stimulus essay. Instead the instructions ask the students to “argue
against” the stimulus essay. The stimulus essay was an emotional one-sided essay that presented an
unsupported position likely to be counter to the attitudes of the students. In their argument against this
stimulus essay, the students in the present study predominantly tended to present the opposing
position, which would be their own opinion, as opposed to attacking the logic of the essay. The
emotional one-sided stimulus essay may have elicited an emotional need to present the opposing
position.
Thus, with respect to the lack of change on the Analysis of Argument “Attack” scores, one might
speculate that these students have not specifically and habitually learned to give precedence to a logical
analysis of the flaws in their opponent’s arguments; instead, their predominant tendency may have
been to give a positive statement of the opposite position. If the instructions had specifically requested
an analysis of the merits and flaws of the stimulus essay, we may have found developmental differences
on this measure. We are proposing a distinction here between the ability to write, when requested, an
intellectual critique in an emotional environment versus the habitual tendency to write a purely
intellectual critique in an emotional environment.
Implications for Standardization of Measures
We note that Winter, McClelland, and Stewart (1981) reported both longitudinal and cross-sectional
gains for the Analysis of Argument and Test of Thematic Analysis at “Ivy” College. These findings
contrast somewhat with our findings at Alverno College, even though we used the same scoring
procedures. We tend to discount the cross-sectional gains for Analysis of Argument we found because of
our lack of longitudinal results, while we feel encouraged by our perhaps unique bi-directional
longitudinal changes on the single criterion-scores of the Test of Thematic Analysis. Although we would
like to have been able to demonstrate the robustness of the standard scoring and administration
procedures developed at “Ivy” college, we have not been able to show cross-college equivalence for
these measures.
The likelihood that different educational strategies are used in different colleges has wide implications
for using scoring schemes that are assumed to be standard, and requires further consideration. For
instance, colleges may differ in how writing is taught and in how often role playing is pedagogically used,
and these and other stylistic differences in instruction may cause students to interpret differently the
instructions on these production measures. This confounding may greatly complicate the cross-
institutional measurement properties of these measures. Not only may differing educational contexts
elicit different interpretations of how to perform well on an instrument, but also these different
interpretations may be equally reasonable if the instrument itself does not communicate to the student
the preferred type of performance.
Longitudinal Assessment of Critical Thinking in College Page 12
It is also possible that these differing educational strategies reflect differing conceptions of critical
thinking outcomes. For example, at Alverno, role playing is often explicitly used to encourage students
to take tolerantly the perspective of another person or culture. Students are encouraged to consider the
cultural or personal assumptions they make. In contrast, the scoring procedures standardized by “Ivy”
college for the Analysis of Argument “Defense” essay is based upon another conception. The ideal
student portrayed in these scoring criteria. will insist upon maintaining a position consistent with her
own, even as she obligingly defends the good points present in an opponent’s position. There is no
necessary recognition of the cultural supports for one’s own beliefs or of the perhaps equal validity of
another’s views. Another example of possibly different conceptions of critical thinking can be illustrated
with the scoring of the “Attack” essay. The “Ivy” college scoring procedures reflect a preference for a
logical analysis of the stimulus essay. The present college, however, also encourages students to express
their own values and positions as well as encouraging logical analysis.
If either the conception of critical thinking differs across colleges, or, else, the students’ interpretation
based upon their educational context differs, even the direction of “gain” or “decline” on a criterion may
also differ. This concern suggests that a single college population—however reputable—may not serve
well as the major source for standardizing scoring criteria.
In many instances, it would be desirable to try to insure that these production measures elicit similar
interpretations of the task by students from different colleges. Here a distinction needs to be made
between the respondent’s ability to conform to a standard versus the respondent’s predominant
tendencies in unstructured situations. We do not rule out the development of several standardized
scoring systems, or even the particularization of scoring systems to a college’s conception of critical
thinking. What we earnestly recommend for production measures of ability, however, is that they offer
to the respondent a clear idea of the standards that will be applied to their productions, if the
researcher intends to represent their performance as their ability to conform to a standard.
A more unstructured stimulus may be used when the researcher is interested in the predominant
tendencies of the respondents. In such cases, the researcher reeds to be especially sensitive to the
distinction between necessary habits supporting an ability and stylistic divergence. We feel that the
Analysis of Argument and Test of Thematic Analysis do elicit habitual tendencies, but are not yet
sensitive to stylistic divergence. Their instructions draw forth the respondent’s habitual tendencies, but
the standardized scoring reflects a preferred response that does not accommodate alternative modes
and styles of critical thinking.
In general, we recommend fielding both types of production measures, those measuring the ability to
demonstrate mastery to standards and those measuring habitual tendencies. We also recommend
keeping these types of measures separate and distinct. Otherwise, it may not be possible to unconfound
them.
Longitudinal Assessment of Critical Thinking in College Page 13
Educational/Scientific Importance of the Study
Researchers are encouraged by these results to continue development of measures which ask students
to show the process of their thinking in addition to showing comprehension of knowledge, concepts or
generalizations. Critical thinking appears best understood as comprised of several dimensions, which at
this college develop differently across the college years. Researchers developing production measures
need to distinguish in their measurement between habitual tendencies versus the ability to show
mastery to a standard when requested. At present, the Analysis of Argument and Test of Thematic
Analysis tend to be similar to projection measures in that they do not specifically guide responses and,
as a result, tend to measure habitual tendencies. Our results suggest that these instruments need
further refinement if they are to be used as standardized cross-college outcome measures. We were not
able to show their scoring criteria to be internally reliable.
We remain confident in the potential usefulness of the Test of Thematic Analysis and Analysis of
Argument to a wide variety of colleges once they have undergone further development to either tailor
them to the institution’s own educational goals and definitions, or to improve their generic cross-
institutional equivalence. Even at this early stage of development, we have found the findings from the
instruments useful in suggesting the habitual tendencies of our students. The Test of Thematic Analysis
appeared sensitive to changes in their habits of thought.
We urge educators to support researchers developing production measures and not to rely entirely on
traditional, though efficient measures that may not work well as measures of well-practiced mastery.
Production measures seem better suited to the measurement of the kind of mastery obtained by the
consistent practice of higher-order thought and of the kind of habituality of thought processes that
would lead to their use across situations. We have suggested that the Recognition of Assumptions
subscale may tap a relatively sophisticated thought process, but we also note that some sophisticated
thought processes, for example, as used in the integration of perspectives, may require a production
measure. We encourage researchers to broaden their scoring schemes to include the kind of critical
thinking practiced by adults already in the working world (Arlin, 1975; McClelland, 1973).
Educators should be encouraged by our conclusion that critical thinking develops in college. The Alverno
students not only showed gains on the Piagetian-based Test of Cognitive Development, but also their
performance on this measure has been linked with their prior progress in the curriculum (cf.
Mentkowski & Strait, 1983). These students also showed gains on each of the three subscales of the
Critical Thinking Appraisal that we fielded. This finding is tempered somewhat by our inability to show a
direct relationship between performance on this measure and progress in the curriculum. Performance
on the Test of Thematic Analysis did appear to be linked to progress in the curriculum. In this regard, the
Alverno students may have developed the habit of making more exceptions or qualifications to their
analyses. Although they also may have weakened in their habit of giving examples to their analyses, we
suspect this finding is context bound.
Longitudinal Assessment of Critical Thinking in College Page 14
So far, we are able to support the usefulness of only two of three new measures of critical thinking. We
are cautioned in interpreting these study results. For example, although this study generally confirmed
the usefulness of the Test of Thematic Analysis, a comparative cross-sectional study of seven colleges
was able to demonstrate change on the Test of Thematic Analysis for only two of the seven: “Ivy”
College, a famous and highly selective one and Alverno College (see Winter, McClelland, and Stewart,
1981). Why? We have already noted that the instrument criteria and instructions were standardized by
a single institution, “Ivy” college We also note that the findings for Alverno reported in that study, which
were based upon a cross-sectional comparison of entering versus graduating students, have not been
confirmed by our longitudinal results. But, even if the summated results from “Ivy” college and the
results from the present analysis of the scoring criteria for the Test of Thematic Analysis are taken as
suggestive of possible differences, we have yet to demonstrate robust cross-college findings. We note
that at Alverno College, faculty have identified critical thinking abilities as college outcomes and
developed teaching strategies and instruments to teach and assess them. Perhaps, colleges cannot
expect change on production measures without such an explicit curriculum or without highly selective
student bodies. If so, it means that researchers and educators must work together at both instrument
and curriculum development at a range of colleges.
Longitudinal Assessment of Critical Thinking in College Page 15
REFERENCES
Arlin, P. (1975). Cognitive development in adulthood: A fifth stage? Developmental Psychology, 11(5), 602-606.
Ewell, P. (1984). The self-regarding institution: Information for excellence. Boulder, CO: National Center for Higher
Education Management Systems.
Fredericksen, N. (1984). The real test bias: Influences of testing on teaching and learning. Paper presented at a
conference on Teaching Thinking Skills, Wingspread Conference Center, Racine, WI.
Lipman, M. (1984). Philosophy and the cultivation of reasoning_. Paper presented at a conference on Teaching
Thinking Skills, Wingspread Conference Center, Racine, WI.
Marchese, T. (1985). Learning about assessment. AAHE Bulletin, 38(1), 10-13.
McClelland, D. (1973). Testing for competence rather than for “intelligence.” American Psycologist, 28, 1-14.
Mentkowski, M., & Doherty, A. (1984). Abilities that last a lifetime: Outcomes of the Alverno experience. AAHE
Bulletin, 36(6), 1-6, 11-14
Mentkowski, M., & Doherty, A. (1983, revised 1984). Careering after college: Establishing the validity of abilities
learned in college for later careering and professional performance. Final report to the National Institute of
Education: Overview and summary. Milwaukee, WI: Alverno Productions.
Mentkowski, M., & Strait, M. (1983). A longitudinal study of student change in cognitive development, learning
styles, and generic abilities in an outcome-centered liberal arts curriculum. Final report to the National
Institute of Education, research report number six. Milwaukee, WI: Alverno Productions.
Nickerson, R. (1984). Teaching thinking: What is being done and with what results? Cambridge, MA: Bolt Beranek
and Newman, Inc.
Paul, R. (1984). The concept of critical thinking: An analysis, a global strategy, and plea for emancipatory reason.
Rohnert Park, CA: Sonoma State University Center for Critical Thinking and Moral Critique.
Renner, J., Fuller, R., Lockhead, J., Johns, J., Tomlinson-Keasey, C. & Campbell, T. (1976). Test of Cognitive
Development. Norman, OK: University of Oklahoma.
Sigel, I. (1984). Reflection on thinking about thinking: The educational discovery of the 80’s? Paper for presentation
at a conference on Teaching Thinking Skills, Wingspread Conference Center, Racine, WI.
Sternberg, R. (1983). How can we teach intelligence? Philadelphia, PA: Research for Better Schools, Inc.
Stewart, A. (1977a). _Analysis of argument:, An empirically-derived measure of intellectual flexibility. Boston:
McBer and Company.
Stewart, A. (1977b). Scoring manual for stages of psychological adaptation to the environment. Unpublished
manuscript, Department of Psychology, Boston University.
Watson, G., & Glaser, E. (1964). Critical Thinking Appraisal. New York: Harcourt, Brace, Jovanovich.
Longitudinal Assessment of Critical Thinking in College Page 16
Winter, D. (1976). The Test of Thematic Analysis. Boston: McBer and Company.
Winter, D., & McClelland, D. (1978). Thematic analysis: An empirically derived measure of the effects of liberal arts
education. Journal of Educational Psychology, 70, 8-16.
Winter, D., McClelland, D., & Stewart, A. (1981). A new case for the liberal arts: Assessing institutional goals and
student development. San Francisco: Jossey-Bass.
Longitudinal Assessment of Critical Thinking in College Page 17
Table 1: Inter-Item Reliability, Chronbach’s Alpha, for Each Time
of Assessment1
Measure Time 1 Time 2 Time 3
Test of Cognitive Development NA .472
.54
Test of Thematic Analysis (all 9criteria) .28 .11 .19
Test of Thematic Analysis (5 of 9criteria) .36 .17 .35
Analysis of Argument Attack items .33 .17 .09
Analysis of Argument Defense items .79 .46 .75
1
For the reliability analyses, each scoring criteria was used as an item. For the Test of Cognitive
Development, 5 scores were used as items. For the Analysis of Argument “Attack” essay, 5 scoring
criteria were also available. For the Analysis of Argument “Defense” essay, 4 criteria were coded, but
only 3 were used in the reliability analysis because of multicollinearity.
NA The reliability coefficient is not available because only total scores were keypunched.
2
At time 2, the Test of Cognitive Development reliability coefficient is based only on the 1977 cohort,
because only the total score for 1976 cohort was keypunched.
Longitudinal Assessment of Critical Thinking in College Page 18
Table 2: Age by Time Repeated Measures ANOVAs for Linear and
Quadratic Contrasts
Measures Age
Main Effect
Linear Time
Main Effect
Quadratic Time
Main Effect
Test of Cognitive Development F(1,189) < 1 F(1,189) = 14.7*** F(1,189) = 1.7
Test of Thematic Analysis
(5 of 9 criteria)
F(1,192) = 22.4*** F(1,192) = 3.9* F(1,192) = 5.1*
Test of Thematic Analysis exception F(1,192) = 3.6 F(1,192) = 14.0*** F(1,192 < 1
Test of Thematic Analysis example F(1,192)-16.4*** F(1,192) = 16.6*** F(1,192) < 1
Inference Subscale of Critical
Thinking Appraisal
F(1,180) = 6.6* F(1,180) = 19.1*** F(1,180) < 1
Recognition of Assumptions Subscale
of Critical Thinking Appraisal
F(1,180) = 4.9* F(1,180) = 4.1* F(1,180) = 6.8*
Deduction Subscale of Critical
Thinking Appraisal
F(1,179) < 1 F(1,179) = 19.6*** F(1,179) < 1
1
Multivariate analysis of variance testing combined effects of linear and quadratic quadratic time were
statistically significant for all reported main effects. Both linear and quadratic tests of the Age by
Time interaction failed to reach to reach statistical significance (all F values less than 1).
* p <.05
** p < .01
*** p < .001
Longitudinal Assessment of Critical Thinking in College Page 19
Table 3: Means for Linear Time Main Effect Ignoring Age
Measures Time 1 Time 2 Time 3
Test of Cognitive Development 11.45 12.24 12.37
Deduction Subscale of Critical Thinking Appraisal 16.10 16.64 17.16
Longitudinal Assessment of Critical Thinking in College Page 20
Table 4: Means for Time of Assessment Broken Down for the Age
Main Effect
Measure Time 1 Time 2 Time 3
Test of Thematic Analysis (5 of 9 criteria)
Age 17 to 19 1.09 1.19 .91
Age 20 to 55 1.56 1.57 1.32
Test of Thematic Analysis, exception
Age 17 to 19 .25 .36 .48
Age 20 to 55 .37 .48 .49
Test of Thematic Analysis, example
Age 17 to 19 .32 .22 .16
Age 20 to 55 .50 .41 .29
Inference Subscale of Critical Thinking Appraisal
Age 17 to 19 8.97 9.56 9.58
Age 20 to 55 9.87 10.45 10.93
Recognition of Assumptions Subscale of Appraisal
Age 17 to 19 10.96 10.52 11.18
Age 20 to 55 11.26 11.35 12.01
Longitudinal Assessment of Critical Thinking in College Page 21
Table 5: Correlation of Critical Thinking Measures with Progress
in the Curriculum On “Competence Level Units” (CLU’s)1
and On
Credits Achieved
Measures Time 2
CLU's
Time 2
Credits
Time 3
CLU's
Time 3
Credits
Test of Cognitive Development .21** .15* .03 .08
Test of Thematic Analysis (5 of 9 criteria) –.20** –.31*** –.17** –.29***
Test of Thematic Analysis, exception –.04 –.09 .10 .13*
Test of Thematic Analysis, example –.21** –.29*** –.10 –.25***
Inference Subscale of Critical Thinking Appraisal .05 –.07 –.08 –.15*
Recognition of Assumption Subscale of Critical Thinking
Appraisal
.09 –.03 –.11 –.07
Deduction Subscale of Critical Thinking Appraisal .04 .06 –.11 –.02
1
Alverno Students demonstrate to criteria on over 100 assessments each of 8 broad abilities, which
have been set by the faculty (e.g. communication, analysis, problem solving, etc.). It is these
assessments that lead to the credentialing of the student on the sequential and developmental
arranged “competence level units.”
* p < .05
** p < .01
*** p <.001
Longitudinal Assessment of Critical Thinking in College Page 22
Table 6: Correlations between the Critical Thinking Measures at
Time One
Measures
Test of Cognitive
Test of
Cognitive
Development
CTA
Inference
Subscale
Recognition of
Assumptions
Subscale
CTA
Deduction
Subscale
Test of Cognitive Development .28*** .21** .35***
Test of Thematic Analysis (5 of 9 criteria) .38*** .20** .16* .29***
Test of Thematic Analysis, exception –.05 .10 .17** .06
Test of Thematic Analysis, example .24** .08 –.01
.17*
Inference Subscale of Critical Thinking
Appraisal
.29*** .29***
Recognition of Assumptions Subscale of
Critical Thinking Appraisal
.38***
Deduction Subscale of Critical Thinking
Appraisal
* p < .05
** p < .01
*** p <.001
Longitudinal Assessment of Critical Thinking in College Page 23
Appendix A: Test of Thematic Analysis Criteria Included Versus
Not Included
Criteria Included in 5 of 9 Total Score
Making “Direct Compound Comparisons” positively scored
Giving “Examples” positively scored
Using an “Analytic Hierarchy” positively scored
“Redefinition” for Scope or Clarity positively scored
Comparing “Apples and Oranges” negatively scored
Criteria Excluded From 5 of 9 Total Score
Making “Exceptions” or “Qualifications” positively scored
“Subsuming Alternatives” positively scored
“Affect” negatively scored
“Subjective Reaction” negatively scored

More Related Content

What's hot

Correcting Students’ Chemical Misconceptions based on Two Conceptual change s...
Correcting Students’ Chemical Misconceptions based on Two Conceptual change s...Correcting Students’ Chemical Misconceptions based on Two Conceptual change s...
Correcting Students’ Chemical Misconceptions based on Two Conceptual change s...
iosrjce
 
Survey question bases
Survey question basesSurvey question bases
Survey question bases
celparcon1
 
Organizing and Evaluating Results from Multiple Reading Assessments
Organizing and Evaluating Results from Multiple Reading AssessmentsOrganizing and Evaluating Results from Multiple Reading Assessments
Organizing and Evaluating Results from Multiple Reading Assessmentsrathx039
 
Impact Of Diagnostic Test For Enhancing Student Learning At Elementary Level
Impact Of Diagnostic Test For Enhancing Student Learning At Elementary LevelImpact Of Diagnostic Test For Enhancing Student Learning At Elementary Level
Impact Of Diagnostic Test For Enhancing Student Learning At Elementary Level
Pakistan
 
Pro questdocuments 2015-03-16(1)
Pro questdocuments 2015-03-16(1)Pro questdocuments 2015-03-16(1)
Pro questdocuments 2015-03-16(1)
Rose Jedin
 
Hots
HotsHots
EDUC 8102-6: Applied Research and Adult Learn
EDUC 8102-6: Applied Research and Adult LearnEDUC 8102-6: Applied Research and Adult Learn
EDUC 8102-6: Applied Research and Adult Learn
eckchela
 
Assessing the use of language learning strategies worldwide with the ESL and ...
Assessing the use of language learning strategies worldwide with the ESL and ...Assessing the use of language learning strategies worldwide with the ESL and ...
Assessing the use of language learning strategies worldwide with the ESL and ...mizzyatie14
 
Achievement Goal Orientation across Gender and Ethnicity in a Community Col...
Achievement Goal Orientation across Gender and Ethnicity in a Community Col...Achievement Goal Orientation across Gender and Ethnicity in a Community Col...
Achievement Goal Orientation across Gender and Ethnicity in a Community Col...
Scott R. Furtwengler, Ph.D.
 
Homeschool study
Homeschool studyHomeschool study
Homeschool studyFieza Najwa
 
Pilot Study for Validity and Reliability of an Aptitude Test
Pilot Study for Validity and Reliability of an Aptitude TestPilot Study for Validity and Reliability of an Aptitude Test
Pilot Study for Validity and Reliability of an Aptitude Test
Bahram Kazemian
 
The Power Of Clarifying 03
The Power Of Clarifying 03The Power Of Clarifying 03
The Power Of Clarifying 03guest19cc722
 
Dr. Rebecca Duong, PhD Dissertation Defense, Dr. William Allan Kritsonis, Dis...
Dr. Rebecca Duong, PhD Dissertation Defense, Dr. William Allan Kritsonis, Dis...Dr. Rebecca Duong, PhD Dissertation Defense, Dr. William Allan Kritsonis, Dis...
Dr. Rebecca Duong, PhD Dissertation Defense, Dr. William Allan Kritsonis, Dis...
William Kritsonis
 
journal article
journal article journal article
journal article
Shafiqah Rashid
 
Comparartive and non-Comparative study
Comparartive and non-Comparative studyComparartive and non-Comparative study
Comparartive and non-Comparative study
Khadeeja Al-Shidhani
 
The effectiveness of evaluation style in improving lingual knowledge for ajlo...
The effectiveness of evaluation style in improving lingual knowledge for ajlo...The effectiveness of evaluation style in improving lingual knowledge for ajlo...
The effectiveness of evaluation style in improving lingual knowledge for ajlo...
Alexander Decker
 
11.the effectiveness of teaching physics through project method on academic a...
11.the effectiveness of teaching physics through project method on academic a...11.the effectiveness of teaching physics through project method on academic a...
11.the effectiveness of teaching physics through project method on academic a...Alexander Decker
 
8. brown &amp; hudson 1998 the alternatives in language assessment
8. brown &amp; hudson 1998 the alternatives in language assessment8. brown &amp; hudson 1998 the alternatives in language assessment
8. brown &amp; hudson 1998 the alternatives in language assessment
Cate Atehortua
 

What's hot (20)

Attribution presentation
Attribution presentationAttribution presentation
Attribution presentation
 
Correcting Students’ Chemical Misconceptions based on Two Conceptual change s...
Correcting Students’ Chemical Misconceptions based on Two Conceptual change s...Correcting Students’ Chemical Misconceptions based on Two Conceptual change s...
Correcting Students’ Chemical Misconceptions based on Two Conceptual change s...
 
Survey question bases
Survey question basesSurvey question bases
Survey question bases
 
Organizing and Evaluating Results from Multiple Reading Assessments
Organizing and Evaluating Results from Multiple Reading AssessmentsOrganizing and Evaluating Results from Multiple Reading Assessments
Organizing and Evaluating Results from Multiple Reading Assessments
 
Impact Of Diagnostic Test For Enhancing Student Learning At Elementary Level
Impact Of Diagnostic Test For Enhancing Student Learning At Elementary LevelImpact Of Diagnostic Test For Enhancing Student Learning At Elementary Level
Impact Of Diagnostic Test For Enhancing Student Learning At Elementary Level
 
Pro questdocuments 2015-03-16(1)
Pro questdocuments 2015-03-16(1)Pro questdocuments 2015-03-16(1)
Pro questdocuments 2015-03-16(1)
 
Hots
HotsHots
Hots
 
EDUC 8102-6: Applied Research and Adult Learn
EDUC 8102-6: Applied Research and Adult LearnEDUC 8102-6: Applied Research and Adult Learn
EDUC 8102-6: Applied Research and Adult Learn
 
Assessing the use of language learning strategies worldwide with the ESL and ...
Assessing the use of language learning strategies worldwide with the ESL and ...Assessing the use of language learning strategies worldwide with the ESL and ...
Assessing the use of language learning strategies worldwide with the ESL and ...
 
Achievement Goal Orientation across Gender and Ethnicity in a Community Col...
Achievement Goal Orientation across Gender and Ethnicity in a Community Col...Achievement Goal Orientation across Gender and Ethnicity in a Community Col...
Achievement Goal Orientation across Gender and Ethnicity in a Community Col...
 
Homeschool study
Homeschool studyHomeschool study
Homeschool study
 
Pilot Study for Validity and Reliability of an Aptitude Test
Pilot Study for Validity and Reliability of an Aptitude TestPilot Study for Validity and Reliability of an Aptitude Test
Pilot Study for Validity and Reliability of an Aptitude Test
 
The Power Of Clarifying 03
The Power Of Clarifying 03The Power Of Clarifying 03
The Power Of Clarifying 03
 
Dr. Rebecca Duong, PhD Dissertation Defense, Dr. William Allan Kritsonis, Dis...
Dr. Rebecca Duong, PhD Dissertation Defense, Dr. William Allan Kritsonis, Dis...Dr. Rebecca Duong, PhD Dissertation Defense, Dr. William Allan Kritsonis, Dis...
Dr. Rebecca Duong, PhD Dissertation Defense, Dr. William Allan Kritsonis, Dis...
 
journal article
journal article journal article
journal article
 
Comparartive and non-Comparative study
Comparartive and non-Comparative studyComparartive and non-Comparative study
Comparartive and non-Comparative study
 
The effectiveness of evaluation style in improving lingual knowledge for ajlo...
The effectiveness of evaluation style in improving lingual knowledge for ajlo...The effectiveness of evaluation style in improving lingual knowledge for ajlo...
The effectiveness of evaluation style in improving lingual knowledge for ajlo...
 
11.the effectiveness of teaching physics through project method on academic a...
11.the effectiveness of teaching physics through project method on academic a...11.the effectiveness of teaching physics through project method on academic a...
11.the effectiveness of teaching physics through project method on academic a...
 
2014 yesulyurt
2014 yesulyurt2014 yesulyurt
2014 yesulyurt
 
8. brown &amp; hudson 1998 the alternatives in language assessment
8. brown &amp; hudson 1998 the alternatives in language assessment8. brown &amp; hudson 1998 the alternatives in language assessment
8. brown &amp; hudson 1998 the alternatives in language assessment
 

Viewers also liked

Reflektioner som utvärdering
Reflektioner som utvärderingReflektioner som utvärdering
Reflektioner som utvärdering
CeciliaFalk
 
Basics of Research and Bias
Basics of Research and BiasBasics of Research and Bias
Basics of Research and Bias
Brian Wells, MD, MS, MPH
 
Analysing Texts / Using Documents
Analysing Texts / Using DocumentsAnalysing Texts / Using Documents
Analysing Texts / Using Documents
DrKevinMorrell
 
Statistics
StatisticsStatistics
Quantitative Data - A Basic Introduction
Quantitative Data - A Basic IntroductionQuantitative Data - A Basic Introduction
Quantitative Data - A Basic Introduction
DrKevinMorrell
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpointjamiebrandon
 

Viewers also liked (11)

Reflektioner som utvärdering
Reflektioner som utvärderingReflektioner som utvärdering
Reflektioner som utvärdering
 
Naturkunskap na1a1 april 2013
Naturkunskap na1a1 april 2013Naturkunskap na1a1 april 2013
Naturkunskap na1a1 april 2013
 
Naturkunskap na1a1 april 2013b
Naturkunskap na1a1 april 2013bNaturkunskap na1a1 april 2013b
Naturkunskap na1a1 april 2013b
 
Basics of Research and Bias
Basics of Research and BiasBasics of Research and Bias
Basics of Research and Bias
 
Analysing Texts / Using Documents
Analysing Texts / Using DocumentsAnalysing Texts / Using Documents
Analysing Texts / Using Documents
 
Statistics
StatisticsStatistics
Statistics
 
Bias
BiasBias
Bias
 
Quantitative Data - A Basic Introduction
Quantitative Data - A Basic IntroductionQuantitative Data - A Basic Introduction
Quantitative Data - A Basic Introduction
 
Sample size calculation
Sample size calculationSample size calculation
Sample size calculation
 
Data analysis powerpoint
Data analysis powerpointData analysis powerpoint
Data analysis powerpoint
 
Chapter 10-DATA ANALYSIS & PRESENTATION
Chapter 10-DATA ANALYSIS & PRESENTATIONChapter 10-DATA ANALYSIS & PRESENTATION
Chapter 10-DATA ANALYSIS & PRESENTATION
 

Similar to Longitudinal Assessment of Critical Thinking

An Investigation Of Undergraduate Students Feelings And Attitudes Towards Gr...
An Investigation Of Undergraduate Students  Feelings And Attitudes Towards Gr...An Investigation Of Undergraduate Students  Feelings And Attitudes Towards Gr...
An Investigation Of Undergraduate Students Feelings And Attitudes Towards Gr...
Joaquin Hamad
 
ASSESSING POSTGRADUATE STUDENTS CRITICAL THINKING ABILITY
ASSESSING POSTGRADUATE STUDENTS  CRITICAL THINKING ABILITYASSESSING POSTGRADUATE STUDENTS  CRITICAL THINKING ABILITY
ASSESSING POSTGRADUATE STUDENTS CRITICAL THINKING ABILITY
Don Dooley
 
EMOTION DETECTION AND OPINION MINING FROM STUDENT COMMENTS FOR TEACHING INNOV...
EMOTION DETECTION AND OPINION MINING FROM STUDENT COMMENTS FOR TEACHING INNOV...EMOTION DETECTION AND OPINION MINING FROM STUDENT COMMENTS FOR TEACHING INNOV...
EMOTION DETECTION AND OPINION MINING FROM STUDENT COMMENTS FOR TEACHING INNOV...
ijejournal
 
An Evaluation Of Predictors Of Achievement On Selected Outcomes In A Self-Pac...
An Evaluation Of Predictors Of Achievement On Selected Outcomes In A Self-Pac...An Evaluation Of Predictors Of Achievement On Selected Outcomes In A Self-Pac...
An Evaluation Of Predictors Of Achievement On Selected Outcomes In A Self-Pac...
Zaara Jensen
 
A Review Of Scientific And Humanistic Approaches In Curriculum Evaluation
A Review Of Scientific And Humanistic Approaches In Curriculum EvaluationA Review Of Scientific And Humanistic Approaches In Curriculum Evaluation
A Review Of Scientific And Humanistic Approaches In Curriculum Evaluation
Lori Moore
 
An analysis of utilization of educational research findings for qualitative d...
An analysis of utilization of educational research findings for qualitative d...An analysis of utilization of educational research findings for qualitative d...
An analysis of utilization of educational research findings for qualitative d...
Alexander Decker
 
Analysis Of Students Critical Thinking Skill Of Middle School Through STEM E...
Analysis Of Students  Critical Thinking Skill Of Middle School Through STEM E...Analysis Of Students  Critical Thinking Skill Of Middle School Through STEM E...
Analysis Of Students Critical Thinking Skill Of Middle School Through STEM E...
Amy Cernava
 
Program evaluation 2013
Program evaluation 2013Program evaluation 2013
Program evaluation 2013lisawitteman
 
Program evaluation 2013
Program evaluation 2013Program evaluation 2013
Program evaluation 2013lisawitteman
 
Intro to philosophy Module1_Q1.pptx
Intro to philosophy Module1_Q1.pptxIntro to philosophy Module1_Q1.pptx
Intro to philosophy Module1_Q1.pptx
MARINELLERICALDE
 
Nicol & McFarlane-Dick (2006)
Nicol & McFarlane-Dick (2006)Nicol & McFarlane-Dick (2006)
Nicol & McFarlane-Dick (2006)nzcop2009
 
Assessment Of ELLs Critical Thinking Using The Holistic Critical Thinking Sc...
Assessment Of ELLs  Critical Thinking Using The Holistic Critical Thinking Sc...Assessment Of ELLs  Critical Thinking Using The Holistic Critical Thinking Sc...
Assessment Of ELLs Critical Thinking Using The Holistic Critical Thinking Sc...
Tracy Morgan
 
Assessing Students Critical Thinking And Approaches To Learning
Assessing Students  Critical Thinking And Approaches To LearningAssessing Students  Critical Thinking And Approaches To Learning
Assessing Students Critical Thinking And Approaches To Learning
Kim Daniels
 
Assessment Of Students Argumentative Writing A Rubric Development
Assessment Of Students  Argumentative Writing  A Rubric DevelopmentAssessment Of Students  Argumentative Writing  A Rubric Development
Assessment Of Students Argumentative Writing A Rubric Development
Sophia Diaz
 
A Procedure For Assessing Students Ability To Write Compositions
A Procedure For Assessing Students  Ability To Write CompositionsA Procedure For Assessing Students  Ability To Write Compositions
A Procedure For Assessing Students Ability To Write Compositions
Sabrina Green
 
Course ID BTM7300v2 Week 4 Outline of the Comprehensive .docx
Course ID BTM7300v2 Week 4 Outline of the Comprehensive .docxCourse ID BTM7300v2 Week 4 Outline of the Comprehensive .docx
Course ID BTM7300v2 Week 4 Outline of the Comprehensive .docx
voversbyobersby
 
An Analysis Of Student Reflections Of Semester Projects In Introductory Stati...
An Analysis Of Student Reflections Of Semester Projects In Introductory Stati...An Analysis Of Student Reflections Of Semester Projects In Introductory Stati...
An Analysis Of Student Reflections Of Semester Projects In Introductory Stati...
Jennifer Roman
 
A comparison of behaviorist and constructivist based teaching methods in psyc...
A comparison of behaviorist and constructivist based teaching methods in psyc...A comparison of behaviorist and constructivist based teaching methods in psyc...
A comparison of behaviorist and constructivist based teaching methods in psyc...Djami Olii
 
Effects of cornell, verbatim and outline note taking strategies on students’ ...
Effects of cornell, verbatim and outline note taking strategies on students’ ...Effects of cornell, verbatim and outline note taking strategies on students’ ...
Effects of cornell, verbatim and outline note taking strategies on students’ ...
Alexander Decker
 

Similar to Longitudinal Assessment of Critical Thinking (20)

An Investigation Of Undergraduate Students Feelings And Attitudes Towards Gr...
An Investigation Of Undergraduate Students  Feelings And Attitudes Towards Gr...An Investigation Of Undergraduate Students  Feelings And Attitudes Towards Gr...
An Investigation Of Undergraduate Students Feelings And Attitudes Towards Gr...
 
ASSESSING POSTGRADUATE STUDENTS CRITICAL THINKING ABILITY
ASSESSING POSTGRADUATE STUDENTS  CRITICAL THINKING ABILITYASSESSING POSTGRADUATE STUDENTS  CRITICAL THINKING ABILITY
ASSESSING POSTGRADUATE STUDENTS CRITICAL THINKING ABILITY
 
EMOTION DETECTION AND OPINION MINING FROM STUDENT COMMENTS FOR TEACHING INNOV...
EMOTION DETECTION AND OPINION MINING FROM STUDENT COMMENTS FOR TEACHING INNOV...EMOTION DETECTION AND OPINION MINING FROM STUDENT COMMENTS FOR TEACHING INNOV...
EMOTION DETECTION AND OPINION MINING FROM STUDENT COMMENTS FOR TEACHING INNOV...
 
An Evaluation Of Predictors Of Achievement On Selected Outcomes In A Self-Pac...
An Evaluation Of Predictors Of Achievement On Selected Outcomes In A Self-Pac...An Evaluation Of Predictors Of Achievement On Selected Outcomes In A Self-Pac...
An Evaluation Of Predictors Of Achievement On Selected Outcomes In A Self-Pac...
 
A Review Of Scientific And Humanistic Approaches In Curriculum Evaluation
A Review Of Scientific And Humanistic Approaches In Curriculum EvaluationA Review Of Scientific And Humanistic Approaches In Curriculum Evaluation
A Review Of Scientific And Humanistic Approaches In Curriculum Evaluation
 
Standard progressive matrices
Standard progressive matricesStandard progressive matrices
Standard progressive matrices
 
An analysis of utilization of educational research findings for qualitative d...
An analysis of utilization of educational research findings for qualitative d...An analysis of utilization of educational research findings for qualitative d...
An analysis of utilization of educational research findings for qualitative d...
 
Analysis Of Students Critical Thinking Skill Of Middle School Through STEM E...
Analysis Of Students  Critical Thinking Skill Of Middle School Through STEM E...Analysis Of Students  Critical Thinking Skill Of Middle School Through STEM E...
Analysis Of Students Critical Thinking Skill Of Middle School Through STEM E...
 
Program evaluation 2013
Program evaluation 2013Program evaluation 2013
Program evaluation 2013
 
Program evaluation 2013
Program evaluation 2013Program evaluation 2013
Program evaluation 2013
 
Intro to philosophy Module1_Q1.pptx
Intro to philosophy Module1_Q1.pptxIntro to philosophy Module1_Q1.pptx
Intro to philosophy Module1_Q1.pptx
 
Nicol & McFarlane-Dick (2006)
Nicol & McFarlane-Dick (2006)Nicol & McFarlane-Dick (2006)
Nicol & McFarlane-Dick (2006)
 
Assessment Of ELLs Critical Thinking Using The Holistic Critical Thinking Sc...
Assessment Of ELLs  Critical Thinking Using The Holistic Critical Thinking Sc...Assessment Of ELLs  Critical Thinking Using The Holistic Critical Thinking Sc...
Assessment Of ELLs Critical Thinking Using The Holistic Critical Thinking Sc...
 
Assessing Students Critical Thinking And Approaches To Learning
Assessing Students  Critical Thinking And Approaches To LearningAssessing Students  Critical Thinking And Approaches To Learning
Assessing Students Critical Thinking And Approaches To Learning
 
Assessment Of Students Argumentative Writing A Rubric Development
Assessment Of Students  Argumentative Writing  A Rubric DevelopmentAssessment Of Students  Argumentative Writing  A Rubric Development
Assessment Of Students Argumentative Writing A Rubric Development
 
A Procedure For Assessing Students Ability To Write Compositions
A Procedure For Assessing Students  Ability To Write CompositionsA Procedure For Assessing Students  Ability To Write Compositions
A Procedure For Assessing Students Ability To Write Compositions
 
Course ID BTM7300v2 Week 4 Outline of the Comprehensive .docx
Course ID BTM7300v2 Week 4 Outline of the Comprehensive .docxCourse ID BTM7300v2 Week 4 Outline of the Comprehensive .docx
Course ID BTM7300v2 Week 4 Outline of the Comprehensive .docx
 
An Analysis Of Student Reflections Of Semester Projects In Introductory Stati...
An Analysis Of Student Reflections Of Semester Projects In Introductory Stati...An Analysis Of Student Reflections Of Semester Projects In Introductory Stati...
An Analysis Of Student Reflections Of Semester Projects In Introductory Stati...
 
A comparison of behaviorist and constructivist based teaching methods in psyc...
A comparison of behaviorist and constructivist based teaching methods in psyc...A comparison of behaviorist and constructivist based teaching methods in psyc...
A comparison of behaviorist and constructivist based teaching methods in psyc...
 
Effects of cornell, verbatim and outline note taking strategies on students’ ...
Effects of cornell, verbatim and outline note taking strategies on students’ ...Effects of cornell, verbatim and outline note taking strategies on students’ ...
Effects of cornell, verbatim and outline note taking strategies on students’ ...
 

Longitudinal Assessment of Critical Thinking

  • 1. LONGITUDINAL ASSESSMENT OF CRITICAL THINKING IN COLLEGE WHAT MEASURES ASSESS CURRICULAR IMPACT? Marcia Mentkowski Glen Rogers Office of Research & Evaluation Alverno College
  • 2.
  • 3. Findings reported here are based upon a study funded by a grant from the National Institute of Education (NIE-G- 77-0058), as one part of a larger research project. Tamar Ben-Ur assisted with the statistical analyses reported in this paper. Paper presented at the annual meeting of The Mid-Western Educational Research Association in Chicago, October 1985. Longitudinal Assessment of Critical Thinking in College What Measures Assess Curricular Impact? Marcia Mentkowski Glen Rogers Educational Research and Evaluation ALVERNO COLLEGE Milwaukee, WI
  • 4. This publication is available from: Alverno College Institute 3400 South 43rd Street PO Box 343922 Milwaukee, WI 53234-3922 Phone: 414-382-6000 www.alverno.edu Graphic Design: Lynn Chabot-Long, Project Specialist, Educational Research and Evaluation  Copyright 1985. Alverno College Institute, Milwaukee, Wisconsin. All rights reserved under U.S., International and Universal Copyright Conventions. Reproduction in part or whole by any method is prohibited by law.
  • 5. Longitudinal Assessment of Critical Thinking in College Page i ABSTRACT Longitudinal results from a college outcomes study were used to judge one widely used and three new measures of critical thinking. Each measure was reviewed for how well it measured longitudinal vs. cross-sectional change, for whether it was associated with progress in the curriculum, for its association to background variables (e.g., High school GPA), and for whether it assessed change in critical thinking for both traditional-age and older adult students. Two of the newer measures are better college outcomes measures on these criteria. Interrelationships among the measures suggest that critical thinking is made up of multiple components, so researchers and educators are cautioned to improve both production and recognition measures.
  • 6. INTRODUCTION Objectives This study explores (1) the validity of a set of four critical thinking measures, (2) the possibility of an expanded domain of critical thinking abilities, (3) the degree and pattern of change in critical thinking abilities in an undergraduate population, and (4) the relationship between progress in the curriculum, which is outcome-centered and performance-based, and change in critical thinking abilities over the course of four years. Perspective and Theoretical Background Educators and researchers alike are strengthening their commitment to critical thinking as a college outcome. The American Association of Colleges, in a recent report redefining the baccalaureate, has identified critical thinking as one of the outcomes all colleges should prepare students to demonstrate (Integrity in the college curriculum, February 1985). In 1984, major conferences at Wingspread, Harvard, Sonoma State and The University of Chicago addressed critical thinking assessment. Accrediting agencies (e.g., North Central, COPA) are also calling for assessment of broad college outcomes. Recently, attention has turned to both the assessment of individual student learning and to the institutional evaluation of student outcomes (Marchese, 1985; Ewell, 1980; Marchese, 1985, Mentkowski & Doherty, 1984). The need for measures that assess broad college outcomes and that are not limited to amount of knowledge is critical. But, how are administrators and faculty to do so without instruments that contribute to cross-college measurement, and that have face validity for liberal arts faculty? Many liberal arts faculty do not just focus on knowledge measured by recognition tasks like SATs or GREs, but instead focus on higher order cognitive processes that are measured by production tasks and that have been conceptualized as central to newer definitions of critical thinking (Lipman, 1984; Nickerson, 1984; Paul, 1984; Sigel, 1984; Sternberg, 1983; Winter & McClelland, 1978). Moreover, there is a reciprocal relation between assessment and teaching. For example, Frederiksen (1984) observes in the American Psychologist that use of the newer production measures could also encourage teaching of higher level cognitive skills and provide practice with feedback. Several researchers have designed newer measures, but how well do they work as college outcomes measures? METHOD Critical Thinking Measures A total of 12 cognitive-developmental, learning style and generic ability measures were administered in a longitudinal study of college outcomes (Mentkowski & Doherty, 1983). Four of these are the focus of this analysis of critical thinking measures. Of these, two involve, at least partially, the more typical recognition tasks whereby the participant must select the correct answer:
  • 7. Longitudinal Assessment of Critical Thinking in College Page 2 (1) Watson and Glaser’s Critical Thinking Appraisal (Watson & Glaser, 1964) is a standardized, easily used recognition measure of component critical thinking abilities. The Form ZM subscales assessing quality of inferences, recognition of assumptions, and deductive reasoning were administered in this study. These subscales are henceforth designated as the Inference, Recognition of Assumptions, and Deduction subscales, respectively. The interpretation subscale and the evaluation of argument subscale were not administered. (2) Test of Cognitive Development (Renner, Fuller, Lockhead, Tomlinson-Keasey & Campbell, 1976) is a newer series of paper and pencil tasks designed to measure formal operational thinking as defined by Fiaget. Although participant responses for several tasks are scored from multiple choice answers, written justification of the answers often is considered in scoring as well. Scoring of the flexibility of rods task focuses upon participants’ written explanation of their reasoning. The theoretical justification for the test suggests sophisticated cognitive processes are being measured. The other two critical thinking measures administered are pure production measures: (3) Analysis of Argument (Stewart, 1977a, 1977b) is designed to measure flexibility in arguing (with consistency) for opposite positions on a specified issue. In response to the stimulus, an emotional one-sided essay, participants write two essays. The “Attack” essay is scored for whether it has a central organizing principle and whether it focuses on the faulty logic of the stimulus essay. The “Defense” essay is scored for whether it reflects a modified or qualified endorsement of the counter-attitudinal stimulus essay. (4) Test of Thematic Analysis (Winter, 1976; Winter & McClelland, 1978) is designed to measure the ability to form complex concepts and communicate them. The task requires participants to compare and contrast two sets of essays according to the themes in the two essay sets. Set A includes three stimuli essays about four sentences long, as does Set B. Appendix A gives the brief titles of the nine scoring criteria. Study Design and Inventory Administration All undergraduates who entered a women’s college in 1976 and 1977 were recruited as volunteer participants in the longitudinal study. In 1977, a Weekend College timeframe was offered for the first time. Thus, the 1977 cohort includes students from both the weekend and weekday timeframes. The longitudinal cohorts were assessed at entrance, two years after entrance, and three and one-half years after entrance. As a cross-sectional comparison group, the entire 1978 weekday graduating class was recruited for assessment at graduation. In order to control for attrition effects, the entrance scores of the students in the 1977 weekday longitudinal cohort who did not graduate were deleted when they were cross-
  • 8. Longitudinal Assessment of Critical Thinking in College Page 3 sectionally compared to the scores of the 1978 graduates. Measures were administered in large group sessions. Data Source, Attrition, and Participation Alverno College is a women’s college. At the time of the study, the students were predominantly Caucasian and from one midwest state. Many were first-generation college students. The college traditionally has served working class students from a large urban area, and has not been highly selective. At the first assessment, all of the women who entered in 1976 and 1977 were recruited as participants. To be considered eligible for recruitment for later longitudinal assessments, the students had to be currently enrolled and to have completed the prior assessments on at least a subset of the inventories. The weekend college attracted older students in increased numbers. Longitudinal data analyzed for this study include all students who participated on all three occasions for a particular instrument. Between 83% to 99% of the students participated at each assessment for at least a subset of the inventories. The longitudinal data pool, n = 208, included both traditional-age (17–19 years; n = 108) and older students (20–55 years; n = 100). The graduating class used as a cross-sectional comparison group also included both traditional age (n = 45) and older students (n = 15). Since not all participants completed all of the inventories, the number of observations per inventory varies. The proportion of older and younger participants remains about the same. The lowest number of replicated observations across three assessments occurred for the Analysis of Argument “Attack” essay, n = 133, and “Defense” essay, n = 130. This is accounted for by a delayed decision to include the inventory in the study, which resulted in a reduced number of observations at time 1 for this inventory. For the other inventories reported here, between 181 and 194 participants completed all three assessments for the particular inventory. The vast majority of the participants completed all of the instruments. For the purposes of obtaining internal reliability estimates only, the Analysis of Argument scores were used for all those who completed the Analysis of Argument inventory at the time of assessment, even if they did not complete all three assessments. Thus, between 183 and 189 participants were available for these time 2 and time 3 reliability analyses. Main Analyses Employed Raw change over occasions of assessment was studied with multivariate analysis of variance for repeated measures and unequal n’s in an Age X Time factorial design, with the repeated measures on the Time of assessment. In this analysis, both the linear and the quadratic effects of Time were tested with orthogonal polynomial contrasts. Weights were used to adjust for the unequal lengths of Time between the intervals. The association of the critical thinking measures with rate of progress in the curriculum was measured with correlational analyses. Alpha coefficients were computed to determine the internal reliability of measures at each time of assessment. The amount of variance accounted for by scores from the previous administration of each measure was computed to determine test-retest
  • 9. Longitudinal Assessment of Critical Thinking in College Page 4 predictability. Mentkowski and Strait (1983) have previously reported data analyses of this data set, which are sometimes referenced here. The present analyses extend the previous analyses by exploring the Age X Time interaction directly, by focusing entirely upon unadjusted scores, by exploring the internal reliability of the summated measures, and by analyzing the component criteria for the Test of Thematic Analysis and Analysis of Argument measures. RESULTS AND CONCLUSIONS Reliability Analysis of Argument post-test scores showed no predictability from Analysis of Argument pre-test scores. The inter-item reliability analysis of the “Attack” essay’s five scoring criteria (one was dropped because of multicollinearity) yielded unacceptably low reliability coefficients at times two and three (see Table 1), even though at least 184 students participated at these administrations. Although we believe researchers need to accept lower reliability estimates for production measures, the coefficients for the “Attack” essay’s scoring criteria at time two and time three do not inspire confidence in the current measurement. Since there appeared to be sufficient variability in the scores for each scoring criterion, we surmise that our relatively low internal reliability coefficients might be improved by increasing the number of scoring criteria. Alternatively, a consistently unitary construct may not yet underlie the six scoring criteria for the “Attack” essay. The internal reliability of the “Defense” essay’s four scoring criteria was relatively high at each time of assessment (see Table 1). The Test of Cognitive Development showed good internal reliability for the five item scale (see Table 1). In addition, pretest scores for the Test of Cognitive Development also accounted for a high proportion of the variance in the Test of Cognitive Development post-test scores. Averaging across the percent of variance accounted for by time one in predicting time two and by time two in predicting time three, 29.87 of the variance (R Square) was statistically predictable. The Test of Thematic Analysis appeared less internally reliable. The low internal reliability may be partially due to the lack of variance in some of the nine scoring criteria, which were dichotomously scored. The reliability coefficients for the scale were improved by removing just one scoring criterion, “making exceptions or qualifications.” Another scoring criterion was removed because of its poor correlation with the remaining seven criteria and its limited variance. This negatively scored criterion, “Affect,” is scored if the participant makes a comparison that is based upon her emotional reaction to the story. In order to prevent inappropriate extreme scores, two scoring criteria that yielded scores with an extremely restricted variance were also removed, leaving 5 of 9 of the Test of Thematic Analysis scoring criteria for a summated scale. These procedures yielded somewhat improved internal reliability coefficients (see Table 1). Appendix A documents the scoring criteria included and excluded from this exploratory 5 of 9 summated criteria-measure for the Test of Thematic Analysis. Although internal reliability coefficients were not computed for the Critical Thinking Appraisal subscales, the proportion of variance accounted for by pre-test scores for each interval confirms the test-retest
  • 10. Longitudinal Assessment of Critical Thinking in College Page 5 reliability of these subscales. Averaged across the two intervals, pretest scores account for 26.3% of the Inference post-test scores, for 19.17 of the Recognition of Assumptions post-test scores, and for 31.97 of the Deduction post-test scores. Although the relatively high predictability of the post-test scores from the pretest scores supports the reliability of this set of Critical Thinking Appraisal measures, the lower predictability of the post-test scores for the Test of Thematic of Analysis may be only due to individual differences in response to educational or other influences. A Face Validity Issue A qualitative examination of the “Defense” essays written by the students suggested that the students were role playing the emotional and faulty arguments they were asked to defend. This face validity concern is consistent with the scoring of the “Defense” essays. At each time of assessment, the scoring of the “Defense” essays according to the four scoring-criteria yielded very little variance among students. In the case of the first assessment, for example, 937 “totally” endorsed the presented argument (which was probably counter to their attitude); 917 presented new arguments supporting the counter-attitudinal statement; 47 made a modified endorsement of it; and 47 accepted a particular part of it. In other words, in the “Defense” essay, almost all of the entering freshman did exactly opposite of what would yield a high score in the standard scoring. Our qualitative observation that the students were role playing the bad arguments they were asked to defend is also consistent with the lack of instructions to dissuade against role playing in the Analysis of Argument inventory. We also observe that our students encounter numerous role playing exercises in their education. Trends through Time Our concerns about the face validity of the “Defense” essays and about the internal reliability of the “Attack” essay scoring criteria are supported by the previously conducted analyses on this data set (see Mentkowski and Strait, 1983). We know from these analyses that neither the “Attack” essay’s summated measure nor the “Defense” essay’s summated measure yielded any statistically significant differences across the times of assessment. Mentkowski and Strait (1983) also found that these summated measures were not related to measures of progress in the college’s curriculum. The lack of internal reliability for the “Attack” essay’s scoring criteria suggested an exploratory analysis with the individual criteria as the unit of analysis. None of these individual scoring criteria yield statistically significant differences for the Time of the assessment, however. Given the exploratory nature of this research, the Test of Thematic Analysis was analyzed in more than one way. Not only was a total score for the Test of Thematic Analysis included (in this case, the 5 of 9 criteria summated measure), but also, three of the scoring criteria derived from this measure were separately analyzed. Thus, analysis of variance was also performed on (1) the criterion of making exceptions or qualifications to one’s ascriptions, (2) the criterion of giving examples for observations, and (3) the negatively scored criterion of affective reaction. It should be noted that the examples criterion is a component of the Test of Thematic Analysis summated measure (5 of 9 criteria). As is noted below, analysis of this summated measure and analysis of its component scoring criterion,
  • 11. Longitudinal Assessment of Critical Thinking in College Page 6 examples, yield very similar results. This is probably largely because the scores yielded by the examples scoring criterion actually contribute the bulk of the variance to the summated score. Thus, like the summated scale, the examples scoring criterion does not correlate positively with the exceptions criterion at any of the times of measurement. Again, the direction of association is always negative, even though, in this case, it is not statistically significant. The Test of Thematic Analysis scoring criterion for affect-based comparisons was scored for only 16% of the essay comparisons. Because of low variance, a Friedman’s distribution-free one-way analysis of variance procedure for time of assessment was again conducted. It showed no statistically significant difference, Chi Square (2, 194), F < 1. The Age X Time repeated measures analysis of variance procedure for unequal n’s and unequal intervals, which tested for the linear and quadratic effects of Time, were performed on the following seven measurements: the Test of Cognitive Development summated measure, the Test of Thematic Analysis summated measure (5 of 9 criteria), the examples and exceptions scoring criteria for the Test of Thematic Analysis, and the three subscales of the Critical Thinking Appraisal. This subset of seven measurements, which includes both summated scales and single item scores, yielded many effects. The Age X Time analyses yielded main effects for Age on 4 of 7 of these measurements (see Table 2). They also yielded main effects for linear Time on all seven of these measurements. A main effect for quadratic Time was revealed on the Test of Thematic Analysis (5 of 9 criteria) and the Recognition of Assumption subscale of the Critical Thinking Appraisal (see Table 2). The lack of any interaction effects between Age and Time (all F values less than 1) suggests that these main effects of linear Time, and quadratic Time are relatively general for both age groups. First, looking at those measures not associated with Age effects, we see that scores on the Test of Cognitive Development and the scores on the Deduction subscale of the Critical Thinking Appraisal improve linearly with Time (see Table 3). Recall that the analysis of variance procedure (see Table 2) adjusted for unequal interval lengths. Next, looking at those measures with Age main effects (see Table 4), older age students show a fairly constant advantage on these measures at each time of assessment. There is no statistically significant interaction between Age and Time for scores yielded by the exceptions scoring criterion despite the apparent convergence of the means at time three for the two age groups. The quadratic effect of Time for the Test of Thematic Analysis summated score (5 of 9) is reflected in a drop on this measure at time three (see Table 4), which is confirmed by a posteriori analyses. There is also a linear decrease with Time (see Table 2). The examples scoring criterion, which is a component of this summated measure, likewise shows a linear (see Table 2) decrease with Time (see Table 4). In contrast, the exceptions scoring criterion shows a linear (see Table 2) increase with Time (see Table 4). Thus, by exploring the scoring criterion components of the Test of Thematic Analysis, we have found trends in opposite directions.
  • 12. Longitudinal Assessment of Critical Thinking in College Page 7 The Inference subscale of the Critical Thinking Appraisal shows a linear (see Table 2) increase with Time (see Table 4). Like the linear increases with Time for the Deduction subscale (see Table 3), these increases occur generally across our two age cohorts. The Recognition of Assumptions subscale shows both a linear and quadratic effect (see Table 2). Examination of Table 4 shows an increase on the Recognition of Assumptions subscale at the second interval, which is confirmed by a posteriori analyses. Relation of Measures to Curricular Progress Are these critical thinking and exploratory single item measures that have shown change as a function of time of assessment sensitive to curricular impact? In preparation for examining the relation between curriculum progress and performance on the critical thinking measures, we tabulated the measures of curriculum progress according to two separate cumulative subtotals: one subtotal for the first two years since enrollment and the other subtotal for the subsequent year and a half. The one curriculum progress subtotal corresponds to the interval of time between the first and second fieldings of the critical thinking measures, and the other curriculum progress subtotal corresponds to the interval of time between the second and third fieldings of the critical thinking measures. We reasoned that students showing a high degree of curriculum progress in the intervals of the study would at the end of the intervals show higher performance on the critical thinking measures. Thus, progress in the curriculum up to the second assessment was correlated with the critical thinking measures at the second assessment, and progress in the curriculum between the second and third assessments was correlated with the critical thinking measures at the third assessment. The measures of curriculum progress employed are conceptually well suited for showing the kind of student development relevant to growth of critical thinking. The curriculum of Alverno College is abilities-based; On over 100 assessments Alverno students demonstrate to criteria each of eight broad abilities set by the faculty (eg. communication, analysis, problem solving, and valuing). These assessments lead to the credentialing of the student on the sequential and developmentally arranged “competence level units.” The number of competence level units completed and the number of course credits completed were employed as two separate measures of progress in the curriculum. Examination of Table 5 shows that the Test of Cognitive Development scores at time two are positively correlated with cumulative progress in the curriculum. At time three, the Test of Cognitive Development scores do not correlate with the intervening progress in the curriculum (see Table 5). The Test of Thematic Analysis showed a somewhat complex relation to progress in the curriculum. The Test of Thematic Analysis summated scale (5 of 9 criteria) was negatively associated with curriculum progress at both the second and third assessments. While the examples scoring criterion performs similarly to the summated scale, of which it is a component, the exceptions scoring criterion yields different results. Thus, the examples criterion is negatively associated with curriculum progress at the second assessment, and, for at least one of the
  • 13. Longitudinal Assessment of Critical Thinking in College Page 8 measures of curriculum progress, is also negatively associated with curriculum progress at the third assessment (see Table 5). Again in contrast, the exceptions criterion shows a small positive relation to progress in the curriculum at the third assessment (see Table 5). For the three Critical Thinking Appraisal subscales only one of the twelve correlations with progress in the curriculum is statistically significant (see Table 5). The Test of Cognitive Development and the Test of Thematic Analysis and its single item subcomponents were related to progress in the curriculum, but the Critical Thinking Appraisal measures generally were not. Why not? Level of Thinking, Mastery of That Level, and Habituality of Use We believe the lack of relationship of the Critical thinking Appraisal with the measures of curriculum progress flows from the multiple choice format of the Critical Thinking Appraisal. Before we clarify why we suspect this, we need to distinguish between some broad performance characteristics. From a developmental perspective we might describe one performance as demonstrating a higher level of thinking than another. Higher levels of thinking are perhaps more complex, broader in scope, and so on. Also, a sophisticated thought process could be based upon the coordination of several thought processes, and might be predicated upon the development of other thought processes. We can conceptually distinguish the sophistication or level of a thought process from other performance characteristics. An individual can be said to have mastered a particular thinking process when, at a certain frequency, they can demonstrate the process of thinking when it is explicitly requested. But, even if an individual has mastered a level or kind of thinking, they might not habitually employ it. For example, they may not be called on to use it, they may have opposing tendencies, or they may not know it is expected of them. The distinctions between sophistication of a thinking process, mastery of a thinking process, and habitual use of a thinking process may help explain why some Critical Thinking measures that showed development during the college years also showed a relationship to curriculum progress, while others did not. Because of the recognition character of the Critical Thinking Appraisal, we do not feel confident in it as a measure of well-practiced mastery. College students can probably recognize higher levels of thinking than they can systematically produce. Performance on the Critical Thinking Appraisal may have more to do with the breadth of the exposure to the thinking processes the test taps, than with a well-practiced mastery of the thinking processes. Thus, although we are able to show gains on all of the Critical Thinking Appraisal subscales, these robust gains may indicate that it is relatively easy to show progress on these recognition based measures. Even those students progressing slowly in the curriculum may have a breadth of exposure to the thinking processes required by the Critical Thinking Appraisal. Indeed, the breadth of exposure that slower progressing students have compared to that of faster progressing students may be of equal value in recognizing the “correct” answers on the Critical Thinking Appraisal. This assumption would help
  • 14. Longitudinal Assessment of Critical Thinking in College Page 9 account for the finding that rate of progression in the curriculum is not generally related to subsequent performance on the Critical Thinking Appraisal subscales. Nonetheless, we must note that the Recognition of Assumptions subscale gains were most strongly associated with the second interval in the study. This suggests that a sophisticated critical thinking process was being measured. And, theoretically, the sophistication required to recognize assumptions of arguments does seem high enough to account for delayed development in the curriculum. If a later curriculum exposure is responsible for the delayed development on the Recognition of Assumptions subscale, it would seem that this argues for expecting to find a relationship with curriculum progress. But, could it be that both the slower and faster progressing students were both far enough along in the curriculum to be exposed sufficiently to this thinking process as it was required by the recognition test? We think so, and can note that the variance in the rate of progression of our students was not inordinately large. In explaining the relationship of the Test of Thematic Analysis to progress in the curriculum, we conversely focus on its characteristic of being a production measure. We believe these curriculum-linked changes on the Test of Thematic Analysis and its single item criterion-scores are consistent with an interpretation that focuses on the assumed relationship between production measures and the measurement of either broadly based mastery or habituality. More specifically, we suspect that mastery of a way of thinking or habituality in a way of thinking develops incrementally and would not be sensitive to minimal curriculum exposure. As a result, a range of progress in the curriculum would be likely to be associated with differential performance on a production measure, which is what we found. The response format of the Test of Cognitive Development is not purely either recognition or production, and so it is inappropriate to speculate on how the results for this measure reflect upon our thesis that production measures capture more habitual tendencies or more well-practiced abilities. We do, however, gain greater confidence in this measure as a measure of college outcomes because of its relation to progress in the curriculum. Furthermore, the historical development of this measure in the Research on Formal Operations does suggest the instrument as a measure of sophisticated and well- practiced thinking. Inter-Relation of Critical Thinking Measures/Background Variables More generally, we find other evidence that these various instruments are measuring different aspects of higher-order thinking. First, the Test of Thematic Analysis (the purest production measure) is related to Age, but the Test of Cognitive Development is not. Second, the Test of Cognitive Development and the Critical Thinking Appraisal subscales are related to high school GPA at entrance, but the Test of Thematic Analysis is not. Finally, although these measures are correlated with one another positively (see Table 6), previously reported factor analyses (see Mentkowski & Strait, 1983) supports distinctions between the summated measures of critical thinking for the first two assessments.
  • 15. Longitudinal Assessment of Critical Thinking in College Page 10 Comparison with Cross-Sectional Results We are compelled to offer another word of caution. Our own cross-sectional analyses would have led to different conclusions than our longitudinal analyses. The cross-sectional analyses, which controlled for attrition, showed a “gain” only on the “Defense” score for the Analysis of Argument instrument, F(1,125) - 7.26, p<.01, and the Inference subscale of the Critical Thinking Appraisal, F(1, 127) - 12.75, p<.001. Moreover, because the cross-sectional comparison involves the comparison of existing groups (which differed on high school GPA, for example), even these “gains” would have been potentially spurious. When we did control statistically for high school GPA in our cross-sectional comparison, the cross- sectional Inference “gains” were no longer statistically significant. We place greater faith in our longitudinal changes, of course. We point up this discrepancy between the longitudinal and cross- sectional results, because we believe it is an object lesson in the problems of interpretation from cross- sectional findings, even though each method has its place. Interpreting the New Production Measures The task of more specifically interpreting changes on Test of Thematic Analysis remains. We believe the decline on the Test of Thematic Analysis summated measure (5 of 9 criteria) can be attributed to the decline on the examples criterion. We interpret the tendency of students to give less examples, but to make more exceptions, as a positive outcome. The increasing tendency of students to make exceptions in their analysis of the essays suggests that the thinking of the students may be becoming more abstract. The decreasing tendency of the students to give examples of their abstractions seems somewhat puzzling at first. In order to explain this, we make the following observations. Faculty at Alverno College are pioneers in the writing across the curriculum movement. They explicitly teach students to write for a particular audience, and they have developed standards of performance on this dimension. Since the format of the Test of Thematic Analysis suggests that researchers are the audience, we imagine them increasingly reasoning as follows as they repeatedly encounter the Test of Thematic Analysis: “These researchers assuredly are aware of what the four sentence essays say. I don’t need to give examples from these essays to make my points, because they already know the examples. They would be bored by the obviousness of the examples, which would be about as long as the set of essays.” We believe styles of writing developed across school programs, as well as across disciplines, may vary. Thus, we encourage a complex interpretation of the decline and gain on the Test of Thematic Analysis scores. Future modifications of the instrument should consider, at a minimum, providing directions which identify a more explicit audience Although the findings for the Analysis of Argument instrument cast some doubt upon its suitability for use at this college as an outcomes measure, we believe modifications in the instrument may overcome these difficulties.
  • 16. Longitudinal Assessment of Critical Thinking in College Page 11 In regard to the instructions for the “Defense” essay, if they advised against role playing, this might improve the “Defense” essay’s measurement properties. In effect, students are penalized for role playing “unsophisticated” thinking, and for empathizing with the perspective of the essay. In regard to the “Attack” essay’s instructions, we note that they do not specifically request students to analyze the adequacy of the arguments in the stimulus essay. Instead the instructions ask the students to “argue against” the stimulus essay. The stimulus essay was an emotional one-sided essay that presented an unsupported position likely to be counter to the attitudes of the students. In their argument against this stimulus essay, the students in the present study predominantly tended to present the opposing position, which would be their own opinion, as opposed to attacking the logic of the essay. The emotional one-sided stimulus essay may have elicited an emotional need to present the opposing position. Thus, with respect to the lack of change on the Analysis of Argument “Attack” scores, one might speculate that these students have not specifically and habitually learned to give precedence to a logical analysis of the flaws in their opponent’s arguments; instead, their predominant tendency may have been to give a positive statement of the opposite position. If the instructions had specifically requested an analysis of the merits and flaws of the stimulus essay, we may have found developmental differences on this measure. We are proposing a distinction here between the ability to write, when requested, an intellectual critique in an emotional environment versus the habitual tendency to write a purely intellectual critique in an emotional environment. Implications for Standardization of Measures We note that Winter, McClelland, and Stewart (1981) reported both longitudinal and cross-sectional gains for the Analysis of Argument and Test of Thematic Analysis at “Ivy” College. These findings contrast somewhat with our findings at Alverno College, even though we used the same scoring procedures. We tend to discount the cross-sectional gains for Analysis of Argument we found because of our lack of longitudinal results, while we feel encouraged by our perhaps unique bi-directional longitudinal changes on the single criterion-scores of the Test of Thematic Analysis. Although we would like to have been able to demonstrate the robustness of the standard scoring and administration procedures developed at “Ivy” college, we have not been able to show cross-college equivalence for these measures. The likelihood that different educational strategies are used in different colleges has wide implications for using scoring schemes that are assumed to be standard, and requires further consideration. For instance, colleges may differ in how writing is taught and in how often role playing is pedagogically used, and these and other stylistic differences in instruction may cause students to interpret differently the instructions on these production measures. This confounding may greatly complicate the cross- institutional measurement properties of these measures. Not only may differing educational contexts elicit different interpretations of how to perform well on an instrument, but also these different interpretations may be equally reasonable if the instrument itself does not communicate to the student the preferred type of performance.
  • 17. Longitudinal Assessment of Critical Thinking in College Page 12 It is also possible that these differing educational strategies reflect differing conceptions of critical thinking outcomes. For example, at Alverno, role playing is often explicitly used to encourage students to take tolerantly the perspective of another person or culture. Students are encouraged to consider the cultural or personal assumptions they make. In contrast, the scoring procedures standardized by “Ivy” college for the Analysis of Argument “Defense” essay is based upon another conception. The ideal student portrayed in these scoring criteria. will insist upon maintaining a position consistent with her own, even as she obligingly defends the good points present in an opponent’s position. There is no necessary recognition of the cultural supports for one’s own beliefs or of the perhaps equal validity of another’s views. Another example of possibly different conceptions of critical thinking can be illustrated with the scoring of the “Attack” essay. The “Ivy” college scoring procedures reflect a preference for a logical analysis of the stimulus essay. The present college, however, also encourages students to express their own values and positions as well as encouraging logical analysis. If either the conception of critical thinking differs across colleges, or, else, the students’ interpretation based upon their educational context differs, even the direction of “gain” or “decline” on a criterion may also differ. This concern suggests that a single college population—however reputable—may not serve well as the major source for standardizing scoring criteria. In many instances, it would be desirable to try to insure that these production measures elicit similar interpretations of the task by students from different colleges. Here a distinction needs to be made between the respondent’s ability to conform to a standard versus the respondent’s predominant tendencies in unstructured situations. We do not rule out the development of several standardized scoring systems, or even the particularization of scoring systems to a college’s conception of critical thinking. What we earnestly recommend for production measures of ability, however, is that they offer to the respondent a clear idea of the standards that will be applied to their productions, if the researcher intends to represent their performance as their ability to conform to a standard. A more unstructured stimulus may be used when the researcher is interested in the predominant tendencies of the respondents. In such cases, the researcher reeds to be especially sensitive to the distinction between necessary habits supporting an ability and stylistic divergence. We feel that the Analysis of Argument and Test of Thematic Analysis do elicit habitual tendencies, but are not yet sensitive to stylistic divergence. Their instructions draw forth the respondent’s habitual tendencies, but the standardized scoring reflects a preferred response that does not accommodate alternative modes and styles of critical thinking. In general, we recommend fielding both types of production measures, those measuring the ability to demonstrate mastery to standards and those measuring habitual tendencies. We also recommend keeping these types of measures separate and distinct. Otherwise, it may not be possible to unconfound them.
  • 18. Longitudinal Assessment of Critical Thinking in College Page 13 Educational/Scientific Importance of the Study Researchers are encouraged by these results to continue development of measures which ask students to show the process of their thinking in addition to showing comprehension of knowledge, concepts or generalizations. Critical thinking appears best understood as comprised of several dimensions, which at this college develop differently across the college years. Researchers developing production measures need to distinguish in their measurement between habitual tendencies versus the ability to show mastery to a standard when requested. At present, the Analysis of Argument and Test of Thematic Analysis tend to be similar to projection measures in that they do not specifically guide responses and, as a result, tend to measure habitual tendencies. Our results suggest that these instruments need further refinement if they are to be used as standardized cross-college outcome measures. We were not able to show their scoring criteria to be internally reliable. We remain confident in the potential usefulness of the Test of Thematic Analysis and Analysis of Argument to a wide variety of colleges once they have undergone further development to either tailor them to the institution’s own educational goals and definitions, or to improve their generic cross- institutional equivalence. Even at this early stage of development, we have found the findings from the instruments useful in suggesting the habitual tendencies of our students. The Test of Thematic Analysis appeared sensitive to changes in their habits of thought. We urge educators to support researchers developing production measures and not to rely entirely on traditional, though efficient measures that may not work well as measures of well-practiced mastery. Production measures seem better suited to the measurement of the kind of mastery obtained by the consistent practice of higher-order thought and of the kind of habituality of thought processes that would lead to their use across situations. We have suggested that the Recognition of Assumptions subscale may tap a relatively sophisticated thought process, but we also note that some sophisticated thought processes, for example, as used in the integration of perspectives, may require a production measure. We encourage researchers to broaden their scoring schemes to include the kind of critical thinking practiced by adults already in the working world (Arlin, 1975; McClelland, 1973). Educators should be encouraged by our conclusion that critical thinking develops in college. The Alverno students not only showed gains on the Piagetian-based Test of Cognitive Development, but also their performance on this measure has been linked with their prior progress in the curriculum (cf. Mentkowski & Strait, 1983). These students also showed gains on each of the three subscales of the Critical Thinking Appraisal that we fielded. This finding is tempered somewhat by our inability to show a direct relationship between performance on this measure and progress in the curriculum. Performance on the Test of Thematic Analysis did appear to be linked to progress in the curriculum. In this regard, the Alverno students may have developed the habit of making more exceptions or qualifications to their analyses. Although they also may have weakened in their habit of giving examples to their analyses, we suspect this finding is context bound.
  • 19. Longitudinal Assessment of Critical Thinking in College Page 14 So far, we are able to support the usefulness of only two of three new measures of critical thinking. We are cautioned in interpreting these study results. For example, although this study generally confirmed the usefulness of the Test of Thematic Analysis, a comparative cross-sectional study of seven colleges was able to demonstrate change on the Test of Thematic Analysis for only two of the seven: “Ivy” College, a famous and highly selective one and Alverno College (see Winter, McClelland, and Stewart, 1981). Why? We have already noted that the instrument criteria and instructions were standardized by a single institution, “Ivy” college We also note that the findings for Alverno reported in that study, which were based upon a cross-sectional comparison of entering versus graduating students, have not been confirmed by our longitudinal results. But, even if the summated results from “Ivy” college and the results from the present analysis of the scoring criteria for the Test of Thematic Analysis are taken as suggestive of possible differences, we have yet to demonstrate robust cross-college findings. We note that at Alverno College, faculty have identified critical thinking abilities as college outcomes and developed teaching strategies and instruments to teach and assess them. Perhaps, colleges cannot expect change on production measures without such an explicit curriculum or without highly selective student bodies. If so, it means that researchers and educators must work together at both instrument and curriculum development at a range of colleges.
  • 20. Longitudinal Assessment of Critical Thinking in College Page 15 REFERENCES Arlin, P. (1975). Cognitive development in adulthood: A fifth stage? Developmental Psychology, 11(5), 602-606. Ewell, P. (1984). The self-regarding institution: Information for excellence. Boulder, CO: National Center for Higher Education Management Systems. Fredericksen, N. (1984). The real test bias: Influences of testing on teaching and learning. Paper presented at a conference on Teaching Thinking Skills, Wingspread Conference Center, Racine, WI. Lipman, M. (1984). Philosophy and the cultivation of reasoning_. Paper presented at a conference on Teaching Thinking Skills, Wingspread Conference Center, Racine, WI. Marchese, T. (1985). Learning about assessment. AAHE Bulletin, 38(1), 10-13. McClelland, D. (1973). Testing for competence rather than for “intelligence.” American Psycologist, 28, 1-14. Mentkowski, M., & Doherty, A. (1984). Abilities that last a lifetime: Outcomes of the Alverno experience. AAHE Bulletin, 36(6), 1-6, 11-14 Mentkowski, M., & Doherty, A. (1983, revised 1984). Careering after college: Establishing the validity of abilities learned in college for later careering and professional performance. Final report to the National Institute of Education: Overview and summary. Milwaukee, WI: Alverno Productions. Mentkowski, M., & Strait, M. (1983). A longitudinal study of student change in cognitive development, learning styles, and generic abilities in an outcome-centered liberal arts curriculum. Final report to the National Institute of Education, research report number six. Milwaukee, WI: Alverno Productions. Nickerson, R. (1984). Teaching thinking: What is being done and with what results? Cambridge, MA: Bolt Beranek and Newman, Inc. Paul, R. (1984). The concept of critical thinking: An analysis, a global strategy, and plea for emancipatory reason. Rohnert Park, CA: Sonoma State University Center for Critical Thinking and Moral Critique. Renner, J., Fuller, R., Lockhead, J., Johns, J., Tomlinson-Keasey, C. & Campbell, T. (1976). Test of Cognitive Development. Norman, OK: University of Oklahoma. Sigel, I. (1984). Reflection on thinking about thinking: The educational discovery of the 80’s? Paper for presentation at a conference on Teaching Thinking Skills, Wingspread Conference Center, Racine, WI. Sternberg, R. (1983). How can we teach intelligence? Philadelphia, PA: Research for Better Schools, Inc. Stewart, A. (1977a). _Analysis of argument:, An empirically-derived measure of intellectual flexibility. Boston: McBer and Company. Stewart, A. (1977b). Scoring manual for stages of psychological adaptation to the environment. Unpublished manuscript, Department of Psychology, Boston University. Watson, G., & Glaser, E. (1964). Critical Thinking Appraisal. New York: Harcourt, Brace, Jovanovich.
  • 21. Longitudinal Assessment of Critical Thinking in College Page 16 Winter, D. (1976). The Test of Thematic Analysis. Boston: McBer and Company. Winter, D., & McClelland, D. (1978). Thematic analysis: An empirically derived measure of the effects of liberal arts education. Journal of Educational Psychology, 70, 8-16. Winter, D., McClelland, D., & Stewart, A. (1981). A new case for the liberal arts: Assessing institutional goals and student development. San Francisco: Jossey-Bass.
  • 22. Longitudinal Assessment of Critical Thinking in College Page 17 Table 1: Inter-Item Reliability, Chronbach’s Alpha, for Each Time of Assessment1 Measure Time 1 Time 2 Time 3 Test of Cognitive Development NA .472 .54 Test of Thematic Analysis (all 9criteria) .28 .11 .19 Test of Thematic Analysis (5 of 9criteria) .36 .17 .35 Analysis of Argument Attack items .33 .17 .09 Analysis of Argument Defense items .79 .46 .75 1 For the reliability analyses, each scoring criteria was used as an item. For the Test of Cognitive Development, 5 scores were used as items. For the Analysis of Argument “Attack” essay, 5 scoring criteria were also available. For the Analysis of Argument “Defense” essay, 4 criteria were coded, but only 3 were used in the reliability analysis because of multicollinearity. NA The reliability coefficient is not available because only total scores were keypunched. 2 At time 2, the Test of Cognitive Development reliability coefficient is based only on the 1977 cohort, because only the total score for 1976 cohort was keypunched.
  • 23. Longitudinal Assessment of Critical Thinking in College Page 18 Table 2: Age by Time Repeated Measures ANOVAs for Linear and Quadratic Contrasts Measures Age Main Effect Linear Time Main Effect Quadratic Time Main Effect Test of Cognitive Development F(1,189) < 1 F(1,189) = 14.7*** F(1,189) = 1.7 Test of Thematic Analysis (5 of 9 criteria) F(1,192) = 22.4*** F(1,192) = 3.9* F(1,192) = 5.1* Test of Thematic Analysis exception F(1,192) = 3.6 F(1,192) = 14.0*** F(1,192 < 1 Test of Thematic Analysis example F(1,192)-16.4*** F(1,192) = 16.6*** F(1,192) < 1 Inference Subscale of Critical Thinking Appraisal F(1,180) = 6.6* F(1,180) = 19.1*** F(1,180) < 1 Recognition of Assumptions Subscale of Critical Thinking Appraisal F(1,180) = 4.9* F(1,180) = 4.1* F(1,180) = 6.8* Deduction Subscale of Critical Thinking Appraisal F(1,179) < 1 F(1,179) = 19.6*** F(1,179) < 1 1 Multivariate analysis of variance testing combined effects of linear and quadratic quadratic time were statistically significant for all reported main effects. Both linear and quadratic tests of the Age by Time interaction failed to reach to reach statistical significance (all F values less than 1). * p <.05 ** p < .01 *** p < .001
  • 24. Longitudinal Assessment of Critical Thinking in College Page 19 Table 3: Means for Linear Time Main Effect Ignoring Age Measures Time 1 Time 2 Time 3 Test of Cognitive Development 11.45 12.24 12.37 Deduction Subscale of Critical Thinking Appraisal 16.10 16.64 17.16
  • 25. Longitudinal Assessment of Critical Thinking in College Page 20 Table 4: Means for Time of Assessment Broken Down for the Age Main Effect Measure Time 1 Time 2 Time 3 Test of Thematic Analysis (5 of 9 criteria) Age 17 to 19 1.09 1.19 .91 Age 20 to 55 1.56 1.57 1.32 Test of Thematic Analysis, exception Age 17 to 19 .25 .36 .48 Age 20 to 55 .37 .48 .49 Test of Thematic Analysis, example Age 17 to 19 .32 .22 .16 Age 20 to 55 .50 .41 .29 Inference Subscale of Critical Thinking Appraisal Age 17 to 19 8.97 9.56 9.58 Age 20 to 55 9.87 10.45 10.93 Recognition of Assumptions Subscale of Appraisal Age 17 to 19 10.96 10.52 11.18 Age 20 to 55 11.26 11.35 12.01
  • 26. Longitudinal Assessment of Critical Thinking in College Page 21 Table 5: Correlation of Critical Thinking Measures with Progress in the Curriculum On “Competence Level Units” (CLU’s)1 and On Credits Achieved Measures Time 2 CLU's Time 2 Credits Time 3 CLU's Time 3 Credits Test of Cognitive Development .21** .15* .03 .08 Test of Thematic Analysis (5 of 9 criteria) –.20** –.31*** –.17** –.29*** Test of Thematic Analysis, exception –.04 –.09 .10 .13* Test of Thematic Analysis, example –.21** –.29*** –.10 –.25*** Inference Subscale of Critical Thinking Appraisal .05 –.07 –.08 –.15* Recognition of Assumption Subscale of Critical Thinking Appraisal .09 –.03 –.11 –.07 Deduction Subscale of Critical Thinking Appraisal .04 .06 –.11 –.02 1 Alverno Students demonstrate to criteria on over 100 assessments each of 8 broad abilities, which have been set by the faculty (e.g. communication, analysis, problem solving, etc.). It is these assessments that lead to the credentialing of the student on the sequential and developmental arranged “competence level units.” * p < .05 ** p < .01 *** p <.001
  • 27. Longitudinal Assessment of Critical Thinking in College Page 22 Table 6: Correlations between the Critical Thinking Measures at Time One Measures Test of Cognitive Test of Cognitive Development CTA Inference Subscale Recognition of Assumptions Subscale CTA Deduction Subscale Test of Cognitive Development .28*** .21** .35*** Test of Thematic Analysis (5 of 9 criteria) .38*** .20** .16* .29*** Test of Thematic Analysis, exception –.05 .10 .17** .06 Test of Thematic Analysis, example .24** .08 –.01 .17* Inference Subscale of Critical Thinking Appraisal .29*** .29*** Recognition of Assumptions Subscale of Critical Thinking Appraisal .38*** Deduction Subscale of Critical Thinking Appraisal * p < .05 ** p < .01 *** p <.001
  • 28. Longitudinal Assessment of Critical Thinking in College Page 23 Appendix A: Test of Thematic Analysis Criteria Included Versus Not Included Criteria Included in 5 of 9 Total Score Making “Direct Compound Comparisons” positively scored Giving “Examples” positively scored Using an “Analytic Hierarchy” positively scored “Redefinition” for Scope or Clarity positively scored Comparing “Apples and Oranges” negatively scored Criteria Excluded From 5 of 9 Total Score Making “Exceptions” or “Qualifications” positively scored “Subsuming Alternatives” positively scored “Affect” negatively scored “Subjective Reaction” negatively scored