Argumentative Writing And Academic Achievement A Longitudinal Study

Argumentative writing and academic achievement: A longitudinal study
David D. Preiss a,
⁎, Juan Carlos Castillo a
, Elena L. Grigorenko b
, Jorge Manzi a
a
Escuela de Psicología, MIDE UC Measurement Center, Pontificia Universidad Católica de Chile, Chile
b
Child Study Center, Department of Epidemiology and Public Health, Department of Psychology, Yale University, USA
a b s t r a c t
a r t i c l e i n f o
Article history:
Received 15 March 2012
Received in revised form 31 October 2012
Accepted 22 December 2012
Keywords:
Writing
Writing assessment
University admissions
High-stakes testing
Academic achievement
Capitalizing on the implementation of a writing assessment initiative implemented at a major Chilean uni-
versity, we test how predictive writing is of subsequent academic achievement. First, using a multilevel an-
alytic approach (n=2597), the study shows that, after controlling for socio-demographic variables and the
university admission tests, writing skills significantly predict first-year university grades. Second, using infor-
mation about the performance of students during their first eight semesters in the university (n=1616), a
longitudinal hierarchical analysis showed that writing remains a significant predictor of university grades
over time, also after controlling socio-demographic variables and university admissions tests. Moreover, lan-
guage skills retain or improve their predictive role over time, whereas mathematics skills seem to decrease in
their importance. Our results show that writing, and the cognitive skills involved in writing, play a critical
role in advanced stages of academic training, consequently offering additional support for the consideration
of this ability for university admission purposes.
© 2013 Elsevier Inc. All rights reserved.
1. Introduction
Currently there is a growing interest in the inclusion of alternative
assessments to the conventional ones used in university admission
processes (Atkinson & Geiser, 2009; Kyllonen, 2012; Manzi, Flotts, &
Preiss, 2012; Stemler, 2012; Sternberg, 2004, 2010). In this context,
writing assessment has taken a prominent place as an additional
high stakes measure. Part of this interest originates in a concern for
the quality of writing high-school graduates and university level
students are capable of producing (Kellogg & Raulerson, 2007;
Kellogg & Whiteford, 2009; Lee & Stankov, 2012; Manzi et al., 2012;
The National Commission on Writing in America's schools and
colleges, 2003). Thus, several North American university admission
tests such as the GRE, SAT and ACT1
incorporated writing measures
(Jeffery, 2009; Norris, Oppler, Kuang, Day, & Adams, 2004). The ana-
lytical part of the GRE was transformed in a writing assessment that
evaluates both critical thinking and analytical writing skills; the SAT
has a section assessing writing by means of a short essay; the ACT
has an optional section assessing students' level of understanding of
the conventions of standard written English as well as their ability
to produce text.
In Chile, where this study was carried out, high stakes writing as-
sessment has never been implemented. Yet a growing awareness of
entry-level university students' deficits in writing has prompted
higher education institutions to develop writing assessment initia-
tives, although they do not play a role in admissions yet. Specifically,
in 2003, the Pontificia Universidad Católica de Chile developed a test
of argumentative writing: The Writing Communication Test (WCT)
(Manzi et al., 2012). The WCT differs from the abovementioned
American examinations in two critical dimensions. On the one hand,
it is a test that is taken once the student has been accepted to his or
her undergraduate programs —.i.e., the test does not play a role in
admission decisions. Yet, it is still a high-stakes test since taking and
passing the test is a graduation requirement. Indeed, the score
obtained in the test is used as a criterion to determine whether the
student must or must not attend additional classes to improve his
or her writing communication skills (hereafter, WCS). On the other
hand, the test uses an analytic rubric that assigns independent scores
for several dimensions of the essay. In the study reported in this
paper, we evaluate how predictive WCT scores are of subsequent
academic achievement during students' undergraduate education.
Large-scale assessment of WCS has many educational advantages.
First, it has an impact on teaching and learning, as educational sys-
tems are more likely to teach what is finally assessed (Grigorenko,
Jarvin, Nui, & Preiss, 2008; Sternberg, 2010). Second, it informs public
Learning and Individual Differences 28 (2013) 204–211
⁎ Corresponding author at: Escuela de Psicología, Pontificia Universidad Católica de
Chile, Av. Vicuña Mackenna - 4860. Macul, Santiago 7820436, Chile. Tel.: +56 2
3544605; fax: +56 2 3544844.
E-mail addresses: davidpreiss@uc.cl, daviddpreiss@gmail.com (D.D. Preiss).
1
The GRE (Graduate Record Examinations) is a standardized test created and ad-
ministered by the Educational Testing Service (ETS). It has three subtests in verbal rea-
soning, quantitative reasoning and analytical writing. The GRE is required as an
admission requirement for graduate or business school in the United States and in
some other English speaking countries. The SAT is a standardized assessment of critical
reading, mathematical reasoning, and writing skills administered by the College Board
and used for college placement in the USA. The ACT is a curriculum- and standards-
based standardized test administered by ACT Inc., which works as an alternative to
the SAT. It has subtests in English, Mathematics, Reading, and Science Reasoning as
well as an optional writing test.
1041-6080/$ – see front matter © 2013 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.lindif.2012.12.013
Contents lists available at ScienceDirect
Learning and Individual Differences
journal homepage: www.elsevier.com/locate/lindif

policy and decision-making processes targeting the development of
writing within the educational system. Moreover, writing assessment
has a broad appeal because it has some attributes that make it distin-
guishable from other assessment tools (O'Neill, Moore, & Huot, 2009;
Powers, Fowles, & Willard, 1994). In fact, writing tests are commonly
labeled as direct writing assessment (hereafter, DWA) since the skills
that are the target of measurement are assessed directly. In contrast
to multiple-choice tests, which measure latent constructs, DWA
does not measure a latent ability. Provided that measurement stan-
dards are established precisely, writing abilities can be assessed in a
straightforward way.
DWA involves relevant assessment challenges, which include
those related to the generation of writing prompts, the definition of
the construct, and the rating process (Manzi et al., 2012). Because
topic knowledge affects text quality (McCutchen, Teske, & Bankston,
2008), a bad choice of a thematic prompt may bias the measurement
process by giving a relative advantage to a group more versed on the
topic, independently of its writing abilities. In addition to different
topics, when generating the writing prompt, test designers may opt
between different genres: the narrative, descriptive, argumentative,
and expository genres are those most commonly mentioned. The
option for a specific genre has an impact on writing performance
and the scoring process (Beck & Jeffery, 2007; Kellogg & Whiteford,
2012; Lee & Stankov, 2012). Writing argumentative and expository
text is more cognitively demanding than writing narrative and
descriptive text (Weigle, 2002). Because of its relevance in academic
discourse, the argumentative genre has been one of the most favored
genres in writing assessment. Additionally, benchmarks to assess the
written products must be aligned to the genre demanded by the test
prompts. This is not always the case. Beck and Jeffery (2007) assessed
the genre demands related to writing examinations existing in the
three most populated American states and found that the bench-
marks used were not aligned to the type of genre stimulated by the
prompts and, consequently, the examinations faced several validity
issues. Last but not least, there are challenges related to the definition
of the construct, which materialize at the moment of setting up the
rubrics used to assess the writing samples.
The composition literature distinguishes three types of rating
scales: primary trait scales, holistic scales and analytic scales (Lee &
Stankov, 2012; Weigle, 2002). The option for any of these procedures
involves an implicit definition of what quality writing is. Primary trait
assessment involves the identification of one or more primary traits
relevant for a specific writing task and related to its purpose, assign-
ment and audience. Holistic assessment is based on the overall
impression the rater has about the written product. This impression
is based on a scoring rubric that is complemented by benchmarks.
This scoring strategy guides the writing assessment made for the
National Assessment of Educational Progress (Lee & Stankov, 2012).
Although holistic scoring is a practical option, it does not allow diag-
nosing strengths and weaknesses in writing (Weigle, 2002). More de-
tailed information about writing is provided by analytical assessment.
It involves assessing different features relevant for good writing.
Some analytical scales weight these attributes, so certain attributes
are considered more relevant (e.g., the global organization of the
text) and have a larger weight in the final score than others (e.g., or-
thography). Besides its utility in providing a more detailed profile of
students' writing, analytical scoring is more instrumental in rater
training as it provides inexperienced raters with more specific guide-
lines about the assessed construct (Weigle, 2002).
The use of writing measures has allowed researchers to start
assessing how predictive writing is of subsequent academic success.
Recent studies show that the ability to produce good argumentative
text is the best predictor of academic success during the first year of
the university (Geiser & Studley, 2002; Kobrin, Patterson, Shaw,
Mattern, & Barbuti, 2008). Specifically, a study which explored discrep-
ant SAT Critical Reading and Writing Scores found, after controlling for
student characteristics and prior academic performance, that students
who had relatively higher writing scores, as compared with their critical
reading scores, obtained higher grades in their first year of college and
in their first-year English course (Shaw, Mattern, & Patterson, 2011).
In Chile, the only evidence available concerning the predictive value of
writing assessments is related to the test described here. Specifically,
for the 2008 round of assessment, WCT performance was positively cor-
related to academic achievement in most undergraduate programs at
the Pontificia Universidad Catolica de Chile with an average correlation
of .12 (Manzi et al., 2012). Studies assessing how predictive writing is
beyond the first year at the university are scarce. Here, capitalizing on
the implementation of the Chilean assessment, we intend to address
that issue by assessing how predictive WCT scores are of academic
achievement during the students' subsequent eight semesters of uni-
versity study.
2. Methods
2.1. Sample
Data from the Pontificia Universidad Católica de Chile's 2007 freshmen
cohort of students were analyzed in this study. Students entering the
university graduated from three different types of high schools: some of
them graduated from schools entirely funded by the State (public
schools), some others graduated from schools that are privately managed
but receive public funding and, in many cases, charge the parents an
additional fee (voucher schools), and some others from entirely private
schools. In Chile, the school of origin is a proxy of the socioeconomic back-
ground of the families: students from more affluent families go to private
institutions whereas those from disadvantaged backgrounds go to public
institutions (Drago & Paredes, 2011; Elacqua, Schneider, & Buckley, 2006).
Although voucher schools as a whole recruit a relatively diverse pool of
students, there are significant socioeconomic gaps between voucher
schools as they select their students depending upon their educational
and financial goals (Mizala & Torche, 2012). A large part of the students
participating in the study were undergoing professional training as in
the Chilean university system the majority of the students enroll in
professional programs directly, that is, professional training is part of
the core of their undergraduate curriculum. Table 3 summarizes the back-
ground information for the sample.
The students were automatically registered to take the WCT when
enrolling for their courses and were recruited by their respective
academic units. Students were expected to take the test during their
first year, although they were given the opportunity of taking the
test two more times during their studies. From the 3760 students
enrolled in 2007, 2879 (76.56%) took the test during their first year.
Missing data were handled in the analysis by means of listwise dele-
tion. For the cross sectional multilevel estimation, which was made
on all the students having data available at the end of their first
year of studies, the final sample was 2597. For the longitudinal esti-
mation, which was made on all the students enrolled in programs
that had at least 15 students who had completed their fourth year
of studies by the second term of 2010, the number of participants
was 1616.
2.2. Measures and procedures
2.2.1. The Written Communication Test
The WCT presented the students with three topics and asked them
to produce a 2-page essay on a theme of their preference, among
three possible alternatives. These themes were related to issues of
general interest, excluding themes related to specific disciplines to
avoid possible biases. The topics were presented as an opinion that
could be challenged or defended by the student. An example of the
proposed themes is the following: Some people think that freedom of
speech, or of the press, is an essential value that is severally damaged
205
D.D. Preiss et al. / Learning and Individual Differences 28 (2013) 204–211

when the rights to privacy of those affected by the dissemination of infor-
mation to the public are accorded a higher importance than freedom of
speech.
The specific guidelines students received to write the essay indicat-
ed the different dimensions of writing that were going to be taken into
account in the evaluation process. These aspects were the development
of an argument line that includes a thesis, arguments that sustain it and
a conclusion. Additionally, students were required to include at least
one alternative point of view or counterargument. In addition to call
students' attention to formal aspects such as orthography, use of ac-
cents and punctuation, as well as vocabulary use, the guidelines explic-
itly asked the students to organize the exposition in such a way that a
reader could follow a clear and coherent argument line.
An analytic rubric was used for grading the essays. The rubric
distinguished five performance levels in the different dimensions
that were object of assessment. Some of the assessed dimensions
were related to formal aspects of written discourse whereas others
were related to content, quality and coherence of argumentative rea-
soning. Specifically, the rubric considered the following dimensions
(the first five are related to the formal aspects of written discourse):
• Orthography: good use of Spanish basic orthographic rules (literal
spelling, punctuation and accents).
• Vocabulary: amplitude, precision and appropriateness to academic use.
• Text structure: presence of introduction, development and conclusion.
• Textual cohesion: adequate use of phrase grammar and connectors.
• Use of paragraphs: sentence use and inclusion of main ideas in each
one of the paragraphs.
• Argument quality: global coherence of arguments, that is, variety and
quality of arguments as they relate to a thesis.
• Thesis: presence of an explicit thesis about the topic.
• Counterarguing: Coherence of arguments based on presentation of one
or more counterarguments.
• Global assessment: general assessment of content and text quality by
the rater, who gives his or her overall appreciation of the text, after
grading all the other dimensions.
In each scoring dimension, five performance levels were distin-
guished: insufficient performance, limited performance, acceptable
performance, good performance, and outstanding performance. To
illustrate these performance levels, we present the rubric used for
the dimension on use of paragraphs:
• Insufficient performance. The text does not present clearly identifiable
paragraphs. Or: Most of the paragraphs are not well implemented as
they reiterate the same idea of the previous paragraph or advance
more than one core idea.
• Limited performance. The text presents clearly identifiable paragraphs;
however, more than one paragraph can either reiterate the same idea of
the previous paragraph or advance more than one core idea.
• Acceptable performance. The text presents clearly identifiable
paragraphs; however, one paragraph can either reiterate the same
idea of the previous paragraph or advance more than one core idea.
• Good performance: The text presents clearly identifiable paragraphs;
each paragraph identifies a core idea that is different from the core
idea advanced in the previous paragraph.
• Outstanding performance: The text presents clearly identifiable par-
agraphs; each paragraph identifies a core idea that is different from
the core idea advanced in the previous paragraph. Additionally,
there is evidence of progression from one paragraph to the next.
A professional team including 26 specialists in Spanish language
trained in the use of the rubric rated the essays. These raters were or-
ganized in three groups and each one of them was in charge of a su-
pervisor whose main task was to maintain the raters calibrated. Use
of benchmarks helped to calibrate raters by illustrating the way rubrics
had to be applied. At the beginning of the rating process, a number of
essays were scored by all raters to check the level of consistency and
consensus they reach. After this stage, one rater scored all essays. In
order to assess the level of agreement between raters, 20% of the essays
were assigned to a second rater. In all those cases involving double
rating, the final score was the average between raters. As shown in
Table 1, 95% of discrepancies were below 1 point, which was
established as the threshold for agreement/disagreement. In a small
proportion of essays (32 cases), the discrepancy between raters
exceeded that threshold. In those cases the supervisor of the rating
process proceeded to rate the essay and his or her score was consid-
ered as the final score. The final score for each essay was computed
as the mean for each one of the nine dimensions assessed in the rubric.
2.2.2. PSU
The PSU is the Chilean university entrance examination, mandatory
for all applicants interested in admission to universities receiving public
funding. The examination includes a Language and Communication as
well as a Mathematics test. The PSU in Language and Communication
measures three competencies: the competency to extract information,
the competency to interpret explicit and implicit information and the
competency to assess explicit and implicit information. The topics
considered include knowledge of Spanish language, knowledge of liter-
ature, and knowledge of mass media and communication. The PSU in
Mathematics measures three sets of cognitive skills: recognition of
facts, terms and procedures; understanding of information in a mathe-
matical context, application of mathematical knowledge to known and
unknown situations, and analyzing, synthesizing and evaluating math-
ematical relations in problem solving. The topics considered include
high-school level number and proportionality, algebra and functions,
geometry and probability and statistics (DEMRE, 2012).
2.3. Data analysis
For reference, tables with descriptive statistics and intercorrela-
tions of the eight WCT dimensions and the raters' global assessment
are included in the Appendices. The raters' global assessment had
significant positive correlations (pb0.01) with all the WCT dimensions.
The highest correlation coefficients observed for the raters' global assess-
ment are those it has with the two dimensions related with argumenta-
tive writing (Argument quality, r=.59; and Counterarguing, r=.49).
The smallest correlations observed for the raters' global assessment are
those it has with the two dimensions related with the formal aspects of
writing (Orthography, r=.20; and Vocabulary, r=.16). The intercorrela-
tions between the WCT dimensions are all significant but for Vocabulary,
which has non-significant correlations with Text structure, Argument
quality, Thesis and Counterarguing. For the WCT dimensions, the
smallest significant correlation is that between Orthography and Thesis
(r=0.05) and the highest is that between Argument quality and Thesis
(r=0.45). Most of the other correlations are in the .1–.3 range
The WCT average scores were merged with a database providing
background information of the students, their scores in the national
university admission test (PSU), their high school grades as well as
their GPAs for each academic semester until 2010. Thus, each student
had information for eight GPAs (two per year from 2007 to 2010).
Chilean GPAs range from 1.0 to 7.0, 7.0 being the highest grade. De-
scriptive statistics for these variables are summarized in Table 2.
Table 1
Distribution of the discrepancy levels in the Writing Communication Test final scores
between raters.
Discrepancy Percentage of cases Accumulated percentage
0 to 0.5 points 65.47% 65.47%
0.6 to 1.0 points 28.98% 94.46%
1.1 to 1.5 points 4.74% 99.19%
More than 1.5 points 0.82% 100.00%
Data are based in double rated essays only (N=866).
206 D.D. Preiss et al. / Learning and Individual Differences 28 (2013) 204–211

The association of the independent variables with the GPAs was
estimated in a multilevel modeling framework. The academic pro-
grams were defined as level two units, since it was plausible that
they might have an influence in the variations of the GPAs and, there-
fore, violate the ordinary regression models' independence assump-
tion. The variables corresponding to the individual background
information (level one) were sex and parents' educational level, as
well as the administrative dependency of the high school (School Ad-
ministration), which as mentioned above in Chile is a proxy of
socio-economic status. As level two variables we included the aca-
demic programs' average PSU score, which was a proxy for selectivity
of the programs, and the domain of academic program, which we
classified in seven different groups, as described in Table 3.
3. Results
We first present a cross-sectional analysis focused on the first year
GPAs. After, we analyze the whole range of GPAs for the 2007 cohort
from 2007 to 2010 in a longitudinal modeling framework. Table 4
shows the correlations between the individual level variables to be
considered in the explanatory models. WCT scores have a significant
positive but moderate association with GPAs, although they are as
well moderately associated with other predictors such as PSU
(Language) scores, High School Grades (NEM), Parents' Education and
Sex, which could reduce its influence when considered simultaneously.
3.1. Multilevel estimation
Next, using a multilevel modeling framework, we assess whether
the WCT score is related to GPA after controlling for other academic
predictors and socio-demographic variables. Furthermore, we account
for possible differences in GPA according to academic programs, since
it is plausible that the distribution of grades is not uniform across
them. Thus, we present the results of a series of multilevel models
for the 2007 GPAs: student data are the first level and the academic
program variables are the second level. The intra-class correlation
coefficient obtained for the model without predictors indicates the
percentage of variance of the GPAs associated with the academic pro-
grams. For the 2007 GPAs, this coefficient reaches 43%, supporting the
assumption that there are considerable differences among academic
programs in terms of their grade distribution.
Table 5 shows the result of the estimation of the fixed and random
effect component for several models for the 2007 GPAs. The models
differ in the number and type of predictors included. Model I uses
only WCT as a predictor, revealing a significant relationship between
WCT and GPAs. The following models test the effect of WCT on GPAs
after controlling for other academic and socio-demographic variables.
In Model II we control for PSU Language, High School Grades, Parents'
Education, School Administration and Sex. PSU Language and High
School Grades have significant effects, and we observe that with
these variables in the model, the parameter associated with the
WCT is lower than in Model I but still significant. Interestingly, School
Administration, a variable with clear socioeconomic implications in
the context of the Chilean school system, does not show significant
effects in Model II. In Model III we were also interested in the interac-
tions of WCT and PSU Language admission scores with Sex, but these
interactions were not significant. Model IV added level two predic-
tors, namely the domain of the academic program and the average ac-
ademic programs' admission test score as a proxy for selectivity. The
results revealed that, in addition to the previous effects, academic
programs show significant differences in their grades: education pro-
grams have the best GPAs and engineering has the worst (when com-
pared with Social Sciences as a reference category). Considered
together, level two predictors account for almost half of the between
academic programs' variance (R2
=49%). WCT, PSU Language score
and High School Grades were significant in all four models. School
Administration is significant in Models III and IV (with students com-
ing from voucher schools showing lower GPAs than their public
school classmates).
Models V, VI and VII follow the same structure of models II, III and
IV, but now PSU Mathematics replaces PSU Language as an admission
score predictor. Results are very similar for the main variables. WCT,
PSU Mathematics and High School Grades are significant predictors
in all cases. School Administration is now significant in all cases
(with students coming from voucher schools showing lower GPAs
than their public school classmates). The other interesting difference
is that in model VI, where interactions are tested, a significant Sex by
PSU Mathematics interaction was found, which shows that PSU Math-
ematics underpredicts females' achievement in GPAs. In sum, the re-
sults of cross sectional multilevel modeling reveals that WCT is a
significant predictor of first year grades regardless of the variables
controlled in the analyses. Next, we will test whether the WCT re-
mains significant when we look into GPAs beyond first year.
3.2. Longitudinal analysis
In this part of the analyses we focused our attention on the influ-
ence of WCT across time, testing whether WCT remains a significant
predictor as students advance in their undergraduate programs. In
this analysis, we used the GPAs at each semester as the dependent
variables (8 points were considered, using data from the first semes-
ter of 2007 through the second semester of 2010). For this longitudi-
nal analysis we also applied a multilevel perspective (Singer &
Willett, 2003), whereby the GPAs constitute the first level units that
are nested in individuals (level two) that at the same time are nested
in academic programs (level three). The cases selected for the
Table 3
Background variables.
Values N
Parents' highest
educational level
0. Less than university education 651
1. University education 2102
Sex 0. Male 1404
1. Female 1475
School administration 0. Public 390
1. Prívate 1820
2. Voucher private 656
PSU average
Academic area 0. Social sciences 429 687.17
1. Engineering 732 734.19
2. Education 202 623.21
3. Sciences 628 681.26
4. Art 193 692.88
5. Health 228 725.70
6. Law and humanities 467 691.83
Note: The numbers of subjects in each variable were obtained with pairwise deletion
between the independent variables and WCT score.
Table 2
Written Communication Test (WCT) and students' academic achievement indicators.
Variable
name
Description Min Max Mean SD
WCT Written Communication Test
score
1.00 4.89 3.29 0.48
PSU Language University admission language
test score
483.00 850.00 692.00 62.94
PSU Math University admission
mathematics test score
428.00 850.00 704.00 72.71
PSU Average score in PSU language
and PSU mathematics
520.00 838.00 697.00 53.19
NEM High school grades converted
into admission scores
435.00 826.00 694.00 53.19
207

analysis were academic programs that had at least 15 students who
had completed the second term of 2010 (1616 students belonging
to 26 academic programs).
Table 6 presents the results of the estimation. In the analyses we
followed a similar plan as the one used with the cross-sectional
case. Starting with WCT as the only predictor in Model I, we added
Table 5
Multilevel regression models of grade point average (GPA) 2007 on individual and academic programs' variables.
I II III IV V VI VII
Fixed effects
Level 1 — indiv.
WCT 0.16⁎⁎ 0.08⁎⁎ 0.08⁎ 0.08⁎⁎ 0.12⁎⁎ 0.14⁎⁎ 0.12⁎⁎
(6.42) (3.24) (2.33) (3.23) (5.27) (4.52) (5.27)
PSU Language 0.00⁎⁎ 0.00⁎⁎ 0.00⁎⁎
(8.31) (6.64) (8.39)
PSU Math. 0.00⁎⁎ 0.00⁎⁎ 0.00⁎⁎
(13.62) (13.77) (13.82)
NEM 0.00⁎⁎ 0.00⁎⁎ 0.00⁎⁎ 0.00⁎⁎ 0.00⁎⁎ 0.00⁎⁎
(15.17) (15.14) (15.28) (15.59) (15.68) (15.75)
Parents' highest educ. level 0.05 0.05 0.05 0.04 0.04 0.05
(1.60) (1.60) (1.64) (1.48) (1.49) (1.51)
School administration
(ref: public)
Private 0.04 0.07 0.07 0.03 0.02 0.03
(1.12) (1.75) (1.78) (0.70) (0.68) (0.70)
Voucher private −0.07 −0.11⁎⁎ −0.11⁎⁎ −0.08⁎ −0.08⁎ −0.08⁎
(−1.95) (2.85) (2.90) (2.10) (2.19) (2.14)
Sex (female) −0.03 −0.03 −0.06⁎ −0.01 1.07⁎⁎ −0.01
(−1.08) (0.10) (2.28) (0.21) (3.90) (0.32)
Sex×PSU Lang. −0.00
(0.09)
Sex×PSU Math. −0.00⁎⁎
(3.97)
Sex×WCT −0.00 −0.04
(0.03) (0.92)
(Intercept) 4.45⁎⁎ 1.28⁎⁎ 1.27⁎⁎ 2.06 −0.08 −0.62⁎ 2.41
(36.68) (5.90) (5.25) (1.43) (0.31) (2.14) (1.55)
Level 2 — ac. prog.
PSU average −0.00 −0.00
(0.57) (1.58)
Academic area
(ref: social sciences)
Engineering −0.51 −0.75⁎⁎
(1.94) (2.65)
Education 1.00⁎⁎ 0.88⁎⁎
(3.19) (2.63)
Sciences −0.16 −0.39
(0.83) (1.86)
Art 0.04 0.03
(0.18) (0.12)
Health 0.39 0.24
(1.49) (0.84)
Law and humanities 0.02 0.09
(0.07) (0.39)
Random effects
Variance between 0.25 0.25 0.25 0.12 0.34 0.33 0.14
Variance within 0.34 0.30 0.30 0.30 0.28 0.28 0.28
Log likelihood −2338 −2168 −2169 −2158 −2114 −2105 −2100
Maximum likelihood estimation, unstandardized coefﬁcients, t values in parentheses. Intra class correlation of null model: 0.43. N level 1=2597, N level 2=31.
⁎ pb0.05.
⁎⁎ pb0.01.
Table 4
Correlation matrix of grade point average with admission scores and sociodemographic variables.
1 2 3 4 5 6 7 8
1. GPA –
2. WCT 0.13⁎⁎ –
3. PSU (average) 0.09⁎⁎ 0.17⁎⁎ –
4. PSU-Language 0.18⁎⁎ 0.28⁎⁎ 0.74⁎⁎ –
5. PSU-Math −0.03 0.01 0.79⁎⁎ 0.22⁎⁎ –
6. NEM 0.20⁎⁎ 0.14⁎⁎ 0.43⁎⁎ 0.35⁎⁎ 0.37⁎⁎ –
7. Parents' ed. 0.11⁎⁎,a
0.12⁎⁎,a
0.44⁎⁎,a
0.35⁎⁎,a
0.35⁎⁎a
0.20⁎⁎,a
–
8. Sex (female) 0.22⁎⁎,a
0.12⁎⁎,a
−0.29⁎⁎,a
−0.01a
−0.41⁎⁎,a
0.08⁎⁎,a
0.03b
–
N=2877.
a
Point-biserial correlations.
b
Tetrachoric correlations.
⁎ pb0.05.
⁎⁎ pb0.01.

the same individual and academic program level variables in the fol-
lowing models. In model I, we observed that WCT was a significant
predictor of GPAs, as it was after one year of university studies. In
Model II we controlled for PSU Language, High School Grades,
Parents' Education, School Administration and Sex. PSU Language
and High School Grades presented a highly significant effect. At the
same time, WCT remained significant. Unlike the cross-sectional anal-
ysis focused on the first year GPAs, the longitudinal analysis show
that having a parent with university education does make a difference
in student performance. This effect remains significant throughout
the models. The inclusion of the academic programs in Model III indi-
cates that university grades are significantly different among academ-
ic areas, with engineering having the lowest grades, and education
the highest ones. In Model IV we were interested in modeling the
effect of time by estimating whether the predictive capacity of WCT,
PSU Language and High School Grades changes as students advance
in their academic programs. Thus, a time variable was introduced:
the semester, coded from 1 (first semester 2007) to 8 (second semes-
ter 2010). Further, the slope of the time variable was allowed to vary
(random slopes) and an interaction term with time was included for
each one of the main predictors: WCT, PSU Language and High School
Grades. The results of model IV indicate that WCT did not interact
with the semesters, indicating that the significant role of WCT does
not change over time. However, PSU Language shows a significant
positive interaction with semester, indicating that this test increases
its predictive capacity over time. Models V, VI and VII replicate the
Table 6
Longitudinal multilevel regression models.
I II III IV V VI VII
Fixed effects
Level 1 — time
Semester −0.07⁎⁎ 0.06⁎⁎
(2.94) (2.72)
(Intercept) 4.78⁎⁎ 2.15⁎⁎ 1.90⁎⁎ 2.19⁎⁎ 2.10⁎⁎ 2.11⁎⁎ 1.93⁎⁎
(49.71) (11.92) (3.38) (3.96) (9.92) (3.63) (3.39)
Level 2 — indiv.
WCT 0.16⁎⁎ 0.09⁎⁎ 0.09⁎⁎ 0.07⁎⁎ 0.12⁎⁎ 0.12⁎⁎ 0.08⁎⁎
(7.31) (4.49) (4.46) (2.78) (6.01) (5.99) (3.18)
PSU Language 0.00⁎⁎ 0.00⁎⁎ 0.00⁎⁎
(6.90) (6.94) (3.60)
PSU Math. 0.00⁎⁎ 0.00⁎⁎ 0.00⁎⁎
(5.26) (5.63) (7.09)
NEM 0.00⁎⁎ 0.00⁎⁎ 0.00⁎⁎ 0.00⁎⁎ 0.00⁎⁎ 0.00⁎⁎
(15.28) (15.58) (16.09) (15.43) (15.76) (16.34)
School admin. (ref: pub.)
Private 0.07⁎ 0.07⁎ 0.06 0.06 0.06 0.05
(2.15) (2.11) (1.90) (1.94) (1.87) (1.57)
Voucher private −0.04 −0.05 −0.06 −0.04 −0.04 −0.05
(1.29) (1.39) (1.70) (1.07) (1.14) (1.43)
Parents highest educ. 0.06⁎ 0.06⁎ 0.06⁎ 0.06⁎ 0.06⁎ 0.06⁎
(2.12) (2.19) (2.19) (2.15) (2.17) (2.12)
Sex −0.01 −0.02 −0.03 0.01 0.01 −0.01
(0.45) (0.70) (1.42) (0.51) (0.33) (0.37)
Lev. 3 — acad. pr.
PSU average 0.00 0.00 −0.00 −0.00
(0.45) (0.21) (0.03) (0.51)
Academic area (ref: social sciences)
Engineering −0.53⁎⁎ −0.50⁎⁎ −0.64⁎⁎ −0.62⁎⁎
(5.20) (5.04) (6.08) (6.05)
Education 0.61⁎⁎ 0.68⁎⁎ 0.55⁎⁎ 0.62⁎⁎
(5.07) (5.78) (4.40) (5.13)
Sciences −0.25⁎⁎ −0.21⁎ −0.34⁎⁎ −0.32⁎⁎
(2.74) (2.44) (3.71) (3.52)
Art −0.01 0.00 −0.02 −0.01
(0.07) (0.04) (0.17) (0.06)
Health −0.06 0.00 −0.12 −0.06
(0.53) (0.01) (1.13) (0.61)
Law and humanities 0.08 0.09 0.12 0.14
(0.89) (1.04) (1.26) (1.48)
Interactions
WCT×semester 0.01 0.01⁎⁎
(1.15) (2.64)
PSU Lang.×semester 0.00⁎⁎
(4.40)
PSU Math.×semester −0.00⁎⁎
(2.76)
Random effects
Var between lev 3 0.09 0.08 0.01 0.01 0.10 0.02 0.01
Var between lev 2 0.15 0.12 0.12 0.00 0.12 0.12 0.00
Var within 0.16 0.16 0.16 0.16 0.16 0.16 0.15
Sem. random slope 0.00 0.00
Cov. slope intercept −0.01 −0.01
Log likelihood −8214 −8056 −8036 −7202 −8066 −8044 −7212
Maximum likelihood estimation, unstandardized coefficients, t values in parenthesesIntraclass correlation of null model: 0.38 (level 2), 0.23 (level 3). N level 1=12,808; N level 2=1616;
N level 3=26.
⁎ pb0.05.
⁎⁎ pb0.01.
209

structure of models II, III and IV using now PSU Mathematics instead
of PSU Language. The effects are similar in magnitude and direction
when compared with the models with PSU Language. The main
difference appears in model VII when testing interactions, because
we now found that WCT significantly interacts with time, showing
that the role of WCT on grades improves over time. In contrast, the
interaction of PSU Mathematics with time indicates that the predic-
tive validity of this test decreases as students advance in their
programs.
In summary, the longitudinal analyses indicate that WCT remains
a significant predictor of university grades over time, even after con-
trolling for a number of individual and academic level variables.
Moreover, the pattern of interactions with time is relevant. Compar-
ing models IV and VII, we observed that WCT did not change its pre-
dictive role when coupled with PSU Language, but it did improve that
role when combined with PSU Mathematics. This seems to indicate
that language skills (as measured by PSU Language and WCT) retain
or improve their predictive role over time, whereas mathematics
skills seem to decrease in their importance over time. This observa-
tion probably reflects the increasing importance of language skills
(reading and writing) in the advanced semesters of the academic
programs.
4. Discussion and conclusions
This paper presents empirical evidence regarding the importance
of writing skills for academic success in university level studies.
First, this study shows that writing skills significantly predict first
year university grades. Moreover, writing remains a significant pre-
dictor even after controlling for socioeconomic background variables,
which in Chile are strongly correlated with educational outcome vari-
ables (Garcia-Huidobro, 2000; Manzi & Preiss, in press). When the
analysis added the factors currently used for university admission
purposes in Chile (standardized scores in a mandatory university en-
trance test and high school grades; i.e. PSU scores), writing was still a
significant predictor of first year university grades. This evidence is
consistent with the predictive power of writing in the US context
(Kobrin et al., 2008). Second, this study generated new information
with regard to the importance of writing over the course of university
training. Writing remains a significant predictor of university grades
longitudinally, after controlling for the scores in the university entry
examinations, high school grades, and background variables such as
the different undergraduate programs, the type of school of origin,
and the students' parents' education. Specifically, the longitudinal
hierarchical analysis showed that writing remains a significant
predictor beyond the first year, and even more when its impact is
complemented with the scores in the university entry examination
in mathematics instead of the scores in the university entry examina-
tion in language. That is, as students progress in their undergraduate
programs, their language abilities gain in predictive capacity in con-
trast with their mathematical abilities. The differentials in prediction
between these two sets of academic skills require further investiga-
tion. Of particular relevance is whether these differences are specific
to Chile or characteristic of the relevance that writing acquires at
most advanced levels of university training.
The study has a number of limitations. One of them relates to sam-
ple attrition. As noted in the description of the sample, not all the
students had data for the eight semesters of the study. This may
have been caused for a number of reasons such as student desertion,
temporal suspension of studies, or students shifting to other pro-
grams inside or outside the university. We believed that because of
the lack of studies assessing the impact of writing developmentally
at the university, it was first necessary to understand its impact on
students progressing normally in their programs. Consequently, we
focused our models on those students progressing in their undergrad-
uate programs in a timely fashion. Future models should take into
consideration the way interruptions in the studies impact this devel-
opment and address the issue of the impact of writing on students'
failure. In order to do so, alternative models such as survival analysis
should be adopted. Last but not least, the data presented here were
collected from only one cohort of students studying at only one highly
selective university. So the degree of generalizability of these findings
could be limited. The study should be replicated in a number of new
cohorts and in different institutions recruiting a more diverse sample
of students. Before concluding, it is worth noting that perhaps the
WCT works best under precisely the conditions and stakes in which
it is used, that is, as a condition of graduating and a diagnostic for
receiving support. Changing the measure to a high-stakes entry
requirement will require a new empirical study of the utility and pre-
diction of the measure because of its impact on the selection process
of the students.
In spite of these limitations, we believe that this work expands the
literature on argumentative writing and educational assessment.
Based on the results summarized above, we believe that this study
presents empirical evidence favoring the use of writing measures as
a graduation requirement. In addition, although these results cannot
be directly generalized to a situation where writing is used as a
high-stakes entry requirement, we believe that our results show
that writing, and the higher order cognitive abilities involved in effec-
tive writing, play a critical role in advanced stages of academic train-
ing, consequently offering additional support for the consideration of
this ability for university admission purposes.
Acknowledgments
This study was supported by Grant 1100580 from FONDECYT-
CONICYT. The authors would like to thank the Pontificia Universidad
Catolica de Chile's Vicerrectoría Académica, which provided academic
support to this study.
Appendix 1. Descriptive statistics of WCT dimensions
Appendix 2. Correlation matrix among WCT dimensions
Variable Mean Std. dev. Min Max
1. Orthography 2.39 1.17 1 5
2. Vocabulary 2.77 0.90 1 5
3. Text structure 4.05 0.73 1 5
4. Textual Cohesion 3.66 0.87 1 5
5. Use of paragraphs 3.36 0.79 1 5
6. Argument quality 3.47 0.82 1 5
7. Thesis 4.02 0.86 1 5
8. Counterarguing 3.03 1.15 1 5
9. Global assessment 2.89 0.74 1 5
N=2,877.
1 2 3 4 5 6 7 8
1. Orthography –
2. Vocabulary 0.20** –
3. Text structure 0.07** 0.03 –
4. Textual cohesion 0.25** 0.22** 0.15** –
5. Use of paragraphs 0.14** 0.14** 0.26** 0.20** –
6. Argument quality 0.10** 0.01 0.36** 0.16** 0.16** –
7. Thesis 0.05** 0.00 0.42** 0.10** 0.13** 0.45** –
8. Counterarguing 0.10** 0.03 0.26** 0.09** 0.12** 0.35** 0.33** –
9. Global assessment 0.20** 0.16** 0.42** 0.25** 0.27** 0.59** 0.42** 0.49**
N=2,877, * pb0.05, ** pb0.01.

References
Atkinson, R. C., & Geiser, S. (2009). Reflections on a century of college admissions tests.
Educational Researcher, 38, 665–676.
Beck, S. W., & Jeffery, J. V. (2007). Genres of high-stakes writing assessments and the
construct of writing competence. Assessing Writing, 12, 60–79.
DEMRE (2012). Temario de Pruebas de Selección Universitaria-Procesos de Admisión 2012
[Contents of Tests for University Admissions—Admission Process 2012]. Retrieved
January 30th, 2011, from http://www.demre.cl/temario.htm
Drago, J. L., & Paredes, R. D. (2011). La brecha de calidad en la educación chilena [The
educational gap in Chilean education]. Revista CEPAL, 104, 167–180.
Elacqua, G., Schneider, M., & Buckley, J. (2006). School choice in Chile: Is it class or the
classroom? Journal of Policy Analysis and Management, 25, 577–601.
Garcia-Huidobro, J. E. (2000). Educational policies and equity in Chile. In F. Reimers
(Ed.), Unequal schools, unequal chances: the challenges to equal opportunity in the
Americas (pp. 160–181). Cambridge, MA: Harvard University Press.
Geiser, S., & Studley, R. (2002). UC and the SAT: Predictive validity and differential impact
of the SAT I and SAT II at the University of California. Educational Assessment, 8, 1–26.
Grigorenko, E. L., Jarvin, L., Nui, W., & Preiss, D. D. (2008). Is there a standard for stan-
dardized testing? Four sketches of the applicability (or lack thereof) of standardized
testing in different educational systems. Extending intelligence: Enhancement and
new constructs.. Mahwah, NJ: Lawrence Erlbaum Associates.
Jeffery, J. V. (2009). Constructs of writing proficiency in US state and national writing
assessments: Exploring variability. Assessing Writing, 14, 3–24.
Kellogg, R. T., & Raulerson, B. A., III (2007). Improving the writing skills of college
students. Psychonomic Bulletin & Review, 14, 237–242.
Kellogg, R. T., & Whiteford, A. P. (2009). Training advanced writing skills: The case for
deliberate practice. Educational Psychologist, 44, 250–266.
Kellogg, R. T., & Whiteford, A. P. (2012). The development of writing expertise.
In E. L. Grigorenko, E. Mambrino, & D. D. Preiss (Eds.), Writing: A mosaic of perspectives
(pp. 109–124). New York: Psychology Press.
Kobrin, J. L., Patterson, B. F., Shaw, E. J., Mattern, K. D., & Barbuti, S. M. (2008). Validity of
the SAT for predicting first-year college grade point average. New York: The College
Board.
Kyllonen, P. C. (2012). The importance of higher education and the role of noncognitive
attributes in college success. Pensamiento Educativo. Revista de Investigación Educacional
Latinoamericana, 49, 84–100.
Lee, J., & Stankov, L. (2012). Large-scale online writing assessments: New approaches
adopted in the National Assessment of Educational Progress. In E. L. Grigorenko, E.
Mambrino, & D. D. Preiss (Eds.), Writing: A mosaic of new perspectives (pp. 371–384).
New York: Psychology Press.
Manzi, J., Flotts, P., & Preiss, D. D. (2012). Design of a college-level test of written
communication. Theorethical and methodological challenges. In E. L. Grigorenko, E.
Mambrino, & D. D. Preiss (Eds.), Writing. A Mosaic of perspectives (pp. 385–400).
New York: Psychology Press.
Manzi, J., & Preiss, D. D. (in press). Educational assessment and educational achievement
in South America. In J. A. C. Hattie & E. M. E.M. Anderman (Eds.), The International
Handbook of Student Achievement. New York: Routledge.
McCutchen, D., Teske, P., & Bankston, C. (2008). Writing and cognition: Implications of
the cognitive architecture for learning to write and writing to learn. In C. Bazerman
(Ed.), Handbook of research on writing. History, society, school, individual, text
(pp. 451–470). New York: Lawrence Erlbaum Associates.
Mizala, A., & Torche, F. (2012). Bringing the schools back in: The stratification of educa-
tional achievement in the Chilean voucher system. International Journal of Educational
Development, 32, 132–144.
Norris, D., Oppler, S., Kuang, D., Day, R., & Adams, K. (2004). The College Board SAT
writing validation research study: An assessment of the predictive and incremental
validity. Washington, DC: American Institutes for Research.
O'Neill, P., Moore, C., & Huot, B. (2009). A guide to college writing assessment. Logan, UT:
Utah State University Press.
Powers, D. E., Fowles, M. E., & Willard, A. E. (1994). Direct assessment, direct valida-
tion? An example from the assessment of writing. Educational Assessment, 2, 89.
Shaw, E. J., Mattern, K. D., & Patterson, B. F. (2011). Discrepant SAT critical reading and
writing scores: Implications for college performance. Educational Assessment, 16,
145–163.
Singer, J., & Willett, J. (2003). Applied longitudinal data analysis: Modeling change and
event occurrence. New York: Oxford University Press.
Stemler, S. E. (2012). What should university admissions tests predict? Educational
Psychologist, 47, 5–17.
Sternberg, R. J. (2004). Theory-based university admissions testing for a new millenni-
um. Educational Psychologist, 39, 185–198.
Sternberg, R. J. (2010). College admissions for the 21st Century. Cambridge, MA: Harvard
University Press.
The National Commission on Writing in America's schools and colleges (2003). The neglected
R: The need for a writing revolution. New York: The College Board.
Weigle, S. C. (2002). Assessing writing. New York: Cambridge University Press.
211

Argumentative Writing And Academic Achievement A Longitudinal Study

Recommended

Recommended

More Related Content

Similar to Argumentative Writing And Academic Achievement A Longitudinal Study

Similar to Argumentative Writing And Academic Achievement A Longitudinal Study (20)

More from Sarah Morrow

More from Sarah Morrow (20)

Recently uploaded

Recently uploaded (20)

Argumentative Writing And Academic Achievement A Longitudinal Study