SlideShare a Scribd company logo
1 of 19
Download to read offline
Journal of Interactive Learning Research (2012) 23(1), ?-?
Automated Formative Assessment as a
Tool to Scaffold Student Documentary Writing
BILL FERSTER
University of Virginia
bferster@virginia.edu
THOMAS C. HAMMOND
Lehigh University
hammond@lehigh.edu
R. CURBY ALEXANDER
University of North Texas
curbyalexander@gmail.com
HUNT LYMAN
The Hill School
huntlyman@thehillschool.org
The hurried pace of the modern classroom does not permit
formative feedback on writing assignments at the frequency
or quality recommended by the research literature. One so-
lution for increasing individual feedback to students is to in-
corporate some form of computer-generated assessment. This
study explores the use of automated assessment of student
writing in a content-specific context (history) on both tradi-
tional and non-traditional tasks. Four classrooms of middle
school history students completed two projects, one cul-
minating in an essay and one culminating in a digital docu-
mentary. From the total set of completed projects, approxi-
mately 70 essays and 70 digital documentary scripts were
then scored by human raters and by an automated evaluation
system. The student essays were used to test the comparison
22 Ferster, Hammond, Alexander, and Lyman
of human and computer-generated feedback in the context of
history education, and the digital documentary scripts were
used to test feedback given on a non-traditional task. The
results were encouraging with very high correlation and reli-
ability factors within and across both sets of documents, sug-
gesting the possibility of new forms of formative assessment
of student writing for content-area instruction in a variety of
emerging formats.
Keywords: Automated formative assessment, writing, history educa-
tion, digital documentaries
Among the many possible strategies for social studies instruction,
writing-intensive activities stand out as a promising but challenging teach-
ing tool. On the one hand, student writing is a powerful mechanism for im-
proving student learning outcomes in social studies (Greene, 1994; Nelms,
1987; Risinger, 1987, 1992; Smith & Niemi, 2001; Sundberg, 2006; Van
Nostrand, 1979). On the other hand, implementing effective student writing
tasks is difficult. Writing tasks are time-consuming, especially when mea-
sured against an already crowded social studies curriculum (Beyer & Brost-
off, 1979b; Nash, Crabtree, & Dunn, 2000). Social studies teachers typically
receive very little instruction in scaffolding students’ writing and providing
effective feedback (Jolliffe, 1987). Furthermore, some students are reluctant
writers, approaching any act of writing—and particularly writing-for-assess-
ment—with anxiety or even dread (Pajares, 2003). Organizing their ideas
or even the act of getting started can be overwhelming (Beyer & Brostoff,
1979a). The use of writing in social studies education deserves continued
scrutiny, and any new strategies must address these existing barriers.
A promising point of focus for exploring student writing in social stud-
ies is the use of formative feedback to the writer. Cognitive scientists and
educators have demonstrated that rapid and appropriate feedback on student
projects has a strong positive effect on the quality of student work (Mory,
2004). Formative feedback can encourage and guide struggling writers, re-
fine students’ content mastery, and develop social studies skills (Beyer,
1979; Nelson, 1990; Olina & Sullivan, 2002). As an instructional best prac-
tice, therefore, social studies teachers should provide students with forma-
tive feedback at several stages in the composition process.
Unfortunately, the majority of our students live in a world where teach-
ers have up to 5 sections with an average of over 23 pupils in each (Gruber,
Broughman, Strizek, & Burian-Fitzgerald, 2002). Combining the realities
Automated Formative Assessment 23
of class size with the content-coverage pressures of the social studies cur-
riculum explains why so little writing takes place in social studies class-
rooms—in one study of more than 600 classrooms, less than 6% of instruc-
tional time was devoted to writing tasks other than note-taking or answering
test questions (Gilstrap, 1991). A teacher who engages students in process-
driven writing tasks and provides formative feedback is therefore the excep-
tion, not the rule. Even under the ideal conditions of small class sizes and a
slower-paced curriculum, however, the delay time between a request for as-
sessment and the response is necessarily longer than the optimal times sug-
gested by the cognition and learning literature (e.g., Gagné, 1970).
One possible solution to these contextual challenges is to automate the
assessment process using some form of computer-mediated intervention
(Frase, Kiefer, Smith, & Fox, 1985). Of course, some disciplines may lend
themselves to automated assessment more readily than others. Mathemat-
ics and science, for example, are considered more empirical, and writing
assignments in these disciplines may be scored more easily than work in
the humanities (e.g., Chapman, 2001). However, even writing in the more
subjective content areas such as social studies and language arts can take
advantage of a collection of computer-based assessment techniques known
as Automated Essay Scoring.
Automated Essay Scoring (AES) is a technique developed by compu-
tational linguists to look at a student writing example, compare it with hun-
dreds of other essays on the same topic that have been scored by human
scorers, and return the likely score that essay may yield when graded by a
teacher. More sophisticated AES systems can offer precise feedback to the
student regarding what to change to improve the essay, indicating the tool’s
potential for use as a source of formative feedback.
An additional area to explore is the use of an AES to evaluate student
writing in non-traditional formats. Research by the Pew Internet and Ameri-
can Life Project indicates that young people write a significant amount, but
much of it is in the form of email messages, blog postings, comments on so-
cial networking sites, and other emerging formats (Lenhart, Arafeh, Smith,
& Macgill, 2008). While much of this writing is taking place outside the
context of curricular instruction, some educators are experimenting with in-
tegrating these non-traditional platforms into their classrooms (e.g., Ikpeze,
2009; Kajder, 2007). One emerging format for integrating student writing
into history education is the digital documentary (Author, 2006 & 2008;
Hofer & Owings-Swan, 2005; Hofer & Swan, 2008). Digital documentaries
are brief digital movies that consist of a montage of images, text or graphics
accompanied by a narration done in the student’s voice. History educators
24 Ferster, Hammond, Alexander, and Lyman
can integrate digital documentary projects into their instruction to develop
students’ content knowledge, historical thinking skills, and expression skills
(Author, 2009).
BACKGROUND
To provide the context for this study, three areas will be examined: (1)
the tool that provides the framework and context for exploring automated
formative assessment, (2) the role that feedback has in effective student
learning, and (3) the nature and efficacy of automated essay scoring re-
search efforts.
Context: Online Digital Documentary Tool (PrimaryAccess)
This study explored the feasibility of integrating automated assessment
into PrimaryAccess (www.primaryaccess.org), a suite of free, web-based
applications that allows teachers to draw upon thousands of indexed histori-
cal images to create customized activities for their students (Author, 2006
& 2008). The most common use of PrimaryAccess is the creation of digi-
tal documentaries. The images used are typically online archival resources,
such as photographs, paintings, engravings, maps, and documents from
sites such as the Virginia Center for Digital History. (See Figure 1.) How-
ever, teachers and students can incorporate any online images, including
their own work. The narration that accompanies the image stream is based
on a student-authored script (Figure 1b). These scripts share many of the
same characteristics as traditional essays in terms of their expository nature,
length, and internal structure.
Automated Formative Assessment 25
a. Select resources b. Write script c. Set motion d. Show movie
Figure 1. Steps involved in creating a digital documentary with PrimaryAc-
cess
(Image source: National Archives and Records Administration)
The script-composition process in PrimaryAccess takes place in a sim-
ple text editor. Students can save iterative versions of the script, often re-
vising and expanding them as prompted by teacher feedback—either asyn-
chronously, in the form of text or audio notes, or synchronously, as in-class
discussions (Author, 2007). The script becomes the basis for the visual pro-
duction stages: students annotate the script by adding primary source im-
ages and then set these images in motion to create the documentary’s vi-
sual sequence. A voice-over narration, recorded with a built-in audio editor,
completes the documentary-making process. This sequence of iterative re-
finement of text, visual arrangement, and narration, reinforces the concept
of writing-as-process—not product—to improve student learning and per-
formance outcomes, as suggested by the research on student writing (e.g.,
Faigley, Cherry, Jolliffe, & Skinner, 1985).
Writing these scripts is therefore a critical step in the process. However,
during our field testing, we have observed—and participating teachers have
confirmed—that the writing is typically the students’ least favorite element
of digital documentary-making as compared to image selection and editing
(Author, 2009). Researchers across multiple institutions are exploring ways
to scaffold the writing process, but one possible support is to provide some
formative assessment in the form of automated feedback during the script
writing stage.
26 Ferster, Hammond, Alexander, and Lyman
The Role of Feedback
A key component of improving students’ writing is feedback on stu-
dents’ scripts at multiple points in the process (Author, 2007). Such forma-
tive assessment provides students opportunities to revise their work and im-
prove their metacognitive skills as they monitor their progress (Bransford,
Brown, & Cocking, 2000). To maximize the benefit to student learning, the
feedback must meet two criteria. First, the nature of feedback needs to be
appropriate to the work being evaluated. Irrelevant or shallow feedback, or
feedback that exceeds the processing capabilities of the student, is wasteful
(Bruner, 1966). As a demonstration of the significance of targeted feedback,
a study comparing student learning in conditions of contrasting scaffolding,
Author (2007) found that the students who received the highest quality feed-
back from the teacher produced more accurate final projects and ultimately
demonstrated greater content knowledge as indicated by higher scores on
end-of-semester tests. Second, immediate feedback, rather than delayed
feedback, has a stronger impact on learning outcomes (Gagné, 1970; Mag-
er, 1997; Mory, 2004). Receiving quick and targeted comments during the
composition process is therefore a critical support for learning from written
assignments.
Possibilities of Automated Feedback: Research to Date
An automated feedback system can provide support to students’ writing
process and lead to improved outcomes (Frase, Kiefer, Smith, & Fox, 1985).
This notion of immediate feedback provided by a machine is not a recent
concept. B.F. Skinner wrote, “Like a good tutor the machine presents just
that material for which the student is ready. It asks him to take only that step
which he is at the moment best equipped and most likely to take” (1958, p.
972). The capability of the machine Skinner described to provide feedback
was primitive, especially when compared to the rich responses modern com-
puters and software can deliver.
Today’s automated essay scoring (AES) is a technique developed by
computational linguists to look at a student writing example, compare it
with hundreds of other essays on the same topic that have been scored by
human scorers, and return the likely score that essay may yield when graded
by a teacher. Students can use this “preview” of a summative evaluation on
their in-progress document to direct their revision process, making the sum-
mative evaluation a form of formative feedback. More sophisticated AES
Automated Formative Assessment 27
systems can even offer precise feedback about what to change to improve
the essay.
Automated essay coring was pioneered by Ellis Page, who developed
the Project Essay Grader (PEG) in the mid-1960s. PEG applied statistical
techniques such as multiple linear regression to essays and considered such
factors as essay length, number of commas, prepositions, and uncommon
words in a weighted model of what he thought approximated the internal
structures used by human raters. Page found high (.78) correlations be-
tween the PEG system and human raters of the same essays, compared to a
.85 correlation between any two human scorers (Kukich, 2000).
The next 30 years led to vigorous research into the automatic scoring
of essays using a wide range of mathematical techniques and factors within
the essays, including Bayesian Inference, Latent Semantic Analysis, Neural
Networks, and others. Although these systems use a variety of computa-
tional modeling approaches, the overall mechanisms are similar. Typically,
hundreds of exemplar essays are hand-scored by human raters. This scor-
ing is put through rigorous inter-rater reliability testing to ensure the accu-
racy of the human ratings. The essays, with scores reflecting the full range
of possible quality levels, are entered into the AES system to train it on the
essay topic. Once trained, the system develops an internal model of what an
arbitrary essay written on the same topic might score in a matter of seconds.
The Educational Testing Service (ETS) began experimenting with
natural-language-processing and information retrieval techniques in the
1990’s to provide automated scoring of essays within the Analytical Writ-
ing Assessment portion contained in Graduate Management Admissions
Test (GMAT). Their e-rater system used a step-wise linear regression of
over 100 essay features to provide a high degree of agreement with human
raters (Wang & Brown, 2007). Valenti, Neri, and Cucchiarelli (2003) com-
pared the degree of performance between ten AES systems in terms of: (a)
accuracy of scoring, (b) multiple regression correlation, and (c) agreement
with human scoring. The systems performed at levels between .80 and .96
on the three terms, and the ETS e-rater system yielded an 87-94% agree-
ment with human scored essays. These correlations are comparable with
those researchers would expect among essays scored by two or more human
scorers (Wang & Brown, 2007).
The effectiveness of AES systems in relation to human raters is well
documented in the literature. A number of studies have cited very high cor-
relations between AES and human scoring, typically with an 85-90% agree-
ment (Attali & Burstein, 2006; Burstein, 2003; Hearst, 2000). Most studies
were performed using essays from the GMAT exams, expository language
28 Ferster, Hammond, Alexander, and Lyman
arts essays, or science assessments (Valenti, Neri, & Cucchiarelli, 2003).
There is little research on AES in the contexts of social studies instruction
and/or non-traditional writing formats.
The nature of the feedback received should be contextualized to be
valuable. Simply providing a feedback score without specific details of what
comprised that score can be frustrating to learners. Researchers who devel-
oped a set of Web-based case studies for preparing teachers to use technol-
ogy (ETIPS; see http://www.etips.info) added AES to provide formative
feedback on the decision essays composed by preservice teachers at the cul-
mination of their case studies. An initial study of 27 preservice teachers us-
ing the AES found that the nature of the feedback was not sophisticated or
detailed enough to guide students in improving their writing (Scharber &
Dexter, 2004). After revising the feedback, researchers studied 70 preservice
teachers and found a moderate impact on the quality of the essays. Sixty-
three percent answered in a survey that the AES encouraged them to com-
plete more drafts of their answers than they might have otherwise (Riedel,
Dexter, Scharber & Doering, 2006).
At the K-12 level, Vantage Learning (2007) has developed a web-based
instructional writing product (MyAccess!) designed for students in grades
4 and higher. Among other features, the software provides automated feed-
back to students during the essay writing process, as well as upon comple-
tion, via its IntelliMetric Essay Scoring System. The software provides both
a holistic score and analytical scores in the areas of “Focus and Meaning;
Content and Development; Organization; Language, Use and Style; and
Mechanics and Conventions” (p. 1). The developer has performed a num-
ber of studies indicating that the automated scoring of students’ writing is
comparable to scoring provided by expert human raters, although not all in-
dependent studies have agreed with their results (e.g., see Brown & Wang,
in press). We could find no independent studies on the use of its AES with
K-12 students.
Our ultimate interest in AES is its use in a formative manner--to guide
the activity and encourage revision based on specific feedback—rather than
a summative manner. An AES system may be able to scaffold students’
script writing in our documentary-making tool. The literature includes few
studies by independent researchers examining the use of AES as formative
feedback with K-12 students, and none we could find in the context of so-
cial studies learning or digital documentary creation. Before testing AES
with students as they work on authoring digital documentaries, however, we
must answer two initial questions:
1. In the context of history education, does an automated essay
Automated Formative Assessment 29
scoring system provide feedback on student essays that is
similar to the feedback provided by a human grader?
2. Does an automated essay scoring system provide feedback
on student digital documentary scripts that is similar to the
feedback provided by a human grader?
METHOD
To complete this initial test of the feasibility of using AES as a for-
mative feedback and writing scaffold for history documentary scripts, we
needed to see if an automated assessment system could perform as well as
human scoring of the student essays and digital documentary scripts. If the
assessments are close, it stands to reason that adding an automated capa-
bility that assesses students’ scripts could be a powerful tool for formative
assessment to improve student engagement and learning outcomes. This is
not a criticism of the educational system or an attempt to “teacher-proof”
the classroom but an experiment to see if a technological intervention might
augment existing classroom relationships.
The data were collected as part of a larger study funded by the Jes-
sie Ball duPont Foundation to test the content-based learning outcomes of
students when using the digital documentary tool as compared to their re-
ceiving more traditional instruction. The research took place during a sin-
gle unit of instruction on early 20th
century American history. Within this
unit, students spent three days exploring the Great Migration and three days
exploring the Harlem Renaissance. For each topic, the students spent one
day working through activity stations to learn about the topic (e.g., primary
source texts and photos of emigrants from the South, videos of Lindy Hop
dancers, audio clips of blues and jazz). The students then spent two days
making their own account about the time period: either an essay (a tradition-
al format for student writing) or a scripted digital documentary (an emerg-
ing medium for history education).
The essays and digital documentary scripts were comparable in terms
of word length, writing style, content, and factual exposition. Although a
professionally-produced documentary narration would look very different
from an essay, in our experience K-12 students tend to write their narrations
in an essay format because that is the writing form with which they are most
familiar. As such, the documentary scripts written by students possess many
of the same criteria identifying a good five-paragraph essay in terms of ba-
sic expository structure, persuasive ability, adherence to conventions of me-
chanics and grammar, and accurate and germane content.
30 Ferster, Hammond, Alexander, and Lyman
Participants
Participants were 87 seventh-grade American History students at a pub-
lic middle school located in a small urban area of a mid-Atlantic state. This
student group was racially and ethnically diverse, with approximately equal
numbers of boys and girls. The majority of the students were from low- to
middle-income socio-economic status. The participating students repre-
sented a wide variety of ability and engagement levels. The students experi-
enced instruction and project work as members of four classes, all taught by
two teachers. One participating teacher was a 25-year veteran of the school
system, and the other was a novice in her first teaching assignment. Due to
the design of learning stations followed by project work, all students, re-
gardless of class or teacher, experienced the same instruction.
Procedure
Over the course of six days—three on the Great Migration and three
on the Harlem Renaissance—the students created a total of 144 student-
authored documents. Each student experienced both the experimental and
the control condition: on the Great Migration portion of the unit, the student
created either a digital documentary or a traditional essay. For the following
topic, the condition was reversed. (Due to absences and incomplete work,
not all 87 students produced both a digital documentary script and an essay.)
The final pool of documents for analysis contained 73 essays and 71 digital
documentary scripts.
Two former readers for the Advanced Placement (AP) language arts
exam scored the students’ documents. The scoring was conducted blind—
the raters did not know whether an individual document was an essay or a
script. The raters used a standard 6+1 rubric designed for use with middle
school students. The 6+1 rubric first asks raters to score each essay in terms
of six characteristics, or traits: ideas and content, organization, voice, word
choice, sentence fluency, and conventions. Each of the six factors is rated
individually with a score from 1 to 5.
1. NOT YET: a bare beginning; writer not yet showing any control.
2. EMERGING: need for revision outweighs strengths.
3. DEVELOPING: strengths and need for revision are about equal.
4. COMPETENT: on balance, the strengths outweigh the weaknesses.
5. STRONG: shows control and skill in this trait; many strengths
present.
Automated Formative Assessment 31
Following the scoring of the six components, the rubric calls for a ho-
listic score, ranging from 1 to 4, to assess overall quality. Indicators of qual-
ity are whether the student addressed the prompt, how sophisticated the
writing was, how precisely the facts and arguments were presented and the
relevance of those facts to the prompt, and the level of logical thinking in
the student’s arguments.
The scorers followed the protocols used in AP exam scoring. First, they
worked independently to score the same 20 essays. Next, they compared
their results and discussed any divergence to encourage rating agreement.
The raters then worked alone, each scoring the entire remaining set of 124
documents, containing both standard essays and scripts. This double-scoring
of the entire set is a departure from AP practices, in which essays are typi-
cally read by a single reader with only 1 in 60 receiving a second reading
(Venkateswaran & Morgan, 2002).
We chose Educational Testing Service’s (ETS) CriterionTM
online essay
evaluation service as the comparison scorer for the two human evaluators.
The Criterion system, described above, has a long track record of success-
ful use in multiple contexts: college or graduate school admissions, 
.. The
tool is strongly recommended by the literature on AES (cite). The research-
ers entered each of 144 documents and then recorded and analyzed the auto-
mated response. The AES provides two substantive forms of feedback: a ho-
listic score, ranging from 1 to 4, and verbose responses over five domains:
grammar, usage, mechanics, style, and organization. Within each domain,
between 6 and 11 categories of potential problems are evaluated. The re-
sponses for grammar, usage, and mechanics identify errors such as pronoun
and possessive errors, nonstandard word forms, and missing articles, many
of which are similarly flagged by grammar-checking programs in programs
such as Microsoft Word. The style and organization categories provide ad-
ditional feedback in areas not usually addressed through automated respons-
es. Students are alerted to stylistic problems such as repeated words, many
short sentences, and many long sentences. Students are also provided with
non-substantive descriptive statistics, such as the number of sentences and
average number of words per sentence.
RESULTS
The first step was to examine the similarities in scoring of the student
history essays. These essays are a traditional format for evaluation by AES.
32 Ferster, Hammond, Alexander, and Lyman
However, the essays used in this case were prepared for the purpose of mas-
tering historical content (i.e., the Great Migration and the Harlem Renais-
sance), not the demonstration of writing ability. The comparison between
human- and computer-generated ratings on students’ essays was encourag-
ing, yielding a .88 Cronbach’s Alpha reliability coefficient and a statistical-
ly-significant .79 correlation coefficient (p < .01) between the human- and
machine-graded holistic scores on the essays. In the context of essays writ-
ten for the purposes of history education, the AES provided scores very sim-
ilar to the human evaluators’ (see Table 1).
Table 1
Descriptive Statistics for Traditional Writing Context (Essay)
Mean SD N
Human-scored
essays
2.67 0.987 73
AES-scored essays 3.202 1.1783 73
The second step was to compare the humans’ scores against the AES
scores across the non-traditional set of documents, the 71 digital documen-
tary scripts. Again, the two sets of scores showed a tight correspondence
(see Table 2). The high Cronbach’s Alpha reliability coefficient (.84) and
correlation coefficient (.73, p < .01) again indicate that the computer-gener-
ated evaluation closely matches that of humans, even in a format other than
a traditional essay.
Table 2
Descriptive Statistics for Non-traditional Writing Context
(Digital Documentary Script)
Mean SD N
Human-scored
scripts
2.49 01.040 71
AES-scored scripts 3.169 0.9169 71
The next step was to more closely examine the relationship between
the traits scores (i.e., scores for ideas and content, organization, voice, word
choice, sentence fluency, and conventions) and the holistic scores from both
Automated Formative Assessment 33
the human raters and the AES. For the human raters, their holistic score var-
ied directly with their scoring of the 6 traits, F(6,143) = 173.7, p < .001. The
same relationship existed between the human-generated traits scores and the
AES’s holistic score, F(6,143) = 35.71, p < .001. This correspondence sug-
gests that the holistic scores (whether human-generated or computer-gen-
erated) and the scores of the 6 individual traits were measuring analogous
internal constructs (see Tables 3-4).
Table 3
ANOVA of Human Scorers’ 6 Traits and Holistic Scores
Sum of Squares df Mean Square F Sig.
Regression 129.922 6 21.654 173.7 .000
Residual 17.078 137 .125
Total 147.000 143
Table 4
ANOVA of Human Scorers’ 6 Traits and Automated Holistic Scores
Sum of Squares df Mean Square F Sig.
Regression 96.889 6 16.148 35.71 .000
Residual 61.955 137 .452
Total 158.843 143
During our analysis, we observed a relationship between the length of
a document (i.e., a word count for the essay or digital documentary script)
and the holistic scoring. For both the human and the automated scoring,
there was a statistically significant correlation between the number of words
in a document and its holistic score: a .67 correlation with the human-gen-
erated holistic scores and a .81 correlation with the AES-generated holis-
tic score. While some correspondence between length and quality is math-
ematically probable (i.e., a more fully-developed essay will tend to have
more words than a less well-developed essay), the gap between the human
and computer-generated correlation coefficients raised a concern: a student
might be able to “game” the automated assessment by writing a longer es-
say or script and thus obtaining a higher score. This possibility directed our
attention to the verbose feedback provided by the AES along with its holis-
tic score.
34 Ferster, Hammond, Alexander, and Lyman
To explore the quality of the verbose feedback provided by the AES,
we compared the system’s comments to the students’ scripts to see whether
these comments were meaningful to the reader. Most of these comments
were accurate but phrased in very generic terms. For example, the response
for a “good” essay (scoring 3 out of 4 possible points) included the state-
ment that the essay “provides a clear sequence of information; provides
pieces of information that are generally related to each other.” This state-
ment was correct but did not provide guidance for further revision by the
student. We then searched for specific instances in which the automated
feedback represented a misunderstanding of the writing, offering feedback
that no competent human reviewer would make. Across the 144 sets of re-
sponses, we identified less than 10 examples of these errors, all grammati-
cal. For example, the following sentence was flagged as containing a sub-
ject-verb agreement error: “Throughout the 20th Century, the segregation of
blacks and whites was abolished.” In this case, the AES read “whites” as
the subject; the subject is actually “segregation.” Earlier in the same essay,
a sentence that begins, “In the early 1900’s” was also flagged as having an
extraneous article (“the”), when it is in fact required.
DISCUSSION
The high correlation between the automated and human scores in both
sets of documents (essays and scripts) and the overall high quality of the
feedback suggests that adding an option for students to submit their writing
for automated feedback could be a useful formative assessment tool, even
in the context of history education and in a non-traditional format such as a
digital documentary. Given an AES module integrated into PrimaryAccess,
students will be able to access an instant, consequence-free first round of
feedback on the style, mechanics, and structure of their scripts. This feed-
back can lead to improved student engagement and multiple revisions of
scripts, resulting in higher quality end products and increased student learn-
ing.
The results, however, underscored the significance of students receiv-
ing human feedback and not just computer-generated evaluations. As not-
ed, some of the students’ human touches in their writing eluded the AES
programmers’ heuristics. In our raters’ opinion, the false flag “errors” were
departures from convention that improved the quality of the document. An
improved AES can reduce the number of instances of such errors, but they
cannot be wholly eliminated. Additionally, substantive feedback about the
Automated Formative Assessment 35
content of the scripts—the accurate portrayal of historical facts and not
merely their expression—will still need to be provided by the teacher. For
example, the student statement that “Throughout the 20th Century, the seg-
regation of blacks and whites was abolished” is grammatically correct, but
the historical understandings can be improved: 20th
century desegregation
was not a unified, completed process but rather an on-going mix of policy
decisions (Executive Order 9981, 1948), legal actions (e.g., Brown v. Board
of Education of Topeka, 1954), and personal choices (James Meredith’s de-
cision to apply to the University of Mississippi, 1961). An AES to provide
this level of content-specific feedback in the social studies is both theoreti-
cally and practically impossible; a teacher will have to make the judgment
call as to which nuances to introduce to the student’s thinking. However,
any automated assistance to the student regarding his or her writing should
give the teacher greater latitude to focus on students’ content understandings
and thought processes.
This study faces several limitations. First, this was a relatively small-
scale study with only two human raters following an approved protocol. A
larger pool of documents and additional human raters would strengthen the
interpretability of the quantitative analysis. Second, the participants were
middle school history students; the results do not generalize to other groups
or other uses of the AES, especially not to high-stakes assessments such as
the SATs or end-of-year, summative assessments of student achievement.
Finally, the AES used was designed to grade essays, and the human graders
in this study were trained experts in grading essays written by high-school
level students taking Advance Placement exams. If teachers were to have the
time and inclination to teach students the fine points of documentary mak-
ing, their scripts may bear little resemblance to essays.
FUTURE RESEARCH
The ETS Criterion system appeared capable of delivering high qual-
ity contextual feedback on the essays, but more research needs to done to
provide the content-area knowledge required for these digital documentaries
and other forms of writing in the social studies. What value does the scoring
provide to the teaching and learning of history? Could the automated scor-
ing process inhibit or standardize students’ writing? What interaction effects
exist between automated scoring and different teachers’ teaching styles or
levels of expertise?
While this study looked to confirm the reliability of automated scor-
36 Ferster, Hammond, Alexander, and Lyman
ing as opposed to human scoring, future studies will investigate the value
of automated formative assessment to students and their learning outcomes.
Future research opportunities could include testing the use of automated as-
sessment in classrooms to compare its efficacy versus no feedback or lim-
ited teacher feedback and looking at the differences in number of revisions,
time on task, and engagement. These differences can be correlated with the
quality of the students’ final products and/or changes on pre/post assess-
ments of writing or content knowledge. Furthermore, the handling of false
flags, such as the example cited above, needs to be more fully explored.
For an AES to be effective, as Shute (1994) noted, “the system [must]
behave intelligently, not actually be intelligent, like a human being” (p. 50).
Most people have had the experience of mistyping a word while entering
Google search—school deform, for example—and having Google’s web ap-
plication return the message, “Did you mean: school reform?” The original
search term is not immediately identifiable as erroneous: the words weren’t
misspelled, and deform is a verb that can have the subject. However, the
Google database does know that most people who typed in school deform
ultimately searched for school reform. By using the large numbers of peo-
ple who use their search engine and some shrewd programming decisions,
Google has been able make their system appear more intelligent.
Because all the PrimaryAccess web applications are instrumented to
create a time-stamped log of a student’s activity writing and creating their
project, it may be possible that some of these student projects can be as-
sessed without needing to actually test the student to determine what they
know. If enough information about how these projects were created can be
captured and then compared with a large enough number of projects that
have been similarly instrumented and also human-scored, this may be an-
other assessment choice combined with automated essay scoring feedback.
References
Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater v.2.
Journal of Technology, Learning, and Assessment, 4(3), n.p. Retrieved
November 13, 2009 from http://escholarship.bc.edu/cgi/viewcontent.
cgi?article=1049&context=jtla
Beyer, B.K. (1979). Pre-writing and re-writing to learn. Social Education, 43(3),
187-189, 197.
Beyer, B.K., & Brostoff, A. (1979a). Writing to learn in Social Studies: Intro-
duction. Social Education, 43(3), 176-177.
Automated Formative Assessment 37
Beyer, B.K., & Brostoff, A. (1979b). The time it takes: Managing/evaluating
writing and Social Studies. Social Education, 43(3), 194-197.
Bransford, J., Brown, A., & Cocking, R. (2000). How people learn: Brain, mind,
experience, and school. Committee on Learning Research and Educational
Practice, National Research Council. Washington DC: National Academy
Press.
Brown, M. S., & Wang, J. (in press). Automated essay scoring versus human
scoring: A correlational study. Contemporary Issues in Technology and
Teacher Education.
Bruner, J. (1966). Toward a theory of instruction. New York: Norton.
Burstein, J. (2003). The e-rater scoring engine: Automated essay scoring with
natural language processing. In M. D. Shermis & J. Burstein (Eds.), Auto-
mated essay scoring: A cross-disciplinary perspective (pp. 113-121). Mah-
wah, NJ: Lawerence Erlbaum Associates.
Chapman, O. (2001). Calibrated Peer Review: A writing and critical thinking in-
structional tool. Retrieved November 13, 2009 from http://cpr.molsci.ucla.
edu/cpr/resources/documents/misc/CPR_White_Paper.pdf
Faigley, L., Cherry, R.D., Jolliffe, D.A., & Skinner, A.M. (1985). Assessing writ-
ers’knowledge and processes of composing. Norwood, NJ: Ablex.
Frase, L.T., Kiefer, K.E., Smith, C.R., & Fox, M.L. (1985). Theory and practice
in computer-aided composition. In S.W. Freedman (Ed.), The acquisition
of written language: Response and revision (pp. 195-210). Norwood, NJ:
Ablex.
Gagné, R. (1970). The conditions of learning. New York: Holt Reinhart.
Gilstrap, R.L. (1991). Writing for the social studies. In J.P. Shaver (Ed.), Hand-
book of research on Social Studies teaching and learning (pp. 578-587).
New York, NY: Macmillan.
Greene, S. (1994). Students as authors in the study of history. In G. Leinhardt,
I. Beck, & K. Stainton (Eds.), Teaching and Learning in History (pp. 133-
168). Hillsdale, NJ: Lawrence Erlbaum.
Gruber, K., Wiley, S., Broughman, S. P., Strizek, G. A., & Burian-Fitzgerald, M.
(2002). Schools and staffing survey, 1999-2000: Overview of the data for
public, private, public charter, and bureau of Indian affairs elementary and
secondary schools Report No. NCES 2002313). Washington, DC: National
Center for Educational Statistics.
Hearst, M. (2000). The debate on automated essay grading. IEEE Intelligent Sys-
tems, 15(5), 22-37.
Ikpeze, C. (2009, May). Writing for real purpose. Learning and Leading with
Technology, 36, 36-37.
Jolliffe, D.A. (1987). A social educator’s guide to teaching writing. Theory and
Research in Social Education, 15(2), 89-104.
Kajder, S. (2007). Plugging in to 21st
century writers. In T. Newkirk & R. Kent
(Eds.), Teaching the neglected “R”: Rethinking writing instruction in sec-
ondary classrooms (pp. 149-161). Portsmouth, NH: Heinemann.
38 Ferster, Hammond, Alexander, and Lyman
Kukich, K. (2000), Beyond automated essay scoring. IEEE Intelligent Systems
15(5), 22–27.
Lenhart, A., Arafeh, S., Smith, A., & Macgill, A. (2008). Writing, Technology,
and Teens. Washington, DC: Pew Internet & American Life Project.
Mager, R. (1997). Making instruction work. Atlanta, GA: Center for Effective
Performance.
Mory, E. H. (2004). Feedback research revisited. In D. H. Jonassen (Ed.), Hand-
book of research on educational communications and technology (pp. 745-
783). Mahwah, NJ: Lawrence Erlbaum.
Nash, G. B., Crabtree, C., &and Dunn, R. E. (2000). History on trial: Culture
wars and the teaching of the past. New York: Vintage Books
Nelms, B.F. (1987). Response and responsibility: Reading, writing, and Social
Studies. The Elementary School Journal, 87(5), 571-589.
Nelson, J. (1990). This was an easy assignment: Examining how students inter-
pret academic writing tasks. Research in the Teaching of English, 24, 362-
396.
Olina, Z., & Sullivan, H.J. (2002). Effects of classroom evaluation on student
achievement and attitudes. Educational Technology Research & Develop-
ment, 50(3), 61-75.
Pajares, F. (2003). Self-efficacy beliefs, motivation, and achivement in writing:
A review of the literature. Reading and Writing Quarterly, 19, 139-158.
Risinger, C.F. (1987). Improving writing skills through social studies (ERIC
Digest No. 40). Bloomington, IN: ERIC Clearinghouse for Social Studies/
Social Science Education. (ERIC Document Reproduction Service No. ED
285829).
Risinger, C.F. (1992). Current directions in K-12 Social Studies. Boston,
MA: Houghton Mifflin Co. (ERIC Document Reproduction Service No.
ED359130)
Riedel, E., Dexter, S., Scharber, C., & Doering, A. (2006). Experimental evi-
dence on the effectiveness of automated essay scoring in teacher education
cases, Journal of Educational Computing Research, 35(3) 267-287.
Scharber, C., & Dexter, S. (2004, March). Automated essay score predictions as
a formative assessment tool. Paper presented at the 15th international con-
ference of the Society for Information Technology and Teacher Education,
Atlanta, GA.
Shute, V.J. (1994). Regarding the I in ITS: Student modeling. In T. Ottmann & I.
Tomek (Eds.), Proceedings of Educational Multimedia and Hypermedia 94
(pp. 50-57). Charlottesville, VA: Association for the Advancement of Com-
puting in Education.
Skinner, B.F. (1958). Teaching Machines. Science, 128(3300), 969-977
Smith, J., & Niemi, R. (2001). Learning history in school: The impact of course
work and instructional practices on achievement. Theory and Research in
Social Education, 29(1), 18-42.
Automated Formative Assessment 39
Sundberg, S.B. (2006). An investigation of the effects of exam essay questions
on student learning in United States History survey classes. The History
Teacher, 40(1). Retrieved November 13, 2009 from http://www.historycoo-
perative.org/cgi-bin/cite.cgi?=ht/40.1/sundberg.html
Valenti, S., Neri, F., & Cucchiarelli, A. (2003). An overview of current research
on automated essay grading. Journal of Information Technology Education,
2, 319-330.
Van Nostrand, A.D. (1979). Writing and the generation of knowledge. Social
Education, 43(3), 178-180.
Venkateswaran, U., & Morgan, R. (2002). Assessing historical thinking skills:
Scoring the AP U.S. History Document-Based Question. Organization of
American Historians Newsletter. Retrieved November 13, 2009 from http://
www.oah.org/pubs/nl/nov02/ets.html
Wang, J., & Brown, M. (2007). Automated essay scoring versus human scoring:
A comparative study. Journal of Technology, Learning, and Assessment,
6(2). Retrieved November 13, 2009 from http://escholarship.bc.edu/cgi/
viewcontent.cgi?article=1100&context=jtla

More Related Content

Similar to Automated Formative Assessment As A Tool To Scaffold Student Documentary Writing

A Comparison Study Of A Face-To-Face And Online Writing Courses
A Comparison Study Of A Face-To-Face And Online Writing CoursesA Comparison Study Of A Face-To-Face And Online Writing Courses
A Comparison Study Of A Face-To-Face And Online Writing CoursesSarah Pollard
 
9_Cite 2011
9_Cite 20119_Cite 2011
9_Cite 2011CITE
 
Application Essays As An Effective Tool For Assessing Instruction In The Basi...
Application Essays As An Effective Tool For Assessing Instruction In The Basi...Application Essays As An Effective Tool For Assessing Instruction In The Basi...
Application Essays As An Effective Tool For Assessing Instruction In The Basi...Don Dooley
 
Automated Formative Feedback And Summative Assessment Using Individualised Sp...
Automated Formative Feedback And Summative Assessment Using Individualised Sp...Automated Formative Feedback And Summative Assessment Using Individualised Sp...
Automated Formative Feedback And Summative Assessment Using Individualised Sp...Jose Katab
 
An Investigation of Critical Inquiry Among Online Mathematics Teachers
An Investigation of Critical Inquiry Among Online Mathematics TeachersAn Investigation of Critical Inquiry Among Online Mathematics Teachers
An Investigation of Critical Inquiry Among Online Mathematics Teachersdcolt
 
ICT in Education
ICT in EducationICT in Education
ICT in EducationCasual Teacher
 
Computer generated feedback
Computer generated feedbackComputer generated feedback
Computer generated feedbackMagdy Mahdy
 
Assessing Children S Writing Products The Role Of Curriculum Based Measures
Assessing Children S Writing Products  The Role Of Curriculum Based MeasuresAssessing Children S Writing Products  The Role Of Curriculum Based Measures
Assessing Children S Writing Products The Role Of Curriculum Based MeasuresErin Taylor
 
Comprehensive Exam Tips
Comprehensive Exam TipsComprehensive Exam Tips
Comprehensive Exam TipsMichael M Grant
 
ENG 507 Final Draft - Smith.pdf
ENG 507 Final Draft - Smith.pdfENG 507 Final Draft - Smith.pdf
ENG 507 Final Draft - Smith.pdfCoriLucas
 
A Comparison Of ESL Writing Strategies Of Undergraduates And Postgraduates
A Comparison Of ESL Writing Strategies Of Undergraduates And PostgraduatesA Comparison Of ESL Writing Strategies Of Undergraduates And Postgraduates
A Comparison Of ESL Writing Strategies Of Undergraduates And PostgraduatesAshley Smith
 
Adult Learning Theories ChartPart 1 Theories o.docx
Adult Learning Theories ChartPart 1  Theories o.docxAdult Learning Theories ChartPart 1  Theories o.docx
Adult Learning Theories ChartPart 1 Theories o.docxdaniahendric
 
Assessing elementary school pupils’ narrative skills
Assessing elementary school pupils’ narrative skillsAssessing elementary school pupils’ narrative skills
Assessing elementary school pupils’ narrative skillsANNA FTERNIATI
 
Using the CoI Framework to Assess the Efficacy of New Technologies
Using the CoI Framework to Assess the Efficacy of New TechnologiesUsing the CoI Framework to Assess the Efficacy of New Technologies
Using the CoI Framework to Assess the Efficacy of New TechnologiesPhil Ice
 
An Overview Of Writing Instruction And Assessment
An Overview Of Writing Instruction And AssessmentAn Overview Of Writing Instruction And Assessment
An Overview Of Writing Instruction And AssessmentSteven Wallach
 
Wsudiantes universitarios sobre retroalimentacion
Wsudiantes universitarios sobre retroalimentacionWsudiantes universitarios sobre retroalimentacion
Wsudiantes universitarios sobre retroalimentacionSisercom SAC
 
1-Experiences with a Hybrid Class Tips And PitfallsCollege .docx
1-Experiences with a Hybrid Class Tips And PitfallsCollege .docx1-Experiences with a Hybrid Class Tips And PitfallsCollege .docx
1-Experiences with a Hybrid Class Tips And PitfallsCollege .docxcatheryncouper
 
A Program Based On English Digital Stories To Develop The Writing Performance...
A Program Based On English Digital Stories To Develop The Writing Performance...A Program Based On English Digital Stories To Develop The Writing Performance...
A Program Based On English Digital Stories To Develop The Writing Performance...Dustin Pytko
 

Similar to Automated Formative Assessment As A Tool To Scaffold Student Documentary Writing (20)

A Comparison Study Of A Face-To-Face And Online Writing Courses
A Comparison Study Of A Face-To-Face And Online Writing CoursesA Comparison Study Of A Face-To-Face And Online Writing Courses
A Comparison Study Of A Face-To-Face And Online Writing Courses
 
9_Cite 2011
9_Cite 20119_Cite 2011
9_Cite 2011
 
Application Essays As An Effective Tool For Assessing Instruction In The Basi...
Application Essays As An Effective Tool For Assessing Instruction In The Basi...Application Essays As An Effective Tool For Assessing Instruction In The Basi...
Application Essays As An Effective Tool For Assessing Instruction In The Basi...
 
Automated Formative Feedback And Summative Assessment Using Individualised Sp...
Automated Formative Feedback And Summative Assessment Using Individualised Sp...Automated Formative Feedback And Summative Assessment Using Individualised Sp...
Automated Formative Feedback And Summative Assessment Using Individualised Sp...
 
An Investigation of Critical Inquiry Among Online Mathematics Teachers
An Investigation of Critical Inquiry Among Online Mathematics TeachersAn Investigation of Critical Inquiry Among Online Mathematics Teachers
An Investigation of Critical Inquiry Among Online Mathematics Teachers
 
ICT in Education
ICT in EducationICT in Education
ICT in Education
 
Computer generated feedback
Computer generated feedbackComputer generated feedback
Computer generated feedback
 
Assessing Children S Writing Products The Role Of Curriculum Based Measures
Assessing Children S Writing Products  The Role Of Curriculum Based MeasuresAssessing Children S Writing Products  The Role Of Curriculum Based Measures
Assessing Children S Writing Products The Role Of Curriculum Based Measures
 
Comprehensive Exam Tips
Comprehensive Exam TipsComprehensive Exam Tips
Comprehensive Exam Tips
 
IRTpdpaper
IRTpdpaperIRTpdpaper
IRTpdpaper
 
ENG 507 Final Draft - Smith.pdf
ENG 507 Final Draft - Smith.pdfENG 507 Final Draft - Smith.pdf
ENG 507 Final Draft - Smith.pdf
 
A Comparison Of ESL Writing Strategies Of Undergraduates And Postgraduates
A Comparison Of ESL Writing Strategies Of Undergraduates And PostgraduatesA Comparison Of ESL Writing Strategies Of Undergraduates And Postgraduates
A Comparison Of ESL Writing Strategies Of Undergraduates And Postgraduates
 
Adult Learning Theories ChartPart 1 Theories o.docx
Adult Learning Theories ChartPart 1  Theories o.docxAdult Learning Theories ChartPart 1  Theories o.docx
Adult Learning Theories ChartPart 1 Theories o.docx
 
Assessing elementary school pupils’ narrative skills
Assessing elementary school pupils’ narrative skillsAssessing elementary school pupils’ narrative skills
Assessing elementary school pupils’ narrative skills
 
726 Group work
726 Group work726 Group work
726 Group work
 
Using the CoI Framework to Assess the Efficacy of New Technologies
Using the CoI Framework to Assess the Efficacy of New TechnologiesUsing the CoI Framework to Assess the Efficacy of New Technologies
Using the CoI Framework to Assess the Efficacy of New Technologies
 
An Overview Of Writing Instruction And Assessment
An Overview Of Writing Instruction And AssessmentAn Overview Of Writing Instruction And Assessment
An Overview Of Writing Instruction And Assessment
 
Wsudiantes universitarios sobre retroalimentacion
Wsudiantes universitarios sobre retroalimentacionWsudiantes universitarios sobre retroalimentacion
Wsudiantes universitarios sobre retroalimentacion
 
1-Experiences with a Hybrid Class Tips And PitfallsCollege .docx
1-Experiences with a Hybrid Class Tips And PitfallsCollege .docx1-Experiences with a Hybrid Class Tips And PitfallsCollege .docx
1-Experiences with a Hybrid Class Tips And PitfallsCollege .docx
 
A Program Based On English Digital Stories To Develop The Writing Performance...
A Program Based On English Digital Stories To Develop The Writing Performance...A Program Based On English Digital Stories To Develop The Writing Performance...
A Program Based On English Digital Stories To Develop The Writing Performance...
 

More from Martha Brown

Business Proposal Letter THE RESEARCH PROPO
Business Proposal Letter THE RESEARCH PROPOBusiness Proposal Letter THE RESEARCH PROPO
Business Proposal Letter THE RESEARCH PROPOMartha Brown
 
What Are The Best Research Methods For Writers
What Are The Best Research Methods For WritersWhat Are The Best Research Methods For Writers
What Are The Best Research Methods For WritersMartha Brown
 
(PDF) Editorial - Writing For Publication
(PDF) Editorial - Writing For Publication(PDF) Editorial - Writing For Publication
(PDF) Editorial - Writing For PublicationMartha Brown
 
Canada Role In World Essay United Nations Internati
Canada Role In World Essay United Nations InternatiCanada Role In World Essay United Nations Internati
Canada Role In World Essay United Nations InternatiMartha Brown
 
5 Best Images Of 12-Sided Snowflake Printable Templ
5 Best Images Of 12-Sided Snowflake Printable Templ5 Best Images Of 12-Sided Snowflake Printable Templ
5 Best Images Of 12-Sided Snowflake Printable TemplMartha Brown
 
Monster Page Borders (Teacher Made). Online assignment writing service.
Monster Page Borders (Teacher Made). Online assignment writing service.Monster Page Borders (Teacher Made). Online assignment writing service.
Monster Page Borders (Teacher Made). Online assignment writing service.Martha Brown
 
How To Resource In An Essay Salt Lake Juvenile Defense
How To Resource In An Essay Salt Lake Juvenile DefenseHow To Resource In An Essay Salt Lake Juvenile Defense
How To Resource In An Essay Salt Lake Juvenile DefenseMartha Brown
 
How To Write A Play Script (With Pictures) - WikiHow
How To Write A Play Script (With Pictures) - WikiHowHow To Write A Play Script (With Pictures) - WikiHow
How To Write A Play Script (With Pictures) - WikiHowMartha Brown
 
How To Write A Great Narrative Essay. How Do Y
How To Write A Great Narrative Essay. How Do YHow To Write A Great Narrative Essay. How Do Y
How To Write A Great Narrative Essay. How Do YMartha Brown
 
Apa Itu Template What Is Template Images
Apa Itu Template What Is Template ImagesApa Itu Template What Is Template Images
Apa Itu Template What Is Template ImagesMartha Brown
 
Fake Essay Writer Tumblr - Formatessay.Web.Fc2.Com
Fake Essay Writer Tumblr - Formatessay.Web.Fc2.ComFake Essay Writer Tumblr - Formatessay.Web.Fc2.Com
Fake Essay Writer Tumblr - Formatessay.Web.Fc2.ComMartha Brown
 
Phenomenal How To Write A Satirical Essay Thatsnotus
Phenomenal How To Write A Satirical Essay ThatsnotusPhenomenal How To Write A Satirical Essay Thatsnotus
Phenomenal How To Write A Satirical Essay ThatsnotusMartha Brown
 
The Best Providers To Get Custom Term Paper Writing Service
The Best Providers To Get Custom Term Paper Writing ServiceThe Best Providers To Get Custom Term Paper Writing Service
The Best Providers To Get Custom Term Paper Writing ServiceMartha Brown
 
How To Choose A Perfect Topic For Essay. Online assignment writing service.
How To Choose A Perfect Topic For Essay. Online assignment writing service.How To Choose A Perfect Topic For Essay. Online assignment writing service.
How To Choose A Perfect Topic For Essay. Online assignment writing service.Martha Brown
 
Pin On Dissertation Help Online. Online assignment writing service.
Pin On Dissertation Help Online. Online assignment writing service.Pin On Dissertation Help Online. Online assignment writing service.
Pin On Dissertation Help Online. Online assignment writing service.Martha Brown
 
Cantest Sample Essay. Online assignment writing service.
Cantest Sample Essay. Online assignment writing service.Cantest Sample Essay. Online assignment writing service.
Cantest Sample Essay. Online assignment writing service.Martha Brown
 
Article Critique Example In His 1999 Article The - Ma
Article Critique Example In His 1999 Article The  - MaArticle Critique Example In His 1999 Article The  - Ma
Article Critique Example In His 1999 Article The - MaMartha Brown
 
College Essay Examples Of College Essays
College Essay Examples Of College EssaysCollege Essay Examples Of College Essays
College Essay Examples Of College EssaysMartha Brown
 
Writing A TOK Essay. Online assignment writing service.
Writing A TOK Essay. Online assignment writing service.Writing A TOK Essay. Online assignment writing service.
Writing A TOK Essay. Online assignment writing service.Martha Brown
 
How To Write A Good Classific. Online assignment writing service.
How To Write A Good Classific. Online assignment writing service.How To Write A Good Classific. Online assignment writing service.
How To Write A Good Classific. Online assignment writing service.Martha Brown
 

More from Martha Brown (20)

Business Proposal Letter THE RESEARCH PROPO
Business Proposal Letter THE RESEARCH PROPOBusiness Proposal Letter THE RESEARCH PROPO
Business Proposal Letter THE RESEARCH PROPO
 
What Are The Best Research Methods For Writers
What Are The Best Research Methods For WritersWhat Are The Best Research Methods For Writers
What Are The Best Research Methods For Writers
 
(PDF) Editorial - Writing For Publication
(PDF) Editorial - Writing For Publication(PDF) Editorial - Writing For Publication
(PDF) Editorial - Writing For Publication
 
Canada Role In World Essay United Nations Internati
Canada Role In World Essay United Nations InternatiCanada Role In World Essay United Nations Internati
Canada Role In World Essay United Nations Internati
 
5 Best Images Of 12-Sided Snowflake Printable Templ
5 Best Images Of 12-Sided Snowflake Printable Templ5 Best Images Of 12-Sided Snowflake Printable Templ
5 Best Images Of 12-Sided Snowflake Printable Templ
 
Monster Page Borders (Teacher Made). Online assignment writing service.
Monster Page Borders (Teacher Made). Online assignment writing service.Monster Page Borders (Teacher Made). Online assignment writing service.
Monster Page Borders (Teacher Made). Online assignment writing service.
 
How To Resource In An Essay Salt Lake Juvenile Defense
How To Resource In An Essay Salt Lake Juvenile DefenseHow To Resource In An Essay Salt Lake Juvenile Defense
How To Resource In An Essay Salt Lake Juvenile Defense
 
How To Write A Play Script (With Pictures) - WikiHow
How To Write A Play Script (With Pictures) - WikiHowHow To Write A Play Script (With Pictures) - WikiHow
How To Write A Play Script (With Pictures) - WikiHow
 
How To Write A Great Narrative Essay. How Do Y
How To Write A Great Narrative Essay. How Do YHow To Write A Great Narrative Essay. How Do Y
How To Write A Great Narrative Essay. How Do Y
 
Apa Itu Template What Is Template Images
Apa Itu Template What Is Template ImagesApa Itu Template What Is Template Images
Apa Itu Template What Is Template Images
 
Fake Essay Writer Tumblr - Formatessay.Web.Fc2.Com
Fake Essay Writer Tumblr - Formatessay.Web.Fc2.ComFake Essay Writer Tumblr - Formatessay.Web.Fc2.Com
Fake Essay Writer Tumblr - Formatessay.Web.Fc2.Com
 
Phenomenal How To Write A Satirical Essay Thatsnotus
Phenomenal How To Write A Satirical Essay ThatsnotusPhenomenal How To Write A Satirical Essay Thatsnotus
Phenomenal How To Write A Satirical Essay Thatsnotus
 
The Best Providers To Get Custom Term Paper Writing Service
The Best Providers To Get Custom Term Paper Writing ServiceThe Best Providers To Get Custom Term Paper Writing Service
The Best Providers To Get Custom Term Paper Writing Service
 
How To Choose A Perfect Topic For Essay. Online assignment writing service.
How To Choose A Perfect Topic For Essay. Online assignment writing service.How To Choose A Perfect Topic For Essay. Online assignment writing service.
How To Choose A Perfect Topic For Essay. Online assignment writing service.
 
Pin On Dissertation Help Online. Online assignment writing service.
Pin On Dissertation Help Online. Online assignment writing service.Pin On Dissertation Help Online. Online assignment writing service.
Pin On Dissertation Help Online. Online assignment writing service.
 
Cantest Sample Essay. Online assignment writing service.
Cantest Sample Essay. Online assignment writing service.Cantest Sample Essay. Online assignment writing service.
Cantest Sample Essay. Online assignment writing service.
 
Article Critique Example In His 1999 Article The - Ma
Article Critique Example In His 1999 Article The  - MaArticle Critique Example In His 1999 Article The  - Ma
Article Critique Example In His 1999 Article The - Ma
 
College Essay Examples Of College Essays
College Essay Examples Of College EssaysCollege Essay Examples Of College Essays
College Essay Examples Of College Essays
 
Writing A TOK Essay. Online assignment writing service.
Writing A TOK Essay. Online assignment writing service.Writing A TOK Essay. Online assignment writing service.
Writing A TOK Essay. Online assignment writing service.
 
How To Write A Good Classific. Online assignment writing service.
How To Write A Good Classific. Online assignment writing service.How To Write A Good Classific. Online assignment writing service.
How To Write A Good Classific. Online assignment writing service.
 

Recently uploaded

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...KokoStevan
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterMateoGardella
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 

Recently uploaded (20)

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 

Automated Formative Assessment As A Tool To Scaffold Student Documentary Writing

  • 1. Journal of Interactive Learning Research (2012) 23(1), ?-? Automated Formative Assessment as a Tool to Scaffold Student Documentary Writing BILL FERSTER University of Virginia bferster@virginia.edu THOMAS C. HAMMOND Lehigh University hammond@lehigh.edu R. CURBY ALEXANDER University of North Texas curbyalexander@gmail.com HUNT LYMAN The Hill School huntlyman@thehillschool.org The hurried pace of the modern classroom does not permit formative feedback on writing assignments at the frequency or quality recommended by the research literature. One so- lution for increasing individual feedback to students is to in- corporate some form of computer-generated assessment. This study explores the use of automated assessment of student writing in a content-specific context (history) on both tradi- tional and non-traditional tasks. Four classrooms of middle school history students completed two projects, one cul- minating in an essay and one culminating in a digital docu- mentary. From the total set of completed projects, approxi- mately 70 essays and 70 digital documentary scripts were then scored by human raters and by an automated evaluation system. The student essays were used to test the comparison
  • 2. 22 Ferster, Hammond, Alexander, and Lyman of human and computer-generated feedback in the context of history education, and the digital documentary scripts were used to test feedback given on a non-traditional task. The results were encouraging with very high correlation and reli- ability factors within and across both sets of documents, sug- gesting the possibility of new forms of formative assessment of student writing for content-area instruction in a variety of emerging formats. Keywords: Automated formative assessment, writing, history educa- tion, digital documentaries Among the many possible strategies for social studies instruction, writing-intensive activities stand out as a promising but challenging teach- ing tool. On the one hand, student writing is a powerful mechanism for im- proving student learning outcomes in social studies (Greene, 1994; Nelms, 1987; Risinger, 1987, 1992; Smith & Niemi, 2001; Sundberg, 2006; Van Nostrand, 1979). On the other hand, implementing effective student writing tasks is difficult. Writing tasks are time-consuming, especially when mea- sured against an already crowded social studies curriculum (Beyer & Brost- off, 1979b; Nash, Crabtree, & Dunn, 2000). Social studies teachers typically receive very little instruction in scaffolding students’ writing and providing effective feedback (Jolliffe, 1987). Furthermore, some students are reluctant writers, approaching any act of writing—and particularly writing-for-assess- ment—with anxiety or even dread (Pajares, 2003). Organizing their ideas or even the act of getting started can be overwhelming (Beyer & Brostoff, 1979a). The use of writing in social studies education deserves continued scrutiny, and any new strategies must address these existing barriers. A promising point of focus for exploring student writing in social stud- ies is the use of formative feedback to the writer. Cognitive scientists and educators have demonstrated that rapid and appropriate feedback on student projects has a strong positive effect on the quality of student work (Mory, 2004). Formative feedback can encourage and guide struggling writers, re- fine students’ content mastery, and develop social studies skills (Beyer, 1979; Nelson, 1990; Olina & Sullivan, 2002). As an instructional best prac- tice, therefore, social studies teachers should provide students with forma- tive feedback at several stages in the composition process. Unfortunately, the majority of our students live in a world where teach- ers have up to 5 sections with an average of over 23 pupils in each (Gruber, Broughman, Strizek, & Burian-Fitzgerald, 2002). Combining the realities
  • 3. Automated Formative Assessment 23 of class size with the content-coverage pressures of the social studies cur- riculum explains why so little writing takes place in social studies class- rooms—in one study of more than 600 classrooms, less than 6% of instruc- tional time was devoted to writing tasks other than note-taking or answering test questions (Gilstrap, 1991). A teacher who engages students in process- driven writing tasks and provides formative feedback is therefore the excep- tion, not the rule. Even under the ideal conditions of small class sizes and a slower-paced curriculum, however, the delay time between a request for as- sessment and the response is necessarily longer than the optimal times sug- gested by the cognition and learning literature (e.g., GagnĂ©, 1970). One possible solution to these contextual challenges is to automate the assessment process using some form of computer-mediated intervention (Frase, Kiefer, Smith, & Fox, 1985). Of course, some disciplines may lend themselves to automated assessment more readily than others. Mathemat- ics and science, for example, are considered more empirical, and writing assignments in these disciplines may be scored more easily than work in the humanities (e.g., Chapman, 2001). However, even writing in the more subjective content areas such as social studies and language arts can take advantage of a collection of computer-based assessment techniques known as Automated Essay Scoring. Automated Essay Scoring (AES) is a technique developed by compu- tational linguists to look at a student writing example, compare it with hun- dreds of other essays on the same topic that have been scored by human scorers, and return the likely score that essay may yield when graded by a teacher. More sophisticated AES systems can offer precise feedback to the student regarding what to change to improve the essay, indicating the tool’s potential for use as a source of formative feedback. An additional area to explore is the use of an AES to evaluate student writing in non-traditional formats. Research by the Pew Internet and Ameri- can Life Project indicates that young people write a significant amount, but much of it is in the form of email messages, blog postings, comments on so- cial networking sites, and other emerging formats (Lenhart, Arafeh, Smith, & Macgill, 2008). While much of this writing is taking place outside the context of curricular instruction, some educators are experimenting with in- tegrating these non-traditional platforms into their classrooms (e.g., Ikpeze, 2009; Kajder, 2007). One emerging format for integrating student writing into history education is the digital documentary (Author, 2006 & 2008; Hofer & Owings-Swan, 2005; Hofer & Swan, 2008). Digital documentaries are brief digital movies that consist of a montage of images, text or graphics accompanied by a narration done in the student’s voice. History educators
  • 4. 24 Ferster, Hammond, Alexander, and Lyman can integrate digital documentary projects into their instruction to develop students’ content knowledge, historical thinking skills, and expression skills (Author, 2009). BACKGROUND To provide the context for this study, three areas will be examined: (1) the tool that provides the framework and context for exploring automated formative assessment, (2) the role that feedback has in effective student learning, and (3) the nature and efficacy of automated essay scoring re- search efforts. Context: Online Digital Documentary Tool (PrimaryAccess) This study explored the feasibility of integrating automated assessment into PrimaryAccess (www.primaryaccess.org), a suite of free, web-based applications that allows teachers to draw upon thousands of indexed histori- cal images to create customized activities for their students (Author, 2006 & 2008). The most common use of PrimaryAccess is the creation of digi- tal documentaries. The images used are typically online archival resources, such as photographs, paintings, engravings, maps, and documents from sites such as the Virginia Center for Digital History. (See Figure 1.) How- ever, teachers and students can incorporate any online images, including their own work. The narration that accompanies the image stream is based on a student-authored script (Figure 1b). These scripts share many of the same characteristics as traditional essays in terms of their expository nature, length, and internal structure.
  • 5. Automated Formative Assessment 25 a. Select resources b. Write script c. Set motion d. Show movie Figure 1. Steps involved in creating a digital documentary with PrimaryAc- cess (Image source: National Archives and Records Administration) The script-composition process in PrimaryAccess takes place in a sim- ple text editor. Students can save iterative versions of the script, often re- vising and expanding them as prompted by teacher feedback—either asyn- chronously, in the form of text or audio notes, or synchronously, as in-class discussions (Author, 2007). The script becomes the basis for the visual pro- duction stages: students annotate the script by adding primary source im- ages and then set these images in motion to create the documentary’s vi- sual sequence. A voice-over narration, recorded with a built-in audio editor, completes the documentary-making process. This sequence of iterative re- finement of text, visual arrangement, and narration, reinforces the concept of writing-as-process—not product—to improve student learning and per- formance outcomes, as suggested by the research on student writing (e.g., Faigley, Cherry, Jolliffe, & Skinner, 1985). Writing these scripts is therefore a critical step in the process. However, during our field testing, we have observed—and participating teachers have confirmed—that the writing is typically the students’ least favorite element of digital documentary-making as compared to image selection and editing (Author, 2009). Researchers across multiple institutions are exploring ways to scaffold the writing process, but one possible support is to provide some formative assessment in the form of automated feedback during the script writing stage.
  • 6. 26 Ferster, Hammond, Alexander, and Lyman The Role of Feedback A key component of improving students’ writing is feedback on stu- dents’ scripts at multiple points in the process (Author, 2007). Such forma- tive assessment provides students opportunities to revise their work and im- prove their metacognitive skills as they monitor their progress (Bransford, Brown, & Cocking, 2000). To maximize the benefit to student learning, the feedback must meet two criteria. First, the nature of feedback needs to be appropriate to the work being evaluated. Irrelevant or shallow feedback, or feedback that exceeds the processing capabilities of the student, is wasteful (Bruner, 1966). As a demonstration of the significance of targeted feedback, a study comparing student learning in conditions of contrasting scaffolding, Author (2007) found that the students who received the highest quality feed- back from the teacher produced more accurate final projects and ultimately demonstrated greater content knowledge as indicated by higher scores on end-of-semester tests. Second, immediate feedback, rather than delayed feedback, has a stronger impact on learning outcomes (GagnĂ©, 1970; Mag- er, 1997; Mory, 2004). Receiving quick and targeted comments during the composition process is therefore a critical support for learning from written assignments. Possibilities of Automated Feedback: Research to Date An automated feedback system can provide support to students’ writing process and lead to improved outcomes (Frase, Kiefer, Smith, & Fox, 1985). This notion of immediate feedback provided by a machine is not a recent concept. B.F. Skinner wrote, “Like a good tutor the machine presents just that material for which the student is ready. It asks him to take only that step which he is at the moment best equipped and most likely to take” (1958, p. 972). The capability of the machine Skinner described to provide feedback was primitive, especially when compared to the rich responses modern com- puters and software can deliver. Today’s automated essay scoring (AES) is a technique developed by computational linguists to look at a student writing example, compare it with hundreds of other essays on the same topic that have been scored by human scorers, and return the likely score that essay may yield when graded by a teacher. Students can use this “preview” of a summative evaluation on their in-progress document to direct their revision process, making the sum- mative evaluation a form of formative feedback. More sophisticated AES
  • 7. Automated Formative Assessment 27 systems can even offer precise feedback about what to change to improve the essay. Automated essay coring was pioneered by Ellis Page, who developed the Project Essay Grader (PEG) in the mid-1960s. PEG applied statistical techniques such as multiple linear regression to essays and considered such factors as essay length, number of commas, prepositions, and uncommon words in a weighted model of what he thought approximated the internal structures used by human raters. Page found high (.78) correlations be- tween the PEG system and human raters of the same essays, compared to a .85 correlation between any two human scorers (Kukich, 2000). The next 30 years led to vigorous research into the automatic scoring of essays using a wide range of mathematical techniques and factors within the essays, including Bayesian Inference, Latent Semantic Analysis, Neural Networks, and others. Although these systems use a variety of computa- tional modeling approaches, the overall mechanisms are similar. Typically, hundreds of exemplar essays are hand-scored by human raters. This scor- ing is put through rigorous inter-rater reliability testing to ensure the accu- racy of the human ratings. The essays, with scores reflecting the full range of possible quality levels, are entered into the AES system to train it on the essay topic. Once trained, the system develops an internal model of what an arbitrary essay written on the same topic might score in a matter of seconds. The Educational Testing Service (ETS) began experimenting with natural-language-processing and information retrieval techniques in the 1990’s to provide automated scoring of essays within the Analytical Writ- ing Assessment portion contained in Graduate Management Admissions Test (GMAT). Their e-rater system used a step-wise linear regression of over 100 essay features to provide a high degree of agreement with human raters (Wang & Brown, 2007). Valenti, Neri, and Cucchiarelli (2003) com- pared the degree of performance between ten AES systems in terms of: (a) accuracy of scoring, (b) multiple regression correlation, and (c) agreement with human scoring. The systems performed at levels between .80 and .96 on the three terms, and the ETS e-rater system yielded an 87-94% agree- ment with human scored essays. These correlations are comparable with those researchers would expect among essays scored by two or more human scorers (Wang & Brown, 2007). The effectiveness of AES systems in relation to human raters is well documented in the literature. A number of studies have cited very high cor- relations between AES and human scoring, typically with an 85-90% agree- ment (Attali & Burstein, 2006; Burstein, 2003; Hearst, 2000). Most studies were performed using essays from the GMAT exams, expository language
  • 8. 28 Ferster, Hammond, Alexander, and Lyman arts essays, or science assessments (Valenti, Neri, & Cucchiarelli, 2003). There is little research on AES in the contexts of social studies instruction and/or non-traditional writing formats. The nature of the feedback received should be contextualized to be valuable. Simply providing a feedback score without specific details of what comprised that score can be frustrating to learners. Researchers who devel- oped a set of Web-based case studies for preparing teachers to use technol- ogy (ETIPS; see http://www.etips.info) added AES to provide formative feedback on the decision essays composed by preservice teachers at the cul- mination of their case studies. An initial study of 27 preservice teachers us- ing the AES found that the nature of the feedback was not sophisticated or detailed enough to guide students in improving their writing (Scharber & Dexter, 2004). After revising the feedback, researchers studied 70 preservice teachers and found a moderate impact on the quality of the essays. Sixty- three percent answered in a survey that the AES encouraged them to com- plete more drafts of their answers than they might have otherwise (Riedel, Dexter, Scharber & Doering, 2006). At the K-12 level, Vantage Learning (2007) has developed a web-based instructional writing product (MyAccess!) designed for students in grades 4 and higher. Among other features, the software provides automated feed- back to students during the essay writing process, as well as upon comple- tion, via its IntelliMetric Essay Scoring System. The software provides both a holistic score and analytical scores in the areas of “Focus and Meaning; Content and Development; Organization; Language, Use and Style; and Mechanics and Conventions” (p. 1). The developer has performed a num- ber of studies indicating that the automated scoring of students’ writing is comparable to scoring provided by expert human raters, although not all in- dependent studies have agreed with their results (e.g., see Brown & Wang, in press). We could find no independent studies on the use of its AES with K-12 students. Our ultimate interest in AES is its use in a formative manner--to guide the activity and encourage revision based on specific feedback—rather than a summative manner. An AES system may be able to scaffold students’ script writing in our documentary-making tool. The literature includes few studies by independent researchers examining the use of AES as formative feedback with K-12 students, and none we could find in the context of so- cial studies learning or digital documentary creation. Before testing AES with students as they work on authoring digital documentaries, however, we must answer two initial questions: 1. In the context of history education, does an automated essay
  • 9. Automated Formative Assessment 29 scoring system provide feedback on student essays that is similar to the feedback provided by a human grader? 2. Does an automated essay scoring system provide feedback on student digital documentary scripts that is similar to the feedback provided by a human grader? METHOD To complete this initial test of the feasibility of using AES as a for- mative feedback and writing scaffold for history documentary scripts, we needed to see if an automated assessment system could perform as well as human scoring of the student essays and digital documentary scripts. If the assessments are close, it stands to reason that adding an automated capa- bility that assesses students’ scripts could be a powerful tool for formative assessment to improve student engagement and learning outcomes. This is not a criticism of the educational system or an attempt to “teacher-proof” the classroom but an experiment to see if a technological intervention might augment existing classroom relationships. The data were collected as part of a larger study funded by the Jes- sie Ball duPont Foundation to test the content-based learning outcomes of students when using the digital documentary tool as compared to their re- ceiving more traditional instruction. The research took place during a sin- gle unit of instruction on early 20th century American history. Within this unit, students spent three days exploring the Great Migration and three days exploring the Harlem Renaissance. For each topic, the students spent one day working through activity stations to learn about the topic (e.g., primary source texts and photos of emigrants from the South, videos of Lindy Hop dancers, audio clips of blues and jazz). The students then spent two days making their own account about the time period: either an essay (a tradition- al format for student writing) or a scripted digital documentary (an emerg- ing medium for history education). The essays and digital documentary scripts were comparable in terms of word length, writing style, content, and factual exposition. Although a professionally-produced documentary narration would look very different from an essay, in our experience K-12 students tend to write their narrations in an essay format because that is the writing form with which they are most familiar. As such, the documentary scripts written by students possess many of the same criteria identifying a good five-paragraph essay in terms of ba- sic expository structure, persuasive ability, adherence to conventions of me- chanics and grammar, and accurate and germane content.
  • 10. 30 Ferster, Hammond, Alexander, and Lyman Participants Participants were 87 seventh-grade American History students at a pub- lic middle school located in a small urban area of a mid-Atlantic state. This student group was racially and ethnically diverse, with approximately equal numbers of boys and girls. The majority of the students were from low- to middle-income socio-economic status. The participating students repre- sented a wide variety of ability and engagement levels. The students experi- enced instruction and project work as members of four classes, all taught by two teachers. One participating teacher was a 25-year veteran of the school system, and the other was a novice in her first teaching assignment. Due to the design of learning stations followed by project work, all students, re- gardless of class or teacher, experienced the same instruction. Procedure Over the course of six days—three on the Great Migration and three on the Harlem Renaissance—the students created a total of 144 student- authored documents. Each student experienced both the experimental and the control condition: on the Great Migration portion of the unit, the student created either a digital documentary or a traditional essay. For the following topic, the condition was reversed. (Due to absences and incomplete work, not all 87 students produced both a digital documentary script and an essay.) The final pool of documents for analysis contained 73 essays and 71 digital documentary scripts. Two former readers for the Advanced Placement (AP) language arts exam scored the students’ documents. The scoring was conducted blind— the raters did not know whether an individual document was an essay or a script. The raters used a standard 6+1 rubric designed for use with middle school students. The 6+1 rubric first asks raters to score each essay in terms of six characteristics, or traits: ideas and content, organization, voice, word choice, sentence fluency, and conventions. Each of the six factors is rated individually with a score from 1 to 5. 1. NOT YET: a bare beginning; writer not yet showing any control. 2. EMERGING: need for revision outweighs strengths. 3. DEVELOPING: strengths and need for revision are about equal. 4. COMPETENT: on balance, the strengths outweigh the weaknesses. 5. STRONG: shows control and skill in this trait; many strengths present.
  • 11. Automated Formative Assessment 31 Following the scoring of the six components, the rubric calls for a ho- listic score, ranging from 1 to 4, to assess overall quality. Indicators of qual- ity are whether the student addressed the prompt, how sophisticated the writing was, how precisely the facts and arguments were presented and the relevance of those facts to the prompt, and the level of logical thinking in the student’s arguments. The scorers followed the protocols used in AP exam scoring. First, they worked independently to score the same 20 essays. Next, they compared their results and discussed any divergence to encourage rating agreement. The raters then worked alone, each scoring the entire remaining set of 124 documents, containing both standard essays and scripts. This double-scoring of the entire set is a departure from AP practices, in which essays are typi- cally read by a single reader with only 1 in 60 receiving a second reading (Venkateswaran & Morgan, 2002). We chose Educational Testing Service’s (ETS) CriterionTM online essay evaluation service as the comparison scorer for the two human evaluators. The Criterion system, described above, has a long track record of success- ful use in multiple contexts: college or graduate school admissions, 
.. The tool is strongly recommended by the literature on AES (cite). The research- ers entered each of 144 documents and then recorded and analyzed the auto- mated response. The AES provides two substantive forms of feedback: a ho- listic score, ranging from 1 to 4, and verbose responses over five domains: grammar, usage, mechanics, style, and organization. Within each domain, between 6 and 11 categories of potential problems are evaluated. The re- sponses for grammar, usage, and mechanics identify errors such as pronoun and possessive errors, nonstandard word forms, and missing articles, many of which are similarly flagged by grammar-checking programs in programs such as Microsoft Word. The style and organization categories provide ad- ditional feedback in areas not usually addressed through automated respons- es. Students are alerted to stylistic problems such as repeated words, many short sentences, and many long sentences. Students are also provided with non-substantive descriptive statistics, such as the number of sentences and average number of words per sentence. RESULTS The first step was to examine the similarities in scoring of the student history essays. These essays are a traditional format for evaluation by AES.
  • 12. 32 Ferster, Hammond, Alexander, and Lyman However, the essays used in this case were prepared for the purpose of mas- tering historical content (i.e., the Great Migration and the Harlem Renais- sance), not the demonstration of writing ability. The comparison between human- and computer-generated ratings on students’ essays was encourag- ing, yielding a .88 Cronbach’s Alpha reliability coefficient and a statistical- ly-significant .79 correlation coefficient (p < .01) between the human- and machine-graded holistic scores on the essays. In the context of essays writ- ten for the purposes of history education, the AES provided scores very sim- ilar to the human evaluators’ (see Table 1). Table 1 Descriptive Statistics for Traditional Writing Context (Essay) Mean SD N Human-scored essays 2.67 0.987 73 AES-scored essays 3.202 1.1783 73 The second step was to compare the humans’ scores against the AES scores across the non-traditional set of documents, the 71 digital documen- tary scripts. Again, the two sets of scores showed a tight correspondence (see Table 2). The high Cronbach’s Alpha reliability coefficient (.84) and correlation coefficient (.73, p < .01) again indicate that the computer-gener- ated evaluation closely matches that of humans, even in a format other than a traditional essay. Table 2 Descriptive Statistics for Non-traditional Writing Context (Digital Documentary Script) Mean SD N Human-scored scripts 2.49 01.040 71 AES-scored scripts 3.169 0.9169 71 The next step was to more closely examine the relationship between the traits scores (i.e., scores for ideas and content, organization, voice, word choice, sentence fluency, and conventions) and the holistic scores from both
  • 13. Automated Formative Assessment 33 the human raters and the AES. For the human raters, their holistic score var- ied directly with their scoring of the 6 traits, F(6,143) = 173.7, p < .001. The same relationship existed between the human-generated traits scores and the AES’s holistic score, F(6,143) = 35.71, p < .001. This correspondence sug- gests that the holistic scores (whether human-generated or computer-gen- erated) and the scores of the 6 individual traits were measuring analogous internal constructs (see Tables 3-4). Table 3 ANOVA of Human Scorers’ 6 Traits and Holistic Scores Sum of Squares df Mean Square F Sig. Regression 129.922 6 21.654 173.7 .000 Residual 17.078 137 .125 Total 147.000 143 Table 4 ANOVA of Human Scorers’ 6 Traits and Automated Holistic Scores Sum of Squares df Mean Square F Sig. Regression 96.889 6 16.148 35.71 .000 Residual 61.955 137 .452 Total 158.843 143 During our analysis, we observed a relationship between the length of a document (i.e., a word count for the essay or digital documentary script) and the holistic scoring. For both the human and the automated scoring, there was a statistically significant correlation between the number of words in a document and its holistic score: a .67 correlation with the human-gen- erated holistic scores and a .81 correlation with the AES-generated holis- tic score. While some correspondence between length and quality is math- ematically probable (i.e., a more fully-developed essay will tend to have more words than a less well-developed essay), the gap between the human and computer-generated correlation coefficients raised a concern: a student might be able to “game” the automated assessment by writing a longer es- say or script and thus obtaining a higher score. This possibility directed our attention to the verbose feedback provided by the AES along with its holis- tic score.
  • 14. 34 Ferster, Hammond, Alexander, and Lyman To explore the quality of the verbose feedback provided by the AES, we compared the system’s comments to the students’ scripts to see whether these comments were meaningful to the reader. Most of these comments were accurate but phrased in very generic terms. For example, the response for a “good” essay (scoring 3 out of 4 possible points) included the state- ment that the essay “provides a clear sequence of information; provides pieces of information that are generally related to each other.” This state- ment was correct but did not provide guidance for further revision by the student. We then searched for specific instances in which the automated feedback represented a misunderstanding of the writing, offering feedback that no competent human reviewer would make. Across the 144 sets of re- sponses, we identified less than 10 examples of these errors, all grammati- cal. For example, the following sentence was flagged as containing a sub- ject-verb agreement error: “Throughout the 20th Century, the segregation of blacks and whites was abolished.” In this case, the AES read “whites” as the subject; the subject is actually “segregation.” Earlier in the same essay, a sentence that begins, “In the early 1900’s” was also flagged as having an extraneous article (“the”), when it is in fact required. DISCUSSION The high correlation between the automated and human scores in both sets of documents (essays and scripts) and the overall high quality of the feedback suggests that adding an option for students to submit their writing for automated feedback could be a useful formative assessment tool, even in the context of history education and in a non-traditional format such as a digital documentary. Given an AES module integrated into PrimaryAccess, students will be able to access an instant, consequence-free first round of feedback on the style, mechanics, and structure of their scripts. This feed- back can lead to improved student engagement and multiple revisions of scripts, resulting in higher quality end products and increased student learn- ing. The results, however, underscored the significance of students receiv- ing human feedback and not just computer-generated evaluations. As not- ed, some of the students’ human touches in their writing eluded the AES programmers’ heuristics. In our raters’ opinion, the false flag “errors” were departures from convention that improved the quality of the document. An improved AES can reduce the number of instances of such errors, but they cannot be wholly eliminated. Additionally, substantive feedback about the
  • 15. Automated Formative Assessment 35 content of the scripts—the accurate portrayal of historical facts and not merely their expression—will still need to be provided by the teacher. For example, the student statement that “Throughout the 20th Century, the seg- regation of blacks and whites was abolished” is grammatically correct, but the historical understandings can be improved: 20th century desegregation was not a unified, completed process but rather an on-going mix of policy decisions (Executive Order 9981, 1948), legal actions (e.g., Brown v. Board of Education of Topeka, 1954), and personal choices (James Meredith’s de- cision to apply to the University of Mississippi, 1961). An AES to provide this level of content-specific feedback in the social studies is both theoreti- cally and practically impossible; a teacher will have to make the judgment call as to which nuances to introduce to the student’s thinking. However, any automated assistance to the student regarding his or her writing should give the teacher greater latitude to focus on students’ content understandings and thought processes. This study faces several limitations. First, this was a relatively small- scale study with only two human raters following an approved protocol. A larger pool of documents and additional human raters would strengthen the interpretability of the quantitative analysis. Second, the participants were middle school history students; the results do not generalize to other groups or other uses of the AES, especially not to high-stakes assessments such as the SATs or end-of-year, summative assessments of student achievement. Finally, the AES used was designed to grade essays, and the human graders in this study were trained experts in grading essays written by high-school level students taking Advance Placement exams. If teachers were to have the time and inclination to teach students the fine points of documentary mak- ing, their scripts may bear little resemblance to essays. FUTURE RESEARCH The ETS Criterion system appeared capable of delivering high qual- ity contextual feedback on the essays, but more research needs to done to provide the content-area knowledge required for these digital documentaries and other forms of writing in the social studies. What value does the scoring provide to the teaching and learning of history? Could the automated scor- ing process inhibit or standardize students’ writing? What interaction effects exist between automated scoring and different teachers’ teaching styles or levels of expertise? While this study looked to confirm the reliability of automated scor-
  • 16. 36 Ferster, Hammond, Alexander, and Lyman ing as opposed to human scoring, future studies will investigate the value of automated formative assessment to students and their learning outcomes. Future research opportunities could include testing the use of automated as- sessment in classrooms to compare its efficacy versus no feedback or lim- ited teacher feedback and looking at the differences in number of revisions, time on task, and engagement. These differences can be correlated with the quality of the students’ final products and/or changes on pre/post assess- ments of writing or content knowledge. Furthermore, the handling of false flags, such as the example cited above, needs to be more fully explored. For an AES to be effective, as Shute (1994) noted, “the system [must] behave intelligently, not actually be intelligent, like a human being” (p. 50). Most people have had the experience of mistyping a word while entering Google search—school deform, for example—and having Google’s web ap- plication return the message, “Did you mean: school reform?” The original search term is not immediately identifiable as erroneous: the words weren’t misspelled, and deform is a verb that can have the subject. However, the Google database does know that most people who typed in school deform ultimately searched for school reform. By using the large numbers of peo- ple who use their search engine and some shrewd programming decisions, Google has been able make their system appear more intelligent. Because all the PrimaryAccess web applications are instrumented to create a time-stamped log of a student’s activity writing and creating their project, it may be possible that some of these student projects can be as- sessed without needing to actually test the student to determine what they know. If enough information about how these projects were created can be captured and then compared with a large enough number of projects that have been similarly instrumented and also human-scored, this may be an- other assessment choice combined with automated essay scoring feedback. References Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater v.2. Journal of Technology, Learning, and Assessment, 4(3), n.p. Retrieved November 13, 2009 from http://escholarship.bc.edu/cgi/viewcontent. cgi?article=1049&context=jtla Beyer, B.K. (1979). Pre-writing and re-writing to learn. Social Education, 43(3), 187-189, 197. Beyer, B.K., & Brostoff, A. (1979a). Writing to learn in Social Studies: Intro- duction. Social Education, 43(3), 176-177.
  • 17. Automated Formative Assessment 37 Beyer, B.K., & Brostoff, A. (1979b). The time it takes: Managing/evaluating writing and Social Studies. Social Education, 43(3), 194-197. Bransford, J., Brown, A., & Cocking, R. (2000). How people learn: Brain, mind, experience, and school. Committee on Learning Research and Educational Practice, National Research Council. Washington DC: National Academy Press. Brown, M. S., & Wang, J. (in press). Automated essay scoring versus human scoring: A correlational study. Contemporary Issues in Technology and Teacher Education. Bruner, J. (1966). Toward a theory of instruction. New York: Norton. Burstein, J. (2003). The e-rater scoring engine: Automated essay scoring with natural language processing. In M. D. Shermis & J. Burstein (Eds.), Auto- mated essay scoring: A cross-disciplinary perspective (pp. 113-121). Mah- wah, NJ: Lawerence Erlbaum Associates. Chapman, O. (2001). Calibrated Peer Review: A writing and critical thinking in- structional tool. Retrieved November 13, 2009 from http://cpr.molsci.ucla. edu/cpr/resources/documents/misc/CPR_White_Paper.pdf Faigley, L., Cherry, R.D., Jolliffe, D.A., & Skinner, A.M. (1985). Assessing writ- ers’knowledge and processes of composing. Norwood, NJ: Ablex. Frase, L.T., Kiefer, K.E., Smith, C.R., & Fox, M.L. (1985). Theory and practice in computer-aided composition. In S.W. Freedman (Ed.), The acquisition of written language: Response and revision (pp. 195-210). Norwood, NJ: Ablex. GagnĂ©, R. (1970). The conditions of learning. New York: Holt Reinhart. Gilstrap, R.L. (1991). Writing for the social studies. In J.P. Shaver (Ed.), Hand- book of research on Social Studies teaching and learning (pp. 578-587). New York, NY: Macmillan. Greene, S. (1994). Students as authors in the study of history. In G. Leinhardt, I. Beck, & K. Stainton (Eds.), Teaching and Learning in History (pp. 133- 168). Hillsdale, NJ: Lawrence Erlbaum. Gruber, K., Wiley, S., Broughman, S. P., Strizek, G. A., & Burian-Fitzgerald, M. (2002). Schools and staffing survey, 1999-2000: Overview of the data for public, private, public charter, and bureau of Indian affairs elementary and secondary schools Report No. NCES 2002313). Washington, DC: National Center for Educational Statistics. Hearst, M. (2000). The debate on automated essay grading. IEEE Intelligent Sys- tems, 15(5), 22-37. Ikpeze, C. (2009, May). Writing for real purpose. Learning and Leading with Technology, 36, 36-37. Jolliffe, D.A. (1987). A social educator’s guide to teaching writing. Theory and Research in Social Education, 15(2), 89-104. Kajder, S. (2007). Plugging in to 21st century writers. In T. Newkirk & R. Kent (Eds.), Teaching the neglected “R”: Rethinking writing instruction in sec- ondary classrooms (pp. 149-161). Portsmouth, NH: Heinemann.
  • 18. 38 Ferster, Hammond, Alexander, and Lyman Kukich, K. (2000), Beyond automated essay scoring. IEEE Intelligent Systems 15(5), 22–27. Lenhart, A., Arafeh, S., Smith, A., & Macgill, A. (2008). Writing, Technology, and Teens. Washington, DC: Pew Internet & American Life Project. Mager, R. (1997). Making instruction work. Atlanta, GA: Center for Effective Performance. Mory, E. H. (2004). Feedback research revisited. In D. H. Jonassen (Ed.), Hand- book of research on educational communications and technology (pp. 745- 783). Mahwah, NJ: Lawrence Erlbaum. Nash, G. B., Crabtree, C., &and Dunn, R. E. (2000). History on trial: Culture wars and the teaching of the past. New York: Vintage Books Nelms, B.F. (1987). Response and responsibility: Reading, writing, and Social Studies. The Elementary School Journal, 87(5), 571-589. Nelson, J. (1990). This was an easy assignment: Examining how students inter- pret academic writing tasks. Research in the Teaching of English, 24, 362- 396. Olina, Z., & Sullivan, H.J. (2002). Effects of classroom evaluation on student achievement and attitudes. Educational Technology Research & Develop- ment, 50(3), 61-75. Pajares, F. (2003). Self-efficacy beliefs, motivation, and achivement in writing: A review of the literature. Reading and Writing Quarterly, 19, 139-158. Risinger, C.F. (1987). Improving writing skills through social studies (ERIC Digest No. 40). Bloomington, IN: ERIC Clearinghouse for Social Studies/ Social Science Education. (ERIC Document Reproduction Service No. ED 285829). Risinger, C.F. (1992). Current directions in K-12 Social Studies. Boston, MA: Houghton Mifflin Co. (ERIC Document Reproduction Service No. ED359130) Riedel, E., Dexter, S., Scharber, C., & Doering, A. (2006). Experimental evi- dence on the effectiveness of automated essay scoring in teacher education cases, Journal of Educational Computing Research, 35(3) 267-287. Scharber, C., & Dexter, S. (2004, March). Automated essay score predictions as a formative assessment tool. Paper presented at the 15th international con- ference of the Society for Information Technology and Teacher Education, Atlanta, GA. Shute, V.J. (1994). Regarding the I in ITS: Student modeling. In T. Ottmann & I. Tomek (Eds.), Proceedings of Educational Multimedia and Hypermedia 94 (pp. 50-57). Charlottesville, VA: Association for the Advancement of Com- puting in Education. Skinner, B.F. (1958). Teaching Machines. Science, 128(3300), 969-977 Smith, J., & Niemi, R. (2001). Learning history in school: The impact of course work and instructional practices on achievement. Theory and Research in Social Education, 29(1), 18-42.
  • 19. Automated Formative Assessment 39 Sundberg, S.B. (2006). An investigation of the effects of exam essay questions on student learning in United States History survey classes. The History Teacher, 40(1). Retrieved November 13, 2009 from http://www.historycoo- perative.org/cgi-bin/cite.cgi?=ht/40.1/sundberg.html Valenti, S., Neri, F., & Cucchiarelli, A. (2003). An overview of current research on automated essay grading. Journal of Information Technology Education, 2, 319-330. Van Nostrand, A.D. (1979). Writing and the generation of knowledge. Social Education, 43(3), 178-180. Venkateswaran, U., & Morgan, R. (2002). Assessing historical thinking skills: Scoring the AP U.S. History Document-Based Question. Organization of American Historians Newsletter. Retrieved November 13, 2009 from http:// www.oah.org/pubs/nl/nov02/ets.html Wang, J., & Brown, M. (2007). Automated essay scoring versus human scoring: A comparative study. Journal of Technology, Learning, and Assessment, 6(2). Retrieved November 13, 2009 from http://escholarship.bc.edu/cgi/ viewcontent.cgi?article=1100&context=jtla