Automated Formative Assessment As A Tool To Scaffold Student Documentary Writing
1. Journal of Interactive Learning Research (2012) 23(1), ?-?
Automated Formative Assessment as a
Tool to Scaffold Student Documentary Writing
BILL FERSTER
University of Virginia
bferster@virginia.edu
THOMAS C. HAMMOND
Lehigh University
hammond@lehigh.edu
R. CURBY ALEXANDER
University of North Texas
curbyalexander@gmail.com
HUNT LYMAN
The Hill School
huntlyman@thehillschool.org
The hurried pace of the modern classroom does not permit
formative feedback on writing assignments at the frequency
or quality recommended by the research literature. One so-
lution for increasing individual feedback to students is to in-
corporate some form of computer-generated assessment. This
study explores the use of automated assessment of student
writing in a content-specific context (history) on both tradi-
tional and non-traditional tasks. Four classrooms of middle
school history students completed two projects, one cul-
minating in an essay and one culminating in a digital docu-
mentary. From the total set of completed projects, approxi-
mately 70 essays and 70 digital documentary scripts were
then scored by human raters and by an automated evaluation
system. The student essays were used to test the comparison
2. 22 Ferster, Hammond, Alexander, and Lyman
of human and computer-generated feedback in the context of
history education, and the digital documentary scripts were
used to test feedback given on a non-traditional task. The
results were encouraging with very high correlation and reli-
ability factors within and across both sets of documents, sug-
gesting the possibility of new forms of formative assessment
of student writing for content-area instruction in a variety of
emerging formats.
Keywords: Automated formative assessment, writing, history educa-
tion, digital documentaries
Among the many possible strategies for social studies instruction,
writing-intensive activities stand out as a promising but challenging teach-
ing tool. On the one hand, student writing is a powerful mechanism for im-
proving student learning outcomes in social studies (Greene, 1994; Nelms,
1987; Risinger, 1987, 1992; Smith & Niemi, 2001; Sundberg, 2006; Van
Nostrand, 1979). On the other hand, implementing effective student writing
tasks is difficult. Writing tasks are time-consuming, especially when mea-
sured against an already crowded social studies curriculum (Beyer & Brost-
off, 1979b; Nash, Crabtree, & Dunn, 2000). Social studies teachers typically
receive very little instruction in scaffolding studentsâ writing and providing
effective feedback (Jolliffe, 1987). Furthermore, some students are reluctant
writers, approaching any act of writingâand particularly writing-for-assess-
mentâwith anxiety or even dread (Pajares, 2003). Organizing their ideas
or even the act of getting started can be overwhelming (Beyer & Brostoff,
1979a). The use of writing in social studies education deserves continued
scrutiny, and any new strategies must address these existing barriers.
A promising point of focus for exploring student writing in social stud-
ies is the use of formative feedback to the writer. Cognitive scientists and
educators have demonstrated that rapid and appropriate feedback on student
projects has a strong positive effect on the quality of student work (Mory,
2004). Formative feedback can encourage and guide struggling writers, re-
fine studentsâ content mastery, and develop social studies skills (Beyer,
1979; Nelson, 1990; Olina & Sullivan, 2002). As an instructional best prac-
tice, therefore, social studies teachers should provide students with forma-
tive feedback at several stages in the composition process.
Unfortunately, the majority of our students live in a world where teach-
ers have up to 5 sections with an average of over 23 pupils in each (Gruber,
Broughman, Strizek, & Burian-Fitzgerald, 2002). Combining the realities
4. 24 Ferster, Hammond, Alexander, and Lyman
can integrate digital documentary projects into their instruction to develop
studentsâ content knowledge, historical thinking skills, and expression skills
(Author, 2009).
BACKGROUND
To provide the context for this study, three areas will be examined: (1)
the tool that provides the framework and context for exploring automated
formative assessment, (2) the role that feedback has in effective student
learning, and (3) the nature and efficacy of automated essay scoring re-
search efforts.
Context: Online Digital Documentary Tool (PrimaryAccess)
This study explored the feasibility of integrating automated assessment
into PrimaryAccess (www.primaryaccess.org), a suite of free, web-based
applications that allows teachers to draw upon thousands of indexed histori-
cal images to create customized activities for their students (Author, 2006
& 2008). The most common use of PrimaryAccess is the creation of digi-
tal documentaries. The images used are typically online archival resources,
such as photographs, paintings, engravings, maps, and documents from
sites such as the Virginia Center for Digital History. (See Figure 1.) How-
ever, teachers and students can incorporate any online images, including
their own work. The narration that accompanies the image stream is based
on a student-authored script (Figure 1b). These scripts share many of the
same characteristics as traditional essays in terms of their expository nature,
length, and internal structure.
5. Automated Formative Assessment 25
a. Select resources b. Write script c. Set motion d. Show movie
Figure 1. Steps involved in creating a digital documentary with PrimaryAc-
cess
(Image source: National Archives and Records Administration)
The script-composition process in PrimaryAccess takes place in a sim-
ple text editor. Students can save iterative versions of the script, often re-
vising and expanding them as prompted by teacher feedbackâeither asyn-
chronously, in the form of text or audio notes, or synchronously, as in-class
discussions (Author, 2007). The script becomes the basis for the visual pro-
duction stages: students annotate the script by adding primary source im-
ages and then set these images in motion to create the documentaryâs vi-
sual sequence. A voice-over narration, recorded with a built-in audio editor,
completes the documentary-making process. This sequence of iterative re-
finement of text, visual arrangement, and narration, reinforces the concept
of writing-as-processânot productâto improve student learning and per-
formance outcomes, as suggested by the research on student writing (e.g.,
Faigley, Cherry, Jolliffe, & Skinner, 1985).
Writing these scripts is therefore a critical step in the process. However,
during our field testing, we have observedâand participating teachers have
confirmedâthat the writing is typically the studentsâ least favorite element
of digital documentary-making as compared to image selection and editing
(Author, 2009). Researchers across multiple institutions are exploring ways
to scaffold the writing process, but one possible support is to provide some
formative assessment in the form of automated feedback during the script
writing stage.
7. Automated Formative Assessment 27
systems can even offer precise feedback about what to change to improve
the essay.
Automated essay coring was pioneered by Ellis Page, who developed
the Project Essay Grader (PEG) in the mid-1960s. PEG applied statistical
techniques such as multiple linear regression to essays and considered such
factors as essay length, number of commas, prepositions, and uncommon
words in a weighted model of what he thought approximated the internal
structures used by human raters. Page found high (.78) correlations be-
tween the PEG system and human raters of the same essays, compared to a
.85 correlation between any two human scorers (Kukich, 2000).
The next 30 years led to vigorous research into the automatic scoring
of essays using a wide range of mathematical techniques and factors within
the essays, including Bayesian Inference, Latent Semantic Analysis, Neural
Networks, and others. Although these systems use a variety of computa-
tional modeling approaches, the overall mechanisms are similar. Typically,
hundreds of exemplar essays are hand-scored by human raters. This scor-
ing is put through rigorous inter-rater reliability testing to ensure the accu-
racy of the human ratings. The essays, with scores reflecting the full range
of possible quality levels, are entered into the AES system to train it on the
essay topic. Once trained, the system develops an internal model of what an
arbitrary essay written on the same topic might score in a matter of seconds.
The Educational Testing Service (ETS) began experimenting with
natural-language-processing and information retrieval techniques in the
1990âs to provide automated scoring of essays within the Analytical Writ-
ing Assessment portion contained in Graduate Management Admissions
Test (GMAT). Their e-rater system used a step-wise linear regression of
over 100 essay features to provide a high degree of agreement with human
raters (Wang & Brown, 2007). Valenti, Neri, and Cucchiarelli (2003) com-
pared the degree of performance between ten AES systems in terms of: (a)
accuracy of scoring, (b) multiple regression correlation, and (c) agreement
with human scoring. The systems performed at levels between .80 and .96
on the three terms, and the ETS e-rater system yielded an 87-94% agree-
ment with human scored essays. These correlations are comparable with
those researchers would expect among essays scored by two or more human
scorers (Wang & Brown, 2007).
The effectiveness of AES systems in relation to human raters is well
documented in the literature. A number of studies have cited very high cor-
relations between AES and human scoring, typically with an 85-90% agree-
ment (Attali & Burstein, 2006; Burstein, 2003; Hearst, 2000). Most studies
were performed using essays from the GMAT exams, expository language
8. 28 Ferster, Hammond, Alexander, and Lyman
arts essays, or science assessments (Valenti, Neri, & Cucchiarelli, 2003).
There is little research on AES in the contexts of social studies instruction
and/or non-traditional writing formats.
The nature of the feedback received should be contextualized to be
valuable. Simply providing a feedback score without specific details of what
comprised that score can be frustrating to learners. Researchers who devel-
oped a set of Web-based case studies for preparing teachers to use technol-
ogy (ETIPS; see http://www.etips.info) added AES to provide formative
feedback on the decision essays composed by preservice teachers at the cul-
mination of their case studies. An initial study of 27 preservice teachers us-
ing the AES found that the nature of the feedback was not sophisticated or
detailed enough to guide students in improving their writing (Scharber &
Dexter, 2004). After revising the feedback, researchers studied 70 preservice
teachers and found a moderate impact on the quality of the essays. Sixty-
three percent answered in a survey that the AES encouraged them to com-
plete more drafts of their answers than they might have otherwise (Riedel,
Dexter, Scharber & Doering, 2006).
At the K-12 level, Vantage Learning (2007) has developed a web-based
instructional writing product (MyAccess!) designed for students in grades
4 and higher. Among other features, the software provides automated feed-
back to students during the essay writing process, as well as upon comple-
tion, via its IntelliMetric Essay Scoring System. The software provides both
a holistic score and analytical scores in the areas of âFocus and Meaning;
Content and Development; Organization; Language, Use and Style; and
Mechanics and Conventionsâ (p. 1). The developer has performed a num-
ber of studies indicating that the automated scoring of studentsâ writing is
comparable to scoring provided by expert human raters, although not all in-
dependent studies have agreed with their results (e.g., see Brown & Wang,
in press). We could find no independent studies on the use of its AES with
K-12 students.
Our ultimate interest in AES is its use in a formative manner--to guide
the activity and encourage revision based on specific feedbackârather than
a summative manner. An AES system may be able to scaffold studentsâ
script writing in our documentary-making tool. The literature includes few
studies by independent researchers examining the use of AES as formative
feedback with K-12 students, and none we could find in the context of so-
cial studies learning or digital documentary creation. Before testing AES
with students as they work on authoring digital documentaries, however, we
must answer two initial questions:
1. In the context of history education, does an automated essay
9. Automated Formative Assessment 29
scoring system provide feedback on student essays that is
similar to the feedback provided by a human grader?
2. Does an automated essay scoring system provide feedback
on student digital documentary scripts that is similar to the
feedback provided by a human grader?
METHOD
To complete this initial test of the feasibility of using AES as a for-
mative feedback and writing scaffold for history documentary scripts, we
needed to see if an automated assessment system could perform as well as
human scoring of the student essays and digital documentary scripts. If the
assessments are close, it stands to reason that adding an automated capa-
bility that assesses studentsâ scripts could be a powerful tool for formative
assessment to improve student engagement and learning outcomes. This is
not a criticism of the educational system or an attempt to âteacher-proofâ
the classroom but an experiment to see if a technological intervention might
augment existing classroom relationships.
The data were collected as part of a larger study funded by the Jes-
sie Ball duPont Foundation to test the content-based learning outcomes of
students when using the digital documentary tool as compared to their re-
ceiving more traditional instruction. The research took place during a sin-
gle unit of instruction on early 20th
century American history. Within this
unit, students spent three days exploring the Great Migration and three days
exploring the Harlem Renaissance. For each topic, the students spent one
day working through activity stations to learn about the topic (e.g., primary
source texts and photos of emigrants from the South, videos of Lindy Hop
dancers, audio clips of blues and jazz). The students then spent two days
making their own account about the time period: either an essay (a tradition-
al format for student writing) or a scripted digital documentary (an emerg-
ing medium for history education).
The essays and digital documentary scripts were comparable in terms
of word length, writing style, content, and factual exposition. Although a
professionally-produced documentary narration would look very different
from an essay, in our experience K-12 students tend to write their narrations
in an essay format because that is the writing form with which they are most
familiar. As such, the documentary scripts written by students possess many
of the same criteria identifying a good five-paragraph essay in terms of ba-
sic expository structure, persuasive ability, adherence to conventions of me-
chanics and grammar, and accurate and germane content.
10. 30 Ferster, Hammond, Alexander, and Lyman
Participants
Participants were 87 seventh-grade American History students at a pub-
lic middle school located in a small urban area of a mid-Atlantic state. This
student group was racially and ethnically diverse, with approximately equal
numbers of boys and girls. The majority of the students were from low- to
middle-income socio-economic status. The participating students repre-
sented a wide variety of ability and engagement levels. The students experi-
enced instruction and project work as members of four classes, all taught by
two teachers. One participating teacher was a 25-year veteran of the school
system, and the other was a novice in her first teaching assignment. Due to
the design of learning stations followed by project work, all students, re-
gardless of class or teacher, experienced the same instruction.
Procedure
Over the course of six daysâthree on the Great Migration and three
on the Harlem Renaissanceâthe students created a total of 144 student-
authored documents. Each student experienced both the experimental and
the control condition: on the Great Migration portion of the unit, the student
created either a digital documentary or a traditional essay. For the following
topic, the condition was reversed. (Due to absences and incomplete work,
not all 87 students produced both a digital documentary script and an essay.)
The final pool of documents for analysis contained 73 essays and 71 digital
documentary scripts.
Two former readers for the Advanced Placement (AP) language arts
exam scored the studentsâ documents. The scoring was conducted blindâ
the raters did not know whether an individual document was an essay or a
script. The raters used a standard 6+1 rubric designed for use with middle
school students. The 6+1 rubric first asks raters to score each essay in terms
of six characteristics, or traits: ideas and content, organization, voice, word
choice, sentence fluency, and conventions. Each of the six factors is rated
individually with a score from 1 to 5.
1. NOT YET: a bare beginning; writer not yet showing any control.
2. EMERGING: need for revision outweighs strengths.
3. DEVELOPING: strengths and need for revision are about equal.
4. COMPETENT: on balance, the strengths outweigh the weaknesses.
5. STRONG: shows control and skill in this trait; many strengths
present.
11. Automated Formative Assessment 31
Following the scoring of the six components, the rubric calls for a ho-
listic score, ranging from 1 to 4, to assess overall quality. Indicators of qual-
ity are whether the student addressed the prompt, how sophisticated the
writing was, how precisely the facts and arguments were presented and the
relevance of those facts to the prompt, and the level of logical thinking in
the studentâs arguments.
The scorers followed the protocols used in AP exam scoring. First, they
worked independently to score the same 20 essays. Next, they compared
their results and discussed any divergence to encourage rating agreement.
The raters then worked alone, each scoring the entire remaining set of 124
documents, containing both standard essays and scripts. This double-scoring
of the entire set is a departure from AP practices, in which essays are typi-
cally read by a single reader with only 1 in 60 receiving a second reading
(Venkateswaran & Morgan, 2002).
We chose Educational Testing Serviceâs (ETS) CriterionTM
online essay
evaluation service as the comparison scorer for the two human evaluators.
The Criterion system, described above, has a long track record of success-
ful use in multiple contexts: college or graduate school admissions, âŠ.. The
tool is strongly recommended by the literature on AES (cite). The research-
ers entered each of 144 documents and then recorded and analyzed the auto-
mated response. The AES provides two substantive forms of feedback: a ho-
listic score, ranging from 1 to 4, and verbose responses over five domains:
grammar, usage, mechanics, style, and organization. Within each domain,
between 6 and 11 categories of potential problems are evaluated. The re-
sponses for grammar, usage, and mechanics identify errors such as pronoun
and possessive errors, nonstandard word forms, and missing articles, many
of which are similarly flagged by grammar-checking programs in programs
such as Microsoft Word. The style and organization categories provide ad-
ditional feedback in areas not usually addressed through automated respons-
es. Students are alerted to stylistic problems such as repeated words, many
short sentences, and many long sentences. Students are also provided with
non-substantive descriptive statistics, such as the number of sentences and
average number of words per sentence.
RESULTS
The first step was to examine the similarities in scoring of the student
history essays. These essays are a traditional format for evaluation by AES.
12. 32 Ferster, Hammond, Alexander, and Lyman
However, the essays used in this case were prepared for the purpose of mas-
tering historical content (i.e., the Great Migration and the Harlem Renais-
sance), not the demonstration of writing ability. The comparison between
human- and computer-generated ratings on studentsâ essays was encourag-
ing, yielding a .88 Cronbachâs Alpha reliability coefficient and a statistical-
ly-significant .79 correlation coefficient (p < .01) between the human- and
machine-graded holistic scores on the essays. In the context of essays writ-
ten for the purposes of history education, the AES provided scores very sim-
ilar to the human evaluatorsâ (see Table 1).
Table 1
Descriptive Statistics for Traditional Writing Context (Essay)
Mean SD N
Human-scored
essays
2.67 0.987 73
AES-scored essays 3.202 1.1783 73
The second step was to compare the humansâ scores against the AES
scores across the non-traditional set of documents, the 71 digital documen-
tary scripts. Again, the two sets of scores showed a tight correspondence
(see Table 2). The high Cronbachâs Alpha reliability coefficient (.84) and
correlation coefficient (.73, p < .01) again indicate that the computer-gener-
ated evaluation closely matches that of humans, even in a format other than
a traditional essay.
Table 2
Descriptive Statistics for Non-traditional Writing Context
(Digital Documentary Script)
Mean SD N
Human-scored
scripts
2.49 01.040 71
AES-scored scripts 3.169 0.9169 71
The next step was to more closely examine the relationship between
the traits scores (i.e., scores for ideas and content, organization, voice, word
choice, sentence fluency, and conventions) and the holistic scores from both
13. Automated Formative Assessment 33
the human raters and the AES. For the human raters, their holistic score var-
ied directly with their scoring of the 6 traits, F(6,143) = 173.7, p < .001. The
same relationship existed between the human-generated traits scores and the
AESâs holistic score, F(6,143) = 35.71, p < .001. This correspondence sug-
gests that the holistic scores (whether human-generated or computer-gen-
erated) and the scores of the 6 individual traits were measuring analogous
internal constructs (see Tables 3-4).
Table 3
ANOVA of Human Scorersâ 6 Traits and Holistic Scores
Sum of Squares df Mean Square F Sig.
Regression 129.922 6 21.654 173.7 .000
Residual 17.078 137 .125
Total 147.000 143
Table 4
ANOVA of Human Scorersâ 6 Traits and Automated Holistic Scores
Sum of Squares df Mean Square F Sig.
Regression 96.889 6 16.148 35.71 .000
Residual 61.955 137 .452
Total 158.843 143
During our analysis, we observed a relationship between the length of
a document (i.e., a word count for the essay or digital documentary script)
and the holistic scoring. For both the human and the automated scoring,
there was a statistically significant correlation between the number of words
in a document and its holistic score: a .67 correlation with the human-gen-
erated holistic scores and a .81 correlation with the AES-generated holis-
tic score. While some correspondence between length and quality is math-
ematically probable (i.e., a more fully-developed essay will tend to have
more words than a less well-developed essay), the gap between the human
and computer-generated correlation coefficients raised a concern: a student
might be able to âgameâ the automated assessment by writing a longer es-
say or script and thus obtaining a higher score. This possibility directed our
attention to the verbose feedback provided by the AES along with its holis-
tic score.
14. 34 Ferster, Hammond, Alexander, and Lyman
To explore the quality of the verbose feedback provided by the AES,
we compared the systemâs comments to the studentsâ scripts to see whether
these comments were meaningful to the reader. Most of these comments
were accurate but phrased in very generic terms. For example, the response
for a âgoodâ essay (scoring 3 out of 4 possible points) included the state-
ment that the essay âprovides a clear sequence of information; provides
pieces of information that are generally related to each other.â This state-
ment was correct but did not provide guidance for further revision by the
student. We then searched for specific instances in which the automated
feedback represented a misunderstanding of the writing, offering feedback
that no competent human reviewer would make. Across the 144 sets of re-
sponses, we identified less than 10 examples of these errors, all grammati-
cal. For example, the following sentence was flagged as containing a sub-
ject-verb agreement error: âThroughout the 20th Century, the segregation of
blacks and whites was abolished.â In this case, the AES read âwhitesâ as
the subject; the subject is actually âsegregation.â Earlier in the same essay,
a sentence that begins, âIn the early 1900âsâ was also flagged as having an
extraneous article (âtheâ), when it is in fact required.
DISCUSSION
The high correlation between the automated and human scores in both
sets of documents (essays and scripts) and the overall high quality of the
feedback suggests that adding an option for students to submit their writing
for automated feedback could be a useful formative assessment tool, even
in the context of history education and in a non-traditional format such as a
digital documentary. Given an AES module integrated into PrimaryAccess,
students will be able to access an instant, consequence-free first round of
feedback on the style, mechanics, and structure of their scripts. This feed-
back can lead to improved student engagement and multiple revisions of
scripts, resulting in higher quality end products and increased student learn-
ing.
The results, however, underscored the significance of students receiv-
ing human feedback and not just computer-generated evaluations. As not-
ed, some of the studentsâ human touches in their writing eluded the AES
programmersâ heuristics. In our ratersâ opinion, the false flag âerrorsâ were
departures from convention that improved the quality of the document. An
improved AES can reduce the number of instances of such errors, but they
cannot be wholly eliminated. Additionally, substantive feedback about the
15. Automated Formative Assessment 35
content of the scriptsâthe accurate portrayal of historical facts and not
merely their expressionâwill still need to be provided by the teacher. For
example, the student statement that âThroughout the 20th Century, the seg-
regation of blacks and whites was abolishedâ is grammatically correct, but
the historical understandings can be improved: 20th
century desegregation
was not a unified, completed process but rather an on-going mix of policy
decisions (Executive Order 9981, 1948), legal actions (e.g., Brown v. Board
of Education of Topeka, 1954), and personal choices (James Meredithâs de-
cision to apply to the University of Mississippi, 1961). An AES to provide
this level of content-specific feedback in the social studies is both theoreti-
cally and practically impossible; a teacher will have to make the judgment
call as to which nuances to introduce to the studentâs thinking. However,
any automated assistance to the student regarding his or her writing should
give the teacher greater latitude to focus on studentsâ content understandings
and thought processes.
This study faces several limitations. First, this was a relatively small-
scale study with only two human raters following an approved protocol. A
larger pool of documents and additional human raters would strengthen the
interpretability of the quantitative analysis. Second, the participants were
middle school history students; the results do not generalize to other groups
or other uses of the AES, especially not to high-stakes assessments such as
the SATs or end-of-year, summative assessments of student achievement.
Finally, the AES used was designed to grade essays, and the human graders
in this study were trained experts in grading essays written by high-school
level students taking Advance Placement exams. If teachers were to have the
time and inclination to teach students the fine points of documentary mak-
ing, their scripts may bear little resemblance to essays.
FUTURE RESEARCH
The ETS Criterion system appeared capable of delivering high qual-
ity contextual feedback on the essays, but more research needs to done to
provide the content-area knowledge required for these digital documentaries
and other forms of writing in the social studies. What value does the scoring
provide to the teaching and learning of history? Could the automated scor-
ing process inhibit or standardize studentsâ writing? What interaction effects
exist between automated scoring and different teachersâ teaching styles or
levels of expertise?
While this study looked to confirm the reliability of automated scor-
16. 36 Ferster, Hammond, Alexander, and Lyman
ing as opposed to human scoring, future studies will investigate the value
of automated formative assessment to students and their learning outcomes.
Future research opportunities could include testing the use of automated as-
sessment in classrooms to compare its efficacy versus no feedback or lim-
ited teacher feedback and looking at the differences in number of revisions,
time on task, and engagement. These differences can be correlated with the
quality of the studentsâ final products and/or changes on pre/post assess-
ments of writing or content knowledge. Furthermore, the handling of false
flags, such as the example cited above, needs to be more fully explored.
For an AES to be effective, as Shute (1994) noted, âthe system [must]
behave intelligently, not actually be intelligent, like a human beingâ (p. 50).
Most people have had the experience of mistyping a word while entering
Google searchâschool deform, for exampleâand having Googleâs web ap-
plication return the message, âDid you mean: school reform?â The original
search term is not immediately identifiable as erroneous: the words werenât
misspelled, and deform is a verb that can have the subject. However, the
Google database does know that most people who typed in school deform
ultimately searched for school reform. By using the large numbers of peo-
ple who use their search engine and some shrewd programming decisions,
Google has been able make their system appear more intelligent.
Because all the PrimaryAccess web applications are instrumented to
create a time-stamped log of a studentâs activity writing and creating their
project, it may be possible that some of these student projects can be as-
sessed without needing to actually test the student to determine what they
know. If enough information about how these projects were created can be
captured and then compared with a large enough number of projects that
have been similarly instrumented and also human-scored, this may be an-
other assessment choice combined with automated essay scoring feedback.
References
Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater v.2.
Journal of Technology, Learning, and Assessment, 4(3), n.p. Retrieved
November 13, 2009 from http://escholarship.bc.edu/cgi/viewcontent.
cgi?article=1049&context=jtla
Beyer, B.K. (1979). Pre-writing and re-writing to learn. Social Education, 43(3),
187-189, 197.
Beyer, B.K., & Brostoff, A. (1979a). Writing to learn in Social Studies: Intro-
duction. Social Education, 43(3), 176-177.
18. 38 Ferster, Hammond, Alexander, and Lyman
Kukich, K. (2000), Beyond automated essay scoring. IEEE Intelligent Systems
15(5), 22â27.
Lenhart, A., Arafeh, S., Smith, A., & Macgill, A. (2008). Writing, Technology,
and Teens. Washington, DC: Pew Internet & American Life Project.
Mager, R. (1997). Making instruction work. Atlanta, GA: Center for Effective
Performance.
Mory, E. H. (2004). Feedback research revisited. In D. H. Jonassen (Ed.), Hand-
book of research on educational communications and technology (pp. 745-
783). Mahwah, NJ: Lawrence Erlbaum.
Nash, G. B., Crabtree, C., &and Dunn, R. E. (2000). History on trial: Culture
wars and the teaching of the past. New York: Vintage Books
Nelms, B.F. (1987). Response and responsibility: Reading, writing, and Social
Studies. The Elementary School Journal, 87(5), 571-589.
Nelson, J. (1990). This was an easy assignment: Examining how students inter-
pret academic writing tasks. Research in the Teaching of English, 24, 362-
396.
Olina, Z., & Sullivan, H.J. (2002). Effects of classroom evaluation on student
achievement and attitudes. Educational Technology Research & Develop-
ment, 50(3), 61-75.
Pajares, F. (2003). Self-efficacy beliefs, motivation, and achivement in writing:
A review of the literature. Reading and Writing Quarterly, 19, 139-158.
Risinger, C.F. (1987). Improving writing skills through social studies (ERIC
Digest No. 40). Bloomington, IN: ERIC Clearinghouse for Social Studies/
Social Science Education. (ERIC Document Reproduction Service No. ED
285829).
Risinger, C.F. (1992). Current directions in K-12 Social Studies. Boston,
MA: Houghton Mifflin Co. (ERIC Document Reproduction Service No.
ED359130)
Riedel, E., Dexter, S., Scharber, C., & Doering, A. (2006). Experimental evi-
dence on the effectiveness of automated essay scoring in teacher education
cases, Journal of Educational Computing Research, 35(3) 267-287.
Scharber, C., & Dexter, S. (2004, March). Automated essay score predictions as
a formative assessment tool. Paper presented at the 15th international con-
ference of the Society for Information Technology and Teacher Education,
Atlanta, GA.
Shute, V.J. (1994). Regarding the I in ITS: Student modeling. In T. Ottmann & I.
Tomek (Eds.), Proceedings of Educational Multimedia and Hypermedia 94
(pp. 50-57). Charlottesville, VA: Association for the Advancement of Com-
puting in Education.
Skinner, B.F. (1958). Teaching Machines. Science, 128(3300), 969-977
Smith, J., & Niemi, R. (2001). Learning history in school: The impact of course
work and instructional practices on achievement. Theory and Research in
Social Education, 29(1), 18-42.
19. Automated Formative Assessment 39
Sundberg, S.B. (2006). An investigation of the effects of exam essay questions
on student learning in United States History survey classes. The History
Teacher, 40(1). Retrieved November 13, 2009 from http://www.historycoo-
perative.org/cgi-bin/cite.cgi?=ht/40.1/sundberg.html
Valenti, S., Neri, F., & Cucchiarelli, A. (2003). An overview of current research
on automated essay grading. Journal of Information Technology Education,
2, 319-330.
Van Nostrand, A.D. (1979). Writing and the generation of knowledge. Social
Education, 43(3), 178-180.
Venkateswaran, U., & Morgan, R. (2002). Assessing historical thinking skills:
Scoring the AP U.S. History Document-Based Question. Organization of
American Historians Newsletter. Retrieved November 13, 2009 from http://
www.oah.org/pubs/nl/nov02/ets.html
Wang, J., & Brown, M. (2007). Automated essay scoring versus human scoring:
A comparative study. Journal of Technology, Learning, and Assessment,
6(2). Retrieved November 13, 2009 from http://escholarship.bc.edu/cgi/
viewcontent.cgi?article=1100&context=jtla