SlideShare a Scribd company logo
1 of 13
Download to read offline
GLOBAL CHALLENGE ISSN 1857-8934
International Journal of Linguistics, Literature and Translation, Vol: III, issue 1
Automated Essay Scoring versus Human Scoring: A Reliability Check
Ali Doğan, Ph.D.c., Associate Professor Azamat A. Akbarova, Ph.D., HakanAydoğan, Ph.D.c.,
Kemal GĂśnen, Ph.D.c., EnesTuncdemir, Ph.D.c
zirvealidogan@gmail.com,azamatakbar@yahoo.com,aydoganh@hotmail.com,
kemalgon@gmail.com,etuncdemir@ymail.com
Abstract
New materials have continuously been added to the assessment instruments in ELT day by day.
The question of whether writing assessment in ELT can be done via E-RaterÂŽ was first
addressed in 1996, and this system, which is commonly called “Automated Essay Scoring
Systems” in especially America and Europe in recent years, has taken part in the field of
assessment instruments of ELT with steady development. The purpose of this study is to find out
whether AES can supersede the writing assessment system that is used at The School of Foreign
Languages at Zirve University. It is performed at The School of Foreign Languages at Zirve
University. The participants of the study were a group of 50 students in level C that is the
equivalent of B1. The beginning of the quantitative study includes the assessment of essays
written by C level students at The School of Foreign Languages at Zirve University by three
human raters and E-RaterÂŽ. After the study it was found that the writing assessment has been
currently used at The School of Foreign Languages at Zirve University costs more energy, more
time and it is more expensive. Thus, AES was suggested for use at The School of Foreign
Languages at Zirve University, which has proven to be more practicable.
GLOBAL CHALLENGE ISSN 1857-8934
International Journal of Linguistics, Literature and Translation, Vol: III, issue 1
Introduction
Automated Essay Scoring (AES) has become increasingly popular in academic institutions as an
alternative solution to assess the writing skills in a fast, efficient, and accurate way. Automated
essay scoring (AES) engines “employ computer technology to evaluate and score written prose.
While some forms of writing, such as poetry, may never be covered, we estimate that
approximately 90 percent of required writing in a typical college classroom can be evaluated
using AES. Many universities and colleges are implementing, “one of the most innovative
instructional technologies in use in college classrooms today: an online automated essay-scoring
service from ETS called Criterion®” (Williamson, 2003).
As Shermis&Barrera (2002) defined Automated Essay Scoring (AES) as using computer
technology to evaluate and score the essays. Burstein also emphasizes that AES is mainly used
because of reliability, validity, and cost issues in writing assessment. It is taking a great attraction
from public schools, colleges, assessment companies, researchers and educators
(Shermin&Brurstein, 2003). Project Essay Grader (PEG™), E-Rater®, Criterion®, Intelligent
Essay Assessor™ (IEA), MYAccess®, and Bayesian Essay Test Scoring System™ (BETSY),
IntelliMetricÂŽ are recently used common automated essay scoring systems.
Although AES has become a more mainstream method of assessing essays there are critics who
voice their skepticism and continuously point out the fallacies of such a grading system. Many
individuals believe that such software programs are not sufficiently sophisticated enough to
imitate human intelligence thus leading to faults in the grade assessment of each paper (as cited
in Williamson, 2003). Although criticisms continue to be voiced the broader assessment
community suggests that automated scoring does have valid applications for the assessment of
writing (as cited in Williamson, 2003).
The assessment of the writing skills of individuals in the world of academia can be a long and
tedious process that may in fact, disrupt the performance of teachers and their ability to teach
effectively. As a result, artificial intelligence has entered into the field of education and English
as a method to resolve this sometimes repetitive and time-consuming process. Such Artificial
Intelligence (A.I.) based scoring systems claims to bring relative efficiency to the automation of
scoring essay by simulating human intelligence and behavior in the electronic system of a
computer such software attempts to relieve the burden of grading from educators. This helps to
GLOBAL CHALLENGE ISSN 1857-8934
International Journal of Linguistics, Literature and Translation, Vol: III, issue 1
enhance human performance in more significant areas such as the educators effectiveness in
teaching the material that will be assessed on the essays. Automated Scoring as a writing
assessment has been highly controversial while critics focus on the movement from indirect to
direct measurement of writing many institutions are adapting their assessment methods to
involve some sort of scoring system processed by artificial intelligence (Simon & Bennett,
2007).
Automated scoring technologies are finding wider acceptance among educators (Williamson,
2003). The software program IntelliMetricÂŽ, a scoring engine that assesses the skills of students
on essays has been implemented by the Commonwealth of Pennsylvania. This engine used to
score the writing of students on the state achievement examinations of Pennsylvania (as cited in
Williamson, 2003). Numerous states have followed Pennsylvania as a model and have begun to
implement their own Automated Essay Scoring software similar to that of IntelliMetricÂŽ
(Williamson, 2003).
Many individuals who support such dramatic changes in assessing essays believe that not only
does such automated scoring software help relieve the grading burden from the educators but
also adds a level of consistency that sometimes humans cannot achieve. Many factors wear on
individuals who are assessing papers including stress and exhaustion. These two factors result in
a form of inconsistency from the grader. These concerns revolve around the unreliability of the
evaluation of essay answers and those individuals who are assessing them (Shermis& Barrera,
2004). The automated software is not affected by such factors that human graders are prone to
suffer, which increases consistency in the grading assessment of each individual student. The
primary concern in the case of automation is if the task itself is computable rather than if the
software is able to perform to the standards of the academic institution implementing the system.
In most cases, when it comes to the assessment of the essays and the writing skills of the
students, the task itself is computable and thus the idea of implementing an automated essay
scoring system is possible (Attali& Burstein,2006).Although such automated scoring
technologies are finding wider acceptance among academic institutions, the drive to find new
tools and more accurate AES software mustn’t diminish. Criterion® has been recently updated to
a 4.0 Version, which, “marks a turning point in the history and practice of writing instruction and
remediation” (Williamson, 2003). The evolution of such software will only help to advance the
GLOBAL CHALLENGE ISSN 1857-8934
International Journal of Linguistics, Literature and Translation, Vol: III, issue 1
validity of such programs and solidify its place in the world of academia as a popular method for
assessing the writing skills of students.
METHODOLOGY
Participants and procedure
We have had 50 participants and their essays (49 were valid for grading by electronic rater).
Three human raters and one electronic rater have graded the essays. The human raters have been
chosen by their similarity (e. g. the same gender, the same teaching profession, working in the
same intuition on same branch), so we have been able to control as much as possible other
factors which could affect the final grades and their averages. The range of grades (marks) was
from 1 to 6.
Instruments
We have used electronic rater, which has had the same range of grades as the human raters and
the same criteria (please explain on which criteria is the electronic rater based and describe it in
four-five sentences).
RESULTS AND DISCUSSIONS
First, we have calculated average values, standard deviations, minimal and maximal results and
ranges of human raters’ and electronic rater’s grades. The results are shown in Table 1.
Table 1. Descriptive statistical values
Rater M SD min max range
first human 3.98 .97 2 6 4
second human 3.78 1.03 2 6 4
third human 3.92 .95 2 6 4
electronic 3.78 .98 1 6 5
As we can see from Table 1, the highest mean value (arithmetic mean) is that of the first human
rater (M = 3.98), and the lowest mean values are for second human and electronic rater (M =
3.78). The theoretical average value for the 6-point scale is 3.5, so these arithmetic means are
close to it. The most variable are the grades of second human rater (SD = 1.03) and the least
variable are those of the third human rater (SD = .95). The theoretical range for the 6-point scale
GLOBAL CHALLENGE ISSN 1857-8934
International Journal of Linguistics, Literature and Translation, Vol: III, issue 1
is 5 and only the electronic rater has this range of results (human raters have smaller range i. e.
that of five units).In order to test our first hypothesis, we have calculated Pearson’s product-
moment coefficients of correlation and these data are provided in Table 2. Before that, we have
had calculated the average results for human raters.
The most important cell of this table is the field where average of human raters and electronic
rater are crossing. The intercorrelations of human raters are giving information of their
concordance i.e. similarity in grading process and results. They also can be useful for our
conclusions about mildness or austerity of our human raters, but must be combined with values
of their average results and with examination of shape of their distributions.If the correlation of
average results for human raters and E-RaterÂŽ is high enough and statistically significant, then
we can say that our first hypothesis is proven.
Table 2. Correlation matrix between grades of human raters and electronic rater
Raters first human rater second human
rater
third human rater average of
human raters
electronic rater
first human rater 1 .583** .652** .857** .716**
second human rater .583** 1 .641** .862** .712**
third human rater .652** .641** 1 .879** .890**
average of human raters .857** .862** .879** 1 .890**
electronic
rater
.716** .712** .890** .890** 1
*correlation coefficients are significant at level .05
** correlation coefficients are significant at level .01
As we can see from Table 2, all coefficients are statistically significant at level .01. So, the
results of all human raters are in statistically significant correlations with each other. First human
rater's results are in higher correlation with those of the third human rater (r = .652, p < .01) than
with those of the second human rater (r = .583, p < .01). Second human rater's results are also in
higher correlation with the grades of the third human rater (r = .641, p < .01). The average
estimates for human raters taken alltogether are in the highest correlation with the estimates of
the third human rater (r = .879, p < .01).
GLOBAL CHALLENGE ISSN 1857-8934
International Journal of Linguistics, Literature and Translation, Vol: III, issue 1
The grades from the third human rater are the closest grades of all raters to estimates which are
provided by electronic rater (r = .890, p < .01). Our coefficient, which is the most relevant
indicator of acceptance of our first hypothesis is very high and statistically significant (r = .890, p
< .01). Therefore, we can conclude that electronic rater's grades are significantly correlated with
average results of all human raters. The common variance of these two ''measures'', i.e. the
coefficient of determination is r2
= .7921. It means that these measures share 79.21% of
variability. Hence, we have proved our first hypothesis.
Our second hypothesis is defined in terms of differences between average human rater's
estimates and those which were provided by electronic rater. Because of that, and because our
variables are dependent, we have conducted paired-samples t-test. If its results show that average
values are not statistically different one from each other, we can accept our null-hypothesis. The
results of t-test are below in Table 3. In the first two columns of table are arithmetic means (M)
and standard devioations (SD). In the next four columns are: standard errors of arithmetic means
(SEM), difference of means (Mdiff), standard deviation of differences (SDdiff) and standard error
of mean differences (SEMdiff). In last three columns are: the value of t-statistic, degrees of
freedom (df) and significance value (p-value).
Table 3. The results of paired-samples t-test
Pair M SD SEM Mdiff. SDdiff. SEMdiff. t df p
Electronic
rater
3.78 .98 .14
-.11 .45 .06 -1.803 48 .078
Average of
human raters
3.89 .85 .12
As we can see from Table 3, the mean value for the variable ''average of human raters'' is a bit
higher than that of electronic rater (3.89 vs. 3.78), but this difference is not statistically
significant (t = -1.803, df = 58, p > .05). We can conclude that average values of this raters do
not differ enough, therefore, electronic rater's average of assessments is similar to average value
of human raters' assessments. Hence, we have confirmed our second hypothesis.
Our third hypothesis is based on discriminativity, as one of four metric characteristics (the other
of them are: probability, validity and objectivity). So, we have checked if the electronic rater
GLOBAL CHALLENGE ISSN 1857-8934
International Journal of Linguistics, Literature and Translation, Vol: III, issue 1
evaluates essays in a manner that can detect little differences between them. It is not
recommended that all students (essays) have similar marks (grades), nor small, nor high marks.
The most plausable option is that our rater provides more average grades than small or high
grades. We expect approximately the same number of very low and very high marks (estimates,
grades), but their number must be smaller than the number of average marks (e.g. 3 and 4 in this
case).In order to test if the distribution of estimates for electronic rater does not differ from the
normal curve, we have conducted Kolmogorov-Smirnov test (see Table 4).
Table 4. The results of normality tests for electronic rater's grades
Test of normality Statistic df p
Kolmogorov-Smirnov .213 49 .000
Shapiro-Wilk .891 49 .000
As we can see, the value of Kolmogorov Smirnov test is statistically significant (K-S z = .213, df
= 49, p < .001). We have also applied Shapiro-Wilk's test because our sample is small and the
results are consistent with those from K-S test (Sh-W statistic = .891, df = 49, p < .001). Hence,
our distribution statistically differs from the normal curve.
Below is Figure 1, where we graphed it and it helped us to conclude about its asymmetry.
1,00 2,00 3,00 4,00 5,00 6,00
elr
0
5
10
15
20
Frequency
Mean = 3,7755
Std. Dev. = 0,98457
N = 49
Figure 1. Distribution of electronic rater's grades
Examining the Figure 1, we can see that our distribution has negative asymmety, i. e. the number
of low grades is quite smaller than the number of high grades. So, we can say two things: the
GLOBAL CHALLENGE ISSN 1857-8934
International Journal of Linguistics, Literature and Translation, Vol: III, issue 1
electronic rater is mild estimator of essays or these distribution of grades could be affected by the
fact that students from our sample are group which is above average in writing essays. Hence,
our last hypothesis has been rejected.
We have also examined the shape of distribution of human raters' average grades. The results are
displayed in Table 5.
Table 5. The results of normality tests for human raters' average grades
Test of normality Statistic df p
Kolmogorov-Smirnov .163 49 .002
Shapiro-Wilk .949 49 .034
In this case, the situation is similar to previous one. Kolmogorov-Simirnov Z statistic is
statistically significant (K-S z = .163, df = 49, p < .01) and Shapiro-Wilks's statistic is also
significant (Sh-W = .949, df = 49, p < .05). In Figure 2 is the histogram for this variable.
2,50 3,00 3,50 4,00 4,50 5,00 5,50 6,00
hrAVE
0
2
4
6
8
10
Frequency
Mean = 3,8912
Std. Dev. = 0,85089
N = 49
Figure 2. Figure 1. Distruibution of human raters' average grades
From Figure 2, we can conclude that this distribution has positive asymmetry, i. e. there are more
low grades than high grades. So, human raters were more severe than electronic raters.
To conclude, our electronic rater provides similar results to those of human raters (taken
alltogether, with average measure of their estimates). However, the distribution of its grades
shows us that it gives to students a bit higher grades than they deserve (i.e. electronic rater, to a
some degree, overestimates the quality of their essays).
Conclusion
There are some requirements to use AES at universities to assess writing. These are seminars
GLOBAL CHALLENGE ISSN 1857-8934
International Journal of Linguistics, Literature and Translation, Vol: III, issue 1
about how to use AES, access the Internet, personal computers, and a computer laboratory at
schools. Carefully planned laboratory times are arranged if the system is used in computer
laboratories at schools because timing may be demotivating for the students if they are forced to
do writing activities in a limited time. The second issue about AES is that institutions or
universities have to buy the system if they want to use it affectively. In this study, CriterionÂŽ
was used as an AES tool to assess the students’ writing. It is commonly used in America and
Europe. It enables students to brainstorm about a given topic, share their ideas, outline their
ideas, write and get feedback from their teachers. Upon finishing writing their final draft, they
upload and get their results in detail in a report prepared by the system. The final process takes a
few minutes, so it is not time-consuming, but timesaving. It is beneficial for both students and
teachers. For teachers, it allows teachers to follow and monitor their students’ writing process,
and give individual and detailed feedback by using the report given by the system. For students,
they can get immediate feedback about their writing in detail, so they can see their improvements
in their writing by matching teacher and system feedback and also improve their writing by
analysing teachers’ feedback for their first draft and system feedback for their final draft. Also,
the system minimizes the time that is required for assessing students’ writing. Three human
raters in the study assessed and evaluated fifty pieces of writings in twenty days, but CriterionÂŽ
did the same job in one hour. To conclude, if the requirements of the system are met, how the
system works is understood, AES can be used effectively at university level.
References:
Anson, C.M. (2003). Responding to and assessing student writing:
The uses and limits of technology. In P. Takayoshi & B. Huot (Eds.), Teaching writing
with computers: An introduction (pp. 234–245). New York: Houghton Miðin Company.
Attali, Y., Bridgeman, B., & Trapani, C. (2010).Performance of a generic approach in automated
essay scoring.Journal of Technology.
Burstein, J. (2003). The E-RaterÂŽ scoring engine: Automated essay scoring with natural
language processing. In M. D. Shermis& J. Burstein (Eds.), Automated essay scoring: A
cross-disciplinary perspective(pp. 113–122). Mahwah, NJ: Lawrence Erlbaum
Associates, Inc.
Burstein, J. &Marcu, D. (2000). Benefits of modularity in an Automated Essay Scoring System
(ERIC reproduction service no TM 032 010).
Burstein, J., Chodorow, M., & Leacock, C. (2003, August).CriterionÂŽ: Online essay evaluation:
an application for automated evaluation of student essays. Proceedings of the 15th
GLOBAL CHALLENGE ISSN 1857-8934
International Journal of Linguistics, Literature and Translation, Vol: III, issue 1
Annual Conference on Innovative Applications of Artificial Intelligence, Acapulco,
Mexico.
Burstein, J., Kukich, K., Wolff, S., Lu, C., &Chodorow, M. (1998, April). Computer analysis of
essays.Proceedings of the NCME Symposium on Automated Scoring, Montreal, Canada.
Burstein, J., Leacock, C., & Swartz, R. (2001).Automated evaluation of essays and short
answers. Proceedings of the 5th International Computer Assisted Assessment Conference
(CAA 01), Loughborough University.
Carlson, J. G., Bridgeman, B., Camp, R. and Waanders, J. 1985).
Carlson, S. B., Bridgeman, B. (1986). Testing ESL student writers. In K. L. Greenberg, H. S.
Wiener, & R. A Donovan (Eds.), Writing assessment: Issues and strategies (pp. 126-152).
NY: Longman.
China Papers.(n.d.). Retrieved September 16, 2012 from, http://www.chinapapers.com.
Chung, K. W. K. & O’Neil, H. F. (1997). Methodological approaches to online scoring of essays
(ERIC reproduction service no ED 418 101).
Cooper, C. R., & Odell, L. (Eds.). (1977). Evaluating writing: Describing, measuring, judging.
Urbana, IL: NCTE.
Diederich, P. B., French, J. W., & Carlton, S. T. (1961). Factors in judgments of writing ability
(ETS research bulletin RB-61-15). Princeton, NJ: Educational Testing Service.
Dikli, S. (2006).An Overview of Automated Scoring of Essays.Journal of Technology, Learning,
and Assessment, 5(1). Retrieved July 15, 2012, from http://www.jtla.org.
Educational Testing Service (ETS).(n.d.). Retrieved on August 17, 2012, from
http://www.ets.org.
Elliot, S. (2003). IntelliMetricÂŽ: from here to validity. In Mark D. Shermis and Jill C. Burstein
(Eds.). Automated essay scoring: a cross disciplinary approach. Mahwah, NJ: Lawrence
Erlbaum Associates.
Greer, G. (2002). Are computer scores on essays the same as essay scores from human experts?
In Vantage Learning, Establishing WritePlacer Validity: A summary of studies (pp. 10–
11). (RB-781). Yardley, PA: Author.
Gregory, K. (1991). More than a decade’s highlight?The holistic scoring consensus and the need
for change. (ERIC Document Reproduction Service No. ED 328 594).
Hamp-Lyons, L. (1990). Basic concepts. In L. Hamp-Lyons (Ed.), Assessing second language
writing in academic contexts (pp. 5-15). Norwood, NJ: Ablex.
Herrington, A., & Moran, C. (2001). What happens when machines read our students’ writing?
College English, 63(4), 480–499.
Hughes, A. (1989). Testing for language teachers. NY: Cambridge University Press.
Huot, B. (1990). Reliability, validity and holistic scoring: What we know and what we need to
know. College Composition and Communication, 41(2), 201-213.
Huot, B. (2002). (Re)Articulating writing assessment for teaching and learning. Logan, UT: Utah
State University Press.
Huot, B., & Neal, M. (2006). Writing assessment: A techno-history. In Charles MacArthur,
Steve Graham, & Jill Fitzgerald (Eds.), Handbook of writing research (pp. 417–432).
New York: Guildford Publications. in Education: Principles, Policy and Practice, 15, 91–
105.
Jacobs, H. L., Zinkgraf, S. A., Wormuth, D. R., Hartfiel, V. F., &Hughey, J. B. (1981). Testing
ESL composition: A practical approach. Rowley, MA: Newbury House.
GLOBAL CHALLENGE ISSN 1857-8934
International Journal of Linguistics, Literature and Translation, Vol: III, issue 1
Klobucar, A., Deane, P., Elliot, N., Ramineni, C., Deess, P., &Rudniy, A. (2012). Automated
essay scoring and the search for valid writing assessment. In: C. Bazerman, C. Dean, J.
Early, K. Lunsford, S. Null, P. Rogers, & A. Stansell (Eds.), International advances in
writing research: Cultures, places, measures (p. 103-119). Fort Collins,
Colorado/Anderson, SC: WAC Clearinghouse/Parlor Press
http://wac.colostate.edu/books/wrab2011/chapter6.pdf.
Kukich, K. (2000). Beyond Automated Essay Scoring. IEEE Intelligent Systems, 15(5), 22–27.
Learning & Assessment, 10. Retrieved from
http://ejournals.bc.edu/ojs/index.php/jtla/article/view/1603/1455
Lloyd-Jones, R. (1987). Test of writing ability. In G. Tate (Ed.), Teaching composition (pp. 155-
176). Texas Christian University Press.Mance. San Francisco, CA: Jossey-Bass.
Mann, R. (1988). Measuring writing competency. (ERIC Document Reproduction Service No.
ED 295 695).
Mitchell, K. & Anderson, J. (1986).Reliability of holistic scoring for the MCAT essay.
Educational and Psychological Measurement, 46, 771-775.
Nivens-Bower, C. (2002). Faculty-WritePlacer Plus score comparisons. In Vantage Learning,
Establishing WritePlacer Validity: A summary of studies (p. 12). (RB-781).
Yardley, PA: Author. Norusis, M. J. (2004). SPSS 12.0 guide to data analysis. Upper Saddle
River, NJ: Prentice Hall.Admission test scores to writing performance of native and
nonnative speakers of English TOEFL Research Report #19). Princeton, NJ: Educa-
tional Testing Service.
Page, E. B. (1994). Computer Grading of Student Prose, Using Modern Concepts and Software,
Journal of Experimental Education, 62, 127–142.
Page, E. B. (2003). Project Essay Grade: PEG. In M. D. Shermis& J. Burstein (Eds.), Automated
essay scoring: A cross-disciplinary perspective (pp. 43–54). Mahwah, NJ: Lawrence
Erlbaum Associates.
Page, E. B. & Petersen, N. S. (1995). The computer moves into essay grading: Updating the
ancient test. Phi Delta Kappan, 76, 561–565.
Page, E.B., Poggio, J.P., & Keith, T.Z. (1997, March). Computer analysis of student essays:
Finding trait diĂ°erences in student profile. Paper presented at the Annual Meeting of the
American Educational Research Association, Chicago, IL. (ERIC Document
Reproduction Service No. ED411316).
Perkins, K. (1983). On the use of composition scoring techniques, objective measure, and
objective tests to evaluate ESL writing ability. TESOL Quarterly, 17(4), 651-671.
Powers, D.E., Burstein, J.C., Fowles, M.E., &Kukich, K. (2001, March). Stumping E-RaterÂŽ:
Challenging the validity of automated essay scoring. (GRE Board Research Report No.
98-08bP). Princeton, NJ: Educational Testing Service. Retrieved November 15, 2012,
from http://www.ets.org/Media/Research/pdf/RR-01-03-Powers.pdf.
Rudner, L. & Gagne, P. (2001). An overview of three approaches to scoring written essays by
computer (ERIC Digest number ED 458 290).
Rudner, L.M., Garcia, V., & Welch, C. (2006). An evaluation of the IntelliMetricÂŽTM essay
scoring system. Journal of Technology, Learning, and Assessment, 4(4). Retrieved
August 6, 2012, from http://escholarship.bc.edu/jtla/vol4/4/ Shermis,
M. & Barrera, F. (2002). Exit assessments: Evaluating writing ability through Automated Essay
Scoring (ERIC document reproduction service no ED 464 950).
GLOBAL CHALLENGE ISSN 1857-8934
International Journal of Linguistics, Literature and Translation, Vol: III, issue 1
Shermis, M. D., Raymat, M. V., & Barrera, F. (2003). Assessing writing through the curriculum
with Automated Essay Scoring (ERIC document reproduction service no ED 477 929).
Shermis, M. D., Shneyderman, A., &Attali, Y. (2008). How important is content in the ratings of
essay assessments? Assessment Shermis, M.D., Koch, C.M., Page, E. B., Keith, T.Z., &
Harrington, S. (2002). Trait ratings for automated essay grading. Educational and
Psychological Measurement, 62(1), 5–18.
Shermis, M.D., Koch, C.M., Page, E. B., Keith, T.Z., & Harrington, S. (2002). Trait ratings for
automated essay grading. Educational and Psychological Measurement, 62(1), 5–18.
Technology and Teacher Education, 8 (4). Retrieved from
http://www.citejournal.org/vol8/iss4/languagearts/article1.cfm.
Vantage Learning (2001a). Applying IntelliMetricÂŽTM to the scoring of entry- level college
student essays. (RB-539). Retrieved September 1, 2012, from
http://www.vantagelearning.com/content_pages/research.html.
Vantage Learning (2002). A study of expert scoring, standard human scoring and IntelliMetricÂŽ
scoring accuracy for state-wide eighth grade writing responses. (RB-726). Retrieved
September 10, 2012, from http://www.vantagelearning.
com/content_pages/research.html.
Vantage Learning. (2000a). A study of expert scoring and IntelliMetricÂŽ scoring accuracy for
dimensional scoring of Grade 11 student writing responses (RB-397). Newtown, PA:
Vantage Learning.
Vantage Learning.(2000b). A true score study of IntelliMetricÂŽ accuracy for holistic and
dimensional scoring of college entry-level writing program (RB-407). Newtown,
PA:Vantage Learning.
Vantage Learning. (2001b). Applying IntelliMetricÂŽ Technology to the scoring of 3rd and 8th
grade standardized writing assessments (RB-524). Newtown, PA: Vantage Learning.
Vantage Learning. (2002). A study of expert scoring, standard human scoring and IntelliMetricÂŽ
scoring accuracy for state-wide eighth grade writing responses (RB-726). Newtown, PA:
Vantage Learning.
Vantage Learning.(2003). A comparison of IntelliMetricÂŽTM and expert scoring for the
evaluation of literature essays. (RB-793). Retrieved December 2, 2012, from
http://www.vantagelearning. com/content_pages/research.html.
Vantage Learning. (2003a). Assessing the accuracy of IntelliMetricÂŽ for scoring a district-wide
writing assessment (RB-806). Newtown, PA: Vantage Learning.
Vantage Learning. (2003b). How does IntelliMetricÂŽ score essay responses? (RB-929).
Newtown, PA: Vantage Learning.
Vantage Learning.(2003c). A true score study of 11th grade student writing responses using
IntelliMetricÂŽ Version 9.0 (RB-786). Newtown, PA: Vantage Learning.
Wang, J., & Brown, M. S. (2008). Automated essay scoring versus human scoring: A
correlational study.
Wang, Jinhao, & Brown, Michelle Stallone. (2007). Automated essay scoring versus human
scoring: A comparative study. The Journal of Technology.
Warschauer, M. & Ware, P. (2006). Automated writing evaluation: Defining the classroom
research agenda. Language Teaching Research, 10, 2, 1–24. Retrieved June 18, 2012,
from http://www.gse.uci.edu/faculty/markw/awe.pdf.
GLOBAL CHALLENGE ISSN 1857-8934
International Journal of Linguistics, Literature and Translation, Vol: III, issue 1
White, E. M. (1994). Teaching and assessing writing: Recent advances in understanding,
evaluating, and improving student performance.
Yang, Y., Buckendahl, C.W., Juszkiewicz, P.J., &Bhola, D.S. (2002).
A review of strategies for validating computer automated scoring. Applied Measurement
in Education, 15(4), 391–412.

More Related Content

Similar to Automated Essay Scoring Versus Human Scoring A Reliability Check

Automated Evaluation of Essays and Short Answers.pdf
Automated Evaluation of Essays and Short Answers.pdfAutomated Evaluation of Essays and Short Answers.pdf
Automated Evaluation of Essays and Short Answers.pdfKathryn Patel
 
Computer generated feedback
Computer generated feedbackComputer generated feedback
Computer generated feedbackMagdy Mahdy
 
A Comprehensive Review Of Automated Essay Scoring (AES) Research And Development
A Comprehensive Review Of Automated Essay Scoring (AES) Research And DevelopmentA Comprehensive Review Of Automated Essay Scoring (AES) Research And Development
A Comprehensive Review Of Automated Essay Scoring (AES) Research And DevelopmentAnna Landers
 
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS cscpconf
 
Automated Writing Evaluation
Automated Writing EvaluationAutomated Writing Evaluation
Automated Writing EvaluationClaire Webber
 
assingnment.docx
assingnment.docxassingnment.docx
assingnment.docxwrite12
 
assingnment.docx
assingnment.docxassingnment.docx
assingnment.docxwrite12
 
Cyprus_paper-final_D.O.Santos_et_al
Cyprus_paper-final_D.O.Santos_et_alCyprus_paper-final_D.O.Santos_et_al
Cyprus_paper-final_D.O.Santos_et_alVictor Santos
 
Module 6 Powerpoint
Module 6 PowerpointModule 6 Powerpoint
Module 6 Powerpointguest5a2e6f
 
Module 6 Powerpoint Presentation
Module 6 Powerpoint PresentationModule 6 Powerpoint Presentation
Module 6 Powerpoint Presentationguest5a2e6f
 
Automated Question Paper Generator And Answer Checker Using Information Retri...
Automated Question Paper Generator And Answer Checker Using Information Retri...Automated Question Paper Generator And Answer Checker Using Information Retri...
Automated Question Paper Generator And Answer Checker Using Information Retri...Sheila Sinclair
 
An Evaluation Of The IntelliMetric
An Evaluation Of The IntelliMetricAn Evaluation Of The IntelliMetric
An Evaluation Of The IntelliMetricAaron Anyaakuu
 
Automated Essay Scoring Using Generalized Latent Semantic Analysis
Automated Essay Scoring Using Generalized Latent Semantic AnalysisAutomated Essay Scoring Using Generalized Latent Semantic Analysis
Automated Essay Scoring Using Generalized Latent Semantic AnalysisGina Rizzo
 
Automated Thai Online Assignment Scoring
Automated Thai Online Assignment ScoringAutomated Thai Online Assignment Scoring
Automated Thai Online Assignment ScoringMary Montoya
 
A Framework Of A Computer-Based Essay Marking System For ESL Writing.
A Framework Of A Computer-Based Essay Marking System For ESL Writing.A Framework Of A Computer-Based Essay Marking System For ESL Writing.
A Framework Of A Computer-Based Essay Marking System For ESL Writing.Carmen Pell
 
Automatic Generation of Multiple Choice Questions using Surface-based Semanti...
Automatic Generation of Multiple Choice Questions using Surface-based Semanti...Automatic Generation of Multiple Choice Questions using Surface-based Semanti...
Automatic Generation of Multiple Choice Questions using Surface-based Semanti...CSCJournals
 
An Evaluation Of The IntelliMetric SM Essay Scoring System
An Evaluation Of The IntelliMetric SM  Essay Scoring SystemAn Evaluation Of The IntelliMetric SM  Essay Scoring System
An Evaluation Of The IntelliMetric SM Essay Scoring SystemJoe Andelija
 

Similar to Automated Essay Scoring Versus Human Scoring A Reliability Check (17)

Automated Evaluation of Essays and Short Answers.pdf
Automated Evaluation of Essays and Short Answers.pdfAutomated Evaluation of Essays and Short Answers.pdf
Automated Evaluation of Essays and Short Answers.pdf
 
Computer generated feedback
Computer generated feedbackComputer generated feedback
Computer generated feedback
 
A Comprehensive Review Of Automated Essay Scoring (AES) Research And Development
A Comprehensive Review Of Automated Essay Scoring (AES) Research And DevelopmentA Comprehensive Review Of Automated Essay Scoring (AES) Research And Development
A Comprehensive Review Of Automated Essay Scoring (AES) Research And Development
 
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS
 
Automated Writing Evaluation
Automated Writing EvaluationAutomated Writing Evaluation
Automated Writing Evaluation
 
assingnment.docx
assingnment.docxassingnment.docx
assingnment.docx
 
assingnment.docx
assingnment.docxassingnment.docx
assingnment.docx
 
Cyprus_paper-final_D.O.Santos_et_al
Cyprus_paper-final_D.O.Santos_et_alCyprus_paper-final_D.O.Santos_et_al
Cyprus_paper-final_D.O.Santos_et_al
 
Module 6 Powerpoint
Module 6 PowerpointModule 6 Powerpoint
Module 6 Powerpoint
 
Module 6 Powerpoint Presentation
Module 6 Powerpoint PresentationModule 6 Powerpoint Presentation
Module 6 Powerpoint Presentation
 
Automated Question Paper Generator And Answer Checker Using Information Retri...
Automated Question Paper Generator And Answer Checker Using Information Retri...Automated Question Paper Generator And Answer Checker Using Information Retri...
Automated Question Paper Generator And Answer Checker Using Information Retri...
 
An Evaluation Of The IntelliMetric
An Evaluation Of The IntelliMetricAn Evaluation Of The IntelliMetric
An Evaluation Of The IntelliMetric
 
Automated Essay Scoring Using Generalized Latent Semantic Analysis
Automated Essay Scoring Using Generalized Latent Semantic AnalysisAutomated Essay Scoring Using Generalized Latent Semantic Analysis
Automated Essay Scoring Using Generalized Latent Semantic Analysis
 
Automated Thai Online Assignment Scoring
Automated Thai Online Assignment ScoringAutomated Thai Online Assignment Scoring
Automated Thai Online Assignment Scoring
 
A Framework Of A Computer-Based Essay Marking System For ESL Writing.
A Framework Of A Computer-Based Essay Marking System For ESL Writing.A Framework Of A Computer-Based Essay Marking System For ESL Writing.
A Framework Of A Computer-Based Essay Marking System For ESL Writing.
 
Automatic Generation of Multiple Choice Questions using Surface-based Semanti...
Automatic Generation of Multiple Choice Questions using Surface-based Semanti...Automatic Generation of Multiple Choice Questions using Surface-based Semanti...
Automatic Generation of Multiple Choice Questions using Surface-based Semanti...
 
An Evaluation Of The IntelliMetric SM Essay Scoring System
An Evaluation Of The IntelliMetric SM  Essay Scoring SystemAn Evaluation Of The IntelliMetric SM  Essay Scoring System
An Evaluation Of The IntelliMetric SM Essay Scoring System
 

More from Brittany Brown

Minimalist Neutral Floral Lined Printable Paper Digit
Minimalist Neutral Floral Lined Printable Paper DigitMinimalist Neutral Floral Lined Printable Paper Digit
Minimalist Neutral Floral Lined Printable Paper DigitBrittany Brown
 
Project Concept Paper. Online assignment writing service.
Project Concept Paper. Online assignment writing service.Project Concept Paper. Online assignment writing service.
Project Concept Paper. Online assignment writing service.Brittany Brown
 
Writing Paper Clipart Free Writing On Paper
Writing Paper Clipart Free Writing On PaperWriting Paper Clipart Free Writing On Paper
Writing Paper Clipart Free Writing On PaperBrittany Brown
 
Best Friend Friendship Day Essay 015 Friendship Essay Examples
Best Friend Friendship Day Essay 015 Friendship Essay ExamplesBest Friend Friendship Day Essay 015 Friendship Essay Examples
Best Friend Friendship Day Essay 015 Friendship Essay ExamplesBrittany Brown
 
Best QUOTES For ESSAY Writing QUOTATIONS For Essay UPSC Essay Quotes
Best QUOTES For ESSAY Writing QUOTATIONS For Essay UPSC Essay QuotesBest QUOTES For ESSAY Writing QUOTATIONS For Essay UPSC Essay Quotes
Best QUOTES For ESSAY Writing QUOTATIONS For Essay UPSC Essay QuotesBrittany Brown
 
Cheap Essay Writing Service Uk. Online assignment writing service.
Cheap Essay Writing Service Uk. Online assignment writing service.Cheap Essay Writing Service Uk. Online assignment writing service.
Cheap Essay Writing Service Uk. Online assignment writing service.Brittany Brown
 
Case Study Science. Case Study The Art And Scienc
Case Study Science. Case Study The Art And SciencCase Study Science. Case Study The Art And Scienc
Case Study Science. Case Study The Art And SciencBrittany Brown
 
Best Paper Writing Service By Bestewsreviews On DeviantArt
Best Paper Writing Service By Bestewsreviews On DeviantArtBest Paper Writing Service By Bestewsreviews On DeviantArt
Best Paper Writing Service By Bestewsreviews On DeviantArtBrittany Brown
 
My Father Essay - Write An Essay On My Father My Hero (DAD) In English
My Father Essay - Write An Essay On My Father My Hero (DAD) In EnglishMy Father Essay - Write An Essay On My Father My Hero (DAD) In English
My Father Essay - Write An Essay On My Father My Hero (DAD) In EnglishBrittany Brown
 
My Mother Essay 1000 Words. Online assignment writing service.
My Mother Essay 1000 Words. Online assignment writing service.My Mother Essay 1000 Words. Online assignment writing service.
My Mother Essay 1000 Words. Online assignment writing service.Brittany Brown
 
Definition Essay Examples Love. Online assignment writing service.
Definition Essay Examples Love. Online assignment writing service.Definition Essay Examples Love. Online assignment writing service.
Definition Essay Examples Love. Online assignment writing service.Brittany Brown
 
How To Write A Paper For College Besttoppaperessay
How To Write A Paper For College BesttoppaperessayHow To Write A Paper For College Besttoppaperessay
How To Write A Paper For College BesttoppaperessayBrittany Brown
 
Proposal Samples - Articleeducation.X.Fc2.Com
Proposal Samples - Articleeducation.X.Fc2.ComProposal Samples - Articleeducation.X.Fc2.Com
Proposal Samples - Articleeducation.X.Fc2.ComBrittany Brown
 
Critical Analysis Essay Examples For Students
Critical Analysis Essay Examples For StudentsCritical Analysis Essay Examples For Students
Critical Analysis Essay Examples For StudentsBrittany Brown
 
Homeschool Research Paper. Online assignment writing service.
Homeschool Research Paper. Online assignment writing service.Homeschool Research Paper. Online assignment writing service.
Homeschool Research Paper. Online assignment writing service.Brittany Brown
 
Awesome Why Buy An Essay. Online assignment writing service.
Awesome Why Buy An Essay. Online assignment writing service.Awesome Why Buy An Essay. Online assignment writing service.
Awesome Why Buy An Essay. Online assignment writing service.Brittany Brown
 
Sample Essay Questions - Leading For Change Final
Sample Essay Questions - Leading For Change FinalSample Essay Questions - Leading For Change Final
Sample Essay Questions - Leading For Change FinalBrittany Brown
 
Access To Justice Essay - Introduction Justice Is Def
Access To Justice Essay - Introduction Justice Is DefAccess To Justice Essay - Introduction Justice Is Def
Access To Justice Essay - Introduction Justice Is DefBrittany Brown
 
Groundhog Day Writing Paper Teaching Resources
Groundhog Day Writing Paper  Teaching ResourcesGroundhog Day Writing Paper  Teaching Resources
Groundhog Day Writing Paper Teaching ResourcesBrittany Brown
 
Examples Of 6Th Grade Persuasive Essays. Online assignment writing service.
Examples Of 6Th Grade Persuasive Essays. Online assignment writing service.Examples Of 6Th Grade Persuasive Essays. Online assignment writing service.
Examples Of 6Th Grade Persuasive Essays. Online assignment writing service.Brittany Brown
 

More from Brittany Brown (20)

Minimalist Neutral Floral Lined Printable Paper Digit
Minimalist Neutral Floral Lined Printable Paper DigitMinimalist Neutral Floral Lined Printable Paper Digit
Minimalist Neutral Floral Lined Printable Paper Digit
 
Project Concept Paper. Online assignment writing service.
Project Concept Paper. Online assignment writing service.Project Concept Paper. Online assignment writing service.
Project Concept Paper. Online assignment writing service.
 
Writing Paper Clipart Free Writing On Paper
Writing Paper Clipart Free Writing On PaperWriting Paper Clipart Free Writing On Paper
Writing Paper Clipart Free Writing On Paper
 
Best Friend Friendship Day Essay 015 Friendship Essay Examples
Best Friend Friendship Day Essay 015 Friendship Essay ExamplesBest Friend Friendship Day Essay 015 Friendship Essay Examples
Best Friend Friendship Day Essay 015 Friendship Essay Examples
 
Best QUOTES For ESSAY Writing QUOTATIONS For Essay UPSC Essay Quotes
Best QUOTES For ESSAY Writing QUOTATIONS For Essay UPSC Essay QuotesBest QUOTES For ESSAY Writing QUOTATIONS For Essay UPSC Essay Quotes
Best QUOTES For ESSAY Writing QUOTATIONS For Essay UPSC Essay Quotes
 
Cheap Essay Writing Service Uk. Online assignment writing service.
Cheap Essay Writing Service Uk. Online assignment writing service.Cheap Essay Writing Service Uk. Online assignment writing service.
Cheap Essay Writing Service Uk. Online assignment writing service.
 
Case Study Science. Case Study The Art And Scienc
Case Study Science. Case Study The Art And SciencCase Study Science. Case Study The Art And Scienc
Case Study Science. Case Study The Art And Scienc
 
Best Paper Writing Service By Bestewsreviews On DeviantArt
Best Paper Writing Service By Bestewsreviews On DeviantArtBest Paper Writing Service By Bestewsreviews On DeviantArt
Best Paper Writing Service By Bestewsreviews On DeviantArt
 
My Father Essay - Write An Essay On My Father My Hero (DAD) In English
My Father Essay - Write An Essay On My Father My Hero (DAD) In EnglishMy Father Essay - Write An Essay On My Father My Hero (DAD) In English
My Father Essay - Write An Essay On My Father My Hero (DAD) In English
 
My Mother Essay 1000 Words. Online assignment writing service.
My Mother Essay 1000 Words. Online assignment writing service.My Mother Essay 1000 Words. Online assignment writing service.
My Mother Essay 1000 Words. Online assignment writing service.
 
Definition Essay Examples Love. Online assignment writing service.
Definition Essay Examples Love. Online assignment writing service.Definition Essay Examples Love. Online assignment writing service.
Definition Essay Examples Love. Online assignment writing service.
 
How To Write A Paper For College Besttoppaperessay
How To Write A Paper For College BesttoppaperessayHow To Write A Paper For College Besttoppaperessay
How To Write A Paper For College Besttoppaperessay
 
Proposal Samples - Articleeducation.X.Fc2.Com
Proposal Samples - Articleeducation.X.Fc2.ComProposal Samples - Articleeducation.X.Fc2.Com
Proposal Samples - Articleeducation.X.Fc2.Com
 
Critical Analysis Essay Examples For Students
Critical Analysis Essay Examples For StudentsCritical Analysis Essay Examples For Students
Critical Analysis Essay Examples For Students
 
Homeschool Research Paper. Online assignment writing service.
Homeschool Research Paper. Online assignment writing service.Homeschool Research Paper. Online assignment writing service.
Homeschool Research Paper. Online assignment writing service.
 
Awesome Why Buy An Essay. Online assignment writing service.
Awesome Why Buy An Essay. Online assignment writing service.Awesome Why Buy An Essay. Online assignment writing service.
Awesome Why Buy An Essay. Online assignment writing service.
 
Sample Essay Questions - Leading For Change Final
Sample Essay Questions - Leading For Change FinalSample Essay Questions - Leading For Change Final
Sample Essay Questions - Leading For Change Final
 
Access To Justice Essay - Introduction Justice Is Def
Access To Justice Essay - Introduction Justice Is DefAccess To Justice Essay - Introduction Justice Is Def
Access To Justice Essay - Introduction Justice Is Def
 
Groundhog Day Writing Paper Teaching Resources
Groundhog Day Writing Paper  Teaching ResourcesGroundhog Day Writing Paper  Teaching Resources
Groundhog Day Writing Paper Teaching Resources
 
Examples Of 6Th Grade Persuasive Essays. Online assignment writing service.
Examples Of 6Th Grade Persuasive Essays. Online assignment writing service.Examples Of 6Th Grade Persuasive Essays. Online assignment writing service.
Examples Of 6Th Grade Persuasive Essays. Online assignment writing service.
 

Recently uploaded

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.MateoGardella
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 

Recently uploaded (20)

Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 

Automated Essay Scoring Versus Human Scoring A Reliability Check

  • 1. GLOBAL CHALLENGE ISSN 1857-8934 International Journal of Linguistics, Literature and Translation, Vol: III, issue 1 Automated Essay Scoring versus Human Scoring: A Reliability Check Ali Doğan, Ph.D.c., Associate Professor Azamat A. Akbarova, Ph.D., HakanAydoğan, Ph.D.c., Kemal GĂśnen, Ph.D.c., EnesTuncdemir, Ph.D.c zirvealidogan@gmail.com,azamatakbar@yahoo.com,aydoganh@hotmail.com, kemalgon@gmail.com,etuncdemir@ymail.com Abstract New materials have continuously been added to the assessment instruments in ELT day by day. The question of whether writing assessment in ELT can be done via E-RaterÂŽ was first addressed in 1996, and this system, which is commonly called “Automated Essay Scoring Systems” in especially America and Europe in recent years, has taken part in the field of assessment instruments of ELT with steady development. The purpose of this study is to find out whether AES can supersede the writing assessment system that is used at The School of Foreign Languages at Zirve University. It is performed at The School of Foreign Languages at Zirve University. The participants of the study were a group of 50 students in level C that is the equivalent of B1. The beginning of the quantitative study includes the assessment of essays written by C level students at The School of Foreign Languages at Zirve University by three human raters and E-RaterÂŽ. After the study it was found that the writing assessment has been currently used at The School of Foreign Languages at Zirve University costs more energy, more time and it is more expensive. Thus, AES was suggested for use at The School of Foreign Languages at Zirve University, which has proven to be more practicable.
  • 2. GLOBAL CHALLENGE ISSN 1857-8934 International Journal of Linguistics, Literature and Translation, Vol: III, issue 1 Introduction Automated Essay Scoring (AES) has become increasingly popular in academic institutions as an alternative solution to assess the writing skills in a fast, efficient, and accurate way. Automated essay scoring (AES) engines “employ computer technology to evaluate and score written prose. While some forms of writing, such as poetry, may never be covered, we estimate that approximately 90 percent of required writing in a typical college classroom can be evaluated using AES. Many universities and colleges are implementing, “one of the most innovative instructional technologies in use in college classrooms today: an online automated essay-scoring service from ETS called Criterion®” (Williamson, 2003). As Shermis&Barrera (2002) defined Automated Essay Scoring (AES) as using computer technology to evaluate and score the essays. Burstein also emphasizes that AES is mainly used because of reliability, validity, and cost issues in writing assessment. It is taking a great attraction from public schools, colleges, assessment companies, researchers and educators (Shermin&Brurstein, 2003). Project Essay Grader (PEG™), E-RaterÂŽ, CriterionÂŽ, Intelligent Essay Assessor™ (IEA), MYAccessÂŽ, and Bayesian Essay Test Scoring System™ (BETSY), IntelliMetricÂŽ are recently used common automated essay scoring systems. Although AES has become a more mainstream method of assessing essays there are critics who voice their skepticism and continuously point out the fallacies of such a grading system. Many individuals believe that such software programs are not sufficiently sophisticated enough to imitate human intelligence thus leading to faults in the grade assessment of each paper (as cited in Williamson, 2003). Although criticisms continue to be voiced the broader assessment community suggests that automated scoring does have valid applications for the assessment of writing (as cited in Williamson, 2003). The assessment of the writing skills of individuals in the world of academia can be a long and tedious process that may in fact, disrupt the performance of teachers and their ability to teach effectively. As a result, artificial intelligence has entered into the field of education and English as a method to resolve this sometimes repetitive and time-consuming process. Such Artificial Intelligence (A.I.) based scoring systems claims to bring relative efficiency to the automation of scoring essay by simulating human intelligence and behavior in the electronic system of a computer such software attempts to relieve the burden of grading from educators. This helps to
  • 3. GLOBAL CHALLENGE ISSN 1857-8934 International Journal of Linguistics, Literature and Translation, Vol: III, issue 1 enhance human performance in more significant areas such as the educators effectiveness in teaching the material that will be assessed on the essays. Automated Scoring as a writing assessment has been highly controversial while critics focus on the movement from indirect to direct measurement of writing many institutions are adapting their assessment methods to involve some sort of scoring system processed by artificial intelligence (Simon & Bennett, 2007). Automated scoring technologies are finding wider acceptance among educators (Williamson, 2003). The software program IntelliMetricÂŽ, a scoring engine that assesses the skills of students on essays has been implemented by the Commonwealth of Pennsylvania. This engine used to score the writing of students on the state achievement examinations of Pennsylvania (as cited in Williamson, 2003). Numerous states have followed Pennsylvania as a model and have begun to implement their own Automated Essay Scoring software similar to that of IntelliMetricÂŽ (Williamson, 2003). Many individuals who support such dramatic changes in assessing essays believe that not only does such automated scoring software help relieve the grading burden from the educators but also adds a level of consistency that sometimes humans cannot achieve. Many factors wear on individuals who are assessing papers including stress and exhaustion. These two factors result in a form of inconsistency from the grader. These concerns revolve around the unreliability of the evaluation of essay answers and those individuals who are assessing them (Shermis& Barrera, 2004). The automated software is not affected by such factors that human graders are prone to suffer, which increases consistency in the grading assessment of each individual student. The primary concern in the case of automation is if the task itself is computable rather than if the software is able to perform to the standards of the academic institution implementing the system. In most cases, when it comes to the assessment of the essays and the writing skills of the students, the task itself is computable and thus the idea of implementing an automated essay scoring system is possible (Attali& Burstein,2006).Although such automated scoring technologies are finding wider acceptance among academic institutions, the drive to find new tools and more accurate AES software mustn’t diminish. CriterionÂŽ has been recently updated to a 4.0 Version, which, “marks a turning point in the history and practice of writing instruction and remediation” (Williamson, 2003). The evolution of such software will only help to advance the
  • 4. GLOBAL CHALLENGE ISSN 1857-8934 International Journal of Linguistics, Literature and Translation, Vol: III, issue 1 validity of such programs and solidify its place in the world of academia as a popular method for assessing the writing skills of students. METHODOLOGY Participants and procedure We have had 50 participants and their essays (49 were valid for grading by electronic rater). Three human raters and one electronic rater have graded the essays. The human raters have been chosen by their similarity (e. g. the same gender, the same teaching profession, working in the same intuition on same branch), so we have been able to control as much as possible other factors which could affect the final grades and their averages. The range of grades (marks) was from 1 to 6. Instruments We have used electronic rater, which has had the same range of grades as the human raters and the same criteria (please explain on which criteria is the electronic rater based and describe it in four-five sentences). RESULTS AND DISCUSSIONS First, we have calculated average values, standard deviations, minimal and maximal results and ranges of human raters’ and electronic rater’s grades. The results are shown in Table 1. Table 1. Descriptive statistical values Rater M SD min max range first human 3.98 .97 2 6 4 second human 3.78 1.03 2 6 4 third human 3.92 .95 2 6 4 electronic 3.78 .98 1 6 5 As we can see from Table 1, the highest mean value (arithmetic mean) is that of the first human rater (M = 3.98), and the lowest mean values are for second human and electronic rater (M = 3.78). The theoretical average value for the 6-point scale is 3.5, so these arithmetic means are close to it. The most variable are the grades of second human rater (SD = 1.03) and the least variable are those of the third human rater (SD = .95). The theoretical range for the 6-point scale
  • 5. GLOBAL CHALLENGE ISSN 1857-8934 International Journal of Linguistics, Literature and Translation, Vol: III, issue 1 is 5 and only the electronic rater has this range of results (human raters have smaller range i. e. that of five units).In order to test our first hypothesis, we have calculated Pearson’s product- moment coefficients of correlation and these data are provided in Table 2. Before that, we have had calculated the average results for human raters. The most important cell of this table is the field where average of human raters and electronic rater are crossing. The intercorrelations of human raters are giving information of their concordance i.e. similarity in grading process and results. They also can be useful for our conclusions about mildness or austerity of our human raters, but must be combined with values of their average results and with examination of shape of their distributions.If the correlation of average results for human raters and E-RaterÂŽ is high enough and statistically significant, then we can say that our first hypothesis is proven. Table 2. Correlation matrix between grades of human raters and electronic rater Raters first human rater second human rater third human rater average of human raters electronic rater first human rater 1 .583** .652** .857** .716** second human rater .583** 1 .641** .862** .712** third human rater .652** .641** 1 .879** .890** average of human raters .857** .862** .879** 1 .890** electronic rater .716** .712** .890** .890** 1 *correlation coefficients are significant at level .05 ** correlation coefficients are significant at level .01 As we can see from Table 2, all coefficients are statistically significant at level .01. So, the results of all human raters are in statistically significant correlations with each other. First human rater's results are in higher correlation with those of the third human rater (r = .652, p < .01) than with those of the second human rater (r = .583, p < .01). Second human rater's results are also in higher correlation with the grades of the third human rater (r = .641, p < .01). The average estimates for human raters taken alltogether are in the highest correlation with the estimates of the third human rater (r = .879, p < .01).
  • 6. GLOBAL CHALLENGE ISSN 1857-8934 International Journal of Linguistics, Literature and Translation, Vol: III, issue 1 The grades from the third human rater are the closest grades of all raters to estimates which are provided by electronic rater (r = .890, p < .01). Our coefficient, which is the most relevant indicator of acceptance of our first hypothesis is very high and statistically significant (r = .890, p < .01). Therefore, we can conclude that electronic rater's grades are significantly correlated with average results of all human raters. The common variance of these two ''measures'', i.e. the coefficient of determination is r2 = .7921. It means that these measures share 79.21% of variability. Hence, we have proved our first hypothesis. Our second hypothesis is defined in terms of differences between average human rater's estimates and those which were provided by electronic rater. Because of that, and because our variables are dependent, we have conducted paired-samples t-test. If its results show that average values are not statistically different one from each other, we can accept our null-hypothesis. The results of t-test are below in Table 3. In the first two columns of table are arithmetic means (M) and standard devioations (SD). In the next four columns are: standard errors of arithmetic means (SEM), difference of means (Mdiff), standard deviation of differences (SDdiff) and standard error of mean differences (SEMdiff). In last three columns are: the value of t-statistic, degrees of freedom (df) and significance value (p-value). Table 3. The results of paired-samples t-test Pair M SD SEM Mdiff. SDdiff. SEMdiff. t df p Electronic rater 3.78 .98 .14 -.11 .45 .06 -1.803 48 .078 Average of human raters 3.89 .85 .12 As we can see from Table 3, the mean value for the variable ''average of human raters'' is a bit higher than that of electronic rater (3.89 vs. 3.78), but this difference is not statistically significant (t = -1.803, df = 58, p > .05). We can conclude that average values of this raters do not differ enough, therefore, electronic rater's average of assessments is similar to average value of human raters' assessments. Hence, we have confirmed our second hypothesis. Our third hypothesis is based on discriminativity, as one of four metric characteristics (the other of them are: probability, validity and objectivity). So, we have checked if the electronic rater
  • 7. GLOBAL CHALLENGE ISSN 1857-8934 International Journal of Linguistics, Literature and Translation, Vol: III, issue 1 evaluates essays in a manner that can detect little differences between them. It is not recommended that all students (essays) have similar marks (grades), nor small, nor high marks. The most plausable option is that our rater provides more average grades than small or high grades. We expect approximately the same number of very low and very high marks (estimates, grades), but their number must be smaller than the number of average marks (e.g. 3 and 4 in this case).In order to test if the distribution of estimates for electronic rater does not differ from the normal curve, we have conducted Kolmogorov-Smirnov test (see Table 4). Table 4. The results of normality tests for electronic rater's grades Test of normality Statistic df p Kolmogorov-Smirnov .213 49 .000 Shapiro-Wilk .891 49 .000 As we can see, the value of Kolmogorov Smirnov test is statistically significant (K-S z = .213, df = 49, p < .001). We have also applied Shapiro-Wilk's test because our sample is small and the results are consistent with those from K-S test (Sh-W statistic = .891, df = 49, p < .001). Hence, our distribution statistically differs from the normal curve. Below is Figure 1, where we graphed it and it helped us to conclude about its asymmetry. 1,00 2,00 3,00 4,00 5,00 6,00 elr 0 5 10 15 20 Frequency Mean = 3,7755 Std. Dev. = 0,98457 N = 49 Figure 1. Distribution of electronic rater's grades Examining the Figure 1, we can see that our distribution has negative asymmety, i. e. the number of low grades is quite smaller than the number of high grades. So, we can say two things: the
  • 8. GLOBAL CHALLENGE ISSN 1857-8934 International Journal of Linguistics, Literature and Translation, Vol: III, issue 1 electronic rater is mild estimator of essays or these distribution of grades could be affected by the fact that students from our sample are group which is above average in writing essays. Hence, our last hypothesis has been rejected. We have also examined the shape of distribution of human raters' average grades. The results are displayed in Table 5. Table 5. The results of normality tests for human raters' average grades Test of normality Statistic df p Kolmogorov-Smirnov .163 49 .002 Shapiro-Wilk .949 49 .034 In this case, the situation is similar to previous one. Kolmogorov-Simirnov Z statistic is statistically significant (K-S z = .163, df = 49, p < .01) and Shapiro-Wilks's statistic is also significant (Sh-W = .949, df = 49, p < .05). In Figure 2 is the histogram for this variable. 2,50 3,00 3,50 4,00 4,50 5,00 5,50 6,00 hrAVE 0 2 4 6 8 10 Frequency Mean = 3,8912 Std. Dev. = 0,85089 N = 49 Figure 2. Figure 1. Distruibution of human raters' average grades From Figure 2, we can conclude that this distribution has positive asymmetry, i. e. there are more low grades than high grades. So, human raters were more severe than electronic raters. To conclude, our electronic rater provides similar results to those of human raters (taken alltogether, with average measure of their estimates). However, the distribution of its grades shows us that it gives to students a bit higher grades than they deserve (i.e. electronic rater, to a some degree, overestimates the quality of their essays). Conclusion There are some requirements to use AES at universities to assess writing. These are seminars
  • 9. GLOBAL CHALLENGE ISSN 1857-8934 International Journal of Linguistics, Literature and Translation, Vol: III, issue 1 about how to use AES, access the Internet, personal computers, and a computer laboratory at schools. Carefully planned laboratory times are arranged if the system is used in computer laboratories at schools because timing may be demotivating for the students if they are forced to do writing activities in a limited time. The second issue about AES is that institutions or universities have to buy the system if they want to use it affectively. In this study, CriterionÂŽ was used as an AES tool to assess the students’ writing. It is commonly used in America and Europe. It enables students to brainstorm about a given topic, share their ideas, outline their ideas, write and get feedback from their teachers. Upon finishing writing their final draft, they upload and get their results in detail in a report prepared by the system. The final process takes a few minutes, so it is not time-consuming, but timesaving. It is beneficial for both students and teachers. For teachers, it allows teachers to follow and monitor their students’ writing process, and give individual and detailed feedback by using the report given by the system. For students, they can get immediate feedback about their writing in detail, so they can see their improvements in their writing by matching teacher and system feedback and also improve their writing by analysing teachers’ feedback for their first draft and system feedback for their final draft. Also, the system minimizes the time that is required for assessing students’ writing. Three human raters in the study assessed and evaluated fifty pieces of writings in twenty days, but CriterionÂŽ did the same job in one hour. To conclude, if the requirements of the system are met, how the system works is understood, AES can be used effectively at university level. References: Anson, C.M. (2003). Responding to and assessing student writing: The uses and limits of technology. In P. Takayoshi & B. Huot (Eds.), Teaching writing with computers: An introduction (pp. 234–245). New York: Houghton MiĂ°in Company. Attali, Y., Bridgeman, B., & Trapani, C. (2010).Performance of a generic approach in automated essay scoring.Journal of Technology. Burstein, J. (2003). The E-RaterÂŽ scoring engine: Automated essay scoring with natural language processing. In M. D. Shermis& J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective(pp. 113–122). Mahwah, NJ: Lawrence Erlbaum Associates, Inc. Burstein, J. &Marcu, D. (2000). Benefits of modularity in an Automated Essay Scoring System (ERIC reproduction service no TM 032 010). Burstein, J., Chodorow, M., & Leacock, C. (2003, August).CriterionÂŽ: Online essay evaluation: an application for automated evaluation of student essays. Proceedings of the 15th
  • 10. GLOBAL CHALLENGE ISSN 1857-8934 International Journal of Linguistics, Literature and Translation, Vol: III, issue 1 Annual Conference on Innovative Applications of Artificial Intelligence, Acapulco, Mexico. Burstein, J., Kukich, K., Wolff, S., Lu, C., &Chodorow, M. (1998, April). Computer analysis of essays.Proceedings of the NCME Symposium on Automated Scoring, Montreal, Canada. Burstein, J., Leacock, C., & Swartz, R. (2001).Automated evaluation of essays and short answers. Proceedings of the 5th International Computer Assisted Assessment Conference (CAA 01), Loughborough University. Carlson, J. G., Bridgeman, B., Camp, R. and Waanders, J. 1985). Carlson, S. B., Bridgeman, B. (1986). Testing ESL student writers. In K. L. Greenberg, H. S. Wiener, & R. A Donovan (Eds.), Writing assessment: Issues and strategies (pp. 126-152). NY: Longman. China Papers.(n.d.). Retrieved September 16, 2012 from, http://www.chinapapers.com. Chung, K. W. K. & O’Neil, H. F. (1997). Methodological approaches to online scoring of essays (ERIC reproduction service no ED 418 101). Cooper, C. R., & Odell, L. (Eds.). (1977). Evaluating writing: Describing, measuring, judging. Urbana, IL: NCTE. Diederich, P. B., French, J. W., & Carlton, S. T. (1961). Factors in judgments of writing ability (ETS research bulletin RB-61-15). Princeton, NJ: Educational Testing Service. Dikli, S. (2006).An Overview of Automated Scoring of Essays.Journal of Technology, Learning, and Assessment, 5(1). Retrieved July 15, 2012, from http://www.jtla.org. Educational Testing Service (ETS).(n.d.). Retrieved on August 17, 2012, from http://www.ets.org. Elliot, S. (2003). IntelliMetricÂŽ: from here to validity. In Mark D. Shermis and Jill C. Burstein (Eds.). Automated essay scoring: a cross disciplinary approach. Mahwah, NJ: Lawrence Erlbaum Associates. Greer, G. (2002). Are computer scores on essays the same as essay scores from human experts? In Vantage Learning, Establishing WritePlacer Validity: A summary of studies (pp. 10– 11). (RB-781). Yardley, PA: Author. Gregory, K. (1991). More than a decade’s highlight?The holistic scoring consensus and the need for change. (ERIC Document Reproduction Service No. ED 328 594). Hamp-Lyons, L. (1990). Basic concepts. In L. Hamp-Lyons (Ed.), Assessing second language writing in academic contexts (pp. 5-15). Norwood, NJ: Ablex. Herrington, A., & Moran, C. (2001). What happens when machines read our students’ writing? College English, 63(4), 480–499. Hughes, A. (1989). Testing for language teachers. NY: Cambridge University Press. Huot, B. (1990). Reliability, validity and holistic scoring: What we know and what we need to know. College Composition and Communication, 41(2), 201-213. Huot, B. (2002). (Re)Articulating writing assessment for teaching and learning. Logan, UT: Utah State University Press. Huot, B., & Neal, M. (2006). Writing assessment: A techno-history. In Charles MacArthur, Steve Graham, & Jill Fitzgerald (Eds.), Handbook of writing research (pp. 417–432). New York: Guildford Publications. in Education: Principles, Policy and Practice, 15, 91– 105. Jacobs, H. L., Zinkgraf, S. A., Wormuth, D. R., Hartfiel, V. F., &Hughey, J. B. (1981). Testing ESL composition: A practical approach. Rowley, MA: Newbury House.
  • 11. GLOBAL CHALLENGE ISSN 1857-8934 International Journal of Linguistics, Literature and Translation, Vol: III, issue 1 Klobucar, A., Deane, P., Elliot, N., Ramineni, C., Deess, P., &Rudniy, A. (2012). Automated essay scoring and the search for valid writing assessment. In: C. Bazerman, C. Dean, J. Early, K. Lunsford, S. Null, P. Rogers, & A. Stansell (Eds.), International advances in writing research: Cultures, places, measures (p. 103-119). Fort Collins, Colorado/Anderson, SC: WAC Clearinghouse/Parlor Press http://wac.colostate.edu/books/wrab2011/chapter6.pdf. Kukich, K. (2000). Beyond Automated Essay Scoring. IEEE Intelligent Systems, 15(5), 22–27. Learning & Assessment, 10. Retrieved from http://ejournals.bc.edu/ojs/index.php/jtla/article/view/1603/1455 Lloyd-Jones, R. (1987). Test of writing ability. In G. Tate (Ed.), Teaching composition (pp. 155- 176). Texas Christian University Press.Mance. San Francisco, CA: Jossey-Bass. Mann, R. (1988). Measuring writing competency. (ERIC Document Reproduction Service No. ED 295 695). Mitchell, K. & Anderson, J. (1986).Reliability of holistic scoring for the MCAT essay. Educational and Psychological Measurement, 46, 771-775. Nivens-Bower, C. (2002). Faculty-WritePlacer Plus score comparisons. In Vantage Learning, Establishing WritePlacer Validity: A summary of studies (p. 12). (RB-781). Yardley, PA: Author. Norusis, M. J. (2004). SPSS 12.0 guide to data analysis. Upper Saddle River, NJ: Prentice Hall.Admission test scores to writing performance of native and nonnative speakers of English TOEFL Research Report #19). Princeton, NJ: Educa- tional Testing Service. Page, E. B. (1994). Computer Grading of Student Prose, Using Modern Concepts and Software, Journal of Experimental Education, 62, 127–142. Page, E. B. (2003). Project Essay Grade: PEG. In M. D. Shermis& J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 43–54). Mahwah, NJ: Lawrence Erlbaum Associates. Page, E. B. & Petersen, N. S. (1995). The computer moves into essay grading: Updating the ancient test. Phi Delta Kappan, 76, 561–565. Page, E.B., Poggio, J.P., & Keith, T.Z. (1997, March). Computer analysis of student essays: Finding trait diĂ°erences in student profile. Paper presented at the Annual Meeting of the American Educational Research Association, Chicago, IL. (ERIC Document Reproduction Service No. ED411316). Perkins, K. (1983). On the use of composition scoring techniques, objective measure, and objective tests to evaluate ESL writing ability. TESOL Quarterly, 17(4), 651-671. Powers, D.E., Burstein, J.C., Fowles, M.E., &Kukich, K. (2001, March). Stumping E-RaterÂŽ: Challenging the validity of automated essay scoring. (GRE Board Research Report No. 98-08bP). Princeton, NJ: Educational Testing Service. Retrieved November 15, 2012, from http://www.ets.org/Media/Research/pdf/RR-01-03-Powers.pdf. Rudner, L. & Gagne, P. (2001). An overview of three approaches to scoring written essays by computer (ERIC Digest number ED 458 290). Rudner, L.M., Garcia, V., & Welch, C. (2006). An evaluation of the IntelliMetricÂŽTM essay scoring system. Journal of Technology, Learning, and Assessment, 4(4). Retrieved August 6, 2012, from http://escholarship.bc.edu/jtla/vol4/4/ Shermis, M. & Barrera, F. (2002). Exit assessments: Evaluating writing ability through Automated Essay Scoring (ERIC document reproduction service no ED 464 950).
  • 12. GLOBAL CHALLENGE ISSN 1857-8934 International Journal of Linguistics, Literature and Translation, Vol: III, issue 1 Shermis, M. D., Raymat, M. V., & Barrera, F. (2003). Assessing writing through the curriculum with Automated Essay Scoring (ERIC document reproduction service no ED 477 929). Shermis, M. D., Shneyderman, A., &Attali, Y. (2008). How important is content in the ratings of essay assessments? Assessment Shermis, M.D., Koch, C.M., Page, E. B., Keith, T.Z., & Harrington, S. (2002). Trait ratings for automated essay grading. Educational and Psychological Measurement, 62(1), 5–18. Shermis, M.D., Koch, C.M., Page, E. B., Keith, T.Z., & Harrington, S. (2002). Trait ratings for automated essay grading. Educational and Psychological Measurement, 62(1), 5–18. Technology and Teacher Education, 8 (4). Retrieved from http://www.citejournal.org/vol8/iss4/languagearts/article1.cfm. Vantage Learning (2001a). Applying IntelliMetricÂŽTM to the scoring of entry- level college student essays. (RB-539). Retrieved September 1, 2012, from http://www.vantagelearning.com/content_pages/research.html. Vantage Learning (2002). A study of expert scoring, standard human scoring and IntelliMetricÂŽ scoring accuracy for state-wide eighth grade writing responses. (RB-726). Retrieved September 10, 2012, from http://www.vantagelearning. com/content_pages/research.html. Vantage Learning. (2000a). A study of expert scoring and IntelliMetricÂŽ scoring accuracy for dimensional scoring of Grade 11 student writing responses (RB-397). Newtown, PA: Vantage Learning. Vantage Learning.(2000b). A true score study of IntelliMetricÂŽ accuracy for holistic and dimensional scoring of college entry-level writing program (RB-407). Newtown, PA:Vantage Learning. Vantage Learning. (2001b). Applying IntelliMetricÂŽ Technology to the scoring of 3rd and 8th grade standardized writing assessments (RB-524). Newtown, PA: Vantage Learning. Vantage Learning. (2002). A study of expert scoring, standard human scoring and IntelliMetricÂŽ scoring accuracy for state-wide eighth grade writing responses (RB-726). Newtown, PA: Vantage Learning. Vantage Learning.(2003). A comparison of IntelliMetricÂŽTM and expert scoring for the evaluation of literature essays. (RB-793). Retrieved December 2, 2012, from http://www.vantagelearning. com/content_pages/research.html. Vantage Learning. (2003a). Assessing the accuracy of IntelliMetricÂŽ for scoring a district-wide writing assessment (RB-806). Newtown, PA: Vantage Learning. Vantage Learning. (2003b). How does IntelliMetricÂŽ score essay responses? (RB-929). Newtown, PA: Vantage Learning. Vantage Learning.(2003c). A true score study of 11th grade student writing responses using IntelliMetricÂŽ Version 9.0 (RB-786). Newtown, PA: Vantage Learning. Wang, J., & Brown, M. S. (2008). Automated essay scoring versus human scoring: A correlational study. Wang, Jinhao, & Brown, Michelle Stallone. (2007). Automated essay scoring versus human scoring: A comparative study. The Journal of Technology. Warschauer, M. & Ware, P. (2006). Automated writing evaluation: Defining the classroom research agenda. Language Teaching Research, 10, 2, 1–24. Retrieved June 18, 2012, from http://www.gse.uci.edu/faculty/markw/awe.pdf.
  • 13. GLOBAL CHALLENGE ISSN 1857-8934 International Journal of Linguistics, Literature and Translation, Vol: III, issue 1 White, E. M. (1994). Teaching and assessing writing: Recent advances in understanding, evaluating, and improving student performance. Yang, Y., Buckendahl, C.W., Juszkiewicz, P.J., &Bhola, D.S. (2002). A review of strategies for validating computer automated scoring. Applied Measurement in Education, 15(4), 391–412.