A Framework Of A Computer-Based Essay Marking System For ESL Writing.

A Framework of a Computer-Based Essay Marking System for ESL Writing
Saadiyah Darus
adi@pkrisc.cc.ukm.my
Abdullah Mohd Zin
amz@ftsm.ukm.my
Universiti Kebangsaan Malaysia
Abstract
The main objective of this research is to develop the framework of a Computer-Based Essay
Marking (CBEM) system for writing in ESL (English as a Second Language) at Institutions
of Higher Learning (IHLs) in Malaysia. An initial study shows that a number of CBEM
systems are available. In order to determine whether they are suitable for marking students’
writing in ESL, the study investigated lecturers’ and students’ expectations of the CBEM
systems using questionnaire surveys. The study also uses Criterion to mark students’ essays.
The results of this study suggest that existing CBEM systems are not suitable for marking
ESL writing at IHLs in Malaysia. This is due to the fact that lecturers and students have
certain expectations these CBEM systems failed to meet. In this paper, we will describe the
proposed framework of a CBEM system for ESL writing at IHLs in Malaysia. Since this
framework is intended to be used by the software designer in designing and implementing the
system, we will describe the framework in the form of the software requirements.
Keywords: second language writing, software requirement, assessment
Introduction
Essay writing occupies a central place at IHLs because it serves two main purposes: as a tool
of assessment and as an avenue to learning (Hounsell 1997). However, essay marking poses a
real challenge to lecturers as there are numerous aspects that can be evaluated, and lecturers
encounter several problems when dealing with written assignments; for example, there are
difficulties since there are deadlines to meet; stressful since lecturers need to give prompt
feedback to students, and time-consuming. With an increase in enrollment, other problems
also surface: the number of essays assigned may decrease; frequency of feedback given to
students may decrease; and in the end, less feedback is given. Apart from these issues, rater
or marker (as it is known in Malaysia) reliability is particularly important because differences
in judgment exist. Inter-rater reliability problems exist when differences in judgment exist
between different raters or markers, while intra-rater reliability problems may arise when the
same rater marks different papers using different judgments. According to Gamaroff (2000:
32), “Interrater reliability consists of two major kinds of judgments: (a) the order of priority
for individual raters of performance criteria (such as grammatical accuracy, content relevance
and spelling), and (b) the agreement between raters on the ratings that should be awarded if
or when agreement is reached on what importance to attach to different criteria”.
AsiaCall Online Journal (ISSN 1936-9859) Vol. 2 No. 1 November 2007 Saadiyah Darus, Abdullah Mohd Zin
1

Based on these issues, it is necessary to develop a CBEM system where marking can be
carried out automatically, feedback can be given to students on time, and the marks given are
more reliable. The research reported in this paper is part of a wider research project involving
developing the software for marking essays in ESL for proficiency level at IHLs in Malaysia.
The objective of the initial phase of the research is to develop the framework of a CBEM
system for this purpose. The research questions formulated for this study are as follows:
1. What are the expectations of the lecturers at IHLs in Malaysia towards the CBEM
system?
2. What are the expectations of students at IHLs towards the CBEM system? To what
extent are available CBEM systems useful in marking essays and providing feedback
to students?
3. What are the dimensions of writing that must be incorporated into the framework for a
CBEM system for marking students’ essays at proficiency level at IHLs in Malaysia?
Can these dimensions be refined further into attributes?
4. How do we measure the level of achievement for each of these dimensions of writing
quantitatively?
Literature review
A number of CBEM systems are available; for example, Methodical Assessment of Reports
by Computer (Marshall and Baron 1987), Markin 32 (Holmes 1996), Project Essay Grader
(Page, Fisher and Fisher 1968; Hiller 1998; Page and Petersen 1995; Shermis et al.),
Intelligent Essay Assessor (Landauer, Foltz and Laham 1998), and Criterion (Burstein,
Chodorow and Leacock 2003). However, these CBEM systems are not developed for the
Malaysian educational environment. Therefore, there is a need to investigate whether these
CBEM systems are suitable for marking essays in ESL for Malaysian IHLs.
A series of studies was carried out to investigate the suitability of these systems. One such
study investigated the lecturers’ expectations of a CBEM system (Darus and Stapa 2001). In
this study, the following issues were addressed: (a) Are the lecturers aware of the availability
of CBEM systems? (b) What are the lecturers’ opinions about a CBEM system? (c) What are
the expected functions that lecturers look for in a CBEM system?
Another related study investigated the students’ expectations of a CBEM system (Darus,
Hussin and Stapa 2001). The issues addressed in this study were as follows: (a) In what areas
of essay writing would the students prefer to receive feedback (b) Which level of feedback is
most useful to students (c) Are available CBEM systems able to provide feedback in these
areas of essay-writing?
From both of these investigations, we conclude that currently available CBEM systems are
not suitable for the Malaysian educational environment since they do not address all of the
needs and expectations of the lecturers and students. However, one CBEM system, Criterion,
comes quite close to their expectations.
The study by Darus, Hussin and Stapa (2004) determined whether feedback given by
Criterion is useful to Malaysian students. The issues addressed in this study were as follows:
(a) In which areas of essay-writing do students find feedback provided by Criterion useful to
2

them? (b) Is feedback given by Criterion useful in revising their essays? (c) Is feedback given
by Criterion more informative than feedback normally given by their lecturers?
The results of these three studies further strengthened the conclusion that there is a need to
develop a new CBEM system for ESL writing in Malaysian IHLs. In developing the
framework of the new CBEM system, we will have to take into consideration the functions
expected by the lecturers and students.
The next part of the research is concerned with development of the framework of a CBEM
system that is suitable to be used for writing courses at IHLs in Malaysia. This part of the
study involved analysis of scoring rubrics for writing courses at proficiency level (Darus
2006). The issues addressed in this study were as follows: (a) What are the dimensions of
writing in ESL that are frequently assessed at proficiency level at various IHLs? (b) Can these
dimensions be refined further into attributes?
Textual analysis of students’ essays followed next. This includes readability, concordance,
error, and coherence analysis. This part of the study attempted to answer the following
research issues: What are the measurable characteristics of Malaysian students’ writing? For
readability analysis, we used the readability tools provided by Microsoft Word for Windows.
We also analyzed the types of sentences in the students’ essays manually; for example,
percentage of simple, basic, loose, periodic and combination sentences. Students’ essays were
analyzed for function words and for different number of words used by using a concordance.
Analysis of errors is carried out based on the error classification scheme developed by Ho
Peng (1974). For coherence analysis, we employed TSA (Topical Structure Analysis)
developed by Lautamatti (1987) and the coding guidelines for TSA as proposed by Schneider
and Connor (1991). The framework was then formalized based on the results of these five
consecutive studies.
Background of the study
Eighty Malaysian university lecturers participated in the study investigating lecturers’
expectations of a CBEM system (Darus and Stapa 2001). The lecturers were from Universiti
Kebangsaan Malaysia (UKM), Universiti Putra Malaysia (UPM), Universiti Malaya (UM),
Universiti Malaysia Sarawak (UNIMAS) and Universiti Sains Malaysia (USM). The
instrument for this survey was a questionnaire, which consisted of 18 questions. The results
of the study show that quite a substantial number of lecturers (40.0%) have heard about
CBEM systems. Lecturers, however, had mixed opinions as to whether computers were able
to mark essays effectively. Although most of the lecturers (75.0%) did not believe that
computers can presently mark essays effectively, 67.0 % of them believed that computers
would be beneficial to them if computers could mark essays. From the lecturers’ point of
view, the most important functions that need to be supported by a CBEM system are the
ability to indicate errors, mark syntax, provide error statistics, produce a letter grade, and
mark organization of ideas.
A total number of nine-hundred and eighty-one students participated in the study
investigating students’ expectations of a CBEM system (Darus, Hussin and Stapa 2001).
These students sat for proficiency courses at two universities; namely, UKM and Open
University Malaysia (OUM). The instrument used was a questionnaire consisting of 12
questions. The results show that the three most important areas of feedback that students
would expect to receive in essay writing related to errors in the essay topic (74.1%), errors
3

in English (53.3%) and organization of ideas (50.1%). A plausible reason for this
expectation is that these were the most difficult areas in essay writing for ESL learners in
Malaysia. Most students particularly would like to know why their answers are correct or
incorrect. This implies that the students were more interested in the higher level of
diagnostic feedback as this type of feedback gives more insight into the strengths and
weaknesses in their essays. The most desirable function from the students’ point of view is
indicating errors in essays. The next most desirable functions in descending order are as
follows: marking related to errors in English, marking related to errors in the organization of
ideas, style of writing, knowledge content, creativity, rhetorical structure, syntax, and
coherence of text.
A study with Criterion (Darus, Hussin and Stapa 2004) was carried out with students’ essays.
The sample consisted of essays written by seventy-one second year B.A. ELS (English
Language Studies) students at UKM. (46 of these 71 students completed all requirements for
the study, and our final sample size thus became 46.) The students were asked to write an
essay consisting of approximately 500 words in a 4-5 paragraph format using the following
prompt:
Many adults become upset when young people break with the
traditions of the past. Do you think that these adults are justified in
reacting this way? Why or why not? Support your position with
evidence from your own experience or the experience of people you
know.
The instrument for this study was a questionnaire that consisted of 13 questions. The students
were given two hours to complete their essays. At the end of the 2-hour session, students
submitted their essays written on paper. These essays were then transferred into computer
readable format as they were written. No attempt was made to correct these essays. These
essays were then submitted to Criterion and the report and diagnostic feedback were printed.
After a lapse of 3 weeks, essays were given back to students in the form of soft copy. The
printed report and diagnostic report were also returned to them. The students were required to
revise their essays by making use of the printed report as well as the diagnostic report. The
revised essays were saved in the same diskettes clearly identified by their matriculation
number. After revising their essays, the students answered the questionnaire, and these
diskettes were collected from them. These revised essays were re-submitted to Criterion for
marking.
The results of the study show that the most useful area of feedback from Criterion is feedback
on errors in their essay (16.0 %). The next useful feedback area is topic or knowledge content
(14.0%). The less useful areas of feedback are: syntax (13.0%), style of writing and
coherence of text (12.0%), rhetorical structure, organization of ideas and creativity (11.0%).
31 students (67.4%) found that Criterion feedback was only useful to some extent in revising
their essay. Eleven students (23.9%) found Criterion feedback to be very useful, and three
students (6.5%) found that it was not very useful. Fifteen students (32.6%) found that
Criterion feedback was more informative than feedback normally given by their lecturers
while thirty students (65.2%) did not find that it was so. The results also show that although
students faithfully revised their essays based on the feedback given by Criterion, the revisions
made were not able to increase the scores of revised essays significantly for the majority of
the students.
4

The next phase of the study involves development of the framework. In developing the
framework, the study has analyzed several scoring rubrics for writing proficiency; namely,
scoring rubrics for College Board SAT (2001), International English Language Testing
System (IELTS 2005), Teaching of English as a Foreign Language (TOEFL 2001), ESL
Composition Profile (Jacobs et al. 1985), and writing proficiency courses at UKM and OUM.
Most of these scoring rubrics are given in holistic format. For a human marker, holistic
marking is indeed very practical. However, Cohen (1994: 316) has noted that holistic scoring
is not suitable for marking second language writing because, “…the rating scale may
confound writing ability and language proficiency”. Therefore, an analytic marking scheme is
much more welcomed for marking ESL writing. In addition, for a computer to be able to
understand the holistic scoring, it must be converted into quantitative representation.
The process of identifying dimensions of writing from the scoring rubrics is carried out by
analyzing carefully each of the statements in the rubrics against Cohen’s (1994: 307)
dimensions of ESL writing at proficiency level. These are: (a) content – depth and breadth of
coverage; (b) rhetorical structure – clarity and unity of the thesis; (c) organization – sense of
pattern for the development of ideas; (d) register – appropriateness of level of formality; (e)
style – sense of control and grace; (f) economy – efficiency of language use; (g) accuracy of
meaning – selection and use of vocabulary; (h) appropriateness of language conventions –
grammar, spelling, punctuation; (i) reader’s understanding – inclusion of sufficient
information to allow meaning to be conveyed; and (j) reader’s acceptance – efforts made in
the text to solicit the reader’s agreement, if so desired. In our analysis of scoring rubrics, we
have identified that there are six dimensions of writing at proficiency level. These are
content, rhetorical structure, organization, economy, accuracy of meaning, and
appropriateness of language use. The dimensions of writing are further broken up into
attributes so that the marking process by computer is made possible. We have also identified
the attributes for each of these dimensions. A more detailed description of this study is
available in Darus’s (2006) study.
In order to propose a technique to measure each of these attributes, the research needs to
proceed further. This involves obtaining some empirical data about Malaysian students’
writing that could be used to indicate the students’ achievement or performance level for each
of the attributes. In order to obtain empirical data, we carried out textual analysis of students’
essays. The sample for the study is the same sample as the one used in Darus, Hussin and
Stapa (2004). It consisted of seventy-one essays written by second year B.A. ELS (English
Language Studies) students at UKM for the same prompt.
The results of readability analysis show that majority of the students essays (62.0%) reached
a minimum value of 60.0 for Flesch Reading Ease, while 27 students’ essays (38.0%) fell
below this level. The Flesch Reading Ease for 12 students’ essays (16.9%) is more than 70.0.
For usage of simple sentences, 16.0% of essays contain more than 75.0% of simple
sentences. 20.0% of essays contain 65-74% of simple sentences, 17% of essays contain
55-64% of simple sentences, 20% of essays contain 45-54% of simple sentences, 13% of
essays contain 35-44% of simple sentences, and 14% of essays contain less than 35% of
simple sentences.
Each of the seventy-one essays was analyzed for basic, loose, periodic and combination
sentences. Thus, each of the essays has four values corresponding to the percentage of basic,
5

loose, periodic and combination sentences. The essays that students submitted can be
categorized into no variety, extremely limited variety, very limited variety, limited variety,
moderate variety, high variety, and very high variety of sentence structure. 11.3% of essays
submitted have a very high variety of sentence structure, 33.8% of essays have a high variety
of sentence structure, 39.4% have moderate variety, 11.3% have low variety, and 4.2% have
very low variety.
A concordance was used to analyze students’ essays for function words and for different
number of words. It was found that these students, being ESL learners, overuse and under-use
some function words. The most prominent function words that were overused are ‘their’,
‘our’, ‘when’, ‘so’, ‘all’, and ‘no’. Function words that were underused were ‘be’, ‘have’, ‘it’,
‘do’, ‘on’, and ‘at’. Most students’ essays (41) used 30-40 function words. 15 essays used
20-30 function words, 12 essays used 40-50 function words, and 3 essays used 10-20
function words. For percentage of number of different words/total number of words in essay,
the majority of essays (47) show a percentage of 40.0-49.9. 17 essays show a percentage of
30.0-39.9. 5 essays show a percentage of 50.0-59.9 while 2 essays show a percentage of
60.0-69.9.
Analysis of errors was carried out based on the error classification scheme developed by Ho
Peng (1974). The most frequent types of errors that the students made were lexical, tenses,
and errors in mechanics. The errors of average value were word order, use of pronoun,
agreement and article. Other less frequent types of errors that students made were negative
construction, incomplete sentences, typical use of Malaysian words, possessive and
attributive structures, miscellaneous unclassifiable errors, and infinitive and gerundive
constructions. The majority of the students (86.0%) exhibit between 6 and 9 types of errors,
11.3% of students display between 4 and 5 types of errors, while 2.8% of students display
between 11 and 12 types of errors.
For coherence analysis, a majority of the essays (24) show 65.0-74.9% of SP (Sequential
progression) and ExP (Extended parallel Progression). 19 essays show 55.0-64.9% of SP and
ExP, 16 essays show more than 75.0% of SP and ExP, 10 sows 45.0-54.9% of SP and ExP,
while 2 essays show 35.0-44.9% SP and ExP.
Based on the results of all these studies, we will describe the system requirements for a
CBEM system for marking essays in ESL for proficiency level at IHLs in Malaysia.
System requirements
Required inputs to the system
The main role of a CBEM system is to mark students’ essays one by one based on a given
marking scheme. In consequence, the following aspects are addressed.
1. What is the format of the students’ essays?
2. What is the expected length of the essays?
3. What are the types of essays that need to be marked?
Format of students’ essays
In order to submit an essay for computerized marking, it must be in a computer readable
form. The most common method in preparing a computer readable document is to use a text
6

editor or a word processor. In the survey conducted, most of the students used word
processors in preparing their computer readable essay. Most word processors allow users to
save a document either in text format or a rich text format.
Some CBEM systems, for example Criterion, insist that all essays submitted must be in text
format. This is considered an unnecessary constraint on the user, as documents that are saved
in rich text format must be reformatted into a document in text format. Thus, requirement for
the system can be stated as follows:
Non-Functional Requirement 1.1: The CBEM system must be able to accept essays in text or
rich text format.
Expected length of essays
The results of the study carried out by (Darus and Stapa 2001) show that most of the lecturers
assigned essays up to 1,000 words. Some lecturers assigned essays of more than 3,000 words
in length. Thus, we can say that
Non-Functional Requirement 1.2: The CBEM system must be able to accept essays up to
5,000 words.
Types of essays
There are two types of essays: close-ended and open-ended essays. The result of the survey
(Darus and Stapa 2001) shows that 40.0 percent of lecturers assigned open-ended essays most
of the time while 10.0 percent gave close-ended essays. Since both types of essays are
important, we can state the following as another requirement of the system:
Non-Functional Requirement 1.3: The system must be able to mark both open-ended and
close-ended essays.
Required outputs of the system
Results from these studies (Darus and Stapa 2001) and (Darus, Hussin and Stapa 2001) show
that lecturers as well as students expect the following output:
1. Grade or score obtained
2. Error statistics
3. Various types of common feedback
4. Varying levels of feedback
5. Individualized feedback
Types of grade
A study of the scoring rubrics shows that four types of grades are used:
1. Letter grade. For example, Written Communication at UKM uses the following letter
grades: A (15-20 marks), B+/B/B- (13-14.9 marks), C+/C/C- (11-12.9), D+/D (9-10.9)
and E (0-8.9).
2. Numerical value. The College Board SAT uses 1-6 as numerical values.
7

3. Marks – for example ESL Composition Profile uses 18-20 (very good to excellent),
14-17 (average to good), 10-13 (poor to fair) and 7-9 (very poor) for organization
4. Range of performance – for example IELTS uses ten levels of competencies, namely
Expert user, Very good user, Good user, Competent user, Modest user, Limited user,
Extremely limited user, Intermittent user, Non-user and Did not attempt the test.
The use of the different types of grades suggests that these grades are equally acceptable,
depending on what the assessor considers as satisfactory in indicating the students’
achievement level. Thus, we can state our next requirement as follows:
Functional Requirement 2.1: The CBEM system must be able to indicate the grade obtained
in an essay by a letter grade, numerical value, marks or range of performance as indicated by
the marking scheme used.
Error statistics
As indicated by this study (Darus and Stapa 2001), lecturers consider error statistics to be
important. The reason is that lecturers can make use of these statistics to prepare further
lessons in order to improve students’ writing. The study also reveals that common errors of
ESL writing made by students at IHLs in this particular country are as follows:
1. Lexical errors (word choice, word form, informal usage, idiom error, pronoun error)
2. Morphological errors (tenses, article, agreement)
3. Mechanics (punctuation and spelling)
4. Word order
5. Infinitive and gerundive constructions
6. Possessive and attributive structures
7. Typical use of Malaysian words
8. Incomplete sentences
9. Negative construction
10. Miscellaneous errors
Thus, we can state the next requirement of the system as follows:
Functional Requirement 2.2: The system must be able to produce error statistics for the
following types of errors; namely, lexical errors (word choice, word form, informal usage,
idiom error, pronoun error), morphological errors (tenses, article, agreement), mechanics
(punctuation and spelling), word order, infinitive and gerundive constructions, possessive and
attributive structures, typical use of Malaysian words, incomplete sentences, negative
construction, and miscellaneous errors.
Feedback of various types
Feedback is an important element in the learning process. However, the amount and types of
feedback given must conform to the needs of the learners. Types of feedback expected by
both lecturers and students are: errors in the essay, and organization. Thus, we can conclude
that the CBEM system for ESL writing in tertiary education in Malaysia must be able to give
feedback for the following areas: errors and organization. The following two statements can
be drawn up as a requirement of the system.
8

Functional Requirement 2.3: The system must be able to provide feedback on errors in the
essay.
Functional Requirement 2.4: The system must be able to provide feedback on organization of
the essay.
Varying levels of feedback
There are basically two levels of feedback, namely diagnostic feedback and prescriptive
feedback. Most of the students expect diagnostic feedback rather than prescriptive feedback.
Thus, we can state this as the next functional requirement of the system.
Functional Requirement 2.5: The system must be able to provide a diagnostic level of
feedback to the students.
Individualized feedback
It is also observed that lecturers expect the system to provide individualized feedback. Thus,
the next requirement of the system is as follows:
Functional Requirement 2.6: The system must be able to provide individualized feedback to
students.
Knowledge that needs to be provided to the system
Marking scheme
The most important knowledge that needs to be provided is the marking scheme. Only by
having the appropriate marking scheme will lecturers be able to mark students’ essays
efficiently. Since the system is performing a similar task, the marking scheme is also needed
by the system. Therefore, one way for a lecturer to inform the scoring rubric about a
particular writing task is by indicating the attributes that are being assessed. For each of the
attributes, the lecturer needs to state the level of achievement. The importance of each of the
attributes differs from one writing task to another. The lecturer can indicate the degree of
importance of each attribute by giving a suitable weight for that particular attribute. Thus, the
marking scheme for a writing task can be given by filling out the information in Table 1.
Since the output grade can be given in various forms, a lecturer also needs to indicate the
grade that will be used. One way of doing this is by filling in the values in Table 2.
Table 1: Information for marking scheme
Dimensions Attributes Number of
achievement
levels
Weight
(%)
Content Relevance and knowledge of
the writing task
Rhetorical StructureAbility to use complex language
Development of thesis
9

Organization Cohesiveness
Clarity of ideas
Economy Consistent use of language
Variety of sentence structure
Accuracy of Mean-
ing
Ability to use suitable vocabu-
lary or word choice
Appropriateness of
language use
Frequency of errors
Command of language
Table 2: Information for marks and grade
Range of marks Grade
Thus, we can state the next requirement of the system as follows:
Functional Requirement 2.7: The system must provide a user interface for the lecturer to key
in the information about the marking scheme. The interface must allow the user to indicate
the dimensions and attributes as shown in Table 3. It should also provide a means for the
lecturer to indicate the grade to be given as shown in Table 4.
Process that needs to be undertaken by the system
The process of assigning a grade for a particular piece of writing can be done by adding
together the marks obtained for each of the attributes assessed for that particular writing task.
Suppose that for the ith attribute, the weight is given as wi, the number of achievement level
is li and the student’s achievement for that particular attribute is ai, the marks obtained for that
particular attribute are
Marks for the ith attribute = wi . ai/ li
The total marks for that particular piece of writing can then be calculated as follows:
Total marks = ∑ wi . ai/ li
For example, suppose that a piece of writing is assessed by using the marking scheme as
given in Table 5 and suppose that the achievement levels obtained by each of the attributes
are 6 (for ability to use complex language), 5 (for ability to use vocabulary or word choice)
and 4 (for command of language). So, the total marks obtained are
Total marks = 25 (6/6) + 25 (5/6) + 50 (4/6) = 25 + 20.8 + 33.3 = 79.1
Based on Table 6, a writing assignment that obtains 79.1 marks is given Band 8 as a score.
Accordingly, we can state the next requirement of the system as follows:
10

Functional Requirement 2.8: The total marks obtained by a student for a particular writing
task are given by the following formulae.
where wi is the weight
ai is the student’s achievement for that particular attribute and
li is the number of achievement level
(a) Measuring relevance and knowledge of the writing task
Criteron has provided sample essays of different scores. The number of content words relate
to relevance and knowledge of he writing ask. Thus, the number of content words for each of
these essays is determined and based on the number of overlapped content words, the score of
a particular essay can be given according to Table 3.
Table 3: Relevance and knowledge of the writing task
Score 6
(highest)
Score 5 Score 4 Score 3 Score 2
(lowest)
No. of
content
words
182 176 90 67 36
(b) Measuring use of complex language
The percentage of simple sentences in an essay determines the level of complex language
used as shown in Table 4.
Table 4: Level of complex language use
Score 6
(highest)
5 4 3 2 1
(lowest)
% Simple
sentences
<35% 35-44% 45-54% 55-64% 65-74% >75%
(c) Measuring development of the thesis
Development of the thesis is measured by using outlines as show in Table 5.
11

Table 5: Marks obtained by student essay
Full marks Marks obtained
First Outline 5% 5%
Second Outline 30% 0%
Third Outline 60% 60%
Conclusion 5% 5%
Total marks 100% 70%
We can then translate the marks obtained in percentage into a numerical score. A suitable
grading that can be used is shown in Table 6.
Table 6: Conversion of marks into score
Score 6 (A) 5 (A-) 4 (B) 3 (C) 2(D) 1(F)
Marks >80% >70% >60% >50% >40% 0-39
(d) Measuring cohesiveness
The percentage of SP and ExP are used to measure the level of cohesiveness of sentences in
an essay as shown in Table 7.
Table 7: Level of cohesiveness
Score 6(highest) 5 4 3 2 1(lowest)
% SP and ExP >75% 65-74% 55-64% 45-54% 35-44% <35%
*SP=Sequential Progression
ExP=Extended Parallel progression
(e) Measuring clarity of ideas
Flesch Reading Ease is used to measure the level of clarity of ideas as shown in Table 8.
12

Table 8: Level of clarity of ideas
6
(highest)
5 4 3 2 1
(lowest)
Flesch Reading
Ease
30.1-
40.1
40.1-5
0.1
50.1-60.
1
60.1-70.
1
70.1-80
.1
80.1-
90.1
(f) Measuring consistent use of language
The number of errors in tenses and subject-verb agreement are used to measure the level of
consistent use of language as shown in Table 9.
Table 9: Level of consistent use of language
6
(highest)
5 4 3 2 1
(lowest)
Errors in
tenses and
subject-verb
agreement
No
errors
A small
no. of
errors
(<5)
Some er-
rors
(5-14)
A signifi-
cant no. of
errors
(15-24)
A lot of
errors
(25-34)
Full of
errors
(>35)
(g) Measuring variety of sentence structure
Variety of sentence structure is measured by the percentage of basic, loose, periodic and
combination sentences as shown in Table 10.
Table 10: Description of variety of sentence structure
Variety of sentence structure
Marks Abbreviation Description
0 NV No variety:
At least one value is 100%
1 EL Extremely limited variety:
At least one value is more than 90%
2 VL Very limited variety:
All four values are less than 90%, but at least one
value is more than 75%
3 L Limited variety:
13

4 M Moderate variety:
5 H High variety:
6 VH Very high variety:
All four values are less than 30%
*Four values correspond to the percentage of basic, loose, periodic and combination
sentences.
(h) Measuring use of suitable vocabulary or word choice
The percentage of the number of different words/total number of words is used to measure
level of suitable vocabulary or word choice as shown in Table 11.
Table 11: Level of suitable vocabulary or word choice used
6
(highest)
5 4 3 2 1
(lowest)
Percentage of no.
of different words
divided by total
number of words
>70% 60-70% 50-60% 40-50% 30-40% <30%
(i) Measuring frequency of errors
Frequency of errors is measured by the number of areas of errors made in an essay as shown
in Table 12.
Table 12: Level of errors
6
(highest)
5 4 3 2 1 0
(lowest)
Areas of
errors
1 or 2
areas only
3 or 4
areas
5 or 6
areas
7 or 8
areas
9 or 10
areas
11 or 12
areas
All 13 ar-
eas
(j) Measuring command of language
Command of language is measured by the number of function words used in an essay as
shown in Table 13.
14

Table 13: Level of command of language
6
(highest)
5 4 3 2 1
(lowest)
Number of
function
words
1 or 2
areas only
3 or 4
areas
5 or 6
areas
7 or 8
areas
9 or 10
areas
11 or 12
areas
Using the framework
To demonstrate the use of the framework, a student essay is chosen at random. This essay is
written based on the writing task as stated in the background to the study. The student’s essay
is shown in Figure 1. For the sake of discussion, the scoring rubric used by Criterion as
shown in Appendix A is used for marking this essay.
Traditions play an important role to every human being without concern of
the races, colours and living style. We attach and dedicate ourselves hoping that we
can have a better life. The traditions helped us to be wise, spiritual, pyshically and
mentally aware at the errors of our life. Hence, nowadays, it is too upset to see
when young people break and intend to ignore with the traditions in the past. They
break the past traditions let say for instance, the marriage tradition, festivals and
respected old people.
The old tradition conveys marriage as when a couple decides to give their
life to each other and willing to live together forever and ever in any circumstances
by tight their knots infront of their parents. As usual the ceremony used to be held
by the both parents of the beride and the groom with the permission they have
already given to them. But what happened now is that the young people did not
border about this tradition. There fore, they did not asked the parents permission to
release but instead they prefer to live together without confessions or marriage
ceremony. What I mean here is that there is no marriage proposal to be add in, as
long as the couples are happy which other. This kind of relationships is more than
enough their marriage. It should not happened in this way because we are going to
lose our traditions, where we suppose to valued the relationship by devoting ourself
to God, to the people we care so much too. Somethings are hard to understand but
can we imagine what it would be like when suddenly we broke off after staying and
living together for so long?
The festivals let say for examples, Gawai Festival, and Chinese new years. In
the past Gawai festival used to celebrated in a grand in a traditional way. But what
happened nowadays is Gawai celebrated in such a western style. The food was
being served more on western style rather than traditional food. Young people tend
to wear modern clothes, preferring western dance rather than wearing traditional
costumes and traditional food. It's too bad the trend of respecting old people during
this period has gone. It goes the same to the Chinese New Year too. When I was in
my childhood, my Chinese friends use to gave food to the guest. Now I never see it
happen anymore. In my point of view, I believe this sign is going to lead us into a
world where we intend to be hypnotise by other's people cultural without knowing
that our own cultural and traditions are more worthy.
Young people now days have a new tradition where there prefer to behave
unexpectedly to the old people. Life is easy for them. They depending on their
pleasure and the excitement they have without knowing what they have is just a
temporary. They do not respect the old people especially in terms of giving advises
15

and comfort them. Respecting old people is important because from this people we
can earn blessing from them. But unfortunately this tradition has been mislead by
the young people who always think that they are perfect.
In conclusion, I may not convey my message here in detail but I believe,
traditions is a kind of lighthouse to guide us and to lead us into a right way. When
we lost or feel down it helps us to be aware of the bad situations.
Figure 1: Student essay (A83882)
Description of the marking scheme
The following steps show how the framework can be utilized to mark the above essay.
Step 1: The Criterion scoring rubric is firstly converted from a holistic to analytic description
following the dimensions of ESL writing proposed by Cohen (1994: 307).
Step 2: After converting the scoring rubrics from holistic to analytic format, the marking
scheme can be described as shown in Table 14. The marks and grades used by Criterion can
be represented as shown in Table 15.
Table 14: Criterion marking scheme
Dimensions Attributes Number of
achievement
levels
Weight
(%)
Content Relevance and knowledge of the
writing task
6 15
Rhetorical Structure Ability to use complex language - -
Development of thesis 6 15
Organisation Cohesiveness 6 15
Clarity of ideas 6 15
Economy Consistent use of language - -
Variety of sentence structure 6 10
Accuracy of meaning Use of suitable vocabulary or
word choice
5 10
Appropriateness of lan-
guage use
Frequency of errors 5 10
Command of language 3 10
Table 15: Criterion marks and grade
Range of marks Grade
80-100 6
70-80 5
60-70 4
50-60 3
40-50 2
Less than 40 1
16

Measuring relevance and knowledge of the writing task
The score for relevance and knowledge of the writing task can be obtained by comparing the
number of overlapped content words with the number of words for sample essays as shown in
Table 16.
Table 16: Relevance and knowledge of the writing task for student’s essay
Student
essay
Total no.
of content
words
Overlapped
content
words
Score 2 Score 3 Score 4 Score 5 Score 6 Final
score
A83882 253 57 36 67 90 176 181 3
Measuring development of thesis
The method for measuring development of the thesis is by giving marks to the number of
available outlines in a student’s essay. Following a similar manner, the score for the student’s
essay can be obtained as shown in Table 17.
Table 17: Development of thesis
Sample es-
say
First out-
line
(%)
Second out-
line
(%)
Third
outline
(%)
Last out-
line
(%)
Marks ob-
tained
Score
A83882 0 0 60 5 65% 4
Measuring cohesiveness
Table 18 illustrates the measurement for level of cohesiveness in students’ writing based on the
technique as shown in Table 7.
Table 18: Level of cohesiveness
Sample
essay
Total no. of
sentences
(TS)
No. of SP No. of ExP (SP+ExP)/TS
(%)
Score
A83882 31 12 12 77.4 6
Measuring clarity of ideas
Based on the measurement for level of clarity of ideas as shown Table 12, the level of clarity
of ideas for the student’s essay can be derived as shown in Table 19.
17

Table 19: Level of clarity of ideas
Sample essay Flesch Reading
Ease
Level of clarity
of ideas
A83882 63.2 3
Measuring variety of sentence structure
In measuring variety of sentence structure, we have to find the number of basic sentences,
loose sentences, periodic sentences and combination sentences. We can then use the rule as
described in Table 10 to determine the score obtained by the student’s essay.
Table 20: Level of achievement for variety of sentence structure
Sample essayNo. of basic
sentences
No. of loose
sentence
No. of peri-
odic sentences
No. of combi-
nation sen-
tences
Score
A83882 9.7 58.1 16.1 12.9 4
Measuring use of suitable vocabulary or word choice
Use of suitable vocabulary or word choice can be measured by measuring the percentage of
the number of different words in the essays divided by the total number of words. The score
can be obtained based on the rule described in Table 11.
Table 21: Level of suitable vocabulary or word choice
Sample essay Total no. of
words
(W)
No. of different
words (NDF)
NDF/W
%
Score
A83882 578 253 43.8 3
Measuring frequency errors
Errors can be measured by using error analysis. Based on error analysis carried out, we can
illustrate the occurrence of errors in the student’s essay as shown in Table 22. The score for
each essay is determined by using a method that is described in Table 12.
Table 22: Occurrence of errors in sample essay
Types of error A83882
Tense 19
Article 7
Agreement 2
Infinitive Gerundive 1
Pronoun 1
18

Positive Attributive 2
Word order 3
Incomplete 1
Negative construction 0
Lexical 16
Mechanics 10
Typical 0
Miscellaneous 0
Number of areas of errors 10
Score 2
Measuring command of language
Command of language is measured by calculating the number of function words in the essay.
Table 13 is then used to determine the score for each of the essays. Table 23 shows the level
of command of language for the student’s essay.
Table 23: Level of command of language
Sample essay No. of function words Score
A83882 37 4
Calculating the mark
Following the formula
where wi is the weight
ai is the student’s achievement for that particular attribute and
li is the number of achievement level
The calculation for student’s essay (A83882) is as follows:
Total marks for sample essay 1 = 15(3/5) + 15(4/6) + 15(6/6) + 15(3/6) +
10(4/6) + 10(3/6) + 10(2/6) + 10(4/6)
= 9 + 10 + 15 + 7.5 + 6.7 + 5 + 3.3 + 6.7
= 63.2
Table 24 shows the marks and grades for the sample essays calculated by using the
framework based on Criterion marking scheme.
19

Table 24: Marks and score for sample essay
Attribute Weight A83882
Relevance and knowledge of
the writing task
15 3
Development of thesis 15 4
Cohesiveness 15 6
Clarity of ideas 15 3
Variety of sentence structure 10 4
Vocabulary or word choice 10 3
Frequency of errors 10 2
Command of language 10 4
Marks 100 63.2
Grade 6 4
Conclusion
This study has revealed a framework that can be successfully applied to marking ESL writing
at IHLs in Malaysia. In order to use this framework, the lecturer initially needs to decide the
marking scheme to be used in assessing students’ essays, specifically which dimensions of
ESL writing, the number of achievement levels and weight for each of the attributes. The
range of marks for each of the grades needs to be indicated. Marks for each of the attributes
are then given following the techniques for marking of each of the attributes as described in
the section entitled Using the Framework. The total marks for each essay is then generated
by using a mathematical formula based on the marks obtained by each of the attributes being
assessed. The marks obtained by using the framework are more precise and reliable than the
marks given holistically by lecturers.
Suggestions for further studies
In retrospect, in order to ensure that the proposed framework that we have developed will be
more beneficial to the majority of Malaysian students at IHLs, a number of future studies are
suggested. The future studies are listed below.
Refining the research approach
The research was carried out despite time and resources constraints; thus, it has a limitation in
terms of the samples investigating the lecturers’ and students’ expectations of CBEM systems
and textual analysis of students’ essays. The outcome of the first part of the research can be
further refined if the following steps are carried out:
Firstly, the sample in the study concerning lecturers’ and students’ expectations of
CBEM systems should include more lecturers and students from many more public and
private IHLs in Malaysia.
20

Secondly, the textual analysis of students’ essays needs to be carried out on sample
essays taken from other IHLs in Malaysia in order to determine conclusively the
characteristics of Malaysian students writing.
Thirdly, other CBEM systems need to be experimented with in marking Malaysian
students’ essays in ESL.
Refining the framework
In this study, we have considered six scoring rubrics; namely, scoring rubrics for College
Board SAT, IELTS, TOEFL, the ESL Composition Profile, and the Written Communication
course at UKM and OUM. The analysis of these scoring rubrics indicates that only six
dimensions of ESL writing need to be evaluated. These dimensions are content, rhetorical
structure, organization, economy, accuracy of meaning and appropriateness of language use.
The other four dimensions that are proposed by Cohen (1994: 307), which are register, style,
reader’s understanding and reader’s acceptance are not covered by these scoring rubrics. It
may be necessary to carry out studies to analyze other scoring rubrics that take the other four
dimensions into consideration.
Validation
The framework needs to be validated by a panel of experts in order to ensure that it is really
suitable as the basis for the development of a CBEM system. This validation process may
involve ESL writing experts whose opinion will give further insights into the open-ended and
close-ended essay topics that the Malaysian ESL learners need to be able to write.
Furthermore, their opinions may be important in determining the validity of some of the
measures that have been proposed. The second group of experts is that of the computer-
assisted language testing experts. Their opinions are needed for ensuring the suitability of the
method of evaluation of writing that has been proposed. The third category of experts is that
of the software development experts who will be able to contribute in ensuring the framework
is properly understood by the software developer, who will be responsible for developing the
software.
Implementing the framework into a system
This framework only provides the specification for a CBEM system for ESL writing at
Malaysian IHLs. The next challenging task is to develop the framework into a software
system. This process requires the support of software developers who will be responsible for
designing, coding and testing the system.
References
Burstein, Jill, Martin Chodorow, and Claudia Leacock. "Criterion Online Essay Evaluation:
An Application for Automated Evaluation of Student Essays." Paper presented at the
Fifteenth Annual Conference on Innovative Applications of Artificial Intelligence,
Acupulco, Mexico, 2003.
Cohen, Andrew D. Assessing Language Ability in the Classroom. Boston, Massachusetts:
Heinle and Heinle, 1994.
21

College Board SAT. Real Test Prep for the Test Makers: College Entrance Examination
Board, 2001.
Darus, Saadiyah, Supyan Hussin, and Siti Hamin Stapa. "Students' Expectations of a
Computer-Based Essay Marking System." Reflections, Visions & Dreams of
Practice: Selected Papers from the IEC 2001 International Education Conference. Ed.
J. Mukundan. Kuala Lumpur: ICT Learning, 2001: 197-204.
Darus, Saadiyah. "Identifying Dimensions and Attributes of Writing Proficiency:
Development of a Framework of a Computer-Based Essay Marking System for
Malaysian ESL Learners." Internet Journal of e-Language Learning and Teaching 3.1
(2006): 1-25.
Darus, Saadiyah, Supyan Hussin, and Siti Hamin Stapa. "Different Learners and Different
Feedback: Computer-Based Essay Marking System in Development for ESL Learners
in Malaysia." Computer-Assisted Language Learning: Concepts, Contexts, and
Practices. Ed. Jeong-Bae Son. New York: iUniverse Inc, 2004: 165-82.
Darus, Saadiyah, Supyan Hussin, and Siti Hamin Stapa. "Students' Expectations of a
Computer-Based Essay Marking System." Reflections, Visions & Dreams of
Practice: Selected Papers from the IEC 2001 International Education Conference Ed.
J. Mukundan. Kuala Lumpur: ICT Learning, 2001: 197-204.
Darus, Saadiyah, and Siti Hamin Stapa. "Lecturers' Expectations of a Computer-Based Essay
Marking System." Journal of the Malaysian English Language Teaching Association
30 (2001): 47-56.
Gamaroff, Raphael. "Rater Reliability in Language Assessment: The Bug of All Bears."
System 28.1 (2000): 31-53.
Hiller, Jack H. "Applying Computerized Text Measurement Strategies from Project Essay
Grade (PEG) to Military and Civilian Organizational Needs." Paper presented at the
Annual Meeting of the American Educational Research Association 1998.
Holmes, Martin. Markin 32. Version 1.2. Computer software, 1996.
Ho Peng, Lim. "An Error Analysis of English Compositions Written by Malaysian-Speaking
High School Students." M.A. thesis. University of California, 1974.
Hounsell, D. "Contrasting Conceptions of Essay Writing." The Experience of Learning:
Implications for Teaching and Studying in Higher Education. Eds. F. Marton, D.
Hounsell and N. Entwistle. Edinburgh: Scottish Academic Press, 1997: 106-25.
IELTS. "IELTS Handbook." 2005.
Jacobs, Holly L., S. Zinkgraf, D. Wormuth, V.F. Hartfiel, J. Hughey. Testing EFL
Composition: A Practical Approach. Rowley, Mass: Newbury House, 1981.
22

Landauer, Thomas K., Peter W. Foltz, and Darrell Laham. "Introduction to Latent Semantic
Analysis." Discourse Processes 25 (1998): 259-84.
Lautamatti, L. "Observations in the Development of the Topic in Simplified Discourse."
Writing across Languages: Analysis of L2 Text. Eds. Ulla Connor and Robert B.
Kaplan. Reading, MA: Addison-Wesley, 1987: 87-114.
Marshall, Stewart, and Colin Baron. "MARC - Methodical Assessment of Reports by
Computer." System 15.2 (1987): 161-67.
Page, Ellis Batten, G.A. Fisher, and Mary Ann Fisher. "Project Essay Grade: A Fortran
Program for Statistical Analysis of Prose." British Journal of Mathematical and
Statistical Psychology 21 (1968): 139.
Page, Ellis Batten, and Nancy S. Petersen. "The Computer Moves into Essay Grading:
Updating the Ancient Test." Phi Delta Kappan 76.7 (1995): 561-65.
Schneider, M., and Ulla Connor. "Analyzing Topical Structure in ESL Essays: Not All Topics
Are Equal." Studies in Second Language Acquisition 12 (1991): 411-27.
Shermis, Mark D., Howard R. Mzumara, Jennifer Olson, and Susanmarie Harrington. "On-
Line Grading of Student Essays: PEG Goes on the World Wide Web." Assessment and
Evaluation in Higher Education 26.3 (2001): 247-59.
TOEFL. Product and Services Catalog: Educational Testing Service, 2001.
Appendix A
Criterion scoring rubric
You have put together a convincing argument. Here are some of the strengths evident
in your writing.
Your essay:
Looks at the topic from a number of angles and responds to all aspects of what you
were asked to do.
Responds thoughtfully and insightfully to the issues in the topic.
Develops with a superior structure and apt reasons or examples (each one adding sig-
nificantly to the reader’s understanding of your view).
Uses sentence styles and language that have impact and energy and keep the reader
with you.
Demonstrates that you know the mechanics of correct sentence structure, and Ameri-
can English usage – virtually free of errors.
Score 6
23

You have solid writing skills and something interesting to say. Look at the sample
essay to get ideas on how to develop your ideas more fully or use language more per-
suasively and consistently.
Your essay:
Responds more effectively to some parts of the topic or task than to other parts.
Shows some depth and complexity in your thinking.
Organizes and develops your ideas with reasons and examples that are appropriate.
Uses the range of language and syntax available to you.
Uses grammar, mechanics, or sentence structure with hardly any error.
Score 5
Your writing is good, but you need to know how to be more persuasive and more
skillful at communicating your ideas. Look at the 5 and 6 sample essays to see how
you could be more persuasive and use language more effectively.
Your essay:
Slights some parts of the task.
Treats the topic simplistically or repetitively.
Is organized adequately, but you need more fully to support your position with discus-
sion, reasons, or examples.
Shows that you can say what you mean, but you could use language more precisely or
vigorously.
Demonstrates control in terms of grammar, usage, or sentence structure, but you may
have some errors.
Score 4
Your writing is a mix of strengths and weaknesses. Working to improve your writing
will definitely earn you more satisfactory results because your writing shows promise.
In one or more of the following areas, your essay needs improvement. Your essay:
Neglects or misinterprets important parts of the topic or task.
Lacks focus or is simplistic or confused in interpretation.
Is not organized or developed carefully from point to point.
Provides examples without explanation, or generalizations without completely sup-
porting them.
Uses mostly simple sentences or language that does not serve your meaning.
Demonstrates errors in grammar, usage, or sentence structure.
Score 3
You have work to do to improve your writing skills. You probably have not addressed
the topic or communicated your ideas effectively. Your writing may be difficult to
understand.
In one or more of the following areas, your essay:
Misunderstands the topic or neglects important parts of the task.
Does not coherently focus or communicate your ideas.
Is organized very weakly or does not develop ideas enough.
Generalizes and does not provide examples or support to make your points clear.
Uses sentences and vocabulary without control, which sometimes confuses rather
than clarifies your meaning.
Contains too many errors in grammar, word usage, and sentence structure.
Score 2
24

You have much work to do in order to improve your writing skills. You are not writ-
ing with complete understanding of the task, or you do not have much of a sense of
what you need to do to write better. You need advice from a writing instructor and lots
of practice.
In one or more of the following areas, your essay:
Misunderstands the topic or does not show that you comprehend the task fully.
Lacks focus, logic, or coherence.
Is undeveloped – there is no elaboration of your position.
Lacks support that is relevant.
Shows poor choices in language, mechanics, usage, or sentence structure which make
your writing confusing.
Score 1
25

A Framework Of A Computer-Based Essay Marking System For ESL Writing.

Recommended

Recommended

More Related Content

Similar to A Framework Of A Computer-Based Essay Marking System For ESL Writing.

Similar to A Framework Of A Computer-Based Essay Marking System For ESL Writing. (20)

More from Carmen Pell

More from Carmen Pell (20)

Recently uploaded

Recently uploaded (20)

A Framework Of A Computer-Based Essay Marking System For ESL Writing.