Đề tài: An analysis of final english test at elementary Schools In Ho Chi Minh

MINISTRY OF EDUCATION AND TRAINING
UNIVERSITY OF ECONOMICS AND FINANCE
RESEARCH REPORT
AN ANALYSIS OF FINAL ENGLISH
TEST AT ELEMENTARY SCHOOLS
IN HO CHI MINH
Major: ENGLISH LANGUAGE
Minor: ENGLISH LANGUAGE TEACHING
Ho Chi Minh City, 2018
Supervisor(s) :
Student’s name :
Student ID :
Class :

MINISTRY OF EDUCATION AND TRAINING
UNIVERSITY OF ECONOMICS AND FINANCE
RESEARCH REPORT
AN ANALYSIS OF FINAL ENGLISH
TEST AT AN ELEMENTARY
SCHOOL IN HO CHI MINH
Major: ENGLISH LANGUAGE
Minor: ENGLISH LANGUAGE TEACHING
Ho Chi Minh City, 2018
Supervisor(s) :
Student’s name :
Student ID :
Class :

An analysis of final English test at elementary schools in Ho Chi Minh City
i
Abstract
Nowadays, testing becomes a norm in educational system. However, few
people concern that how reliable those tests are used. To answer such a question,
therefore, I did a small study to figure out the reliability as well as validity of a test
which was performed by the participants of the study in a class at a public
elementary school.
With the data gathered from the test, I can find out the facility values of items
in the test, together with its reliability and validity. Through these data, I could fully
understand how to make a good item for a test to evaluate the students’ language
ability.

ii
Acknowledgements
Firstly, I would like to express my special gratitude for the lecturers who
have taught me over the past year and a half and for their supports which help me to
complete my project. I, especially, am grateful to Mr. Le Nguyen Lan, a Lecturer of
English of Faculty of University of Economics and Finance, who has been
supportive of my study and helped me with enormous information to achieve the
goal. As my lecture and mentor, he has taught me more than I could ever give him
credit for here.
Also, I am thankful to all of those with whom I had the pleasure to work
during this project. Each of the participants of this project has provided me
unforgettable experience when I have worked with them.

iii
THE SOCIALIST REPUBLIC OF VIETNAM
Independence - Freedom - Happiness
--------------------
SUPERVISOR’S COMMENTS
Student’s full name : ................................................................................
Student ID : ..................................................................................
Intake :....................................................................................
1. Internship duration
................................................................................................................................
................................................................................................................................
2. Major and minor
................................................................................................................................
................................................................................................................................
3. Comments
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
Supervisor’s signature

iv
List of Tables
Table 3.4 Project Timetable ............................................................................14
Table 4.1 Construction of the test....................................................................15
Table 4.2: Differences between marked and remarked...................................16
Table 4.3.1 The raw scores after marking for the first time............................17
Table 4.3.2 The raw score of remarking..........................................................17
Table 4.4.1 Face validity of items in Listening section ..................................19
Table 4.4.2 Face validity of items in Reading section ...................................19
Table 4.4.3 Face validity of items in Writing section ....................................20

v
Table of Contents
Chapter 1: Introduction...........................................................................................1
1.1. The research problem.........................................................................................1
1.2. Rationale for the Present Study..........................................................................1
1.3. Aim of the Study ................................................................................................1
1.4. Significance of the Study ...................................................................................1
Chapter 2: Literature Review .................................................................................3
Chapter 3: Methodology ........................................................................................12
3.1. Data gathering instrument................................................................................12
3.2. Setting for the present study.............................................................................12
3.3. Participants information...................................................................................13
3.4. Data collection procedures...............................................................................13
3.5. Data analysis ....................................................................................................14
Chapter 4: Findings................................................................................................15
4.1. Content analysis ...............................................................................................15
4.2. Score analysis...................................................................................................15
4.3. Calculating .......................................................................................................17
Chapter 5: Discussion.............................................................................................21
Chap ter 6: Conclusion ..........................................................................................23
Bibliography.......................................................................................................... 24

1
Chapter One
Introduction
In this chapter, some contents will be mentioned such as the problems of
research, the rationale for the study, the study’s purpose and the significant of this
study.
1.1. The research problems
The study’s aim is to understand the causes of mistakes in testing English of
grade one student. Based on that result, it can point out the validity of these tests.
1.2. Rationale for the Present Study
Recognizing the real situation of some primary schools that were the
learners’ mistake in grade one this leads to my research. Especially, this school I
conducted the test was a famous primary school, which is a reliable source to study,
in Binh Thanh district. Besides, I would like to figure out the answers of the two
following questions:
1. What are the validity and reliability of the tests?
2. How can we improve the tests to measure the testees accurately?
1.3. Aim of the Study
This study was used for finding out some of related challenges of students that
occurred in grade one in learning English. According to this result, the researcher
will figure out the solutions for each difficulty in order to improve their learning
situations. Thus, I would like to aim for:
 The advantages of validity and reliability of a test
 The disadvantages of validity and reliability of a test
1.4. Significance of the Study

2
According to analyzed results of the research that mean the validity of the
test in examination. Because of this, it is still the best way to evaluate the level of
English learners. Besides, it will be a premise for teaching strategies planning and
transmission changing based on each person.

3
Chapter Two
Literature Review
Making fair and systematic evaluations of other’ performance can be a
challenging task for anyone. According to Sax (1989,p.150), judgment cannot be
made solely on the basis of intuition, haphazard guessing, or custom. Teachers,
employers, and others in evaluative positions use many tools to assist them in their
evaluations. Therefore, knowing clearly about the terms in test could help ones to
make a good test for the examinees. In this chapter, I will discuss several terms used
in test evaluation.
What is a test ?
According to Thissen, D., &Wainer, H. (2001).Test Scoring. Mahwah, NJ:
Erlbaum.Page 1, sentence 1.
A test or examination (informally, exam or evaluation) is an assessment intended to
measure a test-takers’ knowledge, skill, aptitude, physical fitness, or classification in many
other topics (e.g., beliefs). A test may be administered verbally, on paper, on a computer,
or in a confined area that requires a test taker to physically perform a set of skills. Tests
might vary in style, rigor and requirements.
There are four main kinds of test:
Diagnostic test
These tests are used to diagnose how much a learner knows and what a
learner knows. They can help a teacher know needs reviewing or reinforced in class.
They also enable the students to identify areas of weakness.
Placement test
These tests are used to place students in the appropriate class or level. After
establishing the student’s level, the student is placed in the appropriate class to suit
his/her needs.

4
Progress or Achievement test
Achievement or progress tests measure the student improvement in relation
to their syllabus. These tests contain items which the students have been taught in
class. There are two types of progress tests of progress tests: short-term and long-
term tests.
Short-term progress tests check how well students have understood or
learned material covered in specific units or chapters. They enable the teacher to
decide if remedial or consolidation work is required.
Long-term progress tests are also called Achievement Tests because they
check the learner’s progress over the entire course. They enable the students to
judge how well they have progressed. Administratively, they are often the sole basis
of decisions to promote to a higher level.
Proficiency test
These tests check learner levels in relation to general standards. They
provide a broad picture of knowledge and ability.
Although tests are used to examine the ability of learners, there are several
problems to make a fair test. Test analysis examines how the test items perform as a
set. Item analysis "investigates the performance of items considered individually
either in relation to some external criterion or in relation to the remaining items on
the test" (Thompson &Levitov, 1985, p. 163). These analyses evaluate the quality of
items and of the test as a whole or revise and improve both items and the test as a
whole. These tools include validity, item difficulty, standard deviation, reliability of
a test and formula to calculate these values.
Item
An item is the basic unit of interaction on a test. What we often call a test
question is more properly known as an item, since it may not be worded as an actual
question. The student's feedback is also more properly known as a response rather

5
than an answer, but we won't get too particular on that point. Items can be written in
various formats, including multiple choice, matching, true/false, short answer, and
essay.
Validity
According to Anastasi and Urbani (1997, p.109), validity is what a test
measures and how well it does this. It is validity that is a crucial consideration in
evaluating a test. Besides that, Cronbach’s (1949,p.55) concept of validity has
remained consistent over the time with minor modifications. He stated that validity
was the extent to which a test measures what it purports to measure and that a test is
valid to the degree that what it measures or predicts is known.
There are two kinds of validity:
Face validity estimates whether a test measures what is claims to measure. It
is the extent to which a test seems relevant, important, and interesting. It is the least
rigorous measure of validity.
Content validity is the degree to which a test matches a curriculum and
accurately measures the specific training objectives on which a program is based.
Agreed with that definition, Standards for Educational and Psychological
Testing (American Educational Research Association, American Psychological
Association & National Council on Measurement and Education, 1985) stated:
Validity is the most important consideration in test evaluation. The concept refers
to the appropriateness, meaningfulness, and usefulness of the specific inferences made
from test scores. Test validation is a process of accumulating evidence to support such
inferences. A variety of inferences may be made from scores produced by a given test, and
there are many ways of accumulating evidence to support any particular inference.
Validity, however, is a unitary concept. Although evidence may be accumulated in many
ways, validity always refers to the degree to which that evidence supports the inferences
that are made from the scores. The inferences regarding specific uses of a test are
validated, not the test itself. (p. 9).

6
Besides some approved thoughts, there are some ideas believing that the notion of
falsification in the validation process exits due to spurious correlations. Messick (1989)
discussed the importance of considering the consequences of test use in drawing inferences
about validity and added the term consequential validity to this list. He noted:
Validity is an overall evaluative judgment, founded on empirical evidence and
theoretical rationales, of the adequacy and appropriateness of inferences and actions
based on test scores. As such validity is an inductive summary of both the adequacy of
existing evidence for and the appropriateness of potential consequences of test
interpretation and use (Messick, 1988, p. 33-34).
Item difficulty
Item difficulty is simply the percentage of students taking the test who
answered the item correctly. It is the relative frequency with which examinees
choose the correct response (Thorndike, Cunningham, Thorndike, & Hagen,
1991,page.45). It has an index ranging from a low of 0 to a high of +1.00. The
larger the percentage getting an item right, the easier the item. The higher the
difficulty index, the easier the item is understood to be (Wood, 1960).
Item difficulty is a characteristic of the item and the sample that takes the
test. Item difficult is a way to compare items that measure different domains, such
as questions in statistics and sociology making it possible to determine whether
which item is more difficult for the same group of examinees. According to
Thorndike and et all (1991), item difficulty has a powerful effect on both the
variability of test scores and the precision with which test scores discriminate
among groups of examinees.
Facility value or difficulty index or p value is basically a behavioural
measure. Rather than defining difficulty in terms of some intrinsic characteristic of
the item, difficulty is defined in terms of the relative frequency with which those
taking the test choose the correct response (Thorndike et al, 1991). To compute the
item difficulty, divide the number of people answering the item correctly by the
total number of people answering item. The proportion for the item is usually

7
denoted as p and is called item difficulty (Crocker &Algina, 1986). An item
answered correctly by 85% of the examinees would have an item difficulty, or p
value, of .85, whereas an item answered correctly by 50% of the examinees would
have a lower item difficulty, or p value, of .5
The p is acceptable from 0.3 to 0.7, and the ideal is 0.5.
Mean
The mean is a measure of central tendency; it gives an indication of the
average value of a distribution of figures. The mean is the arithmetic average of a
group of scores; that is, the scores are added up and divided by the number of
scores. The mean is sensitive to extreme scores when population samples are small.
For example, for a class of 20 students, if there were two students who scored well
above the others, the mean will be skewed higher than the rest of the scores might
indicate. Means are better used with larger sample sizes.
The term is calculated based on the formula:
μ = (Σ Xi) / N
where the symbol ‘μ’ represents the population mean. The symbol ‘Σ Xi’
represents the sum of all scores present in the population (say, in this case) X1 X2 X3
and so on. The symbol ‘N’ represents the total number of individuals or cases in the
population.
Standard deviation
Standard deviation (sd) is a measure that is used to quantify the amount of
variation or dispersion of a set of data values. Low standard deviation indicates that the
data points tend to be close to the mean of the set, while a high standard deviation
indicates that the data points are spread out over a wider range of values.
Standard deviations are often used in norm-referenced tests to diagnose language
impairment. Those scores that fall within one SD of the mean are considered to be

8
typically developing. Disability is often diagnosed at 1.5 to 2.0 SD below the mean.
However, research has demonstrated that using standard deviations to diagnose language
impairment does not accurately identify language impairment with acceptable specificity
and sensitivity (Spaulding, Plante&Farinella, 2006.).
Standard deviation is calculated based on the formula:
Where x represents each value in the population, x is the mean value of the
sample, Σ is the summation (or total), and n-1 is the number of values in the sample
minus 1.
Reliability of test
In 1904, Spearman described true-score-and-error model which was accepted
as “classical” reliability theory for the next 50 years. Classic reliability is a test
score that includes a true score and random error. True score is a theoretical,
dependable measure of a person’s obtained score uninfluenced by chance events or
conditions. It is the average of identical tests administered repeatedly without limit.
Identical implies that a person is not affected by the testing procedure, which is
unlikely to occur. A raw score, which is the number of points a person obtains on a
test, is the best estimate of the true score. Chance conditions, such as test quality
and examinee motivation, may underestimate or overestimate true scores
(Thorndike, 1982).
In 1953, Lindquist described a multifaceted reliability model that was
adapted by Cronbach, Gleser, Nanda, and Rajaranam (1972). More recently,
reliability has been expanded to include generalizability theory, domain mastery,
and criterion-referenced testing.

DOWNLOAD ĐỂ XEM ĐẦY ĐỦ NỘI DUNG
MÃ TÀI LIỆU: 53846
DOWNLOAD: + Link tải: tailieumau.vn
Hoặc : + ZALO: 0932091562

Đề tài: An analysis of final english test at elementary Schools In Ho Chi Minh

Recommended

Recommended

More Related Content

Similar to Đề tài: An analysis of final english test at elementary Schools In Ho Chi Minh

Similar to Đề tài: An analysis of final english test at elementary Schools In Ho Chi Minh (20)

More from Dịch vụ viết thuê Khóa Luận - ZALO 0932091562

More from Dịch vụ viết thuê Khóa Luận - ZALO 0932091562 (20)

Recently uploaded

Recently uploaded (20)

Đề tài: An analysis of final english test at elementary Schools In Ho Chi Minh