Reliability and validity of Research Data

Reliability and
Validity of Research
Data
Course Supervisor:
DR. H. Nur Samsu, M.Pd.
Presented by:
Aminah Ibrahim Abbad
Lolita Febridonata
Zuraida
Graduate Students of TBI 1A
IAIN TULUNGAGUNG

What do Reliability and Validity deal
with?
• Both terms are related to the scores or the result of
assessment of language skills.
• The assessment may be in the form of conventional testing
or alternative classroom assessment.
• Conventional testing covers multiple choice, matching, short
answer, essay, and the like.
• Alternative assessment refers to any activity that involves
systematic collection of information about language skill such
as learners’ journal, notebook, writing, etc.

What is Reliability?
• It refers to the preciseness of the language assessment skill
result in representing the actual level of the skill of the
examinees.
• The result of a language skill assessment has high reliability if
the result precisely (very close to/not far away from/gives
good estimation) represents the true level of the skill being
assessed.
• the distance between the true level of the skill and the
assessment result, then, determines the degree of reliability.
• Bigger distance (bigger errors) means lower reliability.

Equations for Reliability
Where:
X: skill assessment result
T: the true level of the skill being assessed
E: the errors
It explains that every language skill assessment result (X) contains
the mixture of the true level of the language skill being assessed (T)
and the error (E).
The amount of E determines the degree of the reliability of X.
X= T + E

X= T + E
Is Reliability the same as consistency??
• No. They both are different.
• Reliability refers to preciseness, and consitency is an
indicator of reliability.
• Reliability means the closeness of the X to T.
• X closes to T : reliable
• If the assessment result (score) is consistent from one
assessment to another, it means that the result has high
reliability. So, consistent means reliable.
• The evidence of reliability is consistency of the scores.

Factors affecting the degree of reliability
Examinee Examiner
Test
Instrument
Environment
Sources of error:

Not the examinees’ best performance due to
physical or emotional factors; sick, low
motivation, tired, hungry, too happy, over
active.
Cheating in the assessment. If the examinees
are not strictly watched during the assessment
process, they might copy each others’ anwers or
they might copy their prepared notes.
Error from Examinees

Not the raters’ most
objective judgement
Caused by examiners’ physical or emotional
condition; and his expertise in making
and/or conducting assessment procedure.
Error from Examiner

The instrument is too short
The instrument is heterogenous
The assessment questions are too easy or too
difficult
The type and quality of assessmens instrument
Error from
Assessment
Instrument

The room of
assessment is too
hot
The room of
assessment is too
cold
The room of
assessment is too
windy
The room of
assessment is too
small and crowded
Error from the
Environment

Which factor determines the reliability the
most?
• Error due to the test itself
• It is called as systematic error. It deals with
the validity.
• It is the biggest problem because it makes
test unreliable

Estimating the degree of reliability
There are four methods of evaluating the reliability of the research data:
1. Split-half Reliability: determines how much error in a test score is due to poor
test construction. To calculate: administer a test once and then calculate the
reliability index by coefficient alpha, Kuder-Richardson 20, or the Spearman-
Brown-formula
2. Test-retest Reliability: determines how much error in a test score is due to
problems with test administration. To calculate: administer the same test to the
same participants on two different occasions. Correlate the test scores of two
administrations of the same test.
3. Parallel forms Reliability: determines how comparable two different versions of
the same measure are. To calculate: administer two tests to the same
participants within a short period of time. Correlate the test scores of the two
tests.
4. Inter-Rater Reliability: determines how consistent two separate raters of the
instruments are. To calculate: give the result from one test administration to
two evaluation and correlate the two markings from different raters.

Split-half Reliability
• When you are validating a measure, you will most likely to
be interested in evaluating the split-half reliability of your
instruments.
• This method will tell you how consistently your measure
assesses the construct of interest.
• If you have dichotomous items (e.g., ringt-wrong answers) as
you would with multiple choice exams, the KR-20 formula is
the best accepted statistic.
• If you have Likert scale or other types of items, use the
Spearman Brown Formula.

Split-Half Reliability KR-20
Example:
• I administered a 10-item spelling test to 15 children
• To calculate the KR-20, I entered data in an Excel
Spreadsheet

What is meant by r= 0.70????
• When the reliability coefficient (r) is near to 1. it means that
the data is reliable.
• Otherwise, if the r is far from 1 or near to 0, it means that
the data is not reliable.
• So, r = 0.70 means that the data has high reliability. The
result of spelling test precisely represents the true level of the
students’ spelling skill.

Split-Half Reliability (Likert Test)
• If you administer a Likert Scale or have another measure
that does not have just one correct answer, the preferable
statistic to calculate the split-half reliability is coefficient
alpha(Cronbach’s Alpha).
• However, Cronbach’s Alpha is difficult to calculate by hand.
Use it only if you have an access to SPSS.
• However, if you must calculate by hand, use the Spearman-
Brown Formula. It is not accurate, but it is much easier to
calculate.

Validity
To make prediction closes to the
actual skills and knowledge of the
students, we have to provide the
prediction with validity evidence.
Validity is something abstract, so it
can only be predicted.

Defining Validity
Valid means correct.
It denotes the extent to which an instrument is
measuring what it is supposed to measure.
An agreement between a test score or measure and the
quality it is believed to measure.
The correctness of the assessment is called validity
and the evidence to support the correctness of the
assessment is called validity evidence.
 Example: Do tests really measure what student learning?
 Example: Do college GPAs accurately predict on the job success?

Place validity
• Here, the validity is not the characteristic
of the assessment instrument used to collect
the data. It is attached to the result of the
Assessment (score).
• Valid instruments depend on what purpose
the instrument is used.
• Example: Is it valid to use the scores
resulted from this instrument to predict the
students writing skills?

Predicting Validity of Research Data
=> Validity evidence can be provided from the
assessment instrument used and from an
empirical data.
There are four kinds of supporting validity
evidence; construct, content, concurrent, and
predictive supporting validity evidence.
Construct and content can be provided from
assessment instrument, concurrent and
predictive can be provided from empirical data

Construct Validity Evidence
Construct means the match between the task and
the purpose of an assessment.
An assessment to measure the students’ writing
skill will be not valid unless it requires the
students to perform writing activity.
Therefore, it is important to state the task
clearly so that students whose skill is to be
measured know well what they have to
perform.
So, the validity evidence is derived from the
task the students perform.

Content Validity Evidence
 Refers to the coverage of the assessment instrument items to the skill being assessed.
 A scale should measure the true meaning of the concept being studied
 To develop a test with high content-related evidence of validity, you need:
 good logic
 intuitive skills
 Perseverance
 Must consider:
 wording
 reading level
 Example: assessment instrument used to measure grammar must contain the items that
cover the grammar knowledge learned by the students.

Concurrent Validity Evidence
The extent to which a procedure correlates with the current behavior of subjects.
Infers that the test produces similar results to a previously validated test.
 Concurrent Validity Evidence
 forecasting the present
 how well does a test predict current similar outcomes
 job samples, alternative tests used to demonstrate concurrent validity
evidence
 Generally higher than predictive validity estimates.
 Do the results from one measure correspond with those of related measures?
 example: scores resulted from a classroom English proficiency compared to
the score from TOEFL have strong and positive correlation.

Predictive Validity Evidence
The extent to which a procedure allows accurate predictions about a subject’s future
behavior.
 Infers that the test provides a valid reflection of future performance using a
similar test.
 Predictive Validity Evidence
 forecasting the future
 how well does a test predict future outcomes
 SAT predicting 1st your GPA
 most tests don’t have great predictive validity
 decrease due to time & method variance
Example: scores from the entrance test to a university is used to predict the
students’ future scores.

References
1. Ary, D., Jacobs, L.C., Sorensen, C.K. (2010). Introduction to
Research in Education. (8th Ed). California: Wadsworth.
2. Korb, K.A. Calculating Reliability of Quantitative Measures.
https://www.korbedpsych.com accessed December 1, 2017.
3. Mackey, Alison, Gass, S.M. (2005) Second Language
Research: Methodology and Design. New Jersey: Lawrence
Erlbraum Associates Publisher.
4. Latief. M.A. (2016). Research Methods on Language
Learning: An Introduction. Malang: Universitas Negeri
Malang.

Reliability and validity of Research Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Reliability and validity of Research Data

Similar to Reliability and validity of Research Data (20)

Recently uploaded

Recently uploaded (20)

Reliability and validity of Research Data