The following PPT is PPT submitted and presented in partial fulfillment of Research Methodology in English Language Teaching Course. under the guidance of Dr. H. Nur Samsu, M.Pd.
After the formulation of research questions and sample selection, the next step in research chain is developing data collection instruments or research instruments.
They are measurement tools (i.e., tests, questionnaires or interviews)
They can be designed by the researcher or can be previously-developed by other researchers.
After the formulation of research questions and sample selection, the next step in research chain is developing data collection instruments or research instruments.
They are measurement tools (i.e., tests, questionnaires or interviews)
They can be designed by the researcher or can be previously-developed by other researchers.
Introduction
Study design in qualitative research
Method of data collection
Handling qualitative data
Analyzing qualitative data
Presenting the results of qualitative research
Systematic (non-random) error that results in an incorrect estimate of the association between exposure and risk of disease.
Can occur in all stages of a study
Not affected by study sample size
Difficult to adjust for afterwards, but can be reduced by adequate study design.
•Can never be totally avoided, but we must be aware of it and interpret our results accordingly
Introduction to quantitative and qualitative researchLiz FitzGerald
This presentation, delivered in an Open University CALRG Building Knowledge session, gives a preliminary introduction to both quantitative and qualitative research approaches. There has been widespread debate when considering the relative merits of quantitative and qualitative strategies for research. Positions taken by individual researchers vary considerably, from those who see the two strategies as entirely separate, polar opposites that are based upon alternative views of the world, to those who are happy to mix these strategies within their research projects. We consider the different strengths, weaknesses and suitability of different approaches and draw upon some examples to highlight their use within educational technology.
Introduction
Study design in qualitative research
Method of data collection
Handling qualitative data
Analyzing qualitative data
Presenting the results of qualitative research
Systematic (non-random) error that results in an incorrect estimate of the association between exposure and risk of disease.
Can occur in all stages of a study
Not affected by study sample size
Difficult to adjust for afterwards, but can be reduced by adequate study design.
•Can never be totally avoided, but we must be aware of it and interpret our results accordingly
Introduction to quantitative and qualitative researchLiz FitzGerald
This presentation, delivered in an Open University CALRG Building Knowledge session, gives a preliminary introduction to both quantitative and qualitative research approaches. There has been widespread debate when considering the relative merits of quantitative and qualitative strategies for research. Positions taken by individual researchers vary considerably, from those who see the two strategies as entirely separate, polar opposites that are based upon alternative views of the world, to those who are happy to mix these strategies within their research projects. We consider the different strengths, weaknesses and suitability of different approaches and draw upon some examples to highlight their use within educational technology.
What does ‘Reliability’ mean?
Types of Reliability.
Factors which can affect the scores of test papers(reliability).
What does ‘Validity’ mean?
Understanding the differences between reliability and validity.
Assessment techniques, etiquette, ways and how to do it in home business rtfcccvvvvvv and ghhh to the open position for new teachers in the school and school 🚸 and I have been working on 3 4 for a long time and I am very proud of them when I
What makes a good testA test is considered good” if the .docxmecklenburgstrelitzh
What makes a good test?
A test is considered “good” if the following can be said about it:
· The test measures what it claims to measure. For example, a test of mental ability does, in fact, measure mental ability and not some other characteristic.
· The test measures what it claims to measure consistently or reliably. This means that, if a person were to take the test again, the person would get a similar test score.
· The test is job-relevant. In other words, the test measures 1 or more characteristics that are important to the job.
· By using the test, more effective decisions can be made about individuals.
· The degree to which a test has these qualities is indicated by 2 technical properties: reliability and validity.
Test Reliability
Reliability refers to how consistently a test measures a characteristic. If a person takes the test again, will he or she get a similar test score or a much different score? A test that yields similar scores for a person who repeats the test is said to measure a characteristic reliably.
How do we account for an individual who does not get exactly the same test score every time he or she takes the test? Some possible reasons are the following:
· Test taker's temporary psychological or physical state. Test performance can be influenced by a person's psychological or physical state at the time of testing. For example, differing levels of anxiety, fatigue, or motivation may affect the applicant's test results (unsystematic error).
· Environmental factors. Differences in the testing environment, such as room temperature, lighting, noise, or even the test administrator can influence an individual's test performance (unsystematic error).
· Test form. Many tests have more than 1 version or form. Items differ on each form, but each form is supposed to measure the same thing. Different forms of a test are known as parallel forms or alternateforms. These forms are designed to have similar measurement characteristics, but they contain different items. Because the forms are not exactly the same, a test taker might do better on 1 form than on another.
· Multiple raters. In certain tests, scoring is determined by a rater’s judgments of the test taker’s performance or responses. Differences in training, experience, and frame of reference among raters can produce different test scores for the test taker.
These factors are sources of chance or random measurement error in the assessment process. If there were no random errors of measurement, the individual would get the same test score, the individual's “true” score, each time. The degree to which test scores are unaffected by measurement errors is an indication of the reliability of the test. But, while psychometrics can give you a lot of this information, it is important to ask the client about how they experienced the process of taking the test. This will allow you to detect any potential unsystematic errors.
When selecting an assessment, you want to remember that r.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Pride Month Slides 2024 David Douglas School District
Reliability and validity of Research Data
1. Reliability and
Validity of Research
Data
Course Supervisor:
DR. H. Nur Samsu, M.Pd.
Presented by:
Aminah Ibrahim Abbad
Lolita Febridonata
Zuraida
Graduate Students of TBI 1A
IAIN TULUNGAGUNG
2. What do Reliability and Validity deal
with?
• Both terms are related to the scores or the result of
assessment of language skills.
• The assessment may be in the form of conventional testing
or alternative classroom assessment.
• Conventional testing covers multiple choice, matching, short
answer, essay, and the like.
• Alternative assessment refers to any activity that involves
systematic collection of information about language skill such
as learners’ journal, notebook, writing, etc.
3. What is Reliability?
• It refers to the preciseness of the language assessment skill
result in representing the actual level of the skill of the
examinees.
• The result of a language skill assessment has high reliability if
the result precisely (very close to/not far away from/gives
good estimation) represents the true level of the skill being
assessed.
• the distance between the true level of the skill and the
assessment result, then, determines the degree of reliability.
• Bigger distance (bigger errors) means lower reliability.
4. Equations for Reliability
Where:
X: skill assessment result
T: the true level of the skill being assessed
E: the errors
It explains that every language skill assessment result (X) contains
the mixture of the true level of the language skill being assessed (T)
and the error (E).
The amount of E determines the degree of the reliability of X.
X= T + E
5. X= T + E
Is Reliability the same as consistency??
• No. They both are different.
• Reliability refers to preciseness, and consitency is an
indicator of reliability.
• Reliability means the closeness of the X to T.
• X closes to T : reliable
• If the assessment result (score) is consistent from one
assessment to another, it means that the result has high
reliability. So, consistent means reliable.
• The evidence of reliability is consistency of the scores.
6. Factors affecting the degree of reliability
Examinee Examiner
Test
Instrument
Environment
Sources of error:
7. Factors affecting the degree of reliability
Not the examinees’ best performance due to
physical or emotional factors; sick, low
motivation, tired, hungry, too happy, over
active.
Cheating in the assessment. If the examinees
are not strictly watched during the assessment
process, they might copy each others’ anwers or
they might copy their prepared notes.
Error from Examinees
8. Factors affecting the degree of reliability
Not the raters’ most
objective judgement
Caused by examiners’ physical or emotional
condition; and his expertise in making
and/or conducting assessment procedure.
Error from Examiner
9. Factors affecting the degree of reliability
The instrument is too short
The instrument is heterogenous
The assessment questions are too easy or too
difficult
The type and quality of assessmens instrument
Error from
Assessment
Instrument
10. Factors affecting the degree of reliability
The room of
assessment is too
hot
The room of
assessment is too
cold
The room of
assessment is too
windy
The room of
assessment is too
small and crowded
Error from the
Environment
11. Which factor determines the reliability the
most?
• Error due to the test itself
• It is called as systematic error. It deals with
the validity.
• It is the biggest problem because it makes
test unreliable
12. Estimating the degree of reliability
There are four methods of evaluating the reliability of the research data:
1. Split-half Reliability: determines how much error in a test score is due to poor
test construction. To calculate: administer a test once and then calculate the
reliability index by coefficient alpha, Kuder-Richardson 20, or the Spearman-
Brown-formula
2. Test-retest Reliability: determines how much error in a test score is due to
problems with test administration. To calculate: administer the same test to the
same participants on two different occasions. Correlate the test scores of two
administrations of the same test.
3. Parallel forms Reliability: determines how comparable two different versions of
the same measure are. To calculate: administer two tests to the same
participants within a short period of time. Correlate the test scores of the two
tests.
4. Inter-Rater Reliability: determines how consistent two separate raters of the
instruments are. To calculate: give the result from one test administration to
two evaluation and correlate the two markings from different raters.
13. Split-half Reliability
• When you are validating a measure, you will most likely to
be interested in evaluating the split-half reliability of your
instruments.
• This method will tell you how consistently your measure
assesses the construct of interest.
• If you have dichotomous items (e.g., ringt-wrong answers) as
you would with multiple choice exams, the KR-20 formula is
the best accepted statistic.
• If you have Likert scale or other types of items, use the
Spearman Brown Formula.
14.
15. Split-Half Reliability KR-20
Example:
• I administered a 10-item spelling test to 15 children
• To calculate the KR-20, I entered data in an Excel
Spreadsheet
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26. What is meant by r= 0.70????
• When the reliability coefficient (r) is near to 1. it means that
the data is reliable.
• Otherwise, if the r is far from 1 or near to 0, it means that
the data is not reliable.
• So, r = 0.70 means that the data has high reliability. The
result of spelling test precisely represents the true level of the
students’ spelling skill.
27. Split-Half Reliability (Likert Test)
• If you administer a Likert Scale or have another measure
that does not have just one correct answer, the preferable
statistic to calculate the split-half reliability is coefficient
alpha(Cronbach’s Alpha).
• However, Cronbach’s Alpha is difficult to calculate by hand.
Use it only if you have an access to SPSS.
• However, if you must calculate by hand, use the Spearman-
Brown Formula. It is not accurate, but it is much easier to
calculate.
28. Validity
To make prediction closes to the
actual skills and knowledge of the
students, we have to provide the
prediction with validity evidence.
Validity is something abstract, so it
can only be predicted.
29. Defining Validity
Valid means correct.
It denotes the extent to which an instrument is
measuring what it is supposed to measure.
An agreement between a test score or measure and the
quality it is believed to measure.
The correctness of the assessment is called validity
and the evidence to support the correctness of the
assessment is called validity evidence.
Example: Do tests really measure what student learning?
Example: Do college GPAs accurately predict on the job success?
30. Place validity
• Here, the validity is not the characteristic
of the assessment instrument used to collect
the data. It is attached to the result of the
Assessment (score).
• Valid instruments depend on what purpose
the instrument is used.
• Example: Is it valid to use the scores
resulted from this instrument to predict the
students writing skills?
31. Predicting Validity of Research Data
=> Validity evidence can be provided from the
assessment instrument used and from an
empirical data.
There are four kinds of supporting validity
evidence; construct, content, concurrent, and
predictive supporting validity evidence.
Construct and content can be provided from
assessment instrument, concurrent and
predictive can be provided from empirical data
32. Construct Validity Evidence
Construct means the match between the task and
the purpose of an assessment.
An assessment to measure the students’ writing
skill will be not valid unless it requires the
students to perform writing activity.
Therefore, it is important to state the task
clearly so that students whose skill is to be
measured know well what they have to
perform.
So, the validity evidence is derived from the
task the students perform.
33. Content Validity Evidence
Refers to the coverage of the assessment instrument items to the skill being assessed.
A scale should measure the true meaning of the concept being studied
To develop a test with high content-related evidence of validity, you need:
good logic
intuitive skills
Perseverance
Must consider:
wording
reading level
Example: assessment instrument used to measure grammar must contain the items that
cover the grammar knowledge learned by the students.
34. Concurrent Validity Evidence
The extent to which a procedure correlates with the current behavior of subjects.
Infers that the test produces similar results to a previously validated test.
Concurrent Validity Evidence
forecasting the present
how well does a test predict current similar outcomes
job samples, alternative tests used to demonstrate concurrent validity
evidence
Generally higher than predictive validity estimates.
Do the results from one measure correspond with those of related measures?
example: scores resulted from a classroom English proficiency compared to
the score from TOEFL have strong and positive correlation.
35. Predictive Validity Evidence
The extent to which a procedure allows accurate predictions about a subject’s future
behavior.
Infers that the test provides a valid reflection of future performance using a
similar test.
Predictive Validity Evidence
forecasting the future
how well does a test predict future outcomes
SAT predicting 1st your GPA
most tests don’t have great predictive validity
decrease due to time & method variance
Example: scores from the entrance test to a university is used to predict the
students’ future scores.
36. References
1. Ary, D., Jacobs, L.C., Sorensen, C.K. (2010). Introduction to
Research in Education. (8th Ed). California: Wadsworth.
2. Korb, K.A. Calculating Reliability of Quantitative Measures.
https://www.korbedpsych.com accessed December 1, 2017.
3. Mackey, Alison, Gass, S.M. (2005) Second Language
Research: Methodology and Design. New Jersey: Lawrence
Erlbraum Associates Publisher.
4. Latief. M.A. (2016). Research Methods on Language
Learning: An Introduction. Malang: Universitas Negeri
Malang.