SlideShare a Scribd company logo
1 of 6
METHODS OF MEASURING
TEST PELIABILITY
GROUP DISCUSSION SCRIPT
EDM-202
PRESENTED BY-
NAMRATA GUPTA
Method of measuring test reliability
RELIABILITY:
Reliability in statistics and psychometrics is the overall consistency of a measure. A measure
is said to have a high reliability if it produces similar results under consistent conditions. For
example, measurements of people's height and weight are often extremely reliable.
Reliability is the degree to which an assessment tool produces stable and consistent results.
So we can say that Reliability is a measure of the consistency of a test.
Here are the some most common ways of measuring reliability for any empirical
method or metric:
 inter-rater reliability
 test-retest reliability
 parallel forms reliability
 internal consistency reliability
 Split half method
METHODS OF MEASURINGRELIABILITY
1.Inter-Rater or Inter-Observer Reliability-
This is used to assess the degree to which different raters or observers give consistent
estimates of the same phenomenon.
Whenever you use humans as a part of your measurement procedure, you have to worry
about whether the results you get are reliable or consistent. People are notorious for their
inconsistency. We are easily distractible. We get tired of doing repetitive tasks. We
daydream. We misinterpret.
For example, we found that the average inter-rater reliability] of usability experts rating the severity
of usability problems was r = .52. You can also measure intra-rater reliability, whereby you correlate
multiple scores from one observer. In that same study, we found that the average intra-rater reliability
when judging problem severity was r = .58 (which is generally low reliability).
Test-Retest Reliability
We estimate test-retest reliability when we administer the same test to the same sample on two
different occasions.
The shorter the time gap, the higher the correlation; the longer the time gap, the lower the correlation.
This is because the two observations are related over time
Do customers provide the same set of responses when nothing about their experience or their attitudes
has changed? You don't want your measurement system to fluctuate when all other things are static.
Have a set of participants answer a set of questions (or perform a set of tasks). Later (by at least a few
days, typically), have them answer the same questions again. When you correlate the two sets of
measures,look for very high correlations (r > 0.7) to establish retest reliability.
As you can see,there's some effort and planning involved: you need for participants to agree to
answer the same questions twice. Few questionnaires measure test-retest reliability (mostly because of
the logistics), but with the proliferation of online research,we should encourage more of this type of
measure.
Parallel Forms Reliability
Getting the same or very similar results from slight variations on the question or evaluation method
also establishes reliability. One way to achieve this is to have, 20 items that measure one construct
and to administer 10 of the items to one group and the other 10 to another group, and then correlate
the results. You're looking for high correlations and no systematic difference in scores between the
groups.
Internal Consistency Reliability
It measures how consistently participants respond to one set of items .This is by far the most
commonly used measure of reliability in applied settings. It's popular because it's the easiest to
compute using software—it requires only one sample of data to estimate the internal consistency
reliability. This measure of reliability is described most often using Cronbach's alpha (sometimes
called coefficient alpha).
The more items you have, the more internally reliable the instrument, so to increase internal
consistency reliability, you would add items to your questionnaire. Since there's often a strong need to
have few items, however, internal reliability usually suffers. When you have only a few items, and
therefore usually lower internal reliability, having a larger sample size helps offset the loss in
reliability.
A. Average inter-item correlation is a subtype of internal consistency
reliability. It is obtained by taking all of the items on a test that probe the
same construct (e.g., reading comprehension), determining the correlation
coefficient for each pair of items, and finally taking the average of all of these
correlation coefficients. This final step yields the average inter-item
correlation.
B. Split-half reliability is another subtype of internal consistency
reliability. The process of obtaining split-half reliability is begun by “splitting
in half” all items of a test that are intended to probe the same area of
knowledge (e.g., World War II) in order to form two “sets” of
items. The entire test is administered to a group of individuals, the total score
for each “set” is computed, and finally the split-half reliability is obtained by
determining the correlation between the two total “set” scores.
CRONBACH’S ALPHA METHOD
Internal consistency is usually measured with Cronbach's alpha, a statistic calculated from the
pairwise correlations between items. Internal consistency ranges between negative infinity and
one. Coefficient alpha will be negative whenever there is greater within-subject variability than
between-subject variability. A commonly accepted rule of thumb for describing internal
consistency is as follows
Cronbach's alpha Internal
consistency
α ≥ 0.9 Excellent
0.9 > α ≥ 0.8 Good
0.8 > α ≥ 0.7 Acceptable
0.7 > α ≥ 0.6 Questionable
0.6 > α ≥ 0.5 Poor
0.5 > α Unacceptable

More Related Content

What's hot

MAT80 - White paper july 2017 - Prof. P. Irwing
MAT80 - White paper july 2017 - Prof. P. IrwingMAT80 - White paper july 2017 - Prof. P. Irwing
MAT80 - White paper july 2017 - Prof. P. IrwingPaul Irwing
 
Single subjects-research
Single subjects-researchSingle subjects-research
Single subjects-researchNoor Hasmida
 
Correlational research 1 1
Correlational research 1 1Correlational research 1 1
Correlational research 1 1sdwilson88
 
Psyc 321_12 small n research
Psyc 321_12 small n researchPsyc 321_12 small n research
Psyc 321_12 small n researchRyan Sain
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityJulián Urbano
 
Rubric for Investigational Design
Rubric for Investigational DesignRubric for Investigational Design
Rubric for Investigational Designkalegado
 
Correlation research
Correlation researchCorrelation research
Correlation researchAmina Tariq
 
Topic 7 measurement in research
Topic 7   measurement in researchTopic 7   measurement in research
Topic 7 measurement in researchDhani Ahmad
 
notes on correlational research
notes on correlational researchnotes on correlational research
notes on correlational researchSiti Ishark
 
Correlational research design (Kartika Ajeng A)
Correlational research design (Kartika Ajeng A)Correlational research design (Kartika Ajeng A)
Correlational research design (Kartika Ajeng A)Kartika Anggraeni
 
Anchoring and Consistency (Keith Dowd)
Anchoring and Consistency (Keith Dowd)Anchoring and Consistency (Keith Dowd)
Anchoring and Consistency (Keith Dowd)abramrickards
 
Correlation research design presentation 2015
Correlation research design presentation 2015Correlation research design presentation 2015
Correlation research design presentation 2015Syed imran ali
 
Survey and correlational methods of research: Assumptions, Steps and Pros and...
Survey and correlational methods of research: Assumptions, Steps and Pros and...Survey and correlational methods of research: Assumptions, Steps and Pros and...
Survey and correlational methods of research: Assumptions, Steps and Pros and...Michael J Leo
 
Research Method for Business chapter 7
Research Method for Business chapter  7Research Method for Business chapter  7
Research Method for Business chapter 7Mazhar Poohlah
 

What's hot (19)

MAT80 - White paper july 2017 - Prof. P. Irwing
MAT80 - White paper july 2017 - Prof. P. IrwingMAT80 - White paper july 2017 - Prof. P. Irwing
MAT80 - White paper july 2017 - Prof. P. Irwing
 
Single subjects-research
Single subjects-researchSingle subjects-research
Single subjects-research
 
Correlational research 1 1
Correlational research 1 1Correlational research 1 1
Correlational research 1 1
 
Psyc 321_12 small n research
Psyc 321_12 small n researchPsyc 321_12 small n research
Psyc 321_12 small n research
 
On the Measurement of Test Collection Reliability
On the Measurement of Test Collection ReliabilityOn the Measurement of Test Collection Reliability
On the Measurement of Test Collection Reliability
 
Rubric for Investigational Design
Rubric for Investigational DesignRubric for Investigational Design
Rubric for Investigational Design
 
Correlation research
Correlation researchCorrelation research
Correlation research
 
Topic 7 measurement in research
Topic 7   measurement in researchTopic 7   measurement in research
Topic 7 measurement in research
 
03 quantitative method
03 quantitative method03 quantitative method
03 quantitative method
 
notes on correlational research
notes on correlational researchnotes on correlational research
notes on correlational research
 
Hypothesis
HypothesisHypothesis
Hypothesis
 
Correlational research design (Kartika Ajeng A)
Correlational research design (Kartika Ajeng A)Correlational research design (Kartika Ajeng A)
Correlational research design (Kartika Ajeng A)
 
Anchoring and Consistency (Keith Dowd)
Anchoring and Consistency (Keith Dowd)Anchoring and Consistency (Keith Dowd)
Anchoring and Consistency (Keith Dowd)
 
Correlation research design presentation 2015
Correlation research design presentation 2015Correlation research design presentation 2015
Correlation research design presentation 2015
 
Statrting spss
Statrting spssStatrting spss
Statrting spss
 
Survey and correlational methods of research: Assumptions, Steps and Pros and...
Survey and correlational methods of research: Assumptions, Steps and Pros and...Survey and correlational methods of research: Assumptions, Steps and Pros and...
Survey and correlational methods of research: Assumptions, Steps and Pros and...
 
Learning session on research
Learning session on researchLearning session on research
Learning session on research
 
Research Method for Business chapter 7
Research Method for Business chapter  7Research Method for Business chapter  7
Research Method for Business chapter 7
 
TCI in primary care - SEM (2006)
TCI in primary care - SEM (2006)TCI in primary care - SEM (2006)
TCI in primary care - SEM (2006)
 

Similar to Method of measuring test reliability

What is Reliability and its Types?
What is Reliability and its Types? What is Reliability and its Types?
What is Reliability and its Types? Dr. Amjad Ali Arain
 
RELIABILITY AND VALIDITY
RELIABILITY AND VALIDITYRELIABILITY AND VALIDITY
RELIABILITY AND VALIDITYJoydeep Singh
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Linejan
 
RELIABILITY.pptx
RELIABILITY.pptxRELIABILITY.pptx
RELIABILITY.pptxrupasi13
 
Validity, Reliability ,Objective & Their Types
Validity, Reliability ,Objective & Their TypesValidity, Reliability ,Objective & Their Types
Validity, Reliability ,Objective & Their TypesMohammadRabbani18
 
Testing in language programs (chapter 8)
Testing in language programs (chapter 8)Testing in language programs (chapter 8)
Testing in language programs (chapter 8)Tahere Bakhshi
 
Validity and reliability in assessment.
Validity and reliability in assessment. Validity and reliability in assessment.
Validity and reliability in assessment. Tarek Tawfik Amin
 
Chapter 8 compilation
Chapter 8 compilationChapter 8 compilation
Chapter 8 compilationHannan Mahmud
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good testcyrilcoscos
 
Characteristics of a good test
Characteristics of a good test Characteristics of a good test
Characteristics of a good test Arash Yazdani
 
Session 2 2018
Session 2 2018Session 2 2018
Session 2 2018Sue Hines
 
Evaluation of Measurement Instruments.ppt
Evaluation of Measurement Instruments.pptEvaluation of Measurement Instruments.ppt
Evaluation of Measurement Instruments.pptCityComputers3
 
week_10._validity_and_reliability_0.pptx
week_10._validity_and_reliability_0.pptxweek_10._validity_and_reliability_0.pptx
week_10._validity_and_reliability_0.pptxDebdattaMandal3
 
What makes a good testA test is considered good” if the .docx
What makes a good testA test is considered good” if the .docxWhat makes a good testA test is considered good” if the .docx
What makes a good testA test is considered good” if the .docxmecklenburgstrelitzh
 
Reliability by Vartika Verma .pdf
Reliability by Vartika Verma .pdfReliability by Vartika Verma .pdf
Reliability by Vartika Verma .pdfVartika Verma
 
RELIABILITY AND VALIDITY OF RESEARCH TOOLS.pptx
RELIABILITY AND VALIDITY OF RESEARCH TOOLS.pptxRELIABILITY AND VALIDITY OF RESEARCH TOOLS.pptx
RELIABILITY AND VALIDITY OF RESEARCH TOOLS.pptxSupriyaBatwalkar
 

Similar to Method of measuring test reliability (20)

Edm 202
Edm 202Edm 202
Edm 202
 
What is Reliability and its Types?
What is Reliability and its Types? What is Reliability and its Types?
What is Reliability and its Types?
 
RELIABILITY AND VALIDITY
RELIABILITY AND VALIDITYRELIABILITY AND VALIDITY
RELIABILITY AND VALIDITY
 
Louzel Report - Reliability & validity
Louzel Report - Reliability & validity Louzel Report - Reliability & validity
Louzel Report - Reliability & validity
 
RELIABILITY.pptx
RELIABILITY.pptxRELIABILITY.pptx
RELIABILITY.pptx
 
Validity, Reliability ,Objective & Their Types
Validity, Reliability ,Objective & Their TypesValidity, Reliability ,Objective & Their Types
Validity, Reliability ,Objective & Their Types
 
Testing in language programs (chapter 8)
Testing in language programs (chapter 8)Testing in language programs (chapter 8)
Testing in language programs (chapter 8)
 
Shaheen Anwar
Shaheen AnwarShaheen Anwar
Shaheen Anwar
 
Validity and reliability in assessment.
Validity and reliability in assessment. Validity and reliability in assessment.
Validity and reliability in assessment.
 
Chapter 8 compilation
Chapter 8 compilationChapter 8 compilation
Chapter 8 compilation
 
Characteristics of a good test
Characteristics of a good testCharacteristics of a good test
Characteristics of a good test
 
Characteristics of a good test
Characteristics of a good test Characteristics of a good test
Characteristics of a good test
 
Validity & reliability
Validity & reliabilityValidity & reliability
Validity & reliability
 
Session 2 2018
Session 2 2018Session 2 2018
Session 2 2018
 
Evaluation of Measurement Instruments.ppt
Evaluation of Measurement Instruments.pptEvaluation of Measurement Instruments.ppt
Evaluation of Measurement Instruments.ppt
 
week_10._validity_and_reliability_0.pptx
week_10._validity_and_reliability_0.pptxweek_10._validity_and_reliability_0.pptx
week_10._validity_and_reliability_0.pptx
 
What makes a good testA test is considered good” if the .docx
What makes a good testA test is considered good” if the .docxWhat makes a good testA test is considered good” if the .docx
What makes a good testA test is considered good” if the .docx
 
Reliability by Vartika Verma .pdf
Reliability by Vartika Verma .pdfReliability by Vartika Verma .pdf
Reliability by Vartika Verma .pdf
 
RELIABILITY AND VALIDITY OF RESEARCH TOOLS.pptx
RELIABILITY AND VALIDITY OF RESEARCH TOOLS.pptxRELIABILITY AND VALIDITY OF RESEARCH TOOLS.pptx
RELIABILITY AND VALIDITY OF RESEARCH TOOLS.pptx
 
Rep
RepRep
Rep
 

Recently uploaded

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 

Recently uploaded (20)

Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 

Method of measuring test reliability

  • 1. METHODS OF MEASURING TEST PELIABILITY GROUP DISCUSSION SCRIPT EDM-202 PRESENTED BY- NAMRATA GUPTA
  • 2. Method of measuring test reliability RELIABILITY: Reliability in statistics and psychometrics is the overall consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions. For example, measurements of people's height and weight are often extremely reliable. Reliability is the degree to which an assessment tool produces stable and consistent results. So we can say that Reliability is a measure of the consistency of a test. Here are the some most common ways of measuring reliability for any empirical method or metric:  inter-rater reliability  test-retest reliability  parallel forms reliability  internal consistency reliability  Split half method METHODS OF MEASURINGRELIABILITY
  • 3. 1.Inter-Rater or Inter-Observer Reliability- This is used to assess the degree to which different raters or observers give consistent estimates of the same phenomenon. Whenever you use humans as a part of your measurement procedure, you have to worry about whether the results you get are reliable or consistent. People are notorious for their inconsistency. We are easily distractible. We get tired of doing repetitive tasks. We daydream. We misinterpret. For example, we found that the average inter-rater reliability] of usability experts rating the severity of usability problems was r = .52. You can also measure intra-rater reliability, whereby you correlate multiple scores from one observer. In that same study, we found that the average intra-rater reliability when judging problem severity was r = .58 (which is generally low reliability). Test-Retest Reliability We estimate test-retest reliability when we administer the same test to the same sample on two different occasions. The shorter the time gap, the higher the correlation; the longer the time gap, the lower the correlation. This is because the two observations are related over time Do customers provide the same set of responses when nothing about their experience or their attitudes has changed? You don't want your measurement system to fluctuate when all other things are static. Have a set of participants answer a set of questions (or perform a set of tasks). Later (by at least a few days, typically), have them answer the same questions again. When you correlate the two sets of measures,look for very high correlations (r > 0.7) to establish retest reliability.
  • 4. As you can see,there's some effort and planning involved: you need for participants to agree to answer the same questions twice. Few questionnaires measure test-retest reliability (mostly because of the logistics), but with the proliferation of online research,we should encourage more of this type of measure. Parallel Forms Reliability Getting the same or very similar results from slight variations on the question or evaluation method also establishes reliability. One way to achieve this is to have, 20 items that measure one construct and to administer 10 of the items to one group and the other 10 to another group, and then correlate the results. You're looking for high correlations and no systematic difference in scores between the groups. Internal Consistency Reliability It measures how consistently participants respond to one set of items .This is by far the most commonly used measure of reliability in applied settings. It's popular because it's the easiest to compute using software—it requires only one sample of data to estimate the internal consistency reliability. This measure of reliability is described most often using Cronbach's alpha (sometimes called coefficient alpha). The more items you have, the more internally reliable the instrument, so to increase internal consistency reliability, you would add items to your questionnaire. Since there's often a strong need to have few items, however, internal reliability usually suffers. When you have only a few items, and therefore usually lower internal reliability, having a larger sample size helps offset the loss in reliability.
  • 5. A. Average inter-item correlation is a subtype of internal consistency reliability. It is obtained by taking all of the items on a test that probe the same construct (e.g., reading comprehension), determining the correlation coefficient for each pair of items, and finally taking the average of all of these correlation coefficients. This final step yields the average inter-item correlation. B. Split-half reliability is another subtype of internal consistency reliability. The process of obtaining split-half reliability is begun by “splitting in half” all items of a test that are intended to probe the same area of knowledge (e.g., World War II) in order to form two “sets” of items. The entire test is administered to a group of individuals, the total score for each “set” is computed, and finally the split-half reliability is obtained by determining the correlation between the two total “set” scores. CRONBACH’S ALPHA METHOD Internal consistency is usually measured with Cronbach's alpha, a statistic calculated from the pairwise correlations between items. Internal consistency ranges between negative infinity and one. Coefficient alpha will be negative whenever there is greater within-subject variability than between-subject variability. A commonly accepted rule of thumb for describing internal consistency is as follows
  • 6. Cronbach's alpha Internal consistency α ≥ 0.9 Excellent 0.9 > α ≥ 0.8 Good 0.8 > α ≥ 0.7 Acceptable 0.7 > α ≥ 0.6 Questionable 0.6 > α ≥ 0.5 Poor 0.5 > α Unacceptable