SlideShare a Scribd company logo
1 of 36
Download to read offline
Steps In Developing AValid And Reliable
Scale of Measurement
BY:
Omnia Samir Elseifi
Assistant Professor of Public Health and Community Medicine.
Faculty of Medicine
Zagazig University
23 January 2020
Scale development process
• Measurement scales are useful tools to get scores about certain health aspects that cannot be measured directly, such as
measuring quality of life.
• The researcher must pass through many steps to reach the ultimate goal; which is the developing of a valid and reliable
scale to support the application of the test results.
Phase I
Item
Development
1-
Identification
of domain
2- Item
generation
3- Content
validity
Phase II
Scale
Development
4- Pretesting
(Pilot testing of
the Items)
5- Item
reduction
6- Extraction of
factors
Phase III
Scale Evaluation
7- Test of
dimensionality
8-Test of
reliability
9-Test of validity
(1,2,3)
Scale development process Scheme
1- Identification of
domain(s)
1- Purpose
2- Justification
2- Item generation
1- Appropriate
questions
2- Number of items
3- Item wording
4- Translation of items
3- ContentValidity
CVR
CVI
FaceValidity
3- Describing
domains
4- Specify the
dimensions
5- Define each
dimension
5-Types of questions
6- Response to items
• To Specify the boundaries of
the domain.
• To Select Which Items to
Ask.
• To Assess if the Items
Adequately Measure the
Content of The Domain of
Interest.
Scale development process Scheme
4- Pretesting
1- Interview with
target population
2- Sample size
5- Item reduction
1- Item difficulty index
2- Item discrimination
index
3- Item- item
correlation and Item –
total correlation
4- Distractor Efficiency
Analysis
6- Extraction of
factors
Exploratory
Factor Analysis
(EFA)
Confirmatory
Factor Analysis
(CFA)
3- Distribution of
scale
• To Gather Enough Data from
the Right People.
• To Identify Items That Are
Not Related To The Domain,
So, They Can Be Deleted Or
Modified.
• To Explore the Number of
Latent Constructs that Fit
The Observed Data.
Scale development process Scheme
7-Test of
Dimensionality
Using Factor
analysis
Unidimensional
scale
8-Test of
Reliability
1- Test- Retest
Reliability
2- Internal
Consistency
3- Parallel form
Reliability
4- Inter-Rater
Reliability
9-Test ofValidity
CriterionValidity:
Concurrent validity
Predictive validity
ConstructValidity:
ConvergentValidity.
DivergentValidity
Known groupValidity
Multidimensional
scale
• To Identify The Number Of
Latent Variables That Are
Measured By The Scale.
• To Establish if Responses Are
ConsistentWhen Repeated.
• To Ensure the scale
Measures The intended
Latent Dimension.
Example Of Validated Scale Development
Research
A research conducted In Pakistan for “Development of a
stress scale for pregnant women in the South Asian context:
the A–Z Stress Scale.”
Will be an example in most of steps.
Phase 1: Item development
Step 1: Identification of the Domain(s)
Identification
of the
Domain(s)
5-Define each
dimension
1-The purpose: is to
develop a scale based on
stressors to measure
stress among pregnant in
developing countries
2- Justification: They found
preexisting scales record the
somatic and psychological
symptoms of the stressors not
the stressors themselves
3- Describing
domains: They agreed
about defining the
different stressors the
pregnant exposed to.
4- Specify the
dimensions :They decided
the scale will be consisted
from three dimensions;
daily, life event and
pregnancy related stressors.
The purpose : To specify the boundaries of the domain and facilitate item generation.
(4,5)
Pitfalls
1. This step is often neglected or dealt with in a superficial manner.
2. Construct underrepresentation (focus on narrow aspect of the
domain).
These troubles lead to a significant number of problems later in the
validation process(6,7).
Phase 1: Item development
Step 1: Identification of the Domain(s)
Phase 1: Item development
Step 2 Item Generation
The purpose : To create an appropriate questions that fit to identified domain.
Item
Generation
6- Response to
questions
1-Appropriate
questions
2- Number of
items
(must be 2-5 times the
number in final scale)
Item pool of 235
items
3-Item
wording
4-Translation
of the items
5-Types of
questions
Deductive methods
Literature review
Inductive methods:
interviews with 25
experts from different
specialties” Psychiatry,
Gynecology and
Sociology”.
They conducted
interview with 79
pregnant women asking
them about the possible
stressors.
(5,8-11)
Pitfalls
1. Presence of irrelevant items to the defined domain can lead to failure of validation
of the measuring scale, poor quality of data and invalid conclusion regarding the
results and the relationship with other constructs.
2. Improper response to the items as too short scale can affect the reliability of the
instrument this is also for too many responses (more than 7) (12).
Phase 1: Item development
Step 2 Item Generation
Phase 1: Item development
Step 3: ContentValidity
Content validity:
• Content validity is to be sure that
the items of the generated scale
measure what they are presumed to
measure (all contents domain of
interest) (2)
Content validity is assessed by:
• Experts,
• Target population (2)
Purpose: To evaluate the items constituting the domain regarding; content relevance, and technical
quality .
Phase 1: Item development
Step 3: ContentValidity
Expert evaluation
ContentValidity
Ratio
(CVR)
Kappa coefficient
ContentValidity
Index
(CVI)
• >0.74 it’s considered excellent.
• Between 0.60 and 0.74 is considered good.
• Between 0.40 and 0.59 are considered fair.
(2)
I-CVIs
S-CVI
ContentValidity Ratio (CVR):
• The experts are requested to specify whether an item is necessary for the construct or not.
-Score 1 for: [not necessary] item.
-Score 2 for: [useful but not essential] item.
- Score 3 for: [essential] item.
.
Phase 1: Item development
Step 3: ContentValidity
(Number of experts indicating essential - The
total number of experts/2) / The total number
of experts / 2.
• For minimum number of expert (5 or 6 experts) CVR must be not
less than 0.99,
• for 8 experts not less than 0.85
• for 10 experts not less than 0.62
otherwise the item should be eliminated from the scale .
CVR
(13)
Content validity index (CVI):
Panel members are asked to rate instrument items in terms of clarity and relevancy to the construct on a 4-point scale:
-Score 1 for: [not relevant or not clear] items.
-Score 2 for: [somewhat relevant or item somewhat clear and need some revision] items.
-Score 3 for: [quite relevant or quite clear] items.
-Score 4 for: [highly relevant or highly clear] items
Phase 1: Item development
Step 3: ContentValidity
For each item:
Experts giving 3 or 4 score / the
Total number of experts
I-CVIs
• >79%, the item is appropriate and retained
within the scale.
• If between 70 and 79 % it will need
revision.
• <70 percent, it is eliminated from the scale
The number of relevant items by
agreement of all experts / Total
number of items
S-CVI/UA
Should be not less than 0.80
Sum of I-CVIs for the items
/ Total number of items
S-CVI/Ave
Should be not less than 0.90
(14)
Phase 1: Item development
Step 3: ContentValidity
Face
Validity
Readability
Feasibility
Layout
Clarity of
words
Face validity means the degree at which the
designed measuring instrument is apparently
appropriate and related to the domain under
study.
The target population share with expert in
evaluating the face validity of the scale of
measurement (15).
Example for this step:
A research conducted for the development of a stress scale for pregnant women in the South Asian
context: the A–Z Stress Scale (5). The researchers stated that they evaluate the content validity of the scale:
By experts and target pregnant (face validity) . According to that the items selected from the item pool were 78
items.
Pitfalls
• Some researches usually fail to assess the content validity, this may be due to lack of resources or skills. This is
expected to affects the final collected data conducted by the scale and the statistical analysis.
• Limited numbers of the developing scales undergo target population evaluation which is important step as those
population are the target of the newly developed scale (16).
Phase 1: Item development
Step 3: ContentValidity
Phase 2: Scale Development
Step 4: Pre-testing Questions
Pre-
testing
Questions
1- Cognitive
Interviews with
pregnant
2- Sample size
Golden rule of
thumb is10
respondents per
survey item (10:1)
They interviewed
70 pregnant
3-Distribution of
the scale;
Paper based survey or
Online survey
(they used Paper
based face to face
interview)
The purpose :
•To ensure the availability of sufficient data for scale development with
minimum level of error.
(5,17,18)
Pitfalls
• Sample size in many validation
studies is usually less than the
golden role, this may be due to this
type of studies may be difficult to
be funded.
• Missing data increase the risk of
inaccurate conclusions due to
increasing occurrence of errors.
Item
Reduction
Item
Difficulty
Index
Item
discrimination
test
Inter-item and
Item-Total
Correlations
Distractor
Efficiency
Analysis
The purpose :
To identify items that are not related to the domain under study so they can be deleted or modified.
(5)
Phase 2: Scale Development
Step 5: Item Reduction
Inter-item correlations:
Examine the correlation between each item in the
scale and the other items.
Phase 2: Scale Development
Step 5: Item Reduction
Inter-item and Item-Total Correlations
Purpose: To determine the correlations between scale items, as well as the correlations between each item and sum score of scale
items.
Item-total correlations:
Examine the relationship between each item score and
the total scale score.
In both techniques, items with low correlations (r <0.30) are less desirable and could be deleted.
(19,20)
Example:
A research conducted for the development of a stress scale for pregnant women in the South Asian context: the A–Z
Stress Scale (5).
Phase 2: Scale Development
Step 5: Item Reduction
The researchers conducted item- total analysis
ranged from r = 0.2 to r = 0.8.
As a result the items were reduced to final 30 items.
Item Difficulty Index
Purpose: To assess the difficulty level of the scale test items.
Phase 2: Scale Development
Step 5: Item Reduction
Item correct answers for the item /
the total answers on that item
Ranges between 0.0 to 1.0
Item difficulty index Difficulty level
0.86 and above Very easy.
0.71 to 0.85 Easy
0.30 to 0.70 Moderate
0.15 to 0.29 Difficult
0.14 and below Very difficult
High difficulty index score means a
greater proportion of the sample
population answered the question
correctly.
Lower difficulty index score means a
smaller proportion of the sample
understood the question and
answered correctly.
(2,21)
Item Discrimination test
Purpose: to identify the degree to which an item can correctly differentiates between respondents .
Phase 2: Scale Development
Step 5: Item Reduction
The upper group
(with high scores)
proportion of responders who got
the item correct in the upper group
- proportion of responders with
correct answer in the lower group.
Ranges between -1 to +1
The lower group
(with low scores) Item discrimination index Discrimination level
0.19 and below Poor item; should be eliminated
or revised.
0.20 to 0.29 Marginal items; need revision
0.30 to 0.39 Good item; may need some
improvement
0.4 or above Very good item
(22,23)
Distractor Efficiency Analysis:
Purpose:
To determine the distribution of incorrect options “distractors” and how they contribute to the quality of items.
Phase 2: Scale Development
Step 5: Item Reduction
The upper group
(with high scores)
The middle group
(with middle
scores)
The lower group
(with low scores)
• 100% of participants in the high
group
• about 50% of participants in the
middle
• few or none of those in the
lower group
Correct
option
Appropriate
item
If those with adequate knowledge “the high group” can’t differentiate between the right option of the item and
the distractors, the question may need to be modified or deleted.
(24,25)
Factor analysis:
It is a method for explaining the construction of data by
explaining the correlations between variables. It
summarizes data into a few dimensions by condensing many
variables into a smaller set of latent variables or factors .
• Exploratory Factor Analysis (EFA) it’s the interrelation
between items in the construct. It is used to reduce the set
of observed variables to a smaller, more close set of
variables.
• Confirmatory Factor Analysis (CFA) and is used to
determine the factors by statistically testing the hypothesis of
the expected factor loading (FL) of the observed items on
underlying (latent) factors and the correlation between latent
variables.
• Items having factor loading or slope coefficients
below 0.30 are considered inadequate “Unrelated
items” that should be eliminated.
• Items with cross loading > 0.4 should be eliminated.
Phase 2: Scale Development
Step 6: Extraction of Factors
(4,23,26)
Phase 2: Scale Development
Step 6: Extraction of Factors
Example:
In a research for Developing a disease-
specific tool for assessment of quality of
life of patients with hepatitis C virus
associated chronic liver disease (27).
They conducted CFA and calculated Factor
loading, any item with factor loading less than
0.3 is eliminated.
Pitfalls:
Many of scale developers are hesitating to use
factor analysis either because:
• it needs large sample size to be conducted
• because it involves many confusing and
complicated steps and interpretations (16)
• Purpose: A scale’s dimensionality, to identify the number of latent variables that are measured by the scale.
• It’s usually depends on the factor’s extraction and analysis.
Phase 3: Scale Evaluation
Step 7:Test dimensionality
(12)
Start
Example:
A research conducted for the development of a stress scale for pregnant women in the South Asian
context: the A–Z Stress Scale (5)
The researchers stated that their scale has two dimension by multidimension scaling;
1- socioenvironmental related hassles dimension (includes items from 1-26).
2- chronic illness dimension (items 27-30).
Phase 3: Scale Evaluation
Step 7:Test dimensionality
Pitfalls
• Failure to effectively calculate EFA and CFA will lead to miss classification of the dimensions of the
construct.
• Many of the researchers depend on literature and expert view to divide the dimensions of the construct
rather than using factors analysis (12).
Reliability is the ability to reproduce same result consistently under the same conditions.
Purpose: To measure reliability regarding; stability, internal consistency, equivalence and inter-rater reliability.
Phase 3: Scale Evaluation
Step 8:Tests of Reliability
Stability
The test is administered
twice or more to the same
participant to ensure that
same results are obtained.
Testing the developing
scale on 43 pregnant
twice one week interval
(r = 0.86).
It measures whether items measuring the same
general construct produce the same scores
(Homogeneity).It’s assessed by:
• Cronbach’s α;(value 0-1, ≥0.7 is acceptable)
• Kuder-Richardson
• Split halves reliability (two equal halves of the
scale then compare).
• Cronbach’s alpha (0.82 for the scale and
was ranged between 0.75 to 0.86 for
different items).
Equivalence
It determines the
correlation of level of
agreement between two or
more instruments at the
same point of time.
It assesses the degree of
agreement between two or
more raters in assessing
certain phenomena at the
same point of time.
The developing scale was
applied on 50 pregnant
and two interviewers (r =
0.91).
(22, 28, 29)
Pitfalls:
• Test – retest reliability should be used with caution as the score of values could be changed over
time in some types of studies (e.g., intervention studies), here the change isn’t due to low reliable
measure, but it’s a true change in the participants.
• Number of items in the scale below 10, could lead to decrease Cronbach’s alpha
• Lack of standardization between the observers leads to decrease interrater agreement (1,2).
Phase 3: Scale Evaluation
Step 8:Tests of Reliability
Phase 3: Scale Evaluation
Step 9:Tests ofValidity
Validity
The ability of
the
measuring
scale to
evaluate the
domain that
was intended
to be
measured.
Content
validity
Including face
validity
Criterion
validity
Concurrent
validity
Compare at
the same time
Gold
standard
Predictive
validity
Gold
standard
or
Behavior
Predict
after time
Construct
validity
Convergent
validity
Same
result
Two related
measures
Divergent
validity
(Discriminate)
Different
result
Two
different
measures
Known-groups
validity
Two
different
groups
Different
result
Same group
Same group
Same
measurement
New
measure
New
measure
(22. 28, 30)
Phase 3: Scale Evaluation
Step 9:Tests ofValidity
Criterion
validity
Concurrent
validity
Compare at
the same time multicultural
validated
depression scale
New
A–Z Stress
Scale
Moderate correlation
between the two scales
(r = 0.56)
Example: In the study conducted for the development of a stress scale for pregnant women in the South Asian
context: the A–Z Stress Scale (5)
Pitfalls for validity calculation:
1- Criterion validity can’t be assessed with small sample size due to presence of sampling
error.
2- Criterion validity cannot be used in all circumstances, especially in social sciences as a
relevant criterion “gold standard” may be not present, So, it’s usually ignored and not
calculated in most of the validation studies.
3- Lack of sufficient resources or skills for calculation and assessment (22).
Pitfalls for validity calculation: (cont.)
4- The scale developers usually use homogeneous group from the population in the pilot study which
limit calculation of construct validity, so recruiting of heterogenous group or random sample of the
population is recommended.
5- Single time calculation of validity is inaccurate if the variable under study changed with time, so, it’s
recommended to conduct longitudinal studies during scale development to get accurate validity
measures especially predictive validity, as it will lead to pseudo correlations between variables.
6- Social desirability bias: which is a systematic error present in self-reporting measures in which the
participants want to keep good image. This is considered as one of the important threats to the
validity (22).
Phase 3: Scale Evaluation
Step 9:Tests ofValidity
Conclusion
• Valid research results begin with valid and reliable measurement. This can be
achieved if a systematic and scientific based process is followed.
• Developing a valid and reliable scale is a multiphasic procedure that need a
researcher with adequate knowledge and proper level of skills.
• Poor scale development will be had effect on the validity and reliability of the results
and therefore, the applicability in practice. So, the availability of a comprehensive
guide for scale development is essential.
References
1. Fabrigar LR., Ebel-Lam A. Questionnaires. In N. J. Salkind (Ed.), Encyclopedia of Measurement and Statistics (2007).Thousand Oaks, CA: Sage. pp. 808-812.
2. DeVellis RF. Scale Development:Theory and Application. (3rd ed.). Los Angeles, CA: Sage Publications (2012).
3. Hinkin TR.A review of scale development practices in the study of organizations. J Manag. 1995; 21:967–88. doi:10.1016/01492063(95)90050-0
4. McCoach DB, Gable RK, Madura, JP. Instrument Development in the Affective Domain. School and Corporate Applications, 3rd Edn. NewYork, NY: Springer (2013).
5. Kazi A, Fatmi Z, Hatcher J, Niaz U, Aziz A. Development of a stress scale for pregnant women in the South Asian context: the A-Z Stress Scale. East Mediterr Health J. 2009 Mar-
Apr;15(2):353-61. PMID: 19554982.
6. Messick S. Validity of psychological assessment: validation of inferences from persons’ responses and performance as scientifica inquiry into score meaning. Am Psychol. (1995) 50:741–9.
doi: 10.1037/0003-066X.50.9.741
7. MacKenzie, S. B. 2003.“The Dangers of Poor Construct Conceptualization,” Journal of the Academy of Marketing Science (31:3), pp. 323-326.
8. Streiner, D. L., Norman, G. R., & Cairney, J. (2015). Health Measurement Scales:A Practical Guide to Their Development and Use (5th ed.). Oxford, UK: Oxford University Press.
9. Schinka JA,VelicerWF,Weiner IR. Handbook of Psychology, Research Methods in Psychology. Hoboken, NJ: JohnWiley & Sons, Inc. 2012.
10. DeVellis RF. Scale Development:Theory and Applications (4th ed.).Thousand Oaks, CA: Sage. 2017.
11. Price LR. Psychometric Methods:Theory into Practice. NewYork:The Guilford Press. 2017. pp: 190-191.
12. Furr RM. Scale Construction and Psychometrics for Social and Personality Psychology. New Delhi, IN: Sage Publications. 2011.
13. Streiner, DL, Norman GR, Cairney J. Health Measurement Scales:A Practical Guide to Their Development and Use (5th ed.). Oxford, UK: Oxford University Press. 2015.
14. 14. Polit DF, Beck CT, Owen SV. Is the CVI an acceptable indicator of content validity? Appraisal and recommend-ations. Res Nurs Health 2007;30(4):459-67.
15. Haynes SN, Richard DCS, Kubany ES. Content validity in psychological assessment: a functional approach to concepts and methods. Pyschol Assess. 1995; 7:238–47
16. Morgado FFR, Meireles JFF, Neves CM, Amaral ACS, Ferreira MEC. Scale development: ten main limitations and recommendations to improve future research practices. Psicol Reflex E
Crítica 2018; 30:3.
17. Greenlaw C, Brown-Welty S.A Comparison of web-based and paper-based survey methods: testing assumptions of survey mode and response cost. EvalRev. 2009; 33:464–80.
18. Fanning J, McAuley E.A Comparison of tablet computer and paper-based questionnaires in healthy aging research. JMIR Res Protoc. 2014; 3:e38.
19-Raykov T, Marcoulides GA. Introduction to Psychometric Theory. NewYork, NY: Routledge,Taylor & Francis Group 2011.
20. Cohen RJ, Swerdlik ME. Psychological testing and assessment:An introduction to tests and measurement (6th ed.). NewYork: McGraw-Hill, 2005.
21. Si-Mui Sim, Rasiah RI. Relationship between item difficulty and discrimination indices in true/false type multiple choice questions of a para-clinical multidisciplinary paper. Ann Acad
Med Singapore 2006; 35: 67-71
22- Whiston SC. Principles and Applications of Assessment in Counseling. Cengage Learning 2008.
23. Zubairi AM, Kassim NLA. Classical and Rasch analysis of dichotomously scored reading comprehension test items. Malaysian J of ELT Res 2006; 2: 1-20.
24- Tarrant M,Ware J, Mohammed AM.An assessment of functioning and nonfunctioning distractors in multiple-choice questions: a descriptive analysis. BMC Med Educ. 2009; 9:40.
25-Fulcher G, Davidson F.The Routledge Handbook of LanguageTesting. NewYork, NY: Routledge 2012.
26- Polit DF Beck CT. Nursing Research: Generating and Assessing Evidence for Nursing Practice, 9th ed. Philadelphia, USA:Wolters Klower Health, Lippincott Williams & Wilkins, 2012.
27- Sobhi SA, Ibrahim AS, Serwah AA, Tawfik MY. In a research for Developing a disease-specific tool for assessment of quality of life of patients with hepatitis C virus associated chronic
liver disease. Suez canal university medical journal.2008; 11(2):207-214.
28. Boateng GO, Neilands TB, Frongillo EA, Melgar-Quiñonez HR and Young SL Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research: A Primer.
Front. Public Health 2018; 6:149.
29.Wong KL, Ong SF, Kuek TY. Constructing a survey questionnaire to collect data on service quality of business academics. Eur J Soc Sci 2012; 29:209-21.
30, Sackett PR, Lievens F, Berry CM, Landers RN. "A Cautionary Note on the Effects of Range Restriction on Predictor Intercorrelations" (PDF). Journal of Applied Psychology 2007; 92
(2): 538–544.
References
 Steps in Developing A Valid and Reliable Scale.pdf

More Related Content

What's hot

SOURCES OF ERROR AND SCALES OF MEASUREMENT
SOURCES OF ERROR AND SCALES OF MEASUREMENTSOURCES OF ERROR AND SCALES OF MEASUREMENT
SOURCES OF ERROR AND SCALES OF MEASUREMENT
ashanrajpar
 
Schedule and Questionnaire Difference between Schedule and Questionnaire Tech...
Schedule and QuestionnaireDifference between Schedule and QuestionnaireTech...Schedule and QuestionnaireDifference between Schedule and QuestionnaireTech...
Schedule and Questionnaire Difference between Schedule and Questionnaire Tech...
sanjay s.kumar
 
Writing objectives using bloom’s taxonomy by Sohail ahmed
Writing objectives using bloom’s taxonomy by Sohail ahmedWriting objectives using bloom’s taxonomy by Sohail ahmed
Writing objectives using bloom’s taxonomy by Sohail ahmed
Sohail Ahmed Solangi
 
Table of specifications 2013 copy
Table of specifications 2013   copyTable of specifications 2013   copy
Table of specifications 2013 copy
Marciano Melchor
 
Threats to Internal and External Validity
Threats to Internal and External ValidityThreats to Internal and External Validity
Threats to Internal and External Validity
Muhammad Salman Rao
 

What's hot (20)

Unit 4 identification of research problem
Unit 4 identification of research problemUnit 4 identification of research problem
Unit 4 identification of research problem
 
SAMPLING DESIGN AND STEPS IN SAMPLE DESIGN
SAMPLING DESIGN AND STEPS IN SAMPLE DESIGNSAMPLING DESIGN AND STEPS IN SAMPLE DESIGN
SAMPLING DESIGN AND STEPS IN SAMPLE DESIGN
 
SOURCES OF ERROR AND SCALES OF MEASUREMENT
SOURCES OF ERROR AND SCALES OF MEASUREMENTSOURCES OF ERROR AND SCALES OF MEASUREMENT
SOURCES OF ERROR AND SCALES OF MEASUREMENT
 
Cooperative learning
Cooperative learningCooperative learning
Cooperative learning
 
Focus group discussion
Focus group discussionFocus group discussion
Focus group discussion
 
Level of measurement
Level of measurementLevel of measurement
Level of measurement
 
Questionnaire Design
Questionnaire DesignQuestionnaire Design
Questionnaire Design
 
Schedule and Questionnaire Difference between Schedule and Questionnaire Tech...
Schedule and QuestionnaireDifference between Schedule and QuestionnaireTech...Schedule and QuestionnaireDifference between Schedule and QuestionnaireTech...
Schedule and Questionnaire Difference between Schedule and Questionnaire Tech...
 
Quantitative research present
Quantitative research   presentQuantitative research   present
Quantitative research present
 
Writing objectives using bloom’s taxonomy by Sohail ahmed
Writing objectives using bloom’s taxonomy by Sohail ahmedWriting objectives using bloom’s taxonomy by Sohail ahmed
Writing objectives using bloom’s taxonomy by Sohail ahmed
 
DESCRIPTIVE DESIGN
DESCRIPTIVE DESIGNDESCRIPTIVE DESIGN
DESCRIPTIVE DESIGN
 
Table of Specification
Table of SpecificationTable of Specification
Table of Specification
 
Sampling brm chap-4
Sampling brm chap-4Sampling brm chap-4
Sampling brm chap-4
 
Research methodology – unit 4
Research methodology – unit 4Research methodology – unit 4
Research methodology – unit 4
 
Table of specifications 2013 copy
Table of specifications 2013   copyTable of specifications 2013   copy
Table of specifications 2013 copy
 
Threats to Internal and External Validity
Threats to Internal and External ValidityThreats to Internal and External Validity
Threats to Internal and External Validity
 
Some Challenges of M&E
Some Challenges of M&ESome Challenges of M&E
Some Challenges of M&E
 
Interview schedule -shani_ppt
Interview schedule -shani_pptInterview schedule -shani_ppt
Interview schedule -shani_ppt
 
Delphi technique
Delphi techniqueDelphi technique
Delphi technique
 
Validity in Research
Validity in ResearchValidity in Research
Validity in Research
 

Similar to Steps in Developing A Valid and Reliable Scale.pdf

M1-Designing the Assessment-June 2014-FINAL
M1-Designing the Assessment-June 2014-FINALM1-Designing the Assessment-June 2014-FINAL
M1-Designing the Assessment-June 2014-FINAL
Research in Action, Inc.
 
Quick Start Users Guide-June 2014-Working Draft
Quick Start Users Guide-June 2014-Working DraftQuick Start Users Guide-June 2014-Working Draft
Quick Start Users Guide-June 2014-Working Draft
Research in Action, Inc.
 
Template #6-Performance Measure Rubric-May 2014-Final
Template #6-Performance Measure Rubric-May 2014-FinalTemplate #6-Performance Measure Rubric-May 2014-Final
Template #6-Performance Measure Rubric-May 2014-Final
Research in Action, Inc.
 

Similar to Steps in Developing A Valid and Reliable Scale.pdf (20)

Content Validity: Types, Definition , Example
Content Validity: Types, Definition , ExampleContent Validity: Types, Definition , Example
Content Validity: Types, Definition , Example
 
Src Voc
Src VocSrc Voc
Src Voc
 
M1-Designing the Assessment-June 2014-FINAL
M1-Designing the Assessment-June 2014-FINALM1-Designing the Assessment-June 2014-FINAL
M1-Designing the Assessment-June 2014-FINAL
 
Using the test process improvement models. Case study based on TPI Next model...
Using the test process improvement models. Case study based on TPI Next model...Using the test process improvement models. Case study based on TPI Next model...
Using the test process improvement models. Case study based on TPI Next model...
 
QS M1-Designing the Assessment-22JAN14
QS M1-Designing the Assessment-22JAN14QS M1-Designing the Assessment-22JAN14
QS M1-Designing the Assessment-22JAN14
 
M3-Review-SLOs-13NOV13
M3-Review-SLOs-13NOV13M3-Review-SLOs-13NOV13
M3-Review-SLOs-13NOV13
 
chapter 7.ppt
chapter 7.pptchapter 7.ppt
chapter 7.ppt
 
Arrogance or Apathy: The Need for Formative Evaluation + Current & Emerging S...
Arrogance or Apathy: The Need for Formative Evaluation + Current & Emerging S...Arrogance or Apathy: The Need for Formative Evaluation + Current & Emerging S...
Arrogance or Apathy: The Need for Formative Evaluation + Current & Emerging S...
 
R.M Evaluation Program complete research.pptx
R.M Evaluation Program complete research.pptxR.M Evaluation Program complete research.pptx
R.M Evaluation Program complete research.pptx
 
Quick Start Users Guide-June 2014-Working Draft
Quick Start Users Guide-June 2014-Working DraftQuick Start Users Guide-June 2014-Working Draft
Quick Start Users Guide-June 2014-Working Draft
 
Scale Development Techniques Presentation.pptx
Scale Development Techniques Presentation.pptxScale Development Techniques Presentation.pptx
Scale Development Techniques Presentation.pptx
 
Prac Res 2_Q1M1.pdf
Prac Res 2_Q1M1.pdfPrac Res 2_Q1M1.pdf
Prac Res 2_Q1M1.pdf
 
How to develop self reported scale ppt
How to develop self reported scale pptHow to develop self reported scale ppt
How to develop self reported scale ppt
 
How to develop self reported scale
How to develop self reported scaleHow to develop self reported scale
How to develop self reported scale
 
QS M3-Reviewing the Assessment-11FEB14
QS M3-Reviewing the Assessment-11FEB14QS M3-Reviewing the Assessment-11FEB14
QS M3-Reviewing the Assessment-11FEB14
 
Template #6-Performance Measure Rubric-May 2014-Final
Template #6-Performance Measure Rubric-May 2014-FinalTemplate #6-Performance Measure Rubric-May 2014-Final
Template #6-Performance Measure Rubric-May 2014-Final
 
Антон Мужайло, «Using the test process improvement models. Case study based o...
Антон Мужайло, «Using the test process improvement models. Case study based o...Антон Мужайло, «Using the test process improvement models. Case study based o...
Антон Мужайло, «Using the test process improvement models. Case study based o...
 
How to Implement Quality in Health Care Organizations.
How to Implement Quality in Health Care Organizations.How to Implement Quality in Health Care Organizations.
How to Implement Quality in Health Care Organizations.
 
Test construction
Test constructionTest construction
Test construction
 
Perspectives 2018: Ara Tekian
Perspectives 2018: Ara TekianPerspectives 2018: Ara Tekian
Perspectives 2018: Ara Tekian
 

Recently uploaded

Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
mahaiklolahd
 

Recently uploaded (20)

Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
 
Call Girls Shimla Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Shimla Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Shimla Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Shimla Just Call 8617370543 Top Class Call Girl Service Available
 
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...Top Rated Bangalore Call Girls Mg Road ⟟   9332606886 ⟟ Call Me For Genuine S...
Top Rated Bangalore Call Girls Mg Road ⟟ 9332606886 ⟟ Call Me For Genuine S...
 
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
VIP Service Call Girls Sindhi Colony 📳 7877925207 For 18+ VIP Call Girl At Th...
 
Call Girls Kurnool Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kurnool Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Kurnool Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kurnool Just Call 8250077686 Top Class Call Girl Service Available
 
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...Top Rated Bangalore Call Girls Richmond Circle ⟟  9332606886 ⟟ Call Me For Ge...
Top Rated Bangalore Call Girls Richmond Circle ⟟ 9332606886 ⟟ Call Me For Ge...
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
 
Call Girls Guntur Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Guntur  Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Guntur  Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Guntur Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
Call Girls Service Jaipur {9521753030} ❤️VVIP RIDDHI Call Girl in Jaipur Raja...
 
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
 
Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...
Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...
Call Girls Vasai Virar Just Call 9630942363 Top Class Call Girl Service Avail...
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
 
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls  * UPA...
Call Girl in Indore 8827247818 {LowPrice} ❤️ (ahana) Indore Call Girls * UPA...
 
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service AvailableTrichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
Trichy Call Girls Book Now 9630942363 Top Class Trichy Escort Service Available
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
 
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 9332606886 𖠋 Will You Mis...
 
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
Jogeshwari ! Call Girls Service Mumbai - 450+ Call Girl Cash Payment 90042684...
 
Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated  Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...Top Rated  Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
 

Steps in Developing A Valid and Reliable Scale.pdf

  • 1. Steps In Developing AValid And Reliable Scale of Measurement BY: Omnia Samir Elseifi Assistant Professor of Public Health and Community Medicine. Faculty of Medicine Zagazig University 23 January 2020
  • 2. Scale development process • Measurement scales are useful tools to get scores about certain health aspects that cannot be measured directly, such as measuring quality of life. • The researcher must pass through many steps to reach the ultimate goal; which is the developing of a valid and reliable scale to support the application of the test results. Phase I Item Development 1- Identification of domain 2- Item generation 3- Content validity Phase II Scale Development 4- Pretesting (Pilot testing of the Items) 5- Item reduction 6- Extraction of factors Phase III Scale Evaluation 7- Test of dimensionality 8-Test of reliability 9-Test of validity (1,2,3)
  • 3. Scale development process Scheme 1- Identification of domain(s) 1- Purpose 2- Justification 2- Item generation 1- Appropriate questions 2- Number of items 3- Item wording 4- Translation of items 3- ContentValidity CVR CVI FaceValidity 3- Describing domains 4- Specify the dimensions 5- Define each dimension 5-Types of questions 6- Response to items • To Specify the boundaries of the domain. • To Select Which Items to Ask. • To Assess if the Items Adequately Measure the Content of The Domain of Interest.
  • 4. Scale development process Scheme 4- Pretesting 1- Interview with target population 2- Sample size 5- Item reduction 1- Item difficulty index 2- Item discrimination index 3- Item- item correlation and Item – total correlation 4- Distractor Efficiency Analysis 6- Extraction of factors Exploratory Factor Analysis (EFA) Confirmatory Factor Analysis (CFA) 3- Distribution of scale • To Gather Enough Data from the Right People. • To Identify Items That Are Not Related To The Domain, So, They Can Be Deleted Or Modified. • To Explore the Number of Latent Constructs that Fit The Observed Data.
  • 5. Scale development process Scheme 7-Test of Dimensionality Using Factor analysis Unidimensional scale 8-Test of Reliability 1- Test- Retest Reliability 2- Internal Consistency 3- Parallel form Reliability 4- Inter-Rater Reliability 9-Test ofValidity CriterionValidity: Concurrent validity Predictive validity ConstructValidity: ConvergentValidity. DivergentValidity Known groupValidity Multidimensional scale • To Identify The Number Of Latent Variables That Are Measured By The Scale. • To Establish if Responses Are ConsistentWhen Repeated. • To Ensure the scale Measures The intended Latent Dimension.
  • 6. Example Of Validated Scale Development Research A research conducted In Pakistan for “Development of a stress scale for pregnant women in the South Asian context: the A–Z Stress Scale.” Will be an example in most of steps.
  • 7. Phase 1: Item development Step 1: Identification of the Domain(s) Identification of the Domain(s) 5-Define each dimension 1-The purpose: is to develop a scale based on stressors to measure stress among pregnant in developing countries 2- Justification: They found preexisting scales record the somatic and psychological symptoms of the stressors not the stressors themselves 3- Describing domains: They agreed about defining the different stressors the pregnant exposed to. 4- Specify the dimensions :They decided the scale will be consisted from three dimensions; daily, life event and pregnancy related stressors. The purpose : To specify the boundaries of the domain and facilitate item generation. (4,5)
  • 8. Pitfalls 1. This step is often neglected or dealt with in a superficial manner. 2. Construct underrepresentation (focus on narrow aspect of the domain). These troubles lead to a significant number of problems later in the validation process(6,7). Phase 1: Item development Step 1: Identification of the Domain(s)
  • 9. Phase 1: Item development Step 2 Item Generation The purpose : To create an appropriate questions that fit to identified domain. Item Generation 6- Response to questions 1-Appropriate questions 2- Number of items (must be 2-5 times the number in final scale) Item pool of 235 items 3-Item wording 4-Translation of the items 5-Types of questions Deductive methods Literature review Inductive methods: interviews with 25 experts from different specialties” Psychiatry, Gynecology and Sociology”. They conducted interview with 79 pregnant women asking them about the possible stressors. (5,8-11)
  • 10. Pitfalls 1. Presence of irrelevant items to the defined domain can lead to failure of validation of the measuring scale, poor quality of data and invalid conclusion regarding the results and the relationship with other constructs. 2. Improper response to the items as too short scale can affect the reliability of the instrument this is also for too many responses (more than 7) (12). Phase 1: Item development Step 2 Item Generation
  • 11. Phase 1: Item development Step 3: ContentValidity Content validity: • Content validity is to be sure that the items of the generated scale measure what they are presumed to measure (all contents domain of interest) (2) Content validity is assessed by: • Experts, • Target population (2)
  • 12. Purpose: To evaluate the items constituting the domain regarding; content relevance, and technical quality . Phase 1: Item development Step 3: ContentValidity Expert evaluation ContentValidity Ratio (CVR) Kappa coefficient ContentValidity Index (CVI) • >0.74 it’s considered excellent. • Between 0.60 and 0.74 is considered good. • Between 0.40 and 0.59 are considered fair. (2) I-CVIs S-CVI
  • 13. ContentValidity Ratio (CVR): • The experts are requested to specify whether an item is necessary for the construct or not. -Score 1 for: [not necessary] item. -Score 2 for: [useful but not essential] item. - Score 3 for: [essential] item. . Phase 1: Item development Step 3: ContentValidity (Number of experts indicating essential - The total number of experts/2) / The total number of experts / 2. • For minimum number of expert (5 or 6 experts) CVR must be not less than 0.99, • for 8 experts not less than 0.85 • for 10 experts not less than 0.62 otherwise the item should be eliminated from the scale . CVR (13)
  • 14. Content validity index (CVI): Panel members are asked to rate instrument items in terms of clarity and relevancy to the construct on a 4-point scale: -Score 1 for: [not relevant or not clear] items. -Score 2 for: [somewhat relevant or item somewhat clear and need some revision] items. -Score 3 for: [quite relevant or quite clear] items. -Score 4 for: [highly relevant or highly clear] items Phase 1: Item development Step 3: ContentValidity For each item: Experts giving 3 or 4 score / the Total number of experts I-CVIs • >79%, the item is appropriate and retained within the scale. • If between 70 and 79 % it will need revision. • <70 percent, it is eliminated from the scale The number of relevant items by agreement of all experts / Total number of items S-CVI/UA Should be not less than 0.80 Sum of I-CVIs for the items / Total number of items S-CVI/Ave Should be not less than 0.90 (14)
  • 15. Phase 1: Item development Step 3: ContentValidity Face Validity Readability Feasibility Layout Clarity of words Face validity means the degree at which the designed measuring instrument is apparently appropriate and related to the domain under study. The target population share with expert in evaluating the face validity of the scale of measurement (15).
  • 16. Example for this step: A research conducted for the development of a stress scale for pregnant women in the South Asian context: the A–Z Stress Scale (5). The researchers stated that they evaluate the content validity of the scale: By experts and target pregnant (face validity) . According to that the items selected from the item pool were 78 items. Pitfalls • Some researches usually fail to assess the content validity, this may be due to lack of resources or skills. This is expected to affects the final collected data conducted by the scale and the statistical analysis. • Limited numbers of the developing scales undergo target population evaluation which is important step as those population are the target of the newly developed scale (16). Phase 1: Item development Step 3: ContentValidity
  • 17. Phase 2: Scale Development Step 4: Pre-testing Questions Pre- testing Questions 1- Cognitive Interviews with pregnant 2- Sample size Golden rule of thumb is10 respondents per survey item (10:1) They interviewed 70 pregnant 3-Distribution of the scale; Paper based survey or Online survey (they used Paper based face to face interview) The purpose : •To ensure the availability of sufficient data for scale development with minimum level of error. (5,17,18) Pitfalls • Sample size in many validation studies is usually less than the golden role, this may be due to this type of studies may be difficult to be funded. • Missing data increase the risk of inaccurate conclusions due to increasing occurrence of errors.
  • 18. Item Reduction Item Difficulty Index Item discrimination test Inter-item and Item-Total Correlations Distractor Efficiency Analysis The purpose : To identify items that are not related to the domain under study so they can be deleted or modified. (5) Phase 2: Scale Development Step 5: Item Reduction
  • 19. Inter-item correlations: Examine the correlation between each item in the scale and the other items. Phase 2: Scale Development Step 5: Item Reduction Inter-item and Item-Total Correlations Purpose: To determine the correlations between scale items, as well as the correlations between each item and sum score of scale items. Item-total correlations: Examine the relationship between each item score and the total scale score. In both techniques, items with low correlations (r <0.30) are less desirable and could be deleted. (19,20)
  • 20. Example: A research conducted for the development of a stress scale for pregnant women in the South Asian context: the A–Z Stress Scale (5). Phase 2: Scale Development Step 5: Item Reduction The researchers conducted item- total analysis ranged from r = 0.2 to r = 0.8. As a result the items were reduced to final 30 items.
  • 21. Item Difficulty Index Purpose: To assess the difficulty level of the scale test items. Phase 2: Scale Development Step 5: Item Reduction Item correct answers for the item / the total answers on that item Ranges between 0.0 to 1.0 Item difficulty index Difficulty level 0.86 and above Very easy. 0.71 to 0.85 Easy 0.30 to 0.70 Moderate 0.15 to 0.29 Difficult 0.14 and below Very difficult High difficulty index score means a greater proportion of the sample population answered the question correctly. Lower difficulty index score means a smaller proportion of the sample understood the question and answered correctly. (2,21)
  • 22. Item Discrimination test Purpose: to identify the degree to which an item can correctly differentiates between respondents . Phase 2: Scale Development Step 5: Item Reduction The upper group (with high scores) proportion of responders who got the item correct in the upper group - proportion of responders with correct answer in the lower group. Ranges between -1 to +1 The lower group (with low scores) Item discrimination index Discrimination level 0.19 and below Poor item; should be eliminated or revised. 0.20 to 0.29 Marginal items; need revision 0.30 to 0.39 Good item; may need some improvement 0.4 or above Very good item (22,23)
  • 23. Distractor Efficiency Analysis: Purpose: To determine the distribution of incorrect options “distractors” and how they contribute to the quality of items. Phase 2: Scale Development Step 5: Item Reduction The upper group (with high scores) The middle group (with middle scores) The lower group (with low scores) • 100% of participants in the high group • about 50% of participants in the middle • few or none of those in the lower group Correct option Appropriate item If those with adequate knowledge “the high group” can’t differentiate between the right option of the item and the distractors, the question may need to be modified or deleted. (24,25)
  • 24. Factor analysis: It is a method for explaining the construction of data by explaining the correlations between variables. It summarizes data into a few dimensions by condensing many variables into a smaller set of latent variables or factors . • Exploratory Factor Analysis (EFA) it’s the interrelation between items in the construct. It is used to reduce the set of observed variables to a smaller, more close set of variables. • Confirmatory Factor Analysis (CFA) and is used to determine the factors by statistically testing the hypothesis of the expected factor loading (FL) of the observed items on underlying (latent) factors and the correlation between latent variables. • Items having factor loading or slope coefficients below 0.30 are considered inadequate “Unrelated items” that should be eliminated. • Items with cross loading > 0.4 should be eliminated. Phase 2: Scale Development Step 6: Extraction of Factors (4,23,26)
  • 25. Phase 2: Scale Development Step 6: Extraction of Factors Example: In a research for Developing a disease- specific tool for assessment of quality of life of patients with hepatitis C virus associated chronic liver disease (27). They conducted CFA and calculated Factor loading, any item with factor loading less than 0.3 is eliminated. Pitfalls: Many of scale developers are hesitating to use factor analysis either because: • it needs large sample size to be conducted • because it involves many confusing and complicated steps and interpretations (16)
  • 26. • Purpose: A scale’s dimensionality, to identify the number of latent variables that are measured by the scale. • It’s usually depends on the factor’s extraction and analysis. Phase 3: Scale Evaluation Step 7:Test dimensionality (12) Start
  • 27. Example: A research conducted for the development of a stress scale for pregnant women in the South Asian context: the A–Z Stress Scale (5) The researchers stated that their scale has two dimension by multidimension scaling; 1- socioenvironmental related hassles dimension (includes items from 1-26). 2- chronic illness dimension (items 27-30). Phase 3: Scale Evaluation Step 7:Test dimensionality Pitfalls • Failure to effectively calculate EFA and CFA will lead to miss classification of the dimensions of the construct. • Many of the researchers depend on literature and expert view to divide the dimensions of the construct rather than using factors analysis (12).
  • 28. Reliability is the ability to reproduce same result consistently under the same conditions. Purpose: To measure reliability regarding; stability, internal consistency, equivalence and inter-rater reliability. Phase 3: Scale Evaluation Step 8:Tests of Reliability Stability The test is administered twice or more to the same participant to ensure that same results are obtained. Testing the developing scale on 43 pregnant twice one week interval (r = 0.86). It measures whether items measuring the same general construct produce the same scores (Homogeneity).It’s assessed by: • Cronbach’s α;(value 0-1, ≥0.7 is acceptable) • Kuder-Richardson • Split halves reliability (two equal halves of the scale then compare). • Cronbach’s alpha (0.82 for the scale and was ranged between 0.75 to 0.86 for different items). Equivalence It determines the correlation of level of agreement between two or more instruments at the same point of time. It assesses the degree of agreement between two or more raters in assessing certain phenomena at the same point of time. The developing scale was applied on 50 pregnant and two interviewers (r = 0.91). (22, 28, 29)
  • 29. Pitfalls: • Test – retest reliability should be used with caution as the score of values could be changed over time in some types of studies (e.g., intervention studies), here the change isn’t due to low reliable measure, but it’s a true change in the participants. • Number of items in the scale below 10, could lead to decrease Cronbach’s alpha • Lack of standardization between the observers leads to decrease interrater agreement (1,2). Phase 3: Scale Evaluation Step 8:Tests of Reliability
  • 30. Phase 3: Scale Evaluation Step 9:Tests ofValidity Validity The ability of the measuring scale to evaluate the domain that was intended to be measured. Content validity Including face validity Criterion validity Concurrent validity Compare at the same time Gold standard Predictive validity Gold standard or Behavior Predict after time Construct validity Convergent validity Same result Two related measures Divergent validity (Discriminate) Different result Two different measures Known-groups validity Two different groups Different result Same group Same group Same measurement New measure New measure (22. 28, 30)
  • 31. Phase 3: Scale Evaluation Step 9:Tests ofValidity Criterion validity Concurrent validity Compare at the same time multicultural validated depression scale New A–Z Stress Scale Moderate correlation between the two scales (r = 0.56) Example: In the study conducted for the development of a stress scale for pregnant women in the South Asian context: the A–Z Stress Scale (5) Pitfalls for validity calculation: 1- Criterion validity can’t be assessed with small sample size due to presence of sampling error. 2- Criterion validity cannot be used in all circumstances, especially in social sciences as a relevant criterion “gold standard” may be not present, So, it’s usually ignored and not calculated in most of the validation studies. 3- Lack of sufficient resources or skills for calculation and assessment (22).
  • 32. Pitfalls for validity calculation: (cont.) 4- The scale developers usually use homogeneous group from the population in the pilot study which limit calculation of construct validity, so recruiting of heterogenous group or random sample of the population is recommended. 5- Single time calculation of validity is inaccurate if the variable under study changed with time, so, it’s recommended to conduct longitudinal studies during scale development to get accurate validity measures especially predictive validity, as it will lead to pseudo correlations between variables. 6- Social desirability bias: which is a systematic error present in self-reporting measures in which the participants want to keep good image. This is considered as one of the important threats to the validity (22). Phase 3: Scale Evaluation Step 9:Tests ofValidity
  • 33. Conclusion • Valid research results begin with valid and reliable measurement. This can be achieved if a systematic and scientific based process is followed. • Developing a valid and reliable scale is a multiphasic procedure that need a researcher with adequate knowledge and proper level of skills. • Poor scale development will be had effect on the validity and reliability of the results and therefore, the applicability in practice. So, the availability of a comprehensive guide for scale development is essential.
  • 34. References 1. Fabrigar LR., Ebel-Lam A. Questionnaires. In N. J. Salkind (Ed.), Encyclopedia of Measurement and Statistics (2007).Thousand Oaks, CA: Sage. pp. 808-812. 2. DeVellis RF. Scale Development:Theory and Application. (3rd ed.). Los Angeles, CA: Sage Publications (2012). 3. Hinkin TR.A review of scale development practices in the study of organizations. J Manag. 1995; 21:967–88. doi:10.1016/01492063(95)90050-0 4. McCoach DB, Gable RK, Madura, JP. Instrument Development in the Affective Domain. School and Corporate Applications, 3rd Edn. NewYork, NY: Springer (2013). 5. Kazi A, Fatmi Z, Hatcher J, Niaz U, Aziz A. Development of a stress scale for pregnant women in the South Asian context: the A-Z Stress Scale. East Mediterr Health J. 2009 Mar- Apr;15(2):353-61. PMID: 19554982. 6. Messick S. Validity of psychological assessment: validation of inferences from persons’ responses and performance as scientifica inquiry into score meaning. Am Psychol. (1995) 50:741–9. doi: 10.1037/0003-066X.50.9.741 7. MacKenzie, S. B. 2003.“The Dangers of Poor Construct Conceptualization,” Journal of the Academy of Marketing Science (31:3), pp. 323-326. 8. Streiner, D. L., Norman, G. R., & Cairney, J. (2015). Health Measurement Scales:A Practical Guide to Their Development and Use (5th ed.). Oxford, UK: Oxford University Press. 9. Schinka JA,VelicerWF,Weiner IR. Handbook of Psychology, Research Methods in Psychology. Hoboken, NJ: JohnWiley & Sons, Inc. 2012. 10. DeVellis RF. Scale Development:Theory and Applications (4th ed.).Thousand Oaks, CA: Sage. 2017. 11. Price LR. Psychometric Methods:Theory into Practice. NewYork:The Guilford Press. 2017. pp: 190-191. 12. Furr RM. Scale Construction and Psychometrics for Social and Personality Psychology. New Delhi, IN: Sage Publications. 2011. 13. Streiner, DL, Norman GR, Cairney J. Health Measurement Scales:A Practical Guide to Their Development and Use (5th ed.). Oxford, UK: Oxford University Press. 2015. 14. 14. Polit DF, Beck CT, Owen SV. Is the CVI an acceptable indicator of content validity? Appraisal and recommend-ations. Res Nurs Health 2007;30(4):459-67.
  • 35. 15. Haynes SN, Richard DCS, Kubany ES. Content validity in psychological assessment: a functional approach to concepts and methods. Pyschol Assess. 1995; 7:238–47 16. Morgado FFR, Meireles JFF, Neves CM, Amaral ACS, Ferreira MEC. Scale development: ten main limitations and recommendations to improve future research practices. Psicol Reflex E Crítica 2018; 30:3. 17. Greenlaw C, Brown-Welty S.A Comparison of web-based and paper-based survey methods: testing assumptions of survey mode and response cost. EvalRev. 2009; 33:464–80. 18. Fanning J, McAuley E.A Comparison of tablet computer and paper-based questionnaires in healthy aging research. JMIR Res Protoc. 2014; 3:e38. 19-Raykov T, Marcoulides GA. Introduction to Psychometric Theory. NewYork, NY: Routledge,Taylor & Francis Group 2011. 20. Cohen RJ, Swerdlik ME. Psychological testing and assessment:An introduction to tests and measurement (6th ed.). NewYork: McGraw-Hill, 2005. 21. Si-Mui Sim, Rasiah RI. Relationship between item difficulty and discrimination indices in true/false type multiple choice questions of a para-clinical multidisciplinary paper. Ann Acad Med Singapore 2006; 35: 67-71 22- Whiston SC. Principles and Applications of Assessment in Counseling. Cengage Learning 2008. 23. Zubairi AM, Kassim NLA. Classical and Rasch analysis of dichotomously scored reading comprehension test items. Malaysian J of ELT Res 2006; 2: 1-20. 24- Tarrant M,Ware J, Mohammed AM.An assessment of functioning and nonfunctioning distractors in multiple-choice questions: a descriptive analysis. BMC Med Educ. 2009; 9:40. 25-Fulcher G, Davidson F.The Routledge Handbook of LanguageTesting. NewYork, NY: Routledge 2012. 26- Polit DF Beck CT. Nursing Research: Generating and Assessing Evidence for Nursing Practice, 9th ed. Philadelphia, USA:Wolters Klower Health, Lippincott Williams & Wilkins, 2012. 27- Sobhi SA, Ibrahim AS, Serwah AA, Tawfik MY. In a research for Developing a disease-specific tool for assessment of quality of life of patients with hepatitis C virus associated chronic liver disease. Suez canal university medical journal.2008; 11(2):207-214. 28. Boateng GO, Neilands TB, Frongillo EA, Melgar-Quiñonez HR and Young SL Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research: A Primer. Front. Public Health 2018; 6:149. 29.Wong KL, Ong SF, Kuek TY. Constructing a survey questionnaire to collect data on service quality of business academics. Eur J Soc Sci 2012; 29:209-21. 30, Sackett PR, Lievens F, Berry CM, Landers RN. "A Cautionary Note on the Effects of Range Restriction on Predictor Intercorrelations" (PDF). Journal of Applied Psychology 2007; 92 (2): 538–544. References