Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Development of health measurement scales - part 1


Published on

  • Be the first to comment

Development of health measurement scales - part 1

  1. 1. Development of health measurement scales – part I Dr. Rizwan S A, M.D.
  2. 2. OUTLINE OF PRESENTATION • Introduction -Basic concepts • Devising the items • Scaling responses • Selecting the items • Biases in responding • From items to scale • Article
  3. 3. OUTLINE OF PRESENTATION • Introduction -Basic concepts • Devising the items • Scaling responses • Selecting the items • Biases in responding • From items to scale • Article
  4. 4. Some terms • Scale • Subscales • Items • Responses
  5. 5. History of scales • Initially we used mortality and morbidity indicators • After these came down, they were no longer representative or sensitive, creating need for new health indices, happiness, QOL, sadness • WWII – provided the impetus • Scaling techniques developing psychologists • Sampling techniques were developed by political scientists • Development in data analysis
  6. 6. Psychophysics and psychometrics • Power law – humans can make consistent numerical estimates of sensory stimuli • Extrapolating this evidence to the concept that people can make subjective judgments about health in a consistent manner
  7. 7. 32 degree Celsius Depression score - 32
  8. 8. Basic Steps in Scale development 1.Searching the Literature • Awareness of Existing scales for the same purpose 2.Critical Review • Reliability • Validity
  9. 9. Basic Steps Reliability -Reliability refers to the degree to which the results obtained by a measurement procedure can be replicated. Assessing Reliability • Internal Consistency • The average correlation among all the items in the measure. • Its calculated by Cronbach’s alpha, Kuder-Richardson, Split halves • Stability • Reproducibility of a measure on different occasions. • Inter-Observer reliability • Intra-Observer reliability • Test-Retest reliability
  10. 10. Basic Steps Validity • An expression of the degree to which a measurement measures what it purports to measure Types: 1. Face Validity: the relevance of measurement may appear obvious to the investigator 2. Content Validity: the extent to which the measurement incorporates the domain of the phenomenon under study. 3. Construct Validity : the extent to which the measurement corresponds to theoretical concepts 4. Criterion Validity : the extent to which the measurement correlates with an external criterion of the phenomenon under study -Concurrent Validity -Predictive Validity
  11. 11. Reliability Vs. Validity
  12. 12. Basic Steps Traditions of assessments Categorical model (eg- DMS-IIIR) Dimensional model (eg-CES-D) Diagnosis requires multiple criteria each with threshold values Occurrence of some features at high intensity can compensate for nonoccurrence of others Differences between cases and non-cases are implicit in definition Difference between cases and non-cases are less clearly delineated Severity is lowest in instances that minimally satisfy diagnostic criteria Severity is lowest among non-disturbed individuals. One diagnosis often precludes others A person can have varying amounts of different disorders. Multidimensional scaling is a bridge between them
  13. 13. Basic Steps • Reduction of measurement error In Clinical observation • Through training • Interviewing skills • Clinical experience In Psychometric tradition • • • • Items screened to meet certain criteria Consistency of answers across many items Scale as a whole checked if its meeting other criteria Two solitudes can be merged using Diagnostic Interview Schedule (DIS)in psychiatry It is derived from clinical examination used to diagnosed psychiatric patient but can be administered by trained lay people
  14. 14. OUTLINE OF PRESENTATION • Introduction • Basic concepts • Devising the items • Scaling responses • Selecting the items • Biases in responding • From items to scale • Article
  15. 15. Devising the items • Item • Refers to an individual question or response phrases in any health measurements. • First step in writing a scale is devising the items • By exploring various sources • Identifying strengths and weakness of each of them • Items may be repeated from previous scales • Advantages • saves work and necessity of constructing new • proof of being useful and psychometrically sound • only way of asking about a specific problem • Disadvantages • outdated terminology • inadequate or incomplete for domain under study
  16. 16. Devising the items • Sources of items 1. 2. Focus group Key informant interviews • • 3. 4. 5. Clinical observation Patients Theory Research findings Expert opinion • A scale may consist of items derived from some or all of these sources.
  17. 17. Devising the items • After generation of items, Content validity should be addressed • Content Relevance: Each item on the test should relate to one of the course objectives • Content Coverage: Each part of the syllabus should be represented by one or more question.
  18. 18. Devising the items • Generic versus Specific scales (Fidelity versus bandwidth issue) Generic scale (Bandwidth) Specific scale (Fidelity) Allows comparison across different disorders, severity of disease, interventions, demographic and cultural groups Questions will be relevant and appropriate for any specific problem Psychometric properties well established Short
  19. 19. Devising the items • Translation • Translating each item into other language. • Done by a person who is fluent in both English and target tongue, knowledgeable about the content area and aware of intent of each item and scale as a whole • Back Translation • done by another bilingual person, knowledgeable who translates it back into English • Re-establishing the reliability and validity within new context
  20. 20. OUTLINE OF PRESENTATION • Introduction • Devising the items • Scaling responses • Selecting the items • Biases in responding • From items to scale • Article
  21. 21. Scaling Responses • A method by which responses will be obtained • Divided into categorical or continuous variable • level of measurement are decided • Nominal, ordinal, interval, ratio
  22. 22. Scaling Response 1-Dichotomous scale: one that arranges items into either of two mutually exclusive categories ,eg , Yes/no, alive/dead. 2. Nominal scale: classification into unordered qualitative categories; eg., Race, religion, and country of birth. 3. Ordinal scale: classification into ordered qualitative categories, eg., Social class (I, II, III,etc.). 4. Interval scale: an (equal) interval involves assignment of values with a natural distance between them, so that a particular distance (interval) between two values in one region represents the same distance between two values in another region of the scale. Examples include celsius and fahrenheit temperature. 5. Ratio scale: A ratio is an interval scale with a true zero point, so that ratios between values are meaningfully defined. Examples are absolute temperature, weight, height.One value as being so many times greater or less than another value.
  23. 23. Scaling Response • Categorical judgment • Required when response to a question is either yes or no/simple check • Problems • Uncertainty and confusion on the part of respondents • Potential loss of information and reduced reliability • Loss of efficiency of the instrument
  24. 24. Scaling Response • Continuous judgment • Required when the response to a question is a continuous variable • Methods to quantify it are • Direct Estimation Technique • Comparative Methods • Econometric Method
  25. 25. Scaling Response Direct estimation methods 1. Visual analogue scale
  26. 26. Scaling Response Direct estimation methods 2.Adjectival scale
  27. 27. Scaling Response Direct estimation methods 3. Specific scaling methods A. Likert scale • Rater expresses an opinion by rating his agreement on series of statements, wherein responses are framed on an agree-disagree continuum.
  28. 28. Scaling Response B. Semantic differential scale • To define a number of related dimensions of a characteristics on a series of continuous bipolar scales
  29. 29. Scaling Response • General issues in construction of continuous scales • How many steps should there be? • Is there a maximum number of categories?
  30. 30. Scaling Response • • • • • • Should there be an even or odd number of category? Should all the points on the scale be labelled or only the ends? Do adjectives always convey the same meaning? Do numbers placed under the boxes influence the responses? Should the order of successive question response changes? Can it be assumed that data are interval?
  31. 31. Scaling Response • Critique of Direct Estimation Methods • • • • • • • Subjective judgment Easy to design Little pre testing Easily understood Halo effect End aversion bias Positive skew
  32. 32. Scaling Response • Comparative methods • These methods scale the value of each description before obtaining responses, to ensure the response values to be on interval scale • Types • Thurstones’ method • Paired comparison technique • Guttman method
  33. 33. Thurstone’s method of equal appearing interval 1. Selection of 100-200 statements 2. No. of judges are asked to sort them into single pile from lowest to highest 3. Median rank of each statement computed and it’s the scale value of that statement 4. Select a limited no. of statements about 25 having equal intervals between successive items and spanning the entire range of values 5. Applying scale to respondent-they were asked to indicate the statement which applies to him/her 6. Respondents score will be average score of item selected
  34. 34. Scaling Response • Paired comparison technique • Similar to thurstone’s • Except here judges are asked to judge each item one at a time to remaining items • Proportion of times each alternative is chosen over each other option • Convert the values to z-score using property of normal curve
  35. 35. Scaling Response Paired comparison technique
  36. 36. Scaling Response •Guttman method • Differs from thurstones’ in small sample 10-20 items • No calibration is done • Items are tentatively ranked according to increasing amount of attribute assessed and responses are displayed in subject-by-item matrix were 1 is endorsed item and 0 is remaining item • Its an ordinal scale not interval • Here coefficient of reproducibility and coefficient of scalability is used to reflect deviation from perfect cumulativeness • Best suited to behaviors which are developmentally determined ,where mastery of one behaviour virtually guarantees mastery of lower order behaviour
  37. 37. Scaling Response -Guttman method • eg. Assessment of function of lower limb in people with osteoarthritis • A=4,B-3,C=2,D=2,E=1
  38. 38. Scaling Response -Guttman method • The indices which reflect how much an actual scale deviate from perfect cumulativeness are • coefficient of reproducibility The degree to which a person’s scale score is a predictor of his response pattern . Varies between 0and 1;should be >0.9 • coefficient of scalability Reflects whether the scale is uni dimensional and cumulative, Varies betwenm 0 and 1 and should be at least 0.6
  39. 39. Scaling Response • Critique of Comparative method • Requires more time for their development • Thurstone’s and paired comparison guarantee interval level measurement
  40. 40. Scaling Response • Goal attainment scaling(GAS) • An attempt to construct scale which are tailored to specific individuals, yet can yield results on a common ratio scale across all people • If intervention worked as intended subject should score 0 • A higher mean score for all indicate goals were set too low • not all subjects need have same goals Critique • Ability to tailor the scale to specific goals of the individual • Each subject has his own scale, different number of goals and vary criteria for each one • Extremely labour intensive GAS is useful when • The objective is to evaluate intervention as a whole, goals for each person are different and adequate resources for training goal setters and raters are present.
  41. 41. Scaling Response • Econometric Method • Required to scale benefits along a numerical scale so that cost/benefit ratios can be determined • Health state is rated by averaging judgements from a large number of individuals to create a utility score for the state. • Here focus of measurement is described health state not the characteristics of the individual respondent. • eg-choice between medical management and CABG in managing angina approached by the following methods: • Von Neumann-Morgenstern standard gamble • Time tradeoff technique
  42. 42. Scaling Response-Econometric method Von Neumann-Morgenstern standard gamble You have been suffering from angina from several years. As a result of your illness you have chest pain after even minor physical exertion.You have been forced to quit your job and spend most days at home watching TV. Imagine you are offered a possibility of an operation that will result in complete recovery , though operation carry some risk there is a probability p that you will die during operation. How large must p be before you will decline the operation and choose to remain in your present state? Closer the present state is to perfect health , the smaller the risk of death one would be willing to entertain. Having obtained an estimate of p from subjects,value of present state can be directly converted to 0-1 scale by 1-p. Time trade off Technique Imagine living the remainder of your lifespan in your present state 40 years.Contrast this, with operation you can return to your perfect health for fewer years .How many years would you sacrifice if you have perfect health? So the respondent is presented with the alternative of 40 years in her present state versus 0 years of complete health.
  43. 43. Scaling Response • Critique of Econometric method • Difficult to administer • Require a trained interviewer • Multidimensional scaling • Technique to examine the similarities of different objects which may vary along a number of separate dimensions • Begins with some index of how close each object is to every other object and then try to determine how many dimensions underlie these evaluation of closeness
  44. 44. Scaling Response-Multidimensional scaling
  45. 45. OUTLINE OF PRESENTATION • Introduction -Basic concepts • Devising the items • Scaling responses • Selecting the items • Biases in responding • From items to scale • Article
  46. 46. Selecting the items A. Pre-test the items to ensure that they 1. comprehensible to target populations 2. Unambiguous 3. ask only a single question B. Eliminate or rewrite any item which do not meet the criteria above and pre test again C. Discard items endorsed by very few (or many) subjects
  47. 47. Selecting the items D. Check for internal consistency of the scale using 1. Item-Total correlation a) Correlate each item with the scale total omitting that item b) Eliminate or rewrite any with Pearson r’<0.20 c) Rank order the remaining one and select items starting with highest correlation
  48. 48. Selecting the items 2. Coefficient α or KR-20 a) Calculate α eliminating one item at a time. b) Discard any item where α significantly increases. E. For multi scale questionnaire, check the item is in ‘right’ scale by a) Correlating it with the totals of all the scales , eliminating items which correlate more highly on scales other than the one it belongs to b) Factor-analysing the questionnaire, eliminating items which load more highly on other factor than the one it should belong to.
  49. 49. OUTLINE OF PRESENTATION • Introduction -Basic concepts • Devising the items • Scaling responses • Selecting the items • Biases in responding • From items to scale • Article
  50. 50. Biases in Responding • The people who develop a scale a scale, those who use in their work, and the ones who are asked to fill it out, all approach scales from different perspectives, for different reason • Optimizing • Describes performance a task in a careful and comprehensive manner a) try to interpret meaning of the question itself b) try to retrieve all the relevant information from their memories. c) use this information to form a single integrated summary judgement d) try to convey that judgment on the answer sheet.
  51. 51. Biases in Responding • Satisficing Giving an answer which is satisfactory but not optimal which may include-Selecting the first response option(written form),last option (verbal form),agreeing with every statement, answering either true or false to each option , keep things as they are as a response or I don’t know It can be minimized by keeping simple task, words that are short and easy, response with all the possibilities and motivation of respondents
  52. 52. Biases in Responding • Social desirability(SD) and faking good • The subject does not deliberately try to deceive or lie and gives a socially desirable answer is-SD • When the subject is aware and intentionally attempt to create a false positive impression it is called Faking good • SD depends on on individual sex culture question and its context. • Assessed by - Differential Reliability Index(DRI),Social Desirability Scale,Desirability scale,Social Relation Scale. • Faking good being volitional are easier to modify through instructions and careful wording of the items than social desirability
  53. 53. Biases in Responding • Deviation and faking bad • The tendency to test items with deviant responses is opposite of social desirability and is known as Deviation • Faking bad- When the subject is aware and intentionally attempt to create a false negative impression it is called faking bad opposite of faking good. • Minimizing biases by • Disguising the intent of the test • Use of subtle items ones where the respondent is unaware of • Random response technique
  54. 54. Biases in Responding • Yea-saying or acquiescence and Nay-saying • The tendency to give positive response such as yes ,like , true and negative response such as no , dislike , false etc irrespective of the content of the item is called Yea and Nay saying respectively. • It can be reduced by having an equal number of item keyed in positive and negative directions. • End –aversion bias or Central tendency bias • It’s the reluctance of some people to use extreme categories of a scale. • Its reduced by avoiding absolute statements at the end points and including throw away categories at the ends.
  55. 55. Biases in Responding • Positive skew • When responses are distributed more toward favourable end .It produces ceiling effect. • It can be minimized by not putting average need in the middle or middle is expanded.
  56. 56. Biases in Responding • Halo • When judgement made on individual aspects of a person’s performance are influenced by the raters. Overall impression of the person. • It can be minimized by training of raters, basing the evaluation on large samples of behaviour and using more than one evaluator
  57. 57. Biases in Responding • Framing • When the persons’ choice between two alternative states depends on how these sates are framed. • People are RISK AVERSE when Gain is involved and RISK TAKERS when in loss situations. • A-200 people will be saved. • B-There is one third probability that 600 people will be saved,and two third that nobody will be saved. OR A.400 will die. B.There is a one-third probablility that nobody will die , and two third that 600 will die. • The safest strategy for the test developer is to assume that all of these biases are operative and take the necessary steps to minimize them whenever possible.
  58. 58. OUTLINE OF PRESENTATION • Introduction -Basic concepts • Devising the items • Scaling responses • Selecting the items • Biases in responding • From items to scale • Article
  59. 59. From items to scales • Differential weighting of items rarely is worth the trouble • For test being developed for local use ,total score can be obtained by adding up all the items • For general use and to be comparable transform the scores into percentile , z or T scores • For attributes which differ between males and females or which show development changes ,separate age or age-sex norms can be developed.
  60. 60. From items to scales • Combining items into a scale and expression of final score 1. 2. Add the score of the individual item when items are equally contributing to the total score Weighting the items when some item may be more important Each item is given either the same weight or different weight by different subjects 3. Transformation of final scores when comparing the scores on different scales in Percentiles Standard ad standardized scores Normalized scores
  61. 61. From items to scales • Percentiles is the percentage of people who score below a certain value ,lowest being 0th percentile and highest is 99th percentile. Its easy to understand , requires many scores ,non normal in distribution and being an ordinal data cannot be analyzed by parametric statistics. • To address the problems with percentiles scale, z score ,T scores can be calculated : Z-score by transforming scale with a mean of 0 and a standard deviation of1 and T scores by transforming z-scores using new mean and standard deviation chosen arbitrarily. • To ensure normal distribution of z and T scores we use normalized standard score.
  62. 62. From items to scales • Establishing the cut points • Receiver Operating Characteristics curves-ROC curve • Requires true positive rate(sensitivity) and true negative rate(specificity) • A graph is plotted where X axis is 1-specificity(false positive rate) and Y axis is sensitivity(true positive rate).The diagonal runs from (0,0) in lower left hand corner to (1,1) in upper right reflect characteristics of a test with no discriminating ability. The better the test in dividing cases from non cases , the closer it approach the upper left corner . An index of goodness of test is area under the curve as D’
  63. 63. From items to scales curve ROC
  64. 64. THANK YOU