Upcoming SlideShare
×

# Reliability what is it, and how is it measured

5,463 views
5,321 views

Published on

2 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
5,463
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
83
0
Likes
2
Embeds 0
No embeds

No notes for slide

### Reliability what is it, and how is it measured

1. 1. 94 Key Words Reliability, measurement, quantitative measures, statistical method. by Anne Bruton Reliability: Joy H Conway Stephen T Holgate What is it, and how is it measured? Summary Therapists regularly perform various measurements. How reliable these measurements are in themselves, and how clearly essential knowledge to help clinicians reliable therapists are in using them, is clearly essential knowledge decide whether or not a particular to help clinicians decide whether or not a particular measurement measurement is of any value. is of any value. The aim of this paper is to explain the nature of This article focuses on the reliability of reliability, and to describe some of the commonly used estimates measures that generate quantitative data, and that attempt to quantify it. An understanding of reliability, and in particular ‘interval’ and ‘ratio’ data. how it is estimated, will help therapists to make sense of their own Interval data have equal intervals between numbers but these are not related to true clinical findings, and to interpret published studies. zero, so do not represent absolute quantity. Although reliability is generally perceived as desirable, there is no Examples of inter val data are IQ and firm definition as to the level of reliability required to reach clinical degrees Centigrade or Fahrenheit. In the acceptability. As with hypothesis testing, statistically significant temperature scale, the difference between levels of reliability may not translate into clinically acceptable levels, 10° and 20° is the same as between 70° and so that some authors’ claims about reliability may need to be 80°, but is based on the numerical value of the scale, not the true nature of the variable interpreted with caution. Reliability is generally population specific, itself. Therefore the actual difference in so that caution is also advised in making comparisons between heat and molecular motion generated is not studies. the same and it is not appropriate to say that The current consensus is that no single estimate is sufficient to someone is twice as hot as someone else. provide the full picture about reliability, and that different types of With ratio data, numbers represent units estimate should be used together. with equal intervals, measured from true zero, eg distance, age, time, weight, strength, blood pressure, range of motion, height. Introduction Numbers therefore reflect actual amounts of Therapists regularly per form various the variable being measured, and it is measurements of varying reliability. The appropriate to say that one person is twice as term ‘reliability’ here refers to the heavy, tall, etc, as another. The kind of consistency or repeatability of such quantitative measures that therapists often measurements. Irrespective of the area carry out are outlined in table 2. in which they work, therapists take The aim of this paper is to explain the measurements for any or all of the reasons nature of reliability, and to describe, in outlined in table 1. How reliable these general terms, some of the commonly used measurements are in themselves, and how methods for quantifying it. It is not intended reliable therapists are in performing them, is to be a detailed account of the statistical Table 1: Common reasons why therapists perform Table 2: Examples of quantitative measures measurements performed by physiotherapists As part of patient assessment. Strength measures (eg in newtons of force, kilos lifted. As baseline or outcome measures. Angle or range of motion measures (eg in degrees, Bruton, A, Conway, J H As aids to deciding upon treatment plans. centimetres). and Holgate, S T (2000). As feedback for patients and other interested Velocity or speed measures (eg in litres per minute ‘Reliability: What is it and parties. for peak expiratory flow rate). how is it measured?’ As aids to making predictive judgements, eg about Length or circumference measures (eg in metres, Physiotherapy, 86, 2, outcome. centimetres). 94-99.Physiotherapy February 2000/vol 86/no 2
2. 2. Professional articles 95minutiae associated with reliability measures, Table 3: Repeated maximum inspiratory pressure measures datafor which readers are referred to standard demonstrating good relative reliabilitybooks on medical statistics. MIP Rank Subject Day 1 Day 2 Difference Day 1 Day 2Measurement Error 1 110 120 +10 2 2It is very rare to find any clinical 2 94 105 +11 4 4measurement that is perfectly reliable, as allinstruments and observers or measurers 3 86 70 --16 5 5(raters) are fallible to some extent and all 4 120 142 +22 1 1humans respond with some inconsistency. 5 107 107 0 3 3Thus any observed score (X) can be thoughtof as a function of two components, ie a truescore (T) and an error component(E): X = T ± E Table 4: Repeated maximum inspiratory pressures measures data demonstrating poor relative reliability The difference between the true value andthe observed value is measurement error. In MIP Rankstatistical terms, ‘error’ refers to all sources Subject Day 1 Day 2 Difference Day 1 Day 2of variability that cannot be explained by the 1 110 95 --15 2 5independent (also known as the predictor, 2 94 107 +13 4 3or explanatory) variable. Since the error 3 86 97 +11 5 4components are generally unknown, it is 4 120 120 0 1 2only possible to estimate the amount of anymeasurement that is attributable to error 5 107 129 +22 3 1and the amount that represents an accuratereading. This estimate is our measure ofreliability. Measurement errors may be systematic or by some type of correlation coefficient, egrandom. Systematic errors are predictable Pearson’s correlation coefficient, usuallyerrors, occurring in one direction only, written as r. For table 3 the data give aconstant and biased. For example, when Pearson’s correlation coefficient of r = 0.94,using a measurement that is susceptible to a generally accepted to indicate a high degreelearning effect (eg strength testing), a retest of correlation. In table 4, however, althoughmay be consistently higher than a prior test the differences between the two measures(perhaps due to improved motor unit co- look similar to those in table 1 (ie –15 to +22ordination). Such a systematic error would cm of water), on this occasion the rankingnot therefore affect reliability, but would has changed. Subject 4 has the highest MIPaffect validity, as test values are not true on day 1, but is second highest on day 2,representations of the quantity being subject 1 had the second highest MIP in daymeasured. Random errors are due to chance 1, but the lowest MIP on day 2, and so on.and unpredictable, thus they are the basic For table 4 data r = 0.51, which would beconcern of reliability. interpreted as a low degree of correlation. Correlation coefficients thus give infor-Types of Reliability mation about association between twoBaumgarter (1989) has identified two types variables, and not necessarily about theirof reliability, ie relative reliability and proximity.absolute reliability. Absolute reliability is the degree to which Relative reliability is the degree to which repeated measurements vary for individuals,individuals maintain their position in a ie the less they vary, the higher the reliability.sample over repeated measurements. Tables This type of reliability is expressed either in3 and 4 give some maximum inspiratory the actual units of measurement, or as apressure (MIP) measures taken on two proportion of the measured values. Theoccasions, 48 hours apart. In table 3, standard error of measurement (SEM),although the differences between the two coefficient of variation (CV) and Bland andmeasures vary from –16 to +22 centimetres Altman’s 95% limits of agreement (1986)of water, the ranking remains unchanged. are all examples of measures of absoluteThat is, on both day 1 and day 2 subject 4 reliability. These will be described later.had the highest MIP, subject 1 the secondhighest, subject 5 the third highest, and soon. This form of reliability is often assessed Physiotherapy February 2000/vol 86/no 2