LECTURE SEVEN MEASUREMENT OF VARIABLES: OPERATIONAL DEFINITION AND SCALES
THE RESEARCH DESIGN DETAILS OF STUDY MEASURMENT Purpose of the study Exploration Description Hypothesis testing Types of Investigation Establishing: -Casual relationships -Correlations -Group differences, Extent of researcher Interference Minimum: Studying events as they normally occur Moderate: Minimum amount of interference Maximum: High degree of control and artificial settings Study setting Contrived Noncontrived Measurement and measures Operational definition items (measure) Scaling Categorizing Coding Unit of analysis (Population to be studied) Individuals Dyads Groups Organizations Machines etc. Sampling design Probability/ nonprobability Sample Size ( n ) Time horizon One-Shot (cross-sectional) Multishot (longitudinal) Data-Collection method Observation Interview Questionnaire Physical measurement Unobtrusive
Reduction of abstract concepts to render them measurable in tangible way is called operationalizing the concepts. It is done by looking at the behavioral dimensions, facets, or properties denoted by the concept.
Certain things lend themselves to easy measurement through the use of appropriate measurement instrument, as for example, blood pressure, pulse rate, body temperature, height, weight, etc. The same is true for measuring office floor area. As for example:
How long have you been working in this organization?
How long have you been working on this particular assignment?
What is your job title?
What is your marital status?
when we get into realm (particular area) of people subjective feelings, attitudes, and perceptions, the measurement of these factors or variable becomes difficult. So the abstract notation are broken down into observable characteristic behavior i.e., dimensions, and elements
The concept of thirst is abstract: we cannot see thirst. However, we would expect a thirsty person to drink plenty of fluid . If several people say they are thirsty, then we may determine the thirst level of each of these individuals by the measure of the quantity of fluid that they drink to quench their thirst. We will thus be able to measure their level of thirst, even though the concept of thirst itself is abstract and nebulous (unclear).
In the above example the thirst is the concept , the drinking of plenty of fluid is the dimension, and the measuring of the quantity of fluid that they drink to quench their thirst is the element.
D C Learning Understanding Retention Application Answer questions correctly Give appropriate examples Recall material after some lapse of time Solve problems applying concepts understood and recalled Integrate with other relevant material D D E E E E E EXAMPLE OF OPERATIONAL DEFINITION: DIMENTION (D) (INDICATORS) AND ELEMENT (E) (VARIABLES) OF THE CONCEPT (C) LEARNING
A scale is a tool or mechanism by which individuals are distinguished as to how they differ from one another on the variable of interest to our study.
There are four basic methods of scales: nominal, ordinal, interval, and ratio. The degree of sophistication to which the scales are fine-tuned increases progressively as we move from nominal to the ratio scale.
A nominal scale is one that allows the researcher to assign subjects to certain categories or groups. The categories are also collectively exhaustive (complete). In other words, there is no third category into which respondents would normally fall. Thus the nominal scale gives us some basic, categorical, gross information.
variable of gender, respondents can be grouped into two categories- male female. These two groups can be assigned code numbers 1 and 2.
If we had interviewed 200 people, and assigned code number 1 to all male respondents and number 2 to all female respondents, then computer analysis of the data at the end of the survey may show that 98 of the respondents are men and 102 are women. This frequency distribution tells us that 49%of the survey’s respondents are men and 51% women. Other than this marginal information, such scaling tells us nothing more about the two groups.
An ordinal scale not only categorized the variables in such a way as to denote differences among the various categories, it also rank-orders the categories in some meaningful way. With any variable for which the categories are to be ordered according to some preference, the ordinal scale would be used. The preference would be ranked (for example: best to worse, first to last etc.) and numbered 1, 2, and so on.
Rank the following five characteristics in a job in terms of how important they are for you. You should rank the most important item as 1, the next in importance as 2, and so on, until you have ranked each of them 1, 2, 3, 4, or 5.
3 5. Work independently. 5 4. Serve others. 1 3. Complete a whole task from beginning to end. 2 2. Use a number of different skills. 4 1. Interact with others. The opportunity provided by the job to: RANKING OF IMPORTANCE JOB CHARECTERISTICS
An interval scale measure the distance between any two points on the scale. This help us to compute the means and the standard deviations of the responses on the variables.
In other words, the interval scale not only groups, it also measures the magnitude of the differences in the preferences among the individuals.
It is more powerful scale than the nominal and ordinal scale, and has for its measure of central tendency the arithmetic mean. Its measure of dispersion are the range, the standard deviation, and the variance.
Indicate the extent to which you agree with the following statements as they related to your job, by circling the appropriate number against each, using the scale given below.
5 4 3 2 1 Working independently. e. 5 4 3 2 1 Serving others. d. 5 4 3 2 1 Completing a task from beginning to end. c. 5 4 3 2 1 Using a number of different skills. b. 5 4 3 2 1 Interacting with others. a. The following opportunities offered by the job are very important to me: Strongly Agree 5 Agree 4 Neither Agree NorDisagre 3 Disagree 2 Strongly Disagree 1
The ratio scale overcomes the disadvantage of the arbitrary origin point of the interval scale, in that it has an absolute zero point, which is a meaningful measurement point. Thus the ratio scale not only measures the magnitude of the differences between points on the scale but also tapes the propositions in the differences. It is most powerful of the four scales because it has a unique zero origin (not an arbitrary origin) and subsumes all the properties of the other three scales.
The measurement of central tendency of the ratio scale could be either the arithmetic or the geometric mean and the measure of dispersion could be either the standard deviation, or variance, or the coefficient of variation.
How many other organization did you work for before joining this system? ____
Indicate the number of children you have in each of the following categories:
____below three years of age
____between three to six years
____over six years but under twelve years
____twelve years and over.
How many retail outlets do you operate? ____.
The responses to the questions could range from 0 to any reasonable figure.
EXAMPLE OF RATIO SCALE
PROPERTIES OF FOUR SCALES t, F Standard deviation or variance, or coefficient of variation Arithmetic or Geometric Mean Yes Yes Yes Yes Ratio t, F Standard deviation, variance, coefficient of variation Arithmetic Mean No Yes Yes Yes Interval Rank-order correlation Semi-inter-quartile range Median No No Yes Yes Ordinal Chi-square Test ( X 2 ) ________ Mode No No No Yes Nominal Some Tests of Significance Measure of Dispersion Measures of Central Tendency Unique Origin Distance/ Magnitude Order/ Rank Difference/ Category Scales Highlights Statistical Tools
The likert scale is designed to examine how strongly subjects agree or disagree with statement on a 5 point scale with anchors. This is an interval scale.
EXAMPLE Strongly agree 5 Agree 4 Neither Agree Nor Disagree 3 Disagree 2 Strongly Disagree 1 5 4 3 2 1 Life without my work will be dull. 5 4 3 2 1 I am not engaged in my work all day. 5 4 3 2 1 My work is very interesting. Using the Likert scale, state the extent to which you agree with each of the following. Circle your answer.
Several bipolar attributes are identified at the extremes of the scale, and respondents are asked to indicate their attitudes, on what may be called a semantic (artificial) space, towards a particular individual, object, or event of each if the attributes. The bipolar adjectives used as Good-Bad, Strong-Weak, Hot-Cold. etc.
4. SEMANTIC DIFFERENTIAL SCALE EXAMPLE Responsive ------------------------------------------- Unresponsive Beautiful ----------------------------------------------------------- Ugly Bad ---------------------------------------------------------------- Good Hot ------------------------------------------------------------------ Cold
A 5-point or 7-point (or 9, or whatever) scale with different anchors (e.g., Very Unimportant to Very Important, Extremely low to Extremely high) as needed, is provided for each item and the respondent states the appropriate number on the side of each item, or circle the relevant number against each item. The responses to the items are then summated. This is an interval scale.
EXAMPLE Respond to each item using the scale below, and indicate your response number on the line by each item. 5 Very Likely 4 likely 3 Neither Unlikely Nor Likely 2 Unlikely 1 Very Unlikely -- It is possible that I will be out of this organization with in the next 12 months. 3. -- I will take on new assignments in the near future. 2. -- I will be changing my job within the next 12 months. 1.
The respondents are here asked to distribute a given number of points across various items. This is an ordinal scale
EXAMPLE 100 Total Points ---------- Texture of lather ---------- Size ---------- Shape ---------- Color ---------- Fragrance In choosing a toilet soap, indicate the importance you attach to each of the following five aspects by allotting points for each to total 100 in all.
This scale simultaneously measure both the direction and intensity of the attitude toward the item under study. The characteristic of interest to the study is placed at the center and a numerical scale ranging, say from +3 to -3, on either side of the item. This gives the idea of how closer or distant the individual response to the stimulus. Since this does not an absolute zero point, this is an interval scale.
EXAMPLE -3 -3 -3 -2 -2 -2 -1 -1 -1 Interpersonal Skills Product innovation Adopting modern Technology +1 +1 +1 +2 +2 +2 +3 +3 +3 State how you would rate your supervisor’s abilities with respect to each of the characteristics mentioned below, by circling the appropriate number.
A graphical scale representation helps the respondents to indicate on this scale their answers to a particular question by placing a mark at the appropriate point in the line . This is an ordinal scale. The faces scale , which shows faces ranging from smiling to sad is also a graphic scale, used to obtain responses regarding people’s feelings.
9. GRAPHIC RATING SCALE EXAMPLE On a scale of 1 to 10 how would you rate your supervisor? 1 5 10 Good Better Best
Scales are also developed by consensus, where panel of judges selects certain items, which in its view measure the relevant concept. The items are chosen particularly based on their pertinence or relevant to the concept.
RANKING SCALES USED IN ORGANIZATION 1. PAIRED COMPARISON It is used when, among a small number of objects, respondents are asked to choose between two objects at a time. As the number of objects to be compared increases, so does the number of paired comparisons. The paired choices for n objects will be n (n-1) / 2. The greater the number of objects or stimuli, the greater the number of paired comparisons presented to the respondents, and the greater the respondent fatigue. Hence paired comparison is a good method if the number of stimuli presented is small.
It enables respondents to rank objects relative to one another, among the alternatives provided. This is easier for the respondents, practically if the number of choices to be ranked is limited in number.
2. FORCED CHOICE EXAMPLE 4 Compaq 5 Toshiba 2 IBM 3 Hewlett-Packard (HP) 1 Dell Rank the following Companies of Computer that you would like to subscribe to in the order of preference, assigning 1 for the most preferred choice and 5 for the least preferred.
It provides a benchmark or a point of reference to assess attitudes toward the current object, event, or situation under study.
3. COMPARATIVE SCALE EXAMPLE 5 4 3 2 1 Less Useful About the Same More Useful In a volatile (evaporation) financial environment, compared to stocks, how wise or useful is it to invest in Treasury bonds? Circle the appropriate response.
It is the surety that the instrument that we develop to measure a particular concept is indeed accurately measuring the variable, and that infect, we are actually measuring the concept perceptual and attitudinal measure. This ensures that in operationally defining perceptual and attitudinal variables, we have not overlooked some important dimensions and elements or included some irrelevant ones.
The reliability of a measure indicate the extent to which it is without bias (error free) and hence ensures consistent measurement across time and across the various items in the instrument. In other words, the reliability of a measures is an indication of the stability and consistency with which the instrument measures the concept and helps to assess the “goodness” of a measure.
The ability of a measure to remain the same over time-despite uncontrollable testing condition.
There are two tests of stability are of most importance.
The reliability coefficient obtained with a repetition of the same measure on a second occasion is called test-retest reliability. That is, when a questionnaire containing some items that are supposed to measure a concept is administered to a set of respondents now, and again to the same respondents, say several weeks to 6 months later, the correlation between the scores obtained at the two different times from one and the same set of respondents is called the test-retest reliability.
When responses on two comparable sets of measure tapping the same construct are highly correlated, we have parallel-form reliability. Both forms have similar items and the same response format, the only changes being the wordings and the order or sequence of the questions.
The internal consistency of measures is indicative of the homogeneity of the items in the measure that tap the construct. In other words, the items should “hang together as a set” and be capable of independently measuring the same concept so that the respondents attach the same overall meaning to each of the items.
This is a test of the consistency of respondents answer to all the items in a measure. To the degree that items are independent measures of the same concept, they will be correlated with one another.
INTER-ITEM CONSISTENCY RELIABILITY SPLIT-HALF RELIABILITY Split-half reliability reflects the correlations between two halves of an instrument. The estimates would vary depending on how the items in the measure are split into two halves.
As we know the terms internal validity and external validity. That is we were concerned about the issue of the authenticity of the cause-and-effect relationship (internal validity), and their generalizability to the external environment (external validity). We are now going to examine the validity of the measuring instrument itself. That is when we ask a set of questions (i.e., develop a measuring instrument) with the hope that we are tapping the concept, how can we be reasonably certain that we are indeed measuring the concept we set out to do and not some thing else?
We may group validity test under three headings:
CONTENT VALIDITY It ensures that the measure includes an adequate and representative set of items that tap the concept. The more the scale items represent the domain (circle of affection) or universe of the concept being measured, the greater the content validity. FACE VALIDITY It is considered by some as a basic and a very minimum index of content validity. Face validity indicates that the items that are intended to measure a concept, so on the face of it look like they measure the concept.
It is established when the measure differentiates individuals on a criterion it is expected to predict. This can be done by establishing concurrent ( with consensus) validity or predictive validity
CRITERION-RELATED VALIDITY CONCURRENT VALIDITY It is established when the scale discriminates individuals who are known to the different; that is they should score differently on the instrument . PREDICTIVE VALIDITY It indicates the ability of the measuring instrument to differentiate among individuals with reference to a future criterion.
It testified to how well the results obtained from the use of the measure fit the theories around which the test is designed. This is assessed through convergent and discriminant validity.
CONSTRUCT VALIDITY CONVERGENT VALIDITY It is established when the scores obtained with two different instrument measuring the same concept are highly correlated. DISCRIMINANT VALIDITY It is established when, based on the theory, two variables are predicted to be correlated, and the scores obtained by measuring them are indeed empirically found to be so.
TESTING GOODNESS OF MEASURES Goodness of data Reliability (accuracy In Measure- ment) Validity (we are Measuring The right Thing) Stability Consistency Test-retest reliability Interitem consistency reliability Parallel-form reliability Split-half reliability Logical validity (content) Criterion-related validity Congruent validity (construct) Face validity Convergent Predictive Concurrent Discriminant
TYPES OF VALIDITY Does the measure have a low correlation with a variable that is supposed to be unrelated to this variable? Discriminant Validity Do two instruments measuring the concept correlate highly? Convergent Validity Does the instrument tap the concept as theorized? Construct Validity Does the measure differentiate individuals in as manner as to help predict a future criterion? Predictive Validity Does the measure differentiate in a manner that helps to predict a criterion variable currently? Concurrent Validity Does the measure differentiate in a manner that helps to predict a criterion variable. Criterion-related Validity Do “experts” validate that the instrument measures what its name suggests it measures? Face Validity Does the measure adequately measure the concept? Content Validity Description Validity