Quantitative measurement


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Quantitative measurement

  1. 1. Quantitative Measurement EDUU600 Notes from McMillan and Schumacher
  2. 2. Test Validity The extent to which inferences made on the basis of numerical scores are appropriate, meaningful, and useful Situation specific concept Depends on purpose, population, and environment characteristics in which test takes place Can be valid in one situation and not in another.
  3. 3. Educational Inferences Assessing achievement – how well the content of a test represents a larger domain of content or tasks Construct validity – attempts to measure traits or characteristics Intelligence, creativity, ability, attitudes, reasoning, self- concept Two types of rival hypotheses Construct underrepresentation – if the test fails to capture important aspects of the construct Construct irrelevant variance – extent to which a measure includes material extraneous to the intended construct (ex. Measuring math reasoning with story problems – reading required)
  4. 4. Types of evidence of test validity How well does it measure what it says it measures? - Evidence based on test content Are the response strategies consistent with what’s being measured? - Testing thinking processes with multiple choice questions How are the items in the test related to each other? – Evidence based on internal structure of the test related to how well items measuring the same trait are related and correlated How well does one instrument correlate with another similar instrument? How well does the test score or measure predict performance on a criterion measure – test criterion relationships
  5. 5. Reliability Refers to consistency of the measurement The extent to which measures are free from error Common sources of error Changes in time limits or directions Different scoring procedures Interrupted testing session Race of test administrator Time the test is taken Sampling of items Ambiguity in wording Effect of heat, light, ventilation Differences in observers Fatigue Health Motivation Luck Attention Anxiety
  6. 6. Types of Reliability Stability – Stable characteristics over time Same test/same individuals over time Equivalence – Comparability of two measures of the same trait given at about the same time Different forms of test to same individuals at about the same time Equivalence and Stability Compare two measures of same trait to same individuals over time. Internal consistency – administer one test and correlate the items to each other (Cronbach Alpha) Agreement – consistency of ratings or observations
  7. 7. Interpretation of Reliability Coefficients Acceptable range of reliability for coefficients for most instruments between .70 to .90 Higher Reliability Heterogeneous group on trait measured More items on the instrument Greater the range of scores Achievement tests of medium difficulty When based on norming group – only reliable for similar group The more that items discriminate between high and low achievers – the higher the reliability
  8. 8. Norm- and Criterion-Referenced Show how individual scores compare with scores of a well-defined reference or norm group of individuals Depends entirely on how the subject compare with one another The best distribution of scores shows a high variance Items must discriminate between individuals Test items must be fairly difficult If everyone gets a high score – there is NO differentiation between individuals Important content or skills may not be measured
  9. 9. Criterion-Referenced or Standards- Based Interpretation Normed tests Attend carefully to characteristics of the norm or reference group Ceiling effect - Testing gifted group with little variability - Criterion-referenced tests Score interpreted by comparing with professionally judged standards Standards of Proficiency – What subjects are able to do Result in highly skewed distribution Lessens variability Judge master of the domain tested
  10. 10. Standardized Tests Group Intelligence or Ability – Cognitive Abilities Test Individual Intelligence – Stanford-Binet/Wechsler Multifactor – Differential Aptitude Test Special – Torrance Test of Creative Thinking Diagnostic – Woodcock Reading Mastery Criterion-referenced – Writing Skills Test Specific subjects – Modern Math Understanding Test Batteries – Stanford Achievement Test Series
  11. 11. Personality, Attitude, Value, and Interest Inventories Personality – Rorshach Inkblot Test Coopersmith Self-Esteem Inventory Attitude – Minnesota School Affect Assessment Value – Work Values Inventory Interest – Kuder Occupational Interest Inventory
  12. 12. Designing a Questionnaire Justification – generally use proven instrument Defining Objectives – List specific information you hope to get Writing Questions and Statements Make items clear Avoid double-barrelled questions (avoid and) Respondents must be competent to answer (and provide reliable information) Questions should be relevant Short, simple items are best Avoid negative items Avoid biased items or terms
  13. 13. Types of Items Closed Form – Structured response where subject chooses between predetermined responses Open Form – subject writes in any response Scaled items Gradations, levels, or values Likert scale Semantic differential scale Ranked items Checklist items
  14. 14. Pros and Cons of Data Collection Techniques Paper/pencil Economical/standard Norms inappropriate Must be able to read Alternative Assessment Holistic/authentic Subjective rating Costly/time-consuming Questionnaire Economical/easy to score Response rate/inability to probe and clarify Biased/ambiguous items Interview Flexible/include nonverbal responses Costly/time consuming Can be anonymous Effect of interviewer and interviewer bias Observation Captures natural behavior Costly/time consuming Observer bias Not anonymous
  15. 15. Design a Questionnaire Strongly Agree Agree Neutral Disagree Strongly Disagree Always Most of the Time Sometimes Rarely Never Very happy Somewhat happy Neither sad or happy Somewhat sad Very Sad Use a Likert Scale, (or combination of Likert, ranking and differential) Read the types of descriptors on page 262. Like Dislike Important Unimportant
  16. 16. Smileys for Kids