Personality Test
Development
Introduction to Clinical
Psychology
Discussion Section #8 and #9
Personality Test Construction
Goal:
 Gain an increased understanding of the
concepts reliability and validity as they
per...
Test Construction Procedure
1.
2.

3.
4.
5.

Identify a need for a new test
Assemble an item pool (decide on
scale and ite...
1. Identify Need for a New Test
 What

is the objective of the new test/is
there really a need for it
 How will the test...
2. Assemble Item Pool
Two decisions:
 Content
 Format
Content
 Develop

a pool of items that fully
measure the construct
 Example: Depression
 What items should be included ...
Format
 Dichotomous

(true false)
 Polychotomous (multiple choice)
 Likert scales (degree of agreement)
 …many others
3. Pilot Item Pool
 Try

the pool of items out on people for
whom the test is being developed
 Test should be administer...
4. Select “Good” Items
Selecting “good” items involves complex
statistical analysis of the test results
which varies accor...
5. Examine Test’s Psychometric
Properties
 Does

the test yield consistent results
(reliability)?
 Do the test items mea...
Test Construction Exercise: Part 1
 Develop

a test that distinguishes first and
later born children
Test Construction Exercise:
Procedure
Divide into groups of 4 to 5 students
In Class
 As a group, develop an item to dist...
Administer Test
Item

% First Born
Agree

% Later
Born
Agree
Administer Final Test and Score!
Psychometric Properties
of Tests

Reliability and Validity
Reliability
 Consistency

of the observations or
measurements
 Reliability is inversely related to the
degree of error i...
What !?
What does this mean!?


High measurement error
translates to low
reliability



Low measurement error
translates...
Types of Measurement Error
Random
Factors unpredictably
influence
measurements.

Systematic
A persistent bias in the test
...
Types of Reliability
 Inter-rater

reliability (relevant to
observational systems and psychological
assessments requiring...
Inter-rater Reliability


Degree of correspondence between two raters



Inter-rater reliability of diagnoses based on
D...
Test-Retest Reliability
 The

consistency of results over periods
of time.

 The

consistency of the results for a test
...
Quantifying Test-Retest Reliability


Reliability is expressed as a correlation
coefficient



Values range from 0 (not ...
Factors Affecting Test-Retest
Reliability Estimates



Length of the intervening interval
Stability of the measured trai...
Split Half Reliability
 The

consistency of scores on two halves
of the test
Validity
A test can be reliable (consistently give the
same results) but not valuable.
Why?
If the test does not measure t...
Validity
 The

degree to which a test measures
what it is designed or intended to
measure.
Types of Validity
 Face

validity
 Content validity
 Criterion validity (predictive and
concurrent)
 Discriminant
 Co...
Face Validity
A judgment about the relevance of test items
 A type of validity that is more from the
perspective of the t...
Content Validity
 Degree

to which the measure covers the
full range of the (personality) construct.
and
 Degree to whic...
Criterion Validity
 The

degree to which the test results
(from your measure) are correlated with
another related constru...
Types of Criterion Validity



Concurrent: the two constructs are assessed at the same
time
Predictive: one construct ma...
Discriminant Validity


The degree to which the score on a measure
of a personality trait does not correlate with
scores ...
Construct Validity
 The

degree to which the measure
reflects the structure and features of the
hypothetical construct th...
Exercise: Reliability and Validity applied to the
Edinburgh Postnatal Depression Scale (EPDS)

 Let’s

consider reliabili...
What is the Edinburgh Postnatal
Depression Scale (EPDS)?







John Cox, Jenifer
Holden & Ruth
Sagovsky
10 item depre...
What is the Edinburgh Postnatal
Depression Scale (EPDS)?
Psychometric Characteristics
 10 item scale
 Assesses mood aspe...
Stems of all 10 EPDS Items








I have been able to laugh and see the funny side
of things.
I have looked forward ...
Stems of all 10 EPDS Items
(cont)









I have felt scared or panicky for no very good
reason.
I have been so unha...
Psychometric Evaluation of the
EPDS: An Exercise
 Is

the EPDS a good measure of
depression?
 Psychometrically, what doe...
Handout Questions
Test Construction Exercise:
Part 2: Evaluating Developed Tests

Regroup into your “test groups”
2. Evaluate items in terms...
Upcoming SlideShare
Loading in...5
×

D8 and d9 personality test development 10 2007-posting

336

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
336
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

D8 and d9 personality test development 10 2007-posting

  1. 1. Personality Test Development Introduction to Clinical Psychology Discussion Section #8 and #9
  2. 2. Personality Test Construction Goal:  Gain an increased understanding of the concepts reliability and validity as they pertain to tests  Gain an increased understanding of test development methods
  3. 3. Test Construction Procedure 1. 2. 3. 4. 5. Identify a need for a new test Assemble an item pool (decide on scale and item formats) Pilot item pool Select “good” items Examine test’s psychometric properties (reliability and validity)
  4. 4. 1. Identify Need for a New Test  What is the objective of the new test/is there really a need for it  How will the test be administered?  What is the ideal item format for this test?  Should more than one form be developed?  What special training will be required of test users in terms of administering or
  5. 5. 2. Assemble Item Pool Two decisions:  Content  Format
  6. 6. Content  Develop a pool of items that fully measure the construct  Example: Depression  What items should be included in the pool?
  7. 7. Format  Dichotomous (true false)  Polychotomous (multiple choice)  Likert scales (degree of agreement)  …many others
  8. 8. 3. Pilot Item Pool  Try the pool of items out on people for whom the test is being developed  Test should be administered under conditions similar to those that the developed test will be administered (e.g. same instructions, time frame, time limits)
  9. 9. 4. Select “Good” Items Selecting “good” items involves complex statistical analysis of the test results which varies according to the purpose of the test.(called item analysis) However, in tests of attitudes or personality characteristics one consideration is whether individuals endorse the full range of the scale provided.
  10. 10. 5. Examine Test’s Psychometric Properties  Does the test yield consistent results (reliability)?  Do the test items measure the intended construct (validity)?
  11. 11. Test Construction Exercise: Part 1  Develop a test that distinguishes first and later born children
  12. 12. Test Construction Exercise: Procedure Divide into groups of 4 to 5 students In Class  As a group, develop an item to distinguish first born from later born children Note: use a personality construct and not a physical characteristic (e.g. I have no older siblings)  Develop two responses for the item  Once your item is ready, tell Sara or Eunyoe so they can write it on the board (so others won’t give the same item)
  13. 13. Administer Test Item % First Born Agree % Later Born Agree
  14. 14. Administer Final Test and Score!
  15. 15. Psychometric Properties of Tests Reliability and Validity
  16. 16. Reliability  Consistency of the observations or measurements  Reliability is inversely related to the degree of error in the instrument.  High measurement error translates to low reliability  Low measurement error translates to high reliability
  17. 17. What !? What does this mean!?  High measurement error translates to low reliability  Low measurement error translates to high reliability Easy Example: A broken scale There will be high measurement error on a broken scale, correct? How consistent are the weights likely to be on a broken scale? Is a broken or working scale going to have more error? Is the broken or working scale going to be more reliable?
  18. 18. Types of Measurement Error Random Factors unpredictably influence measurements. Systematic A persistent bias in the test or in the interpretations made by examiner. Examples: Mood, environmental distractions, hunger or motivation interfere with the responses. Systematic errors, because they are consistently made will not affect reliability but they will affect validity
  19. 19. Types of Reliability  Inter-rater reliability (relevant to observational systems and psychological assessments requiring ratings or judgment)  Test-retest reliability  Split-half Note: Each form of reliability is not equally important for every assessment method
  20. 20. Inter-rater Reliability  Degree of correspondence between two raters  Inter-rater reliability of diagnoses based on DSM criteria improved with DSM-III and the development of operational criteria for most of the mental disorders Note: We will learn how to calculate next week!.
  21. 21. Test-Retest Reliability  The consistency of results over periods of time.  The consistency of the results for a test given at two different time periods  The correlation of test result scores
  22. 22. Quantifying Test-Retest Reliability  Reliability is expressed as a correlation coefficient  Values range from 0 (not at all consistent or reliable) to 1 ( perfectly consistent and reliable.  The value for adequate reliability is about .80 or greater
  23. 23. Factors Affecting Test-Retest Reliability Estimates   Length of the intervening interval Stability of the measured trait For example: In characteristics that are stable, like intelligence, the interval of time between the two tests should not affect the stability of the results. In contrast, in characteristics that are not stable, like depressed mood, the longer the interval between tests, the less reliable or consistent the scores. (not necessarily bad)
  24. 24. Split Half Reliability  The consistency of scores on two halves of the test
  25. 25. Validity A test can be reliable (consistently give the same results) but not valuable. Why? If the test does not measure the correct construct, then it is not useful even if the results are consistent.
  26. 26. Validity  The degree to which a test measures what it is designed or intended to measure.
  27. 27. Types of Validity  Face validity  Content validity  Criterion validity (predictive and concurrent)  Discriminant  Construct validity
  28. 28. Face Validity A judgment about the relevance of test items  A type of validity that is more from the perspective of the test taker as opposed to the test user Example: Personality tests Introversion-Extroversion test will be perceived as a highly (face) valid measure of personality functioning The inkblot test may not be perceived as a (face) valid method of personality functioning 
  29. 29. Content Validity  Degree to which the measure covers the full range of the (personality) construct. and  Degree to which the measure excludes factors that are not representative of the construct
  30. 30. Criterion Validity  The degree to which the test results (from your measure) are correlated with another related construct.  WHAT!? For example: the degree to which scores on an intelligence test are correlated with school performance or achievement.
  31. 31. Types of Criterion Validity   Concurrent: the two constructs are assessed at the same time Predictive: one construct may be measured at a later date For example: Concurrent: the correlation of SAT score with G.P.A. at the time of taking the SAT in high school. Predictive: the correlation of SAT score taken in high school with final G.P.A. upon graduating from college
  32. 32. Discriminant Validity  The degree to which the score on a measure of a personality trait does not correlate with scores on measures of traits that are unrelated with the trait under investigation. For example: (from text) Trait being measured: phobia Unrelated trait: intelligence You would not expect the score on your phobia scale to be correlated with the score on an intelligence test
  33. 33. Construct Validity  The degree to which the measure reflects the structure and features of the hypothetical construct that is being measured  Measured by combining all these aspects of validity.
  34. 34. Exercise: Reliability and Validity applied to the Edinburgh Postnatal Depression Scale (EPDS)  Let’s consider reliability and validity in the context of a real measure: the EPDS
  35. 35. What is the Edinburgh Postnatal Depression Scale (EPDS)?     John Cox, Jenifer Holden & Ruth Sagovsky 10 item depression screening tool (reliable and valid) Simple to complete Acceptable to mothers and health workers
  36. 36. What is the Edinburgh Postnatal Depression Scale (EPDS)? Psychometric Characteristics  10 item scale  Assesses mood aspects of depression not confounding somatic symptoms  Acceptable to women  Validated  Translated into many languages
  37. 37. Stems of all 10 EPDS Items      I have been able to laugh and see the funny side of things. I have looked forward with enjoyment to things. I have blamed myself unnecessarily when things went wrong. I have been anxious or worried for no good reason. Things have been getting on top of me.
  38. 38. Stems of all 10 EPDS Items (cont)      I have felt scared or panicky for no very good reason. I have been so unhappy that I have had difficulty sleeping. I have felt sad or miserable. I have been so unhappy that I have been crying. The thought of harming myself has occurred to me.
  39. 39. Psychometric Evaluation of the EPDS: An Exercise  Is the EPDS a good measure of depression?  Psychometrically, what does it mean to ask if the EPDS is a “good” measure of depression? Note: Follow the questions on the handout
  40. 40. Handout Questions
  41. 41. Test Construction Exercise: Part 2: Evaluating Developed Tests Regroup into your “test groups” 2. Evaluate items in terms of content validity and adequacy of scales 3. Select final items for test 4. Propose methods for evaluating reliability and validity of new measure 1.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×