Characteristics of a good test

52,513 views
52,178 views

Published on

Published in: Business, Technology
2 Comments
14 Likes
Statistics
Notes
  • well done.......covers the most of good test features
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • nice presentation...... one point is missing that is test objectivity
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
52,513
On SlideShare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
1,563
Comments
2
Likes
14
Embeds 0
No embeds

No notes for slide

Characteristics of a good test

  1. 1. CHARACTERISTICS OF A GOOD TESTa. Valid -----refers to the extent to which measure what is purpose to measure.b. State that if the test item is congruent to the behavior to be tested it is then valid. Types of Evidence CONSTRUCT-RELATED VALIDITY -- refers to how well a performance CONTENT VALIDITY- CRITERION-RELATED VALIDITY- on a particular set of task can be explained by some PSYCHOLOGICAL CONSTRUCT or TRAITS. 1. PREDICTIVE VALIDITY --- involves the use of criterion and THEORETICAL CONSTRUCT refers to the ADEQUACY and a predictor. Example correlating REPRESENTATIVENESS of the results of college entrance test -- describe by determining the learning outcomes to be measured and student GWA at some future components of such psychological time task (predictor= CET; criterion= GWA) 2. CONCURRENT VALIDITY CRITICAL CONSTRUCT -- criterion are already available in which CET is correlated with some - predictors, conclusions, can be assure with the use of T.O.S assumptions, inference, available criterion interpretations and relevance of (predictor= GWA; criterion= 4th year evidence high school grade)
  2. 2. CHARACTERISTICS OF A GOOD TESTRELIABILITY--refers to the ―CONSISTENCY‖ of the test score.--ERRORS of measurements are factors or conditions that can contribute to the lowering of the test reliability. If the test has low reliability we can be assure that errors of measurement have affected the test scores to the point that the test is UNRELIABLE SOME ERRORS OF MEASUREMENTS •What happening within the •Test contain (poorly •Lightning of room, room temp (too hot or too cold) •MISKEY/ providing wrong individual?(fatigue, hunger, constructed items, items with answer, mistake in headache, motional upset, clues, very easy, very noise, seating arrangement, instruction, time allotment, correcting a wrong answer, anxiety, growth and learning difficult, very high vocabulary mistake in the use of acquired before the test)— reading level )—tends to attitude of test examinee(MAKE THE TEST required pencil and tends to reduce the guessing particularly when it subjective scoring consistency of the SCORE is long UNRELIABLE LOWER THE OVER TIME TEST SCORE) Test Itself Test Test Takers (INTRA TEST Test Scoring Administrations ERROR)
  3. 3. TO DETERMINE THE CONSTRUCT VALIDITY OF CRITICAL THINKING 1. Each subtest is correlated with the whole test. 2. The correlation of each subtest which measures a particular components contribute to the measurement of a psychological trait which is critical thinking. Define by: X Y (proportion (subtest) (correlation with the total score) of common variance) DEGREES OF RLATIONSHIP BETWEEN TWO SETS OF SCORE +1.00----PERFECT POSITVE RELATIONSHIP (the better)more from the upper group got the test correctly. 0.00---- NO RELATIONSHIP -1.00----PERFECR NEGATIVE RELATIONSHIP more from the lower group got the test correctly. DISCRIMINANT VALIDITY---DIFFERENT TRAITS CONSTRUCT --- SCORE OF CRITICAL THINKING TEST ARE CORRELATED WITH THOSE OF ATTITUDES TOWARDS MOVIES
  4. 4. METHODS OF ESTIMATING TEST RELIABILITYTEST-RETEST METHOD --determines how scores are consistent over a given period of time. The same test is administered twice to the same group with an interval between 2 to 15 days(sufficient time interval)(2-3 days student can recall answer)(longer time interval lower the reliability)/true score= true score+error of measures/PARALLEL/ALTERNATE FORMS METHOD --used two different versions of the same test, administered to the same group close together in time. It used form A or B and can be given on the same day or the next day. The difference of the two is how they worded or written, it should measure the same skills and errors are significantly controlledTEST-RETEST WITH ALTERNATE FORMS METHOD --administering the two version of the same test on two different occasions. Time interval may be short(2 weeks)(longer for 6 months). Takes into account all possible sources of errors. It is the most useful indicates variation of a test score over a period of time.INTERNAL CONSISTENCY METHOD -- employ only one test administration of the same test given to the same group on individual.DIFERENT METHODS1. SPILT-HALF /ODD-EVEN METHOD—scoring odd items, scoring even items2. KUDER RICHARDSON FORMULA 20—two sets of score (odd and even) are correlated using PRODUCT MOMENT CORRELATION COEFFICIENT FORMULA3. TO TEST THE RELIABILITY OF THE WHOLE TEST (USE SPEARMAN-BROWN PROPHECY FORMULA )4. PEARSON r USED TO COMPUTE INTERMNAL CONSISTENCY OF A CERTAIN TEST USED IN SPLIT-HALF METHOD
  5. 5. Reliability coefficient is high then it is said to be homogenous.Consistency of the test scores determined over different parts of theentire test..RELIABILITY ESTIMATE WHAT TOMEASURETEST-RETEST : TEST ADMIN, TEST TAKERSALTERNATE FORMS ; TEST ADMIN, TESTITSELF,TEST-RETEST WITH ALTERNATE FORMS : TEST ADMIN, TESTITSELF, TEST TAKERSINTERNAL CONSISITENCY : TEST ADMIN,TEST ITSELFNOTE: a reliability coefficient of +.86 of a test measure that 86/100 of theobtained score of an individual is true score and 14/100 can be attributedto errors of measurements.
  6. 6. IMPROVING THE TEST ITEMS Item Analysis• Who answer the • Is the extent to item correctly which a test item differentiate good performer to poor performerIndex of Index ofdifficulty discrimination
  7. 7. METHOD TO EMPLOY IN ITEM ANALYSIS-USING THE UPPER AND LOWER INDEX METHOD 27/1001. After scoring the test, arrange from lowest to highest.2. Segregate the top and bottom 27/100 of the paper.3. Tally the correct answers to each item by each student in the upper 27/100 group.4. Repeat step three, considering the lower 27/100.5. Get the percentage of the upper group that obtained the correct answer use U.6. repeat step 5, considering lower group. Used L.7. Get the average percent of U and L.8. Get the difference between U and L. L/U = NO. OF PUPILS GOT ITEM CORRECT NL/NU = NO. OF PUPIL IN THE LOWER GROUP OVER UPPER GROUP
  8. 8. TABLE INTERPRETING DIFFICULTY INDEX Range Description 0.00 – 0.20 Very difficult 0.21 – 0.40 Difficult 0.41 – 0.60 Moderate difficult 0.61 – 0.80 Easy 0.81 – 1.00 Very easyThe higher the difficulty indexthe easiest the item is.
  9. 9. TABLE INTERPRETING INDEX OF DISCRIMINATION RANGE DESCRIPTION A goodtest item -1.00 - -0.61 Questionable itemseparatethe bright -0.59 - -0.20 Not discriminatingperformerfrom the -0.19 – 0.20 Moderate discriminatingpoor 0.21 – 0.60 Discriminating 0.61 – 1.00 Very discriminating The higher the index of Formula: discrimination the higher the Ds = {((U/NU)-(L/NL)} discrimination
  10. 10. WHEN WOULD YOU SAY “GOOD OR RETAINED” YOUR ITEM -must have ACCEPTABLE INDEX OF DIFFICULTY AND DISCRIMINATION ACCEPTABLE INDEX OF DIFFICULTY RANGES FROM 0.41 - 0.60 -ACCETABLE INDEX OF DISCRIMINATION RANGES FROM +0.20 - +1.00 FAIR OR REVISED -UNACCEPTED DIFFICULTY OR DISCRIMINATION INDEX POOR OR DISCARDED -BOTH DIFFICULTY AND DISCRIMINATION INDEX ARE UNACCEPTABLE. THEN THE ITEM NEED TO BE DISCARDED RIGHT AWAY
  11. 11. TABLE OF ACTION TO BE TAKEN DIFFICULTY LEVEL DISCRIMINATING ACTION LEVEL QUESTIONABLE ITEM VERY DIFFICULT DISCARD VERY DISCRIMINATING NOT DISCRIMINATING DISCARD MODERATELY DIFFICULT DISCRIMINATING REVISE DISCRIMINATING RETAIN NOT DISCRIMINATING REVISE MODERATELYMODERATE DIFFICULT DISCRIMINATING MAY NEED REVISION DISCRIMINATING ACCEPT NOT DISCRIMINATING DISCARD MODERATELY EASY DISCRIMINATING N.R. DISCRIMINATING N.R QUESTIONABLE SEE EXAMPLE VERY EASY DISCARD
  12. 12. TRADITIONAL ASSESSMENTDiscrete Point(Single Attribute Assessment) -- example Language assessment in the form of Multiple choices, matching type, true or false, orshort answerCharles Spearman(1904)-Two Factor Theory --general Factor Or G-factor and postulates specific or S-factor. Example of tests with g-factor areRaven Progressive Matrices and Catre’s Culture Fair Intelligence test Integrative or Global Assessment(Multiple Trait Assessment) --measure more than one point or objective at a time, and often pragmatic. Example is writing compositionCloze Test --innovative method for testing wherein words are deleted from a passage. The most common practice is to delete every 5th word. The acceptable range for readability of certain reading materialsis between 30-50 percent.C-Test -- second half of every word is deleted., leaving the first and last word intact, and commonly contains 100 wordsDictation Test -- primarily a test for listening, and spelling. It is a test use to measure the ability to use capitalletters, punctuation marks, spell words correctly and write legibly and neatly. ADMINISTERING DICTATION TEST Read each word once or twice as student listen, ask student to write the word. Read the word again for confirmation. Read each sentence slowly once or write then at normal speed once beforestudents are asked to write. And do not read the word while students are writingOral Interview- --kind of integrative assessment. It is a collecting information through face-to-face between the interviewee and interviewer. The interviewee is not at liberty to modify or make a follow up question.The question should be prepared before hand and objective should be taking in consideration
  13. 13. MEASURE OF CENTRAL TENDENCYRaw scores- scores obtainedTabulating raw scores steps in constructing a grouped frequency distribution are as follows1. Determine the range of scores, ranges is equal to the highest score minus the lowest score.2. Determined the appropriate number of class interval ideal 10-15. be sure that the lowest limit is divisible interval . Class interval is defined by k= 1+3.3logn, where n is the number of sample and n = (N/(1+Ne^2))3. Or i=range over k, the number of class size.4. Determine the lowest limit (LL) of the interval, LS/I width = Q*I = LL.5. Construct the frequency column (f) by tallying the no. of scores opposite each interval.Raking-Another way to organize test scores. It is the process of arranging a group of scoresfrom highest to lowest. The highest scores is designated as first ranked, and so on.-Steps in ranking the scores - 1. arrange the scores from highest to lowest, particular scores may be written as many times as it may occurs. - 2. put a serial number opposite to each. 1,2,3,4,,.. - 3. average the rank of each scores appearing more than one. Example 45,45,45appear three times and rank as 7, 8, and 9, then add = 24/3 = then they will be rank 8.
  14. 14. GRAPHING OF DATA 6 4 2 Series 1 0 Series 2 Series 3 6 Series 4 11. Histogram 22. Polygon 0 Series Cate… Cate… Cate… Cate…3. bar 2
  15. 15. MEASURES OF CENTRAL TENDENCY MEAN, MEDIAN, MODEThe MEAN– denoted by-Simply the average of the group and most widely accepted measures ofcentral tendency For Grouped data For ungrouped data --Where - using -- mean deviation -- summation of x am- assume meanN – total number of scores in d – deviationdistribution --- summation of frequency times devation.
  16. 16. The MEDIAN is defined by -- the middle most score in the distribution. It divides thedistribution in half or 50 % of the scores is found above the median, andthe other 50 % lies below the median .For ungrouped data For grouped data1. Arrange the scores fromhighest to lowest or viseversa. ll- lowest limit of N/22. If odd numbers, median N- no. of csesis the middle most number Cf- cummulative frequencyin the distribution. f- frequency where the measure lies i- nterval3. If even average themiddle.
  17. 17. The MODE is defined by -- The most frequent, extremes, and repeated numbers. It is notaffected if one number is changed less then or greater thanFor ungrouped data For grouped data1. The mode for ungroupeddata is the number thatoccur most. Mode = 3median –(2mean)
  18. 18. The measures of central tendency in different distribution 1. NORMAL DISTRIBUTION 2. POSITIVELY SKEWED DISTRIBUTION 3. NEGATIVELY SKEWED DISTRIBUTION
  19. 19. . Normal distribution
  20. 20. Positively skewed distribution 1. THERE ARE MORE LOW SCORES THAN HIGHER SCORE. 2. IT SHOWS THAT TEST IS SO DIFFICULT FORMED AN ASYMMETRICAL DISTRIBUTION > > MEAN>MEDIAN>MODEThe graph shows that the number of student who got good grades arerelatively lower than those who got lower grades..
  21. 21.  Negatively skewed distribution 1. THERE ARE MORE HIGH SCORES THAN LOWER SCORE. 2. IT SHOWS THAT TEST IS VERY EASY, THUS EVEN THE LOW PERFORMER STUDENT S GOT GOOD GRADE FORMED AN ASYMMETRICAL DISTRIBUTION > > MODE>MEDIAN>MEAN 2. INVERSE OF POSITIVELY DISTRIBUTIONThe graph shows that the number of student who got high grades are relatively more than those who got lower grades..
  22. 22. Forms of Assessment 1. TRADITIONAL ASSESSMENT - EXAMPLE MULTIPLE CHOICE, MATCHING TYPE, TRUE OR FALSE COMPLETION TEST 2. PERFORMANCE ASSESSMENT -ENGAGE IN COMPLEX TASK, CREATION OF PRODUCT EX. DANCE STEP, DEMONSTRATION 3. PORTFOLIO ASSESSMENT-ON GOING EVALUATION, INVOLVES GATHERING OR COLLECTING MANY DIFFERENT STUDENTS PROGRESS INDICATORS 4. AUTHENTIC ASSESSMENT -REAL LIFE CRITERIA USE OF JUDGMENTS
  23. 23. THANK YOU!

×