• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Ca Baseline and Post test assessment report 2007 12 oct07
 

Ca Baseline and Post test assessment report 2007 12 oct07

on

  • 1,854 views

A quasi experimental evaluation design study comparing the impact of using the Continuous Assessment strategy in intervention and control schools in Zambia

A quasi experimental evaluation design study comparing the impact of using the Continuous Assessment strategy in intervention and control schools in Zambia

Statistics

Views

Total Views
1,854
Views on SlideShare
1,853
Embed Views
1

Actions

Likes
0
Downloads
14
Comments
0

1 Embed 1

http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Ca Baseline and Post test assessment report 2007 12 oct07 Ca Baseline and Post test assessment report 2007 12 oct07 Document Transcript

    • Ministry of Education DRAFT Technical Report of the Pre- and Post-Pilot Testing for the Continuous Assessment Programmein Lusaka, Southern and Western Provinces Coordinated by the Examinations Council of Zambia Research and Test Development Department Under the Direction of theContinuous Assessment Steering and Technical Committees Ministry of Education Lusaka, Zambia October 2007
    • Table of ContentsACKNOWLEDGMENTS ..................................................................................2CHAPTER ONE: BACKGROUND ....................................................................3 1.1 Introduction to Continuous Assessment....................................................... 3 1.2 Definition of Continuous Assessment .......................................................... 4 1.3 Challenges in the Implementation of Continuous Assessment .................... 4 1.4 Guidelines for Implementation of Continuous Assessment.......................... 5 1.5 Plan for Implementation of Continuous Assessment.................................... 7CHAPTER TWO: EVALUATION METHODOLOGY ..............................................8 2.1 Objectives .................................................................................................... 8 2.2 Design.......................................................................................................... 8 2.3 Sample......................................................................................................... 9 2.4 Instruments .................................................................................................. 9 2.5 Administration .............................................................................................. 9 2.6 Data Capture and Scoring.......................................................................... 10 2.7 Data Analysis ............................................................................................. 10CHAPTER THREE: ASSESSMENT RESULTS..................................................11 3.1 Psychometric Characteristics..................................................................... 11 3.2 Classical Test Theory ................................................................................ 11 3.3 Item Response Theory............................................................................... 14 3.4 Scaled Scores............................................................................................ 15 3.5 Vertical Scaled Scores ............................................................................... 18 3.6 Comparison between Pilot and Comparison Groups ................................. 19 3.7 Comparison across Regions ...................................................................... 24 3.8 Performance Categories ............................................................................ 25CHAPTER FOUR: SUMMARY AND CONCLUSIONS .........................................28APPENDIX 1: ITEM STATISTICS BY SUBJECTAPPENDIX 2: SCORES AND FREQUENCIES - GRADE 5 PRE-TESTSAPPENDIX 3: SCORES AND FREQUENCIES - GRADE 5 POST-TESTSAPPENDIX 4: HISTOGRAMS BY SUBJECT AND GROUP 1
    • ACKNOWLEDGMENTSThe Continuous Assessment Joint Steering and Technical Committees and theExaminations Council of Zambia wish to express profound gratitude to theprofessional and material support provided by the Provincial Education Offices,District Education Boards, Educational Zone staff in the different districts, schooladministrators, teachers and pupils. Without this support, the baseline and post-pilotassessment exercises would not have succeeded.Other appreciations go to the management in the Directorate for Curriculum andAssessment in the Ministry of Education for providing professional support towardsthe Continuous Assessment programme in general and the assessment exercises inparticular. We wish to specifically thank the Director for Standards and Curriculum,the Director for the Examinations Council of Zambia, and the Chief CurriculumSpecialist for allowing their personnel to take part in the assessment exercise.Finally, we wish to express our appreciation to the USAID and the EQUIP2 Projectfor providing the finances and technical support towards the Continuous Assessmentprogramme in Zambia.All of the participants and stakeholders listed above have played a crucial role in notonly developing and implementing the Continuous Assessment programme, buthave also been supportive of the quantitative evaluation of the programme presentedin this technical paper. It is because of their interest in improving student learningoutcomes that the Continuous Assessment programme has had the necessaryfinancial, administrative and technical support. Our hope is that the programme willprove to be valuable for all of the pupils and teachers in Zambian schools. 2
    • Chapter One: Background1.1 Introduction to Continuous Assessment Over the years in Zambia, the education system has not been able to provide enough spaces for all learners to proceed from Grade 7 to Grade 8, from Grade 9 to Grade 10, and from Grade 12 to higher learning institutions. The system has used examinations for selection of those to proceed to the next level and for the certification of candidates; however, this has been done without formal consideration of the school-based assessment as a component in the final examinations, with the exception of some practical subjects. The 1977 Educational Reforms explicitly provided for the use of Continuous Assessment (CA). Later, national policy documents, particularly Educating Our Future (1996) and Ministry of Education’s Strategic Plan 2003-2007, stated the need for integrating school-based continuous assessment into the education system, including the development of strategies to combine CA results with the final examination results for purposes of pupil certification and selection. Furthermore, the national education policy, as stated in Educating Our Future, stipulated that the Ministry of Education will develop procedures that will enable teachers to standardise their assessment methods and tasks for use as an integral part of school-based CA. The education policy document also stated that the Directorate of Standards, in cooperation with the Examinations Council of Zambia (ECZ), will determine how school-based CA can be better conducted so that it can contribute to the final examination results for pupil certification and promotion to the subsequent levels. The policy also stated that the Directorate of Standards, with input from the ECZ, will determine when school-based CA can be introduced. In order to set in motion the implementation of school-based CA, the ECZ convened a preparatory workshop from 16th to 22nd November 2003 in Kafue. Ninety (90) participants from various stakeholders’ institutions took part. The objectives of the preparatory workshop were to: • Recommend a plan for developing and implementing CA; • Recommend a training plan for preparing teachers in implementing CA; • Explore ways of ensuring transparency, reliability, validity and comparability in using CA results; • Agree on common assessment tasks and learning outcomes to be identified in the syllabuses for CA; • Discuss the development of a teacher’s manual on CA; and • Discuss the nature of summary forms for recording marks that should be provided to schools. 3
    • 1.2 Definition of Continuous Assessment Continuous assessment is defined as an on-going, diagnostic, classroom- based process that uses a variety of assessment tools to measure learner performance. CA is a formative evaluation tool conducted during the teaching and learning process with the aim of influencing and informing the overall instructional process. It is the assessment of the whole learner on an ongoing basis over a period of time, where cumulative judgments of the learner’s abilities in specific areas are made in order to facilitate further positive learning (Le Grange & Reddy, 1998). 1 The data generated from CA should be useful in assisting teachers to plan for the learning by individual pupils. It also should assist teachers in identifying the unique understanding of each learner in a classroom by informing the pupil of the level of instructional attainment, helping to target opportunities that promote learning, and reducing anxiety and other problems associated with examinations. CA has shown to have had positive impacts on student learning outcomes in hundreds of educational settings (Black & William, 1998). 2 CA is made up of a variety of assessment methods that can be formal or informal. It takes place during the learning process when it is most necessary, making use of criterion referencing rather than norm referencing and providing feedback on how learners are changing.1.3 Challenges in the Implementation of Continuous Assessment There are several areas in which the implementation of CA in the classroom will present challenges. Some of these are listed below. • Large class sizes in most primary schools are a major problem. It is common to find classes of 60 and above in Zambian classrooms. Teachers are expected to mark and keep records of the progress of all of these learners. • CA can take a lot of time for teachers. As a result, teachers get concerned that time spent on remediation and enrichment is excessive and many teachers do not believe that they would finish the syllabus with CA. • CA will not be successfully implemented if there are inadequate teaching resources / equipment in schools. Teachers need materials and equipment such as stationery, computers and photocopiers (and electricity). • There may be cases of resistance from school administrators and teachers if they feel left out in the process of developing the CA programme. • CA requires the cooperation of communities and parents. If they do not understand what is expected of them, they may resist and hence affect the success of the programme.1 Le Grange, L.L. & Reddy, C. 1998. Continuous Assessment: An Introduction and Guidelines toImplementation. Cape Town, South Africa: Juta.2 Black, P. & William, D. (1998). Assessment and classroom learning. Assessment in Education, 5(1),7-74. 4
    • 1.4 Guidelines for Implementation of Continuous Assessment A teachers’ guide on the implementation of continuous assessment at the basic school level was developed with the involvement of curriculum specialists, Standards officers, Examinations specialists, Provincial Education Officials, District Education Officials, Zonal in-Service training providers, school administrators and teachers. The Teachers’ Guide on CA comprises the following: • Sample record forms; • Description of the CA schemes; • Instructions for preparing and administering assessment materials; • Marking and moderation of the CA marks; • Recording and reporting assessment results; and • Monitoring of the implementation of the CA. The Teachers’ Guide also specifies the roles of stakeholders as follows: Teachers • Plan assessment tasks, projects and mark schedules; • Teach, guide and supervise pupils in implementing given tasks; • Conduct the assessment in line with given guidelines; • Mark and record the results; • Provide correction and remedial work to the pupils; • Inform the head teacher and parents on the performance of the child; • Advise and counsel the pupils on their performance in class tasks; • Take part in internal moderation of pupils’ results. School Administrators • Provide an enabling environment, such as the procurement of teaching and learning materials; • Act as links between the school and other stakeholders like ECZ, traditional leaders, politicians and parents; • Ensure validity, reliability and comparability through moderation of CA; • Compile CA results and hand them to ECZ. Parents • Provide professional, moral, financial and material support to pupils. • Continuously monitor their children’s attendance and performance • Take part in making and enforcing school rules. • Attend open days and witness the giving of prizes (rewards) to outstanding pupils in terms of performance. 5
    • Standards Officers• Interpret Government of Zambia policy on education;• Monitor education policy implementation at various levels of the education system;• Advise and evaluate the extent to which the education objectives have been achieved;• Ensure that acceptable assessment practices are conducted;• Monitor the overall standards of education.Guidance Teachers/School Counsellors• Prepare and store record cards for CA;• Counsel pupils, teachers and parents/ guardians on CA and feedback;• Take care of the pupils’ psycho-social needs;• Make referrals for pupils to access other specialized assistance/support.Heads of Department/Senior Teachers/Section Heads• Monitor and advise teachers in the planning, setting, conducting, marking and recording of CA results;• Ensure validity, reliability and dependability of CA by conducting internal moderation of results;• Hold departmental meetings to analyze the assessment;• Provide or make available the teaching and learning materials;• Compile a final record of CA results and hand them over to Guidance Teachers for onward submission to the ECZ.District Resource Centre Coordinators• Ensure adequate in service training for teachers in planning, conducting, marking, moderating and recording results at school level in the district;• Monitor the conduct of CA in the schools and district;• Professionally guide teachers to ensure provision of quality education at school level.Provincial Resource Centre Coordinators• Ensure adequate in-service training for teachers for them to be effective in planning, conducting, marking, moderating and recording CA results;• Monitor the conduct of CA in the province;• Professionally guide teachers to ensure provision of quality education at provincial level.Examinations Specialist• Analyse and moderate CA results and certify candidates;• Integrate CA results with terminal examination results;• Determine grade boundaries;• Certify the candidates; 6
    • • Disseminate the results of candidates. Monitors As monitors of the CA programme, various officials and stakeholders will look out for the following documents and information: • Progress chart; • Record of CA results and analysis; • Marked evidence of pupils’ CA work on remedial activities; • Evaluating gender performance; • Pupil’s Record Cards; • CA plans or schedules and schemes; • Evidence of pupils’ work; • CA administration; • Evidence of remedial work; • Availability of planned remedial work in the classroom; • Availability of the teacher’s guide; • Sample CA tasks; • Evidence of a variety of CA tasks; • Teacher’s record of pupils’ performance.1.5 Plan for Implementation of Continuous Assessment CA in Zambia is planned to roll out over a period of several years. This will allow for proper stakeholder support and evaluation. The following list provides the brief timeline of important CA activities through 2008: • Creation of CA Steering and Technical Committees (2005); • Development of assessment schemes, teacher’s guides, model assessment tasks booklets and recordkeeping forms (2005); • Design of quantitative evaluation methodology with focus on student learning outcomes (2005); • Implementation of CA pilot in Phase 1 schools: Lusaka, Southern and Western regions (2006); • Baseline report on student learning outcomes (2006); • Implementation of CA pilot in Phase 2 schools: Central, Copperbelt and Eastern Regions (2007); • Expansion of modified CA pilot to community schools (2007); • Post-test report on student learning outcomes (2007); • Implementation of CA pilot in Phase 3 schools: Luapula, Northern and Northwestern Regions (2008); • Discussion of scaling up of CA pilot and systems-level planning for combining Grade 7 end-of-cycle summative test scores with CA scores for selection and certification purposes (2008). 7
    • Chapter Two: Evaluation Methodology2.1 Objectives The main objective of the quantitative evaluation is to determine whether the CA programme has had positive effects on student learning outcomes. The evaluation allows for a determination of whether pupils’ academic performance has changed as a result of the CA intervention, as well as the extent of the change in performance.2.2 Design The evaluation design is quasi-experimental, with pre-test and post-tests administered to intervention (pilot) and control (comparison) groups. It features a pre-test at the beginning of Grade 5 and post-tests at the end of Grades 5, 6, and 7. The pilot and comparison groups will be compared at each time point in 6 subject areas to see if there are differences in test scores from the baseline to the post-tests by group (see Figures 1 and 2 below). 3 Figure 1: Pre-Test and Post-Test, Pilot and Control Group Design Grade 5 Grade 5 Grade 6 Grade 7 Pre-test Post-test Post-test Post-test Pilot Pilot Pilot Pilot Group Group Group Group Control Control Control Control Group Group Group Group Figure 2: Expected Results from the Evaluation 650 600 550 Scaled Score 500 450 400 350 Pilot 300 Control 250 200 G5 Pre-test G5 Post-test G6 Post-test G7 Post-test Assessment3 For more information, refer to the Summary of the Continuous Assessment Program August 2007 bythe Examinations Council of Zambia and the EQUIP2-Zambia project. 8
    • With the matched pairs random assignment design, it was expected that the two groups, pilot and control, would have similar mean scores on the pre-test. However, with a successful intervention, it was expected that the pilot group would score higher than the control group on the subsequent post-tests.2.3 Sample The sample included all the 2006 (pre-test) and 2007 (post-test) Grade 5 basic school pupils in Lusaka, Southern and Western Provinces in the 24 pilot (intervention) and 24 comparison (control) schools. The schools were chosen using matched pairs by geographic location, school size, and grade levels as matching variables, followed by random assignment to pilot and comparison status. CA activities were implemented in pilot schools but not in the comparison schools.2.4 Instruments Student achievement for the Grade 5 baseline and post-pilot administrations was measured using multiple choice tests with 30 items (30 points per test). The test development process included the following steps: • Review of the curriculums for each subject area; • Development of test specifications; • Development of items; • Piloting of items; • Data reviews of item statistics; • Forms pulling (selecting items for final test papers). The test instruments were developed by teams of Curriculum Specialists, Standards Officers, Examination Specialists and Teachers. The baseline tests (pre-tests) were developed based on the Grade 4 syllabus and the post-pilot tests (post-tests) were developed based on the Grade 5 syllabus.2.5 Administration The ECZ organized the administration of both pre-test and post-test papers. Teams comprising an Examination Specialist, a Standards Officer and a Curriculum Specialist were sent to each region to supervise the administration. District Education officials, School Administrators and Teachers were involved in the actual administration of the tests. All of the Grade 5 pupils in the pilot and comparison schools sat for six tests, one in each of the six subject areas (English, Mathematics, Social and Development Studies, Integrated Science, Creative and Technology Studies and Community Studies). The baseline tests (Grade 4 syllabus) were administered to the students at the beginning of Grade 5, in February 2006. The post-pilot tests (Grade 5 syllabus) were administered in February 2007. Note that there will be two more administrations of post-tests for the cohort of students in the three provinces. These will take place in February 2008 9
    • (Grade 6 syllabus) and November 2008 (Grade 7 syllabus). This process will be repeated in Phases 2 and 3 schools (see Table 1 below). Table 1: Implementation Plan for CA Pilot Phase 2006 2007 2008 2009 2010 Phase 1 (Lusaka, Grade 5 Grade 6 Grade 7 Southern, Western) Phase 2 (Central, Grade 5 Grade 6 Grade 7 Copperbelt, Eastern ) Phase 3 (Luapula, Grade 5 Grade 6 Grade 7 Northern, Northwestern)2.6 Data Capture and Scoring Data were captured using Optical Mark Readers (OMR) and scored by use of the Faim software at the ECZ. Through this process, tem scores for all students were converted into electronic format and data files were produced for analysis.2.7 Data Analysis Data were analysed by use of the Statistical Package for Social Sciences (SPSS). Scores and frequencies by subject were generated. Analysed data were presented in tabular, chart and graphical forms. Additional analyses were conducted using WINSTEPS (item response theory Rasch modelling) software. SPSS was used for scaling the pupils’ scores. 10
    • Chapter Three: Assessment Results3.1 Psychometric Characteristics An initial step in determining the results from the assessments was to conduct analyses to determine the psychometric characteristics of the assessments. Both the Standards for Educational and Psychological Testing (1999) 4 and the Code of Fair Testing Practices in Education (2004) 5 include standards for identifying quality items. Items should assess only knowledge or skills that are identified as part of the domain being tested and should avoid assessing irrelevant factors (e.g., ambiguous and grammatical errors, sensitive content or language, etc.). Both quantitative and qualitative analyses were conducted to ensure that items on both Grade 5 baseline and post-pilot tests met satisfactory psychometric guidelines. The statistical evaluations of the items are presented in two parts, using classical test theory (CTT) and item response theory (IRT), which is sometimes called modern test theory. 6 The two measurement models generally provide similar results, but IRT is particularly useful for test scaling and equating. CTT analyses included 1) difficulty index (p-value), 2) discrimination index (item-test correlations), and 3) test reliability (Cronbachs Alpha for an estimate of internal consistency reliability). IRT analyses included (1) calibration of items, and (2) examination of item difficulty index (i.e., b-parameter).3.2 Classical Test Theory Difficulty Indices (p) All multiple-choice items were evaluated in terms of item difficulty according to standard classical test theory practices. Difficulty was defined as the average proportion of points achieved on an item by the students. It was calculated by obtaining the average score on an item and dividing by the maximum possible score for the item. Multiple-choice items were scored dichotomously (1 point vs. no points, or correct vs. incorrect), so the difficulty index was simply the proportion of students who correctly answered the item. All items on Grade 5 pre-tests and post-tests had four response options. Table 2 shows the average p-values for each test. Note that this may also be calculated by taking the average raw score of all students divided by the maximum points (30) per test.4 American Educational Research Association, American Psychological Association, and NationalCouncil on Measurement in Education (1999). Standards for Educational and Psychological Testing.Washington, DC: American Educational Research Association.5 Joint Committee on Testing Practices (2004). Code of Fair Testing Practices in Education.Washington, DC: American Psychological Association.6 For more information, see Crocker, L. and Algina, J. (1986). Introduction to Classical and ModernTest Theory. New York: Harcourt Brace. 11
    • Table 2: Overall Test Difficulty Estimates by Subject Area Grade 5 Pre-test Grade 5 Post-test Subject Area Mean Mean # Items # Items p-value p-value English 30 0.40 30 0.37 Social and Developmental Studies 30 0.34 30 0.42 Mathematics 30 0.41 30 0.40 Integrated Science 30 0.33 30 0.36 Creative and Technology Studies 30 0.35 30 0.36 Community Studies 30 0.32 30 0.37Items that are answered correctly by almost all students provide littleinformation about differences in student ability, but they do indicateknowledge or skills that have been mastered by most students. Similarly,items that are correctly answered by very few students may indicateknowledge or skills that have not yet been mastered by most students, butsuch items provide little information about differences in student ability. Ingeneral, to provide best measurement, difficulty indices should range fromnear-chance performance of about 0.20 (for four-option, multiple-choiceitems) to 0.90. In general, the item difficulty indices for both Grade 5 pre-testsand post-tests were within generally acceptable and expected ranges (seeAppendix 1 for a complete list of p-values for all items on each test).Item Discrimination (Item-Test or Point-Biserial Correlations)One desirable feature of an item is that the higher performing students dobetter on the item than lower performing students. The correlation betweenstudent performance on a single item and total test score is a commonly usedmeasure of this characteristic of an item. Within classical test theory, the item-test (or point-biserial) correlation is referred to as the item’s discriminationbecause it indicates the extent to which successful performance on an itemdiscriminates between high and low scores on the test. The theoretical rangeof these statistics is –1 to +1, with a typical range from 0.2 to 0.6.Discrimination indices can be thought of as measures of how closely an itemassesses the same knowledge and skills assessed by other items contributingto the total score. Discrimination indices for Grade 5 are presented in Table 3.Table 3: Overall Test Discrimination Estimates by Subject Area Grade 5 Pre-test Grade 5 Post-test Subject Area Mean Mean # Items # Items Pt-bis Pt-bis English 30 0.46 30 0.48 Social and Developmental Studies 30 0.38 30 0.45 Mathematics 30 0.37 30 0.41 Integrated Science 30 0.35 30 0.43 Creative and Technology Studies 30 0.38 30 0.44 Community Studies 30 0.29 30 0.43 12
    • On average, the discrimination indices were within acceptable and expected ranges (i.e., 0.20 to 0.60). The positive discrimination indices indicate that students who performed well on individual items tended to perform well overall on the test. There were no items on the instruments that had near-zero discrimination indices (see Appendix 1 for a complete list of the point-biserial correlations for all items on each pre-test and post-test per subject area). Test Reliabilities Although an individual item’s statistical properties is an important focus, a complete evaluation of an assessment must also address the way items function together and complement one another. There are a number of ways to estimate an assessment’s reliability. One possible approach is to give the same test to the same students at two different points in time. If students receive the same scores on each test, then the extraneous factors affecting performance are small and the test is reliable. (This is referred to as test-retest reliability.) A potential problem with this approach is that students may remember items from the first administration or may have gained (or lost) knowledge or skills in the interim between the two administrations. A solution to the ‘remembering items’ problem is to give a different, but parallel test at the second administration. If the student scores on each test correlate highly, the test is considered reliable. (This is known as alternate forms reliability, because an alternate form of the test is used in each administration.) This approach, however, does not address the problem that students may have gained (or lost) knowledge or skills in the interim between the two administrations. In addition, the practical challenges of developing and administering parallel forms generally preclude the use of parallel forms reliability indices. One way to address these problems is to split the test in half and then correlate students’ scores on the two half-tests; this in effect treats each half-test as a complete test. By doing this, the problems associated with an intervening time interval, and of creating and administering two parallel forms of the test, are alleviated. This is known as a split-half estimate of reliability. If the two half-test scores correlate highly, items on the two half-tests must be measuring very similar knowledge or skills. This is evidence that the items complement one another and function well as a group. This also suggests that measurement error will be minimal. The split-half method requires a judgment regarding the selection of which items contribute to which half-test score. This decision may have an impact on the resulting correlation; different splits will give different estimates of reliability. Cronbach (1951) 7 provided a statistic, α (alpha), that avoids this concern about the split-half method. Cronbach’s α gives an estimate of the average of all possible splits for a given test. Cronbach’s α is often referred to as a measure of internal consistency because it provides a measure of how well all the items in the test measure one single underlying ability. Cronbach’s α is computed using the following formula:7 Cronbach, L. J. (1951). Coefficient Alpha and the Internal Structure of Tests. Psychometrika, 16,297–334. 13
    • ⎡ n ⎤ n ⎢ ∑σ 2 (Yi ) ⎥ α = ⎢1 − i =1 ⎥ n −1 ⎢ σ x2 ⎥ ⎢ ⎥ ⎣ ⎦ where, i : Item n : Total number of items, σ 2 (Yi ) : Individual item variance, and σ x2 : Total test variance For standardized tests, reliability estimates should be approximately 0.80 or higher. According to Table 4, the reliabilities for the tests on the pre-test ranged from 0.63 (Community Studies) to 0.87 (English). The reliability estimate for Community Studies was low due to the absence of a national curriculum for use in test construction. In contrast, the reliability estimates for the post-tests ranged 0.83 (Mathematics) to 0.89 (English). It is likely that the post-tests had higher reliability estimates since the test developers had more experience than they had when they developed the baseline tests. Table 4: Test Reliability Estimates by Subject Area Grade 5 Pre-test Grade 5 Post-test Subject Area Coefficient Coefficient # Items # Items Alpha Alpha English 30 0.87 30 0.89 Social and Developmental Studies 30 0.80 30 0.87 Mathematics 30 0.79 30 0.83 Integrated Science 30 0.76 30 0.85 Creative and Technology Studies 30 0.80 30 0.86 Community Studies 30 0.63 30 0.853.3 Item Response Theory Item Response Theory (IRT) uses mathematical models to define a relationship between an unobserved measure of student ability, usually referred to as theta ( θ ), and the probability ( p ) of getting a dichotomous item correct. In IRT, it is assumed that all items are independent measures of the same construct or ability (i.e., the same θ ). The process of determining the specific mathematical relationship between θ and p is referred to as item calibration. Once items are calibrated, they are defined by a set of parameters which specify a non-linear relationship between θ and p . 88 For more information about item calibration, see the following references: Lord, F.M. and Novick,M.R. (1968). Statistical Theories of Mental Test Scores. Boston, MA: Addison-Wesley; Hambleton,R.K. and Swaminathan, H. (1984). Item Response Theory: Principles and Applications. New York:Springer. 14
    • For the CA programme, a 1-parameter or Rasch model was implemented. The equation for the Rasch model is defined as probability of giving correct response to item i by a student with ability level of θ : exp D(θ − bi ) Pi (θ ) = 1 + exp D(θ − bi ) Where, i = item, b = item difficulty, D = a normalizing constant equal to 1.701. In IRT, item difficulty ( bi ) and student ability ( θ ) are measured on a scale of − ∞ to + ∞ . A scale of − 3.0 to + 3.0 is used operationally in educational assessment programmes. with − 3.0 being low student ability or an easy item and + 3.0 being high student ability or a difficult item. The bi parameter for an item is the position on the ability scale where the probability of a correct response is 0.50. The WINSTEPS program was the software used to do the IRT analyses. The item parameter files resulting from the analyses are provided in Appendices 2 and 3. This presentation is direct output from WINSTEPS. 9 Raw scores were then scaled using the item response theory model, with a range of 100-500 (see Appendices 2 and 3 for the raw score to scale score conversion tables for each subject area).3.4 Scaled Scores The Grade 5 pre-test and post-test scores in each subject area are reported on a scale that ranges from 100 to 500. Students’ raw scores or total number of points, on the pre-tests and post-tests are translated to scaled scores using a data analysis process called scaling. Scaling simply converts raw points from one scale to another. In the same way that distance can be expressed in miles or kilometres, or monetary value can be expressed in terms of U.S. dollars or Zambian Kwacha, student scores on both pre and post-tests could be expressed as raw scores (i.e., number of points) or scaled scores. Cut points were established on the raw score scale both for the pre-tests and post-tests (see Section 3.8 “Performance Levels” for an explanation of how these cut points were determined). Once the raw score cut points were determined via standard setting, the next step was to compute theta cuts using the test characteristic curve (TCC) mapping procedure and then calculate the transformation coefficients that would be used to place students’ raw scores onto the theta scale then onto the scaled score used for reporting. As previously stated, student scores on the assessments are reported in integer values from 100 to 500 with two scores representing cut scores on each assessment. Two cut points (Unsatisfactory/Satisfactory and Satisfactory/Advanced) were pre-set at 250 and 350, respectively.9 See the WINSTEPS user’s manual for additional details regarding this output (athttp://www.winsteps.com). 15
    • Figure 3: Scaled Score Conversion Procedure Raw Score Cut Conversion of Raw Score Cuts into theta Calculation of Scores (from cuts θ1 and θ 2 Using TCC Mapping Scaled Score Standard Setting) constants (b and m) using theta cuts ( θ 1 , θ 2 ), and Calculation of Scaled Score using scaled score cuts (250 and m(θ ) + b 350)The scaled scores are obtained by a simple linear transformation of the thetascore using the values of 250 and 350 on the scaled score metric and theassociated theta cut points to define the transformation. The scalingcoefficients were calculated using the following formulae: b = 250 − m(θ1 ) b = 350 − m(θ 2 ) (350 − 250) m= (θ 2 − θ1 )Where m is the slope of the line providing the relationship between the thetaand scaled scores, b is the intercept, θ 1 is the cut score on the theta scoremetric for the Unsatisfactory/Satisfactory cut (i.e., corresponding to the rawscore cut for Unsatisfactory/Satisfactory), and θ 2 is the cut score on the thetascore metric for the Satisfactory/Advanced cut (i.e., corresponding to the rawscore cut for Satisfactory/Advanced). Scaled scores were then calculatedusing the following linear transformation (see Figure 1): Scaled Score = m (θ ) + bWhere, θ represents a student’s theta (or ability) score. The values obtainedusing this formula were rounded to the nearest integer and then truncatedsuch that no student received a score below 100 or above 500. Table 4presents the mean raw score for each grade/subject area combination in preand post-tests.It is important to note that converting from raw scores to scaled scores doesnot change the students’ performance-level classifications. For the ZambiaCA programme, a score of 250 is the cut score between Unsatisfactory andSatisfactory and a score of 350 is the cut score between Satisfactory andAdvanced. This is true regardless of which subject area, grade, or year onemay be concerned with.Scaled scores supplement the pre-test and post-test results by providinginformation about the position of a student’s results within a performancelevel. For instance, if the range for a performance level is 200 to 250, a 16
    • student with a scaled score of 245 is near the top of the performance level,and close to the next higher performance level.School level scaled scores are calculated by computing the average ofstudent-level scaled scores. Table 5 provides the raw score averages for eachof the subject areas, while Table 6 provides the same information in scaledscores.Table 5: Grade 5 Mean Raw Scores by Subject Area Grade 5 Pre-test Grade 5 Post-test # Subject Area Std. Std. Items N Mean N Mean Dev. Dev. English 30 3798 12.2 6.5 4025 11.7 7.1 Social and Developmental Studies 30 3962 10.1 5.3 4104 13.2 6.6 Mathematics 30 3883 12.3 5.3 4127 12.4 5.8 Integrated Science 30 4039 9.9 4.9 4135 11.1 6.3 Creative and Technology Studies 30 4032 10.5 5.3 4097 11.7 6.2 Community Studies 30 4037 9.5 4.0 4141 11.2 6.4According to Table 5, overall mean raw scores (with both pilot andcomparison groups taken together) across the subject areas on the pre-testranged from 9.5 (Community Studies) to 12.3 (Mathematics) out of possiblescore point of 30. In contrast, the overall mean raw scores for the post-testsranged from 11.1 (Integrated Science and Creative and Technology Studies)to 13.2 (Social and Developmental Studies). From Table 6, the scaled scoreaverages for Grade 5 pre-tests ranged from 214 (Community Studies) to 239(English) out of possible score point of 100-500. In contrast, the scaled scoreaverages for the post-tests ranged from 233 (English) to 262 (Mathematics).Table 6: Grade 5 Mean Scaled Scores by Subject Area Grade 5 Pre-test Grade 5 Post-test # Subject Area Std. Std. Items N Mean N Mean Dev. Dev. English 30 3798 238.8 83.7 4025 233.4 88.1 Social and Developmental Studies 30 3962 230.5 86.2 4104 241.2 83.9 Mathematics 30 3883 222.4 89.2 4127 261.9 72.6 Integrated Science 30 4039 226.5 80.2 4135 245.7 73.7 Creative and Technology Studies 30 4032 224.1 85.3 4097 244.3 83.0 Community Studies 30 4037 214.0 83.7 4141 236.9 72.3It was stated earlier that scaled score is a simple linear transformation of theraw scores, using the values of 250 and 350 on the scaled score metric.Student’s relative position on the raw score matrix does not change due tothis scale transformation.Note that the primary interest of this evaluation is not whether the raw scoresand/or scaled scores increase or decrease from pre-test to post-test. Thesedifferences will occur mainly through variations in test difficulty. The mainanalysis will compare the relative changes in the two groups, i.e., pilot and 17
    • comparison, across the two time points, i.e., pre-test to post-test. At a later point, post-tests will also be conducted when the cohort of students is in Grade 6 and Grade 7, followed by extended analyses for the two additional time points.3.5 Vertical Scaled Scores In vertical scaling, tests that vary in difficulty level, but that are intended to measure similar constructs, are placed on the same scale. Placing different tests on the same scale can be implemented in a number of ways, such as, linking items across the tests or social moderation. For the CA programme, a social moderation (Linn, 1993) procedure was employed for vertical scaling. 10 In social moderation, assessments are developed in reference to a common content framework. Performance of individual students, and schools, is measured against a single set of common standards. For Zambia, an analysis of the Grade 4 and 5 curriculums showed that the content was vertically aligned, i.e., students were expected to progress in their learning along the same constructs from one grade level to the next. This allowed the test developers to link the pre-tests and post-tests through common performance standards. The visual representation of the vertical scaling scheme for the CA programme is shown below. Figure 4: Vertical Scaling Scheme Grade 5 Pre-test: 250 350 Grade 5 Post-test: 350 450 Grade 6 Post-test: 450 550 Grade 7 Post-test: 550 650 In other words, students who were classified as Advanced in the Grade 5 pre- test (i.e., end of Grade 4 syllabus) would also be considered as Satisfactory in Grade 5 post-test (i.e., end of Grade 5 syllabus) and students who classified as Advanced in Grade 5 post-test would be considered as Satisfactory in Grade 6 post (end of Grade 6 test) so on through Grade 7. In the vertical10 Linn, R. L. (1993). Linking results of distinct assessments. Applied Measurement in Education, 6(1),83-102. 18
    • scaled score matrix, students who earned a grade level scaled score of 250 on Grade 5 post-test would also earn a vertical scaled score of 350 (because 350 is the equivalent grade level scaled score in Grade 5 pre-test). Therefore, grade level scaled scores and vertical scaled scores is differed by a constant value of 100 points. The mean vertical scaled scores for each subject are shown in Table 7. Table 7: Grade 5 Mean Vertical Scaled Scores by Subject Area Grade 5 Pre-test Grade 5 Post-test # Subject Area Std. Std. Items N Mean N Mean Dev. Dev. English 30 3798 238.8 83.7 4025 333.4 88.1 Social and Developmental Studies 30 3962 230.5 86.2 4104 341.2 83.9 Mathematics 30 3883 222.4 89.2 4127 361.9 72.6 Integrated Science 30 4039 226.5 80.2 4135 345.6 73.7 Creative and Technology Studies 30 4032 224.1 85.3 4097 344.4 83.0 Community Studies 30 4037 214.0 83.7 4141 336.9 72.3 Figure 5 shows that mean vertical scaled scores on pre and post-tests across the subject areas. Vertical scaled scores for the pre-test are basically the grade level scaled scores. As expected, vertical scaled scores for Grade 5 post-test are higher than the Grade 5 pre-test scaled scores. Figure 5: Vertical Scaled Mean Scores by Subject Area 400 vertical Scaled Score 300 PRE 200 POST 100 0 Eng. SDS Math. ISC CTS CS3.6 Comparison between Pilot and Comparison Groups The comparisons between pilot and comparison groups were made in raw scores and vertical scaled scores. Although raw scores in the pre and post tests are not on the same scale as the tests are of varied difficulty, however the comparison was made for simplicity. Comparison would be more relevant, valid, and beneficial when they are compared on the vertical scaled score. Note that vertical scaled scores for the pre and post tests are on the same scale. 19
    • Raw ScoresTable 8 shows that the raw score mean differences between the pilot andcomparison schools on the Grade 5 pre-tests were small for each subjectarea. The mean differences, analyzed using t-tests, were statisticallysignificant only in English and Mathematics, with the pupils in comparisongroup performing better than those in the pilot group (p<.05). In the other foursubjects, the t-tests showed no significant differences between the two groupson the baseline. In raw scores, differences in English and Mathematics wereabout a half-point, while the differences for the other subjects had a maximumdifference of two-tenths of a point. These results reflected the expectation ofvery small differences on the pre-tests, since the schools were randomlyassigned to one of the two groups based on a matched pairs design.Table 8: Mean Raw Scores by Subject Area and Group Grade 5 Pre-test Grade 5 Post-test Subject Area Group † N Mean Std. Dev. N Mean Std. Dev. Pilot 1785 11.9 6.4 1773 13.3* 1.6English Comparison 2013 12.4* 6.6 1967 12.2 1.6 Total 3798 12.2 6.5 3740 12.8 1.6Social and Pilot 1907 10.0 5.2 1895 14.9* 1.3Developmental Comparison 2055 10.2 5.5 2008 13.7 1.3Studies Total 3962 10.1 5.3 3903 14.3 1.3 Pilot 1861 12.0 5.3 1849 13.8* 1.4Mathematics Comparison 2022 12.6* 5.3 1975 13.2 1.4 Total 3883 12.3 5.3 3824 13.5 1.4 Pilot 1961 9.8 4.9 1949 13.2* 1.9Integrated Science Comparison 2078 9.9 4.9 2031 11.2 1.8 Total 4039 9.9 4.9 3980 12.2 1.9 Pilot 1967 10.5 5.2 1955 12.9* 1.5Creative and Comparison 2065 10.6 5.4 2018 11.7 1.5Technology Studies Total 4032 10.5 5.3 3973 12.3 1.5 Pilot 1979 9.5 4.0 1967 13.4* 1.6Community Studies Comparison 2058 9.5 3.9 2011 12.5 1.6 Total 4037 9.5 4.0 3978 13.0 1.6* Significant at p<0.05; † represents adjusted weighted sample size.The differences between the two groups for all subject areas in Grade 5 post-test (also in Table 8),were evaluated using an Analysis of Covariance(ANCOVA), with the pre-test scores as the covariates. In other words, the pre-tests scores were made statistically equivalent so that the groups could beevaluated on an equal basis on the post-tests. Using the raw scores, theresults were statistically significant in each of the subject areas, with the pilotgroup outperforming the comparison group (p<.05).Note that all statistical comparisons were made at the school level, not at thestudent level. This was due to changes in student population at each schoolfrom pre-test to post-test. The design was based on cohorts (student groups 20
    • over time) and not on panels (the same students over time). A panel designwould have been statistically possible, but it would also have led to skewedresults due to student attrition.Vertical Scaled ScoresAs started, vertical scaled scores on the pre and post tests were computedindependently both for pilot and comparison groups and were measured onthe same scale (i.e., vertical scale). This makes the comparison more relevantand valid to assess the impact of CA in the pilot schools compared to thecomparison schools.Table 9: Mean Vertical Scaled Scores by Subject Area and Group Grade 5 Pre-tests Grade 5 Post-tests Subject Area Group † N Mean Std. Dev. N Mean Std. Dev. Pilot 1785 236.1 82.4 1773 352.3* 20.3English Comparison 2013 241.2* 84.8 1967 339.9 20.3 Total 3798 238.8 83.7 3740 346.1 20.3Social and Pilot 1907 229.1 84.3 1895 362.4* 17.7Developmental Comparison 2055 231.8 87.9 2008 346.2 17.7Studies Total 3962 230.5 86.2 3903 354.3 17.7 Pilot 1861 217.8 89.3 1849 380.5* 17.1Mathematics Comparison 2022 226.7* 88.9 1975 373.1 17.1 Total 3883 222.4 89.2 3824 376.8 17.1 Pilot 1961 225.5 80.1 1949 369.5* 20.4Integrated Science Comparison 2078 227.4 80.4 2031 348.0 20.4 Total 4039 226.5 80.2 3980 358.8 20.4 Pilot 1967 223.0 84.0 1955 357.1* 16.0Creative and Comparison 2065 225.1 86.5 2018 343.5 16.0Technology Studies Total 4032 224.1 85.3 3973 350.3 16.0 Pilot 1979 213.7 84.3 1967 365.8* 22.1Community Studies Comparison 2058 214.2 83.1 2011 352.8 22.1 Total 4037 214.0 83.7 3978 359.3 22.1* Significant at p<0.05Table 9 shows that the vertical scaled score mean differences between thepilot and comparison schools on the Grade 5 pre-tests were small for eachsubject area. The mean differences in all six subject areas, analyzed using t-tests, were not statistically significant (p>.05). In contrast, when thedifferences between the two groups for all subject areas in Grade 5 post-test(also in Table 9),were evaluated using an ANCOVA (with the pre-test scoresas the covariates), the results were statistically significant in all subject areas,with the pilot group outperforming the comparison group (p<.05). Figures 6 through 11 show the differences in vertical scaled scores from theGrade 5 pre-test to the Grade 5 post-test for each of the subject areas. Thegraphs show clearly the greater score increases by the pilot groups in allsubject areas except for Mathematics, where the increases were not asevident as in the other groups, though the pilot group started off lower. 21
    • Figure 6: English Mean Vertical Scores by Group 400 380 360 Vertical Scaled Score 340 320 300 280 Pilot 260 Comparison 240 220 200 Grade 5 Pre-test Grade 5 Post-testFigure 7: Social & Dev. Studies Mean Vertical Scores by Group 400 380 360 Vertical Scaled Score 340 320 300 280 260 Pilot 240 Comparison 220 200 Grade 5 Pre-test Grade 5 Post-testFigure 8: Mathematics Mean Vertical Scores by Group 400 380 360 Vertical Scaled Score 340 320 300 280 260 Pilot 240 Comparison 220 200 Grade 5 Pre-test Grade 5 Post-test 22
    • Figure 9: Integrated Science Mean Vertical Scores by Group 400 380 360 Vertical Scaled Score 340 320 300 280 260 Pilot 240 Comparison 220 200 Grade 5 Pre-test Grade 5 Post-testFigure 10: Creative & Tech. Studies Mean Vertical Scores by Group 400 380 360 Vertical Scaled Score 340 320 300 280 260 Pilot 240 Comparison 220 200 Grade 5 Pre-test Grade 5 Post-testFigure 11: Community Studies Mean Vertical Scores by Group 400 380 360 Vertical Scaled Score 340 320 300 280 260 Pilot 240 Comparison 220 200 Grade 5 Pre-test Grade 5 Post-test 23
    • 3.7 Comparison across Regions While not the focus of the evaluation, the next two sections have useful information on student performance. Tables 10 and 11 contain a brief analysis of the scores by region, providing information on the scores on a disaggregated basis. As with the overall analyses, the comparisons across the three regions were made in raw scores and vertical scaled scores. Lusaka Region consistently had the highest mean scores (both raw scores and vertical scaled scores) in all subjects on the Grade 5 pre-tests, followed by Western and Southern. The same pattern of results was also observed for Grade 5 post-tests. Table 10: Subject Area Mean Raw Scores by Region Grade 5 Pre-test Grade 5 Post-test Subject Area Region N Mean Std. Dev. N Mean Std Dev. Southern 1010 11.0 6.2 1157 10.4 6.6 Western 994 11.7 5.9 1103 11.9 6.7 English Lusaka 1794 13.1 6.9 1765 12.4 7.5 Total 3798 12.2 6.5 4025 11.7 7.1 Southern 1014 9.4 4.8 1214 11.7 6.0 Social and Western 1112 9.9 4.9 1125 13.2 6.1 Developmental Studies Lusaka 1836 10.7 5.8 1765 14.1 7.0 Total 3962 10.1 5.3 4104 13.2 6.6 Southern 1002 11.5 5.4 1226 11.1 5.2 Western 1086 12.2 5.2 1120 12.7 5.3 Mathematics Lusaka 1795 12.9 5.2 1781 13.0 6.3 Total 3883 12.3 5.3 4127 12.4 5.8 Southern 1025 9.2 4.4 1212 9.6 5.4 Integrated Western 1151 9.4 4.6 1154 11.7 6.4 Science Lusaka 1863 10.6 5.3 1769 11.8 6.7 Total 4039 9.9 4.9 4135 11.1 6.3 Southern 1016 9.6 4.8 1205 9.9 5.6 Creative and Western 1140 10.2 5.0 1146 11.3 6.0 Technology Studies Lusaka 1876 11.2 5.7 1790 11.9 6.9 Total 4032 10.5 5.3 4141 11.2 6.4 Southern 1015 9.0 3.5 1191 10.5 5.3 Community Western 1146 9.4 4.3 1122 11.5 6.0 Studies Lusaka 1876 9.8 4.0 1784 12.7 6.8 Total 4037 9.5 4.0 4097 11.7 6.2 24
    • Table 11: Subject Area Mean Vertical Scaled Scores by Region Grade 5 Pre-test Grade 5 Post-test Subject Area Region N Mean Std. Dev. N Mean Std Dev. Southern 1010 224.1 80.3 1157 317.3 82.8 Western 994 232.3 72.9 1103 335.0 81.0 English Lusaka 1794 250.7 89.3 1765 343.0 94.1 Total 3798 238.8 83.7 4025 333.4 88.1 Southern 1014 218.5 77.4 1214 321.7 76.7 Social and Western 1112 226.4 79.1 1125 341.1 78.1 Developmental Studies Lusaka 1836 239.6 93.6 1765 354.7 89.5 Total 3962 230.5 86.2 4104 341.2 84.0 Southern 1002 209.2 91.0 1226 346.6 66.1 Western 1086 219.9 86.2 1120 366.6 65.5 Mathematics Lusaka 1795 231.3 89.0 1781 369.5 79.3 Total 3883 222.4 89.2 4127 361.9 72.6 Southern 1025 215.7 72.1 1212 328.9 63.5 Integrated Western 1151 218.1 76.1 1154 353.0 74.2 Science Lusaka 1863 237.5 85.5 1769 352.4 78.0 Total 4039 226.5 80.2 4135 345.7 73.7 Southern 1016 209.8 77.9 1191 327.6 70.7 Creative and Western 1140 218.9 79.7 1122 340.7 79.5 Technology Studies Lusaka 1876 234.9 90.8 1784 357.7 90.3 Total 4032 224.1 85.3 4097 344.3 83.0 Southern 1015 204.2 74.8 1205 323.4 64.3 Community Western 1146 213.1 88.6 1146 338.7 66.8 Studies Lusaka 1876 219.8 84.6 1790 344.9 79.1 Total 4037 214.0 83.7 4141 336.9 72.33.8 Performance Categories Depending on test difficulty and score distributions, performance categories were established for each of the tests using a procedure called standard setting. An Angoff (1971) 11 standard setting method was implemented to set the cut scores between Unsatisfactory and Satisfactory and between Satisfactory and Advanced both for pre-tests and post-tests. The resultant cut scores are presented in Tables 12 and 13. In English, for example, students who got a score of 1-12 would be classified Unsatisfactory, students who got a score of 12-21 would be classified as Satisfactory and students who earned a score of 22-30 would be classified as Advanced on the pre-test. For Mathematics, the corresponding ranges are 1-13 Unsatisfactory, 14-19 Satisfactory, and 20-30 Advanced for the pre-test. The post-test ranges for each subject area are different from those on the pre-tests; the reason is that the pre-tests and post-tests covered different content and had different levels of difficulty.11 Angoff, W. H. (1971). Scales, Norms, and Equivalent Scores. In R.L. Thorndike (Ed.) EducationalMeasurement (2nd ed.). (pp. 508-560). Washington, DC: American Council on Education. 25
    • Table 12: Performance Categories for Pre-tests by Subject Grade 5 Pre-test Subject Area 1 2 3 Unsatisfactory Satisfactory Advanced (Fail) (Pass) (Pass) English 1-12 13-21 22-30 Social and 1-10 11-17 18-30 Developmental Studies Mathematics 1-13 14-19 20-30 Integrated Science 1-10 11-17 18-30 Creative and Technology 1-11 12-18 19-30 Studies Community Studies 1-10 11-15 16-30Table 13: Performance Categories for Post-tests by Subject Grade 5 Post-test Subject Area 1 2 3 Unsatisfactory Satisfactory Advanced (Fail) (Pass) (Pass) English 1-12 13-21 22-30 Social and 1-13 14-21 22-30 Developmental Studies Mathematics 1-10 11-19 20-30 Integrated Science 1-10 11-20 21-30 Creative and Technology 1-11 12-21 22-30 Studies Community Studies 1-11 12-19 20-30Tables 14 and 15 provide the percentages of students classified in the 3performance categories by subject. On the pre-test, the percentages in eachcategory by group were similar for most of the subjects. For instance, inIntegrated Science, similar percentages of students were in the Satisfactory(Pass) category for the pilot (34%) and comparison (33%) groups. However,on the post-test, there were some differences for the groups, mostly in favourof the pilot group. In Integrated Science, 53% of students in the pilot groupwere Satisfactory vs. 43% of students in the comparison group. Thepercentages for each group favoured the pilot group on the post-test, with theexception of Mathematics where the rounded percentage passing was thesame in the pilot (65%) and comparison (65%) groups. 26
    • Table 14: Percentages of Students in Performance Categories for Pre-tests Grade 5 Pre-test Subject Area Group 1 2 3 Unsatisfactory Satisfactory Advanced (Fail) (Pass) (Pass) Pilot 63.0 27.2 9.8 English Comparison 59.7 28.2 12.1 Social and Pilot 62.8 26.9 10.3 Developmental Studies Comparison 64.4 24.0 11.6 Pilot 64.3 26.2 9.5 Mathematics Comparison 60.1 29.4 10.5 Integrated Pilot 65.9 25.6 8.5 Science Comparison 67.3 22.9 9.8 Creative and Pilot 67.5 22.9 9.6 Technology Studies Comparison 68.4 20.1 11.5 Community Pilot 66.8 25.4 7.8 Studies Comparison 66.8 24.8 8.4Table 15: Percentages of Students in Performance Categories for Post-tests Grade 5 Post-test Subject Area Group 1 2 3 Unsatisfactory Satisfactory Advanced (Fail) (Pass) (Pass) Pilot 60.0 26.5 13.5 English Comparison 64.0 24.0 11.9 Social and Pilot 51.4 33.4 15.3 Developmental Studies Comparison 59.3 30.6 10.2 Pilot 35.2 53.9 10.9 Mathematics Comparison 34.8 56.3 8.9 Integrated Pilot 46.7 40.2 13.1 Science Comparison 57.3 36.0 6.7 Creative and Pilot 54.5 35.1 10.4 Technology Studies Comparison 62.3 31.0 6.7 Community Pilot 50.4 33.9 15.6 Studies Comparison 54.4 36.2 9.5 27
    • Chapter Four: Summary and ConclusionsThe main objective of the evaluation was to determine whether the CAprogramme is having positive effects on student learning outcomes in the firstyear of implementation. This was accomplished by measuring and comparingthe levels of learning achievement of pupils in pilot (intervention) andcomparison (control) schools. A baseline (pre-test) assessment occurredbefore implementation of the proposed interventions at the beginning ofGrade 5 in randomly selected pilot schools. This created a basis upon whichthe impact of CA was measured at the end of the Grade 5 pilot year.A sample of 48 schools was selected from Lusaka, Southern and WesternProvinces using a matched pairs design and random assignment, resulting in24 pilot schools and 24 comparison schools. Student achievement for theGrade 5 baseline and post-test administrations was measured using multiplechoice tests in 6 subject areas with 30 items each (30 points per test). TheGrade 5 baseline tests were based on the Grade 4 curriculum, while theGrade 5 post-tests were based on the Grade 5 curriculum. Overall, thepsychometric characteristics of the tests were very satisfactory on both thepre-tests and post-tests. Items were within acceptable difficulty (p-value)ranges and discrimination (point-biserial correlation) levels. Overall tests werefound reliable, using Cronbachs Alpha as an estimate of internal consistencyreliability.Performance of the schools in the baseline and post-tests were comparedusing mean raw scores and mean vertical scaled scores. The vertical scaledscore comparison was found more relevant, valid, and beneficial, since theschool mean scores both on the baseline and post-tests were evaluated onthe same measurement scale (i.e., vertical scale). In addition, statisticiansgenerally prefer using scaled scores for longitudinal comparisons since thescale is equal interval, thus making comparisons more accurate.Overall, the pupils’ scores on the baseline pre-test were very similar in thepilot and comparison schools. The comparison schools scored slightly higheron the English and Mathematics tests, but the score differences for the twogroups on the other four tests were minimal. On the post-test, which wasadministered after one year of the CA programme, the scores of the pilotschools on all six tests were significantly higher than those in the comparisonschools. This provides strong initial evidence that the CA programme had asignificantly positive effect on pupil learning outcomes.When the performance of the schools on the baseline and post-tests werecompared by region, Lusaka Region consistently had the highest meanscores in all subjects on the Grade 5 pre-tests and post-test, followed byWestern and Southern. The number of schools by region was too small tomake statistically valid region-by-region comparisons of pre-test to post-testscores for the pilot and comparison groups.Students were also classified into three performance level categories(Unsatisfactory, Satisfactory, and Advanced) in each subject area based ontheir performance in baseline and post-tests. On the pre-tests, the 28
    • percentages in each category by group were similar for most of the subjects.However, on the post-test, there were differences in favour of the pilot groupin virtually all subjects. For instance, in Integrated Science, 53% of students inthe pilot group were Satisfactory and above vs. 43% of students in thecomparison group. This provided strong evidence that a greater percentage ofstudents in the pilot group were achieving a passing score on the post-testthan those in the comparison group.The next round of post-tests in the Phase 1 schools will be administered whenthe same cohort of pupils completes Grade 6. This will be followed by a finaltest administration (a third post-test) when the cohort of pupils completesGrade 7. At that point, with four time points (a baseline and three post-tests),more substantial conclusions will be drawn on the effectiveness of the CAprogramme.Note also that the evaluation process is being repeated in the Phase 2 andPhase 3 schools, which will provide a complete national quantitativeevaluation of the programme at the end of Year 5 of implementation (2010).Based on guidance from the CA Steering Committee, results from theevaluation will be used at a selected point in the implementation period as acriterion for scaling up the CA programme to other primary schools in Zambia. 29
    • Appendix 1: Item Statistics by Subject
    • Table A1: English Item Statistics P-value Pt-Biserial P-value Pt-BiserialSeq. Seq. Pre-test Pre-test Post-test Post-test1 .65 .47 1 .65 .552 .63 .53 2 .51 .583 .63 .52 3 .48 .444 .48 .56 4 .41 .545 .52 .55 5 .40 .486 .40 .53 6 .29 .367 .56 .58 7 .50 .458 .54 .55 8 .46 .469 .46 .56 9 .52 .6110 .46 .41 10 .35 .6111 .61 .52 11 .26 .4612 .40 .52 12 .21 .3513 .38 .47 13 .33 .5814 .39 .50 14 .36 .5615 .27 .46 15 .35 .5516 .29 .42 16 .33 .4017 .28 .40 17 .22 .2418 .47 .55 18 .36 .5919 .33 .40 19 .42 .5420 .36 .46 20 .40 .5121 .24 .46 21 .34 .5322 .34 .30 22 .38 .4723 .33 .36 23 .21 .3524 .37 .47 24 .38 .5625 .39 .46 25 .41 .4926 .35 .42 26 .35 .4627 .31 .38 27 .34 .5028 .25 .28 28 .30 .4029 .27 .32 29 .38 .5230 .20 .29 30 .27 .40
    • Table A2: Social and Developmental Studies Item Statistics P-value Pt-Biserial P-value Pt-BiserialSeq. Seq. Pre-test Pre-test Post-test Post-test1 .49 .52 1 .66 .572 .47 .39 2 .53 .603 .39 .49 3 .66 .604 .37 .32 4 .58 .505 .35 .47 5 .51 .576 .36 .35 6 .48 .617 .43 .51 7 .52 .618 .41 .41 8 .42 .319 .36 .21 9 .44 .5610 .37 .43 10 .49 .5011 .38 .49 11 .34 .4212 .37 .48 12 .39 .4313 .35 .42 13 .51 .4914 .33 .34 14 .43 .5415 .30 .46 15 .36 .5816 .33 .41 16 .36 .4417 .28 .30 17 .39 .4018 .31 .26 18 .42 .4219 .30 .46 19 .37 .5520 .40 .45 20 .34 .5121 .25 .44 21 .32 .3822 .26 .43 22 .35 .3623 .25 .41 23 .32 .4424 .26 .29 24 .38 .2625 .36 .31 25 .38 .2526 .26 .32 26 .34 .3927 .26 .19 27 .36 .3128 .27 .37 28 .32 .2429 .29 .19 29 .27 .2230 .30 .25 30 .30 .39
    • Table A3: Mathematics Item Statistics P-value Pt-Biserial P-value Pt-BiserialSeq. Seq. Pre-test Pre-test Post-test Post-test1 .81 .43 1 .70 .562 .59 .51 2 .65 .553 .46 .34 3 .71 .574 .49 .48 4 .56 .555 .54 .55 5 .60 .546 .57 .51 6 .64 .527 .44 .42 7 .46 .488 .46 .25 8 .50 .509 .43 .29 9 .47 .3210 .50 .51 10 .55 .3411 .43 .51 11 .38 .4412 .34 .26 12 .39 .4413 .39 .42 13 .39 .4514 .46 .42 14 .40 .4515 .48 .45 15 .42 .2816 .30 .25 16 .34 .3217 .36 .30 17 .34 .4618 .32 .23 18 .38 .4819 .33 .36 19 .29 .3420 .27 .28 20 .30 .3521 .52 .40 21 .25 .3722 .57 .48 22 .27 .4023 .32 .33 23 .23 .3424 .40 .46 24 .24 .3325 .31 .43 25 .18 .2326 .27 .32 26 .27 .3327 .30 .26 27 .24 .2828 .21 .17 28 .36 .4829 .19 .15 29 .16 .1830 .25 .32 30 .23 .30
    • Table A4: Integrated Science Item Statistics P-value Pt-Biserial P-value Pt-BiserialSeq. Seq. Pre-test Pre-test Post-test Post-test1 .49 .42 1 .53 .562 .33 .17 2 .53 .563 .45 .41 3 .39 .574 .41 .44 4 .51 .495 .31 .20 5 .44 .526 .40 .39 6 .57 .487 .28 .43 7 .45 .498 .31 .26 8 .47 .539 .34 .45 9 .44 .4810 .29 .26 10 .33 .5111 .43 .29 11 .38 .3412 .31 .40 12 .42 .4913 .52 .28 13 .31 .4414 .37 .45 14 .36 .5115 .36 .42 15 .36 .4016 .41 .43 16 .36 .4917 .34 .29 17 .38 .5518 .30 .50 18 .21 .2119 .37 .50 19 .28 .4220 .26 .25 20 .38 .4821 .29 .37 21 .29 .4722 .26 .38 22 .34 .4923 .28 .34 23 .25 .2924 .24 .39 24 .22 .1625 .20 .35 25 .31 .3826 .25 .25 26 .25 .2927 .27 .33 27 .25 .3628 .29 .21 28 .27 .4029 .23 .45 29 .23 .2730 .30 .27 30 .21 .33
    • Table A5: Creative & Technology Studies Item Statistics P-value Pt-Biserial P-value Pt-BiserialSeq. Seq. Pre-test Pre-test Post-test Post-test1 .25 .55 1 .29 .342 .41 .50 2 .41 .503 .33 .34 3 .43 .554 .56 .45 4 .49 .645 .38 .16 5 .46 .546 .40 .34 6 .40 .557 .35 .46 7 .47 .458 .36 .34 8 .48 .529 .39 .54 9 .43 .3710 .47 .48 10 .44 .5311 .43 .48 11 .29 .4612 .41 .31 12 .40 .5213 .30 .40 13 .36 .5514 .28 .41 14 .39 .5615 .26 .39 15 .32 .4616 .37 .52 16 .28 .3717 .29 .27 17 .36 .3718 .36 .35 18 .40 .5219 .41 .40 19 .33 .5120 .30 .41 20 .22 .2521 .29 .54 21 .36 .3522 .25 .25 22 .36 .2823 .50 .40 23 .29 .2524 .31 .34 24 .30 .3625 .28 .387 25 .27 .4226 .22 .14 26 .28 .4427 .47 .37 27 .27 .3228 .34 .32 28 .33 .2429 .39 .35 29 .23 .5230 .17 .08 30 .32 .44
    • Table A6: Community Studies Item Statistics P-value Pt-Biserial P-value Pt-BiserialSeq. Seq. Pre-test Pre-test Post-test Post-test1 .62 .41 1 .53 .522 .52 .35 2 .44 .603 .46 .42 3 .53 .614 .43 .48 4 .52 .575 .41 .33 5 .44 .496 .36 .32 6 .44 .407 .31 .21 7 .47 .518 .36 .33 8 .42 .579 .27 .20 9 .38 .5610 .37 .21 10 .44 .5011 .30 .35 11 .30 .4112 .40 .38 12 .42 .5213 .30 .19 13 .39 .5114 .30 .45 14 .36 .4315 .20 .18 15 .44 .4116 .30 .36 16 .33 .4917 .30 .25 17 .43 .5018 .28 .38 18 .36 .4219 .26 .21 19 .37 .2920 .25 .19 20 .32 .3121 .31 .34 21 .34 .4422 .26 .21 22 .32 .3923 .25 .26 23 .32 .2924 .25 .24 24 .26 .3125 .30 .31 25 .29 .3726 .22 .28 26 .30 .2827 .26 .28 27 .28 .4128 .23 .21 28 .27 .2429 .19 .16 29 .24 .2130 .21 .16 30 .24 .23
    • Appendix 2: Scores and Frequencies – Grade 5 Pre-Tests
    • Table A7: English Scores and Frequencies Raw Theta Scale Pilot Group Comparison Group Score Score Freq. % Cum. % Freq. % Cum. % 1 -3.59 100 24 1.3 1.3 30 1.5 1.5 2 -2.84 100 28 1.6 2.9 31 1.5 3.0 3 -2.38 102 43 2.4 5.3 61 3.0 6.1 4 -2.04 126 54 3.0 8.3 45 2.2 8.3 5 -1.76 146 66 3.7 12.0 76 3.8 12.1 6 -1.52 163 112 6.3 18.3 112 5.6 17.6 7 -1.31 178 138 7.7 26.1 152 7.6 25.2 8 -1.11 192 145 8.1 34.2 137 6.8 32.0 9 -0.93 205 151 8.5 42.6 146 7.3 39.2 10 -0.76 217 140 7.8 50.5 142 7.1 46.3 11 -0.60 228 118 6.6 57.1 158 7.8 54.1 12 -0.44 239 105 5.9 63.0 111 5.5 59.7 13 -0.29 250 68 3.8 66.8 109 5.4 65.1 14 -0.14 261 83 4.6 71.4 85 4.2 69.3 15 0.01 271 67 3.8 75.2 68 3.4 72.7 16 0.16 282 55 3.1 78.3 68 3.4 76.1 17 0.30 292 50 2.8 81.1 41 2.0 78.1 18 0.46 303 41 2.3 83.4 45 2.2 80.3 19 0.61 314 43 2.4 85.8 52 2.6 82.9 20 0.77 325 44 2.5 88.2 50 2.5 85.4 21 0.94 337 35 2.0 90.2 50 2.5 87.9 22 1.12 350 24 1.3 91.5 27 1.3 89.2 23 1.31 363 25 1.4 92.9 36 1.8 91.0 24 1.52 378 19 1.1 94.0 37 1.8 92.8 25 1.75 395 19 1.1 95.1 46 2.3 95.1 26 2.03 415 26 1.5 96.5 28 1.4 96.5 27 2.37 439 14 .8 97.3 18 .9 97.4 28 2.82 471 19 1.1 98.4 28 1.4 98.8 29 3.56 500 23 1.3 99.7 20 1.0 99.8 30 4.80 500 6 .3 100.0 4 .2 100.0 Total 1785 100.0 2013 100.0
    • Table A8: Social and Developmental Studies Scores and Frequencies Raw Theta Scale Pilot Group Comparison Group Score Score Freq. % Cum. % Freq. % Cum. % 1 -3.42 100 28 1.5 1.5 28 1.4 1.4 2 -2.69 100 30 1.6 3.0 35 1.7 3.1 3 -2.24 100 49 2.6 5.6 46 2.2 5.3 4 -1.91 112 78 4.1 9.7 66 3.2 8.5 5 -1.65 139 129 6.8 16.5 138 6.7 15.2 6 -1.42 162 164 8.6 25.1 188 9.1 24.4 7 -1.22 183 179 9.4 34.5 209 10.2 34.5 8 -1.04 201 210 11.0 45.5 253 12.3 46.9 9 -0.87 218 175 9.2 54.6 191 9.3 56.2 10 -0.71 235 155 8.1 62.8 169 8.2 64.4 11 -0.56 250 143 7.5 70.3 118 5.7 70.1 12 -0.42 264 111 5.8 76.1 97 4.7 74.8 13 -0.27 280 79 4.1 80.2 78 3.8 78.6 14 -0.14 293 60 3.1 83.4 65 3.2 81.8 15 0.00 307 39 2.0 85.4 46 2.2 84.0 16 0.14 321 36 1.9 87.3 50 2.4 86.5 17 0.28 336 45 2.4 89.7 39 1.9 88.4 18 0.42 350 32 1.7 91.3 36 1.8 90.1 19 0.56 364 28 1.5 92.8 30 1.5 91.6 20 0.71 380 29 1.5 94.3 32 1.6 93.1 21 0.87 396 27 1.4 95.8 24 1.2 94.3 22 1.04 413 14 .7 96.5 28 1.4 95.7 23 1.22 432 22 1.2 97.6 17 .8 96.5 24 1.42 452 16 .8 98.5 19 .9 97.4 25 1.65 476 6 .3 98.8 17 .8 98.2 26 1.91 500 12 .6 99.4 14 .7 98.9 27 2.24 500 7 .4 99.8 13 .6 99.6 28 2.69 500 3 .2 99.9 7 .3 99.9 29 3.42 500 1 .1 100.0 1 .0 100.0 30 4.65 500 0 .0 100.0 1 .0 100.0 Total 1907 100.0 2055 100.0
    • Table A9: Mathematics Scores and Frequencies Raw Theta Scale Pilot Group Comparison Group Score Score Freq. % Cum. % Freq. % Cum. % 1 -3.62 100 23 1.2 1.2 20 1.0 1.0 2 -2.86 100 21 1.1 2.4 20 1.0 2.0 3 -2.39 100 40 2.1 4.5 31 1.5 3.5 4 -2.04 100 39 2.1 6.6 40 2.0 5.5 5 -1.76 100 64 3.4 10.0 41 2.0 7.5 6 -1.51 100 81 4.4 14.4 75 3.7 11.2 7 -1.30 120 100 5.4 19.8 111 5.5 16.7 8 -1.10 142 125 6.7 26.5 119 5.9 22.6 9 -0.92 162 139 7.5 34.0 132 6.5 29.1 10 -0.75 181 138 7.4 41.4 158 7.8 36.9 11 -0.59 199 150 8.1 49.4 154 7.6 44.6 12 -0.43 217 147 7.9 57.3 149 7.4 51.9 13 -0.28 233 129 6.9 64.3 165 8.2 60.1 14 -0.13 250 103 5.5 69.8 133 6.6 66.7 15 0.01 266 106 5.7 75.5 114 5.6 72.3 16 0.16 282 91 4.9 80.4 110 5.4 77.7 17 0.31 299 73 3.9 84.3 83 4.1 81.8 18 0.46 316 59 3.2 87.5 76 3.8 85.6 19 0.61 332 57 3.1 90.5 78 3.9 89.5 20 0.77 350 39 2.1 92.6 59 2.9 92.4 21 0.94 369 27 1.5 94.1 33 1.6 94.0 22 1.12 389 34 1.8 95.9 39 1.9 95.9 23 1.31 410 23 1.2 97.2 29 1.4 97.4 24 1.52 433 24 1.3 98.4 15 .7 98.1 25 1.75 459 15 .8 99.2 13 .6 98.8 26 2.03 490 6 .3 99.6 12 .6 99.4 27 2.37 500 5 .3 99.8 7 .3 99.7 28 2.82 500 2 .1 99.9 4 .2 99.9 29 3.56 500 1 .1 100.0 1 .0 100.0 30 4.80 500 0 .0 100.0 1 .0 100.0 Total 1861 100.0 2022 100.0
    • Table A10: Integrated Science Scores and Frequencies Raw Theta Scale Pilot Group Comparison Group Score Score Freq. % Cum. % Freq. % Cum. % 1 -3.44 100 16 .8 .8 18 .9 .9 2 -2.71 100 21 1.1 1.9 24 1.2 2.0 3 -2.26 100 53 2.7 4.6 44 2.1 4.1 4 -1.93 110 83 4.2 8.8 72 3.5 7.6 5 -1.66 138 113 5.8 14.6 115 5.5 13.1 6 -1.43 161 183 9.3 23.9 176 8.5 21.6 7 -1.23 182 195 9.9 33.9 239 11.5 33.1 8 -1.05 200 230 11.7 45.6 268 12.9 46.0 9 -0.88 217 225 11.5 57.1 236 11.4 57.4 10 -0.72 234 173 8.8 65.9 206 9.9 67.3 11 -0.56 250 135 6.9 72.8 141 6.8 74.1 12 -0.42 264 107 5.5 78.2 103 5.0 79.0 13 -0.28 279 96 4.9 83.1 84 4.0 83.1 14 -0.14 293 56 2.9 86.0 51 2.5 85.5 15 0.00 307 48 2.4 88.4 36 1.7 87.2 16 0.14 321 35 1.8 90.2 36 1.7 89.0 17 0.28 336 26 1.3 91.5 25 1.2 90.2 18 0.42 350 22 1.1 92.7 24 1.2 91.3 19 0.57 365 23 1.2 93.8 22 1.1 92.4 20 0.72 381 22 1.1 95.0 38 1.8 94.2 21 0.88 397 18 .9 95.9 24 1.2 95.4 22 1.05 414 20 1.0 96.9 30 1.4 96.8 23 1.23 433 17 .9 97.8 22 1.1 97.9 24 1.43 453 15 .8 98.5 15 .7 98.6 25 1.66 477 9 .5 99.0 11 .5 99.1 26 1.93 500 12 .6 99.6 14 .7 99.8 27 2.26 500 6 .3 99.9 3 .1 100.0 28 2.7 500 2 .1 100.0 1 .0 100.0 29 3.43 500 0 .0 100.0 0 .0 100.0 30 4.67 500 0 .0 100.0 0 .0 100.0 Total 1961 100.0 2078 100.0
    • Table A11: Creative and Technology Studies Scores and Frequencies Raw Theta Scale Pilot Group Comparison Group Score Score Freq. % Cum. % Freq. % Cum. % 1 -3.46 100 15 .8 .8 17 .8 .8 2 -2.73 100 21 1.1 1.8 28 1.4 2.2 3 -2.28 100 42 2.1 4.0 38 1.8 4.0 4 -1.95 100 66 3.4 7.3 59 2.9 6.9 5 -1.68 125 104 5.3 12.6 119 5.8 12.6 6 -1.45 148 162 8.2 20.8 172 8.3 21.0 7 -1.24 169 198 10.1 30.9 193 9.3 30.3 8 -1.06 187 206 10.5 41.4 234 11.3 41.6 9 -0.89 204 211 10.7 52.1 218 10.6 52.2 10 -0.73 220 186 9.5 61.6 167 8.1 60.3 11 -0.57 236 116 5.9 67.5 168 8.1 68.4 12 -0.43 250 123 6.3 73.7 126 6.1 74.5 13 -0.28 265 89 4.5 78.2 77 3.7 78.3 14 -0.14 279 64 3.3 81.5 47 2.3 80.5 15 0.00 293 59 3.0 84.5 41 2.0 82.5 16 0.14 307 55 2.8 87.3 55 2.7 85.2 17 0.28 321 33 1.7 89.0 38 1.8 87.0 18 0.43 336 28 1.4 90.4 31 1.5 88.5 19 0.57 350 29 1.5 91.9 37 1.8 90.3 20 0.73 366 25 1.3 93.1 28 1.4 91.7 21 0.89 382 27 1.4 94.5 31 1.5 93.2 22 1.06 399 19 1.0 95.5 32 1.5 94.7 23 1.24 417 23 1.2 96.6 33 1.6 96.3 24 1.45 438 20 1.0 97.7 30 1.5 97.8 25 1.68 461 23 1.2 98.8 28 1.4 99.1 26 1.95 488 12 .6 99.4 8 .4 99.5 27 2.28 500 8 .4 99.8 8 .4 99.9 28 2.73 500 2 .1 99.9 1 .0 100.0 29 3.46 500 0 0 99.9 1 .0 100.0 30 4.70 500 1 .1 100.0 0 .0 100.0 Total 1967 100.0 2065 100.0
    • Table A12: Community Studies Scores and Frequencies Raw Theta Scale Pilot Group Comparison Group Score Score Freq. % Cum. % Freq. % Cum. % 1 -3.48 100 15 .8 .8 14 .7 .7 2 -2.74 100 31 1.6 2.3 19 .9 1.6 3 -2.29 100 46 2.3 4.6 42 2.0 3.6 4 -1.96 100 53 2.7 7.3 66 3.2 6.9 5 -1.68 100 110 5.6 12.9 120 5.8 12.7 6 -1.45 128 184 9.3 22.2 166 8.1 20.7 7 -1.24 157 211 10.7 32.8 237 11.5 32.3 8 -1.06 182 239 12.1 44.9 266 12.9 45.2 9 -0.88 207 216 10.9 55.8 240 11.7 56.9 10 -0.72 229 216 10.9 66.8 205 10.0 66.8 11 -0.57 250 166 8.4 75.1 156 7.6 74.4 12 -0.42 271 114 5.8 80.9 125 6.1 80.5 13 -0.27 292 98 5.0 85.9 98 4.8 85.2 14 -0.13 311 75 3.8 89.6 78 3.8 89.0 15 0.01 331 51 2.6 92.2 53 2.6 91.6 16 0.15 350 39 2.0 94.2 45 2.2 93.8 17 0.29 369 29 1.5 95.7 33 1.6 95.4 18 0.43 389 22 1.1 96.8 32 1.6 96.9 19 0.58 410 17 .9 97.6 28 1.4 98.3 20 0.73 431 13 .7 98.3 14 .7 99.0 21 0.89 453 6 .3 98.6 7 .3 99.3 22 1.06 476 14 .7 99.3 5 .2 99.6 23 1.25 500 5 .3 99.5 7 .3 99.9 24 1.45 500 1 .1 99.6 1 .0 100.0 25 1.68 500 4 .2 99.8 1 .0 100.0 26 1.95 500 4 .2 100.0 0 .0 100.0 27 2.28 500 0 .0 100.0 0 .0 100.0 28 2.72 500 0 .0 100.0 0 .0 100.0 29 3.46 500 0 .0 100.0 0 .0 100.0 30 4.69 500 0 .0 100.0 0 .0 100.0 Total 1979 100.0 2058 100.0
    • Appendix 3: Scores and Frequencies – Grade 5 Post-Tests
    • Table A13: English Scores and Frequencies Raw Theta Scale Pilot Group Comparison Group Score Score Freq. % Cum. % Freq. % Cum. % 0 -4.94 100 161 8.47 8.47 109 5.13 5.13 1 -3.52 100 6 0.32 8.79 12 0.56 5.69 2 -2.78 100 11 0.58 9.37 20 0.94 6.64 3 -2.32 101 33 1.74 11.11 22 1.04 7.67 4 -1.99 126 49 2.58 13.68 79 3.72 11.39 5 -1.71 146 64 3.37 17.05 118 5.55 16.94 6 -1.48 163 103 5.42 22.47 142 6.68 23.62 7 -1.27 178 123 6.47 28.95 174 8.19 31.81 8 -1.08 192 127 6.68 35.63 193 9.08 40.89 9 -0.90 205 128 6.74 42.37 148 6.96 47.86 10 -0.74 217 124 6.53 48.89 132 6.21 54.07 11 -0.58 228 111 5.84 54.74 116 5.46 59.53 12 -0.43 239 100 5.26 60.00 96 4.52 64.05 13 -0.28 250 81 4.26 64.26 78 3.67 67.72 14 -0.14 261 68 3.58 67.84 87 4.09 71.81 15 0.00 271 68 3.58 71.42 62 2.92 74.73 16 0.15 282 65 3.42 74.84 52 2.45 77.18 17 0.29 292 55 2.89 77.74 41 1.93 79.11 18 0.44 303 49 2.58 80.32 46 2.16 81.27 19 0.59 314 39 2.05 82.37 52 2.45 83.72 20 0.74 325 35 1.84 84.21 50 2.35 86.07 21 0.91 337 44 2.32 86.53 43 2.02 88.09 22 1.08 350 44 2.32 88.84 53 2.49 90.59 23 1.27 364 29 1.53 90.37 34 1.60 92.19 24 1.48 379 44 2.32 92.68 42 1.98 94.16 25 1.71 396 37 1.95 94.63 32 1.51 95.67 26 1.98 416 37 1.95 96.58 33 1.55 97.22 27 2.32 440 28 1.47 98.05 30 1.41 98.64 28 2.77 473 16 0.84 98.89 16 0.75 99.39 29 3.51 500 16 0.84 99.74 10 0.47 99.86 30 4.93 500 5 0.26 100.00 3 0.14 100.00 Total 1900 100 2125 100
    • Table A14: Social and Developmental Studies Scores and Frequencies Raw Theta Scale Pilot Group Comparison Group Score Score Freq. % Cum. % Freq. % Cum. % 0 -4.94 100 167 8.80 8.80 77 3.49 3.49 1 -3.51 100 1 0.05 8.85 6 0.27 3.76 2 -2.77 100 4 0.21 9.06 9 0.41 4.17 3 -2.32 100 8 0.42 9.48 13 0.59 4.76 4 -1.98 100 17 0.90 10.38 25 1.13 5.89 5 -1.70 120 25 1.32 11.70 43 1.95 7.84 6 -1.47 140 53 2.79 14.49 84 3.81 11.65 7 -1.26 157 74 3.90 18.39 113 5.12 16.77 8 -1.07 173 95 5.01 23.39 120 5.44 22.21 9 -0.90 187 109 5.74 29.14 163 7.39 29.60 10 -0.73 201 108 5.69 34.83 193 8.75 38.35 11 -0.57 214 118 6.22 41.04 164 7.43 45.78 12 -0.42 226 95 5.01 46.05 162 7.34 53.13 13 -0.28 238 101 5.32 51.37 136 6.17 59.29 14 -0.13 250 98 5.16 56.53 119 5.39 64.69 15 0.01 262 77 4.06 60.59 100 4.53 69.22 16 0.15 274 73 3.85 64.44 97 4.40 73.62 17 0.30 285 78 4.11 68.55 83 3.76 77.38 18 0.44 297 86 4.53 73.08 76 3.45 80.83 19 0.59 310 70 3.69 76.77 75 3.40 84.22 20 0.74 322 77 4.06 80.82 69 3.13 87.35 21 0.91 336 74 3.90 84.72 55 2.49 89.85 22 1.08 350 65 3.42 88.15 51 2.31 92.16 23 1.26 365 57 3.00 91.15 47 2.13 94.29 24 1.47 382 55 2.90 94.05 40 1.81 96.10 25 1.70 401 46 2.42 96.47 30 1.36 97.46 26 1.97 423 26 1.37 97.84 17 0.77 98.23 27 2.30 451 29 1.53 99.37 23 1.04 99.27 28 2.75 488 11 0.58 99.95 14 0.63 99.91 29 3.48 500 1 0.05 100.00 2 0.09 100.00 30 4.75 500 Total 1898 100 2206 100
    • Table A15: Mathematics Scores and Frequencies Raw Theta Scale Pilot Group Comparison Group Score Score Freq. % Cum. % Freq. % Cum. % 0 -5.14 100 192 9.88 9.88 91 4.17 4.17 1 -3.70 100 5 0.26 10.14 8 0.37 4.53 2 -2.95 100 6 0.31 10.45 9 0.41 4.95 3 -2.48 120 15 0.77 11.22 14 0.64 5.59 4 -2.12 145 13 0.67 11.89 22 1.01 6.59 5 -1.83 165 26 1.34 13.23 33 1.51 8.10 6 -1.58 183 34 1.75 14.98 55 2.52 10.62 7 -1.36 198 69 3.55 18.53 86 3.94 14.56 8 -1.16 212 83 4.27 22.80 132 6.04 20.60 9 -0.97 226 103 5.30 28.10 134 6.14 26.74 10 -0.79 238 138 7.10 35.20 175 8.01 34.75 11 -0.62 250 127 6.54 41.74 186 8.52 43.27 12 -0.45 261 141 7.26 49.00 175 8.01 51.28 13 -0.29 273 168 8.65 57.64 164 7.51 58.79 14 -0.14 284 125 6.43 64.08 167 7.65 66.44 15 0.02 294 131 6.74 70.82 152 6.96 73.40 16 0.17 305 111 5.71 76.53 117 5.36 78.75 17 0.33 316 101 5.20 81.73 114 5.22 83.97 18 0.48 327 87 4.48 86.21 85 3.89 87.87 19 0.65 338 57 2.93 89.14 70 3.21 91.07 20 0.81 350 39 2.01 91.15 54 2.47 93.54 21 0.98 362 44 2.26 93.41 32 1.47 95.01 22 1.17 375 32 1.65 95.06 25 1.14 96.15 23 1.36 389 37 1.90 96.96 23 1.05 97.21 24 1.58 404 23 1.18 98.15 13 0.60 97.80 25 1.82 421 10 0.51 98.66 21 0.96 98.76 26 2.10 440 14 0.72 99.38 16 0.73 99.50 27 2.44 464 9 0.46 99.85 3 0.14 99.63 28 2.90 496 2 0.10 99.95 4 0.18 99.82 29 3.65 500 3 0.14 99.95 30 5.07 500 1 0.05 100.00 1 0.05 100.00 Total 1943 100 2184 100
    • Table A16: Integrated Science Scores and Frequencies Raw Theta Scale Pilot Group Comparison Group Score Score Freq. % Cum. % Freq. % Cum. % 0 -4.93 100 203 10.39 10.39 70 3.21 3.21 1 -3.51 100 2 0.10 10.49 9 0.41 3.62 2 -2.77 104 7 0.36 10.85 15 0.69 4.31 3 -2.32 134 18 0.92 11.77 35 1.60 5.91 4 -1.98 156 39 2.00 13.77 65 2.98 8.90 5 -1.71 175 64 3.28 17.04 127 5.82 14.72 6 -1.48 190 91 4.66 21.70 168 7.70 22.42 7 -1.27 204 133 6.81 28.51 208 9.54 31.96 8 -1.08 217 123 6.29 34.80 215 9.86 41.82 9 -0.91 228 113 5.78 40.58 197 9.03 50.85 10 -0.74 239 120 6.14 46.72 140 6.42 57.27 11 -0.59 250 104 5.32 52.05 157 7.20 64.47 12 -0.43 260 90 4.61 56.65 126 5.78 70.24 13 -0.29 270 102 5.22 61.87 93 4.26 74.51 14 -0.14 280 73 3.74 65.61 88 4.03 78.54 15 0.00 289 67 3.43 69.04 64 2.93 81.48 16 0.15 299 78 3.99 73.03 70 3.21 84.69 17 0.29 309 79 4.04 77.07 53 2.43 87.12 18 0.44 318 65 3.33 80.40 43 1.97 89.09 19 0.59 329 62 3.17 83.57 43 1.97 91.06 20 0.75 339 65 3.33 86.90 48 2.20 93.26 21 0.91 350 41 2.10 89.00 38 1.74 95.00 22 1.08 362 49 2.51 91.50 27 1.24 96.24 23 1.27 374 46 2.35 93.86 32 1.47 97.71 24 1.48 388 38 1.94 95.80 18 0.83 98.53 25 1.71 404 35 1.79 97.59 21 0.96 99.50 26 1.98 422 26 1.33 98.93 5 0.23 99.72 27 2.32 444 11 0.56 99.49 4 0.18 99.91 28 2.77 474 5 0.26 99.74 2 0.09 100.00 29 3.51 500 4 0.20 99.95 30 4.93 500 1 0.05 100.00 Total 1954 100 2181 100
    • Table A17: Creative and Technology Studies Scores and Frequencies Raw Theta Scale Pilot Group Comparison Group Score Score Freq. % Cum. % Freq. % Cum. % 0 -4.86 100 225 11.49 11.49 86 3.94 3.94 1 -3.45 100 5 0.26 11.74 7 0.32 4.26 2 -2.71 100 8 0.41 12.15 14 0.64 4.90 3 -2.27 125 17 0.87 13.02 22 1.01 5.91 4 -1.93 147 25 1.28 14.29 66 3.02 8.94 5 -1.67 166 54 2.76 17.05 104 4.77 13.70 6 -1.44 181 95 4.85 21.90 159 7.29 20.99 7 -1.24 195 127 6.48 28.38 178 8.16 29.15 8 -1.05 207 118 6.02 34.41 190 8.71 37.86 9 -0.88 219 144 7.35 41.76 196 8.98 46.84 10 -0.72 230 122 6.23 47.98 180 8.25 55.09 11 -0.57 240 128 6.53 54.52 157 7.20 62.28 12 -0.42 250 98 5.00 59.52 122 5.59 67.87 13 -0.28 260 88 4.49 64.01 106 4.86 72.73 14 -0.14 269 87 4.44 68.45 93 4.26 76.99 15 0.00 279 76 3.88 72.33 67 3.07 80.06 16 0.14 288 64 3.27 75.60 62 2.84 82.91 17 0.28 298 52 2.65 78.25 66 3.02 85.93 18 0.42 307 76 3.88 82.13 57 2.61 88.54 19 0.57 317 48 2.45 84.58 34 1.56 90.10 20 0.72 328 49 2.50 87.09 41 1.88 91.98 21 0.88 338 50 2.55 89.64 28 1.28 93.26 22 1.05 350 49 2.50 92.14 30 1.37 94.64 23 1.23 362 51 2.60 94.74 35 1.60 96.24 24 1.44 376 31 1.58 96.32 32 1.47 97.71 25 1.67 392 31 1.58 97.91 20 0.92 98.63 26 1.93 410 23 1.17 99.08 20 0.92 99.54 27 2.27 432 13 0.66 99.74 6 0.27 99.82 28 2.71 463 3 0.15 99.90 2 0.09 99.91 29 3.45 500 1 0.05 99.95 2 0.09 100.00 30 4.86 500 1 0.05 100.00 1959 100 2182 100
    • Table A18: Community Studies Scores and Frequencies Raw Theta Scale Pilot Group Comparison Group Score Score Freq. % Cum. % Freq. % Cum. % 0 -4.87 100 219 11.24 11.24 104 4.84 4.84 1 -3.46 100 4 0.21 11.44 7 0.33 5.17 2 -2.72 100 8 0.41 11.85 9 0.42 5.59 3 -2.27 100 20 1.03 12.88 17 0.79 6.38 4 -1.94 118 38 1.95 14.83 45 2.09 8.47 5 -1.67 141 67 3.44 18.27 65 3.03 11.50 6 -1.44 161 109 5.59 23.86 93 4.33 15.83 7 -1.24 179 128 6.57 30.43 129 6.01 21.83 8 -1.06 195 137 7.03 37.46 183 8.52 30.35 9 -0.88 210 110 5.64 43.10 176 8.19 38.55 10 -0.72 224 143 7.34 50.44 174 8.10 46.65 11 -0.57 237 118 6.05 56.49 166 7.73 54.38 12 -0.42 250 117 6.00 62.49 126 5.87 60.24 13 -0.28 262 82 4.21 66.70 132 6.15 66.39 14 -0.14 275 79 4.05 70.75 107 4.98 71.37 15 0.00 287 66 3.39 74.14 97 4.52 75.88 16 0.14 299 56 2.87 77.01 85 3.96 79.84 17 0.28 311 78 4.00 81.02 65 3.03 82.87 18 0.43 324 65 3.34 84.35 89 4.14 87.01 19 0.57 337 60 3.08 87.43 76 3.54 90.55 20 0.73 350 47 2.41 89.84 52 2.42 92.97 21 0.89 364 58 2.98 92.82 45 2.09 95.07 22 1.06 379 41 2.10 94.92 30 1.40 96.46 23 1.24 395 47 2.41 97.33 25 1.16 97.63 24 1.44 412 19 0.97 98.31 25 1.16 98.79 25 1.67 432 17 0.87 99.18 11 0.51 99.30 26 1.94 456 10 0.51 99.69 11 0.51 99.81 27 2.27 485 5 0.26 99.95 4 0.19 100.00 28 2.72 500 1 0.05 100.00 29 3.45 500 30 4.69 500 Total 1949 100 2148 100
    • Appendix 4: Histograms by Subject and Group
    • EnglishPilot Group: Grade 5 Pre-test Comparison Group: Grade 5 Pre-test 150 200 150 100 Frequency Frequency 100 50 50 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Raw Score Raw ScorePilot Group: Grade 5 Post-test Comparison Group: Grade 5 Post-test 150 200 150 100 Frequency Frequency 100 50 50 0 0 5 10 15 20 25 30 0 0 5 10 15 20 25 30 Raw Score Raw Score
    • Social and Developmental StudiesPilot Group: Grade 5 Pre-test Comparison Group: Grade 5 Pre-test 250 300 250 200 200 Frequency 150 Frequency 150 100 100 50 50 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Raw Score Raw ScorePilot Group: Grade 5 Post-test Comparison Group: Grade 5 Post-test 250 300 200 200 Frequency Frequency 150 100 100 50 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Raw Score Raw Score
    • MathematicsPilot Group: Grade 5 Pre-test Comparison Group: Grade 5 Pre-test 150 200 150 100 Frequency Frequency 100 50 50 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Raw Score Raw ScorePilot Group: Grade 5 Post-test Comparison Group: Grade 5 Post-test 200 150 150 Frequency Frequency 100 100 50 50 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Raw Score Raw Score
    • Integrated SciencePilot Group: Grade 5 Pre-test Comparison Group: Grade 5 Pre-test 250 300 200 200 Frequency Frequency 150 100 100 50 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Raw Score Raw ScorePilot Group: Grade 5 Post-test Comparison Group: Grade 5 Post-test 250 300 200 200 Frequency 150 Frequency 100 100 50 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Raw Score Raw Score
    • Creative and Technology StudiesPilot Group: Grade 5 Pre-test Comparison Group: Grade 5 Pre-test 250 250 200 200 Frequency Frequency 150 150 100 100 50 50 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Raw Score Raw ScorePilot Group: Grade 5 Post-test Comparison Group: Grade 5 Post-test 250 250 200 200 Frequency 150 Frequency 150 100 100 50 50 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Raw Score Raw Score
    • Community StudiesPilot Group: Grade 5 Pre-test Comparison Group: Grade 5 Pre-test 250 300 250 200 200 Frequency Frequency 150 150 100 100 50 50 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Raw Score Raw ScorePilot Group: Grade 5 Post-test Comparison Group: Grade 5 Post-test 250 300 250 200 200 Frequency Frequency 150 150 100 100 50 50 0 0 0 5 10 15 20 25 30 0 5 10 15 20 25 30 Raw Score Raw Score