Assessment in education at Secondary level
1.Functions of grading
- This book is about designing classroom grading systems that are both precise and efficient.
One of the first steps to this end is to clarify the basic purpose of grades. How a school or
district defines the purpose of grades dictates much of the form and function of grades.
- Measurement experts such as Peter Airasian (1994) explain that educators use grades
primarily (1) for administrative purposes, (2) to give students feedback about their progress
and achievement, (3) to provide guidance to students about future course work, (4) to
provide guidance to teachers for instructional planning, and (5) to motivate students.
Administrative Purposes
For at least several decades, grades have served a variety of administrative functions (Wrinkle,
1947), most dealing with district-level decisions about students, including
• Student matriculation and retention.
• Placement when students transfer from one school to another.
• Student entrance into college.
Research indicates that some districts explicitly make note of the administrative function of
grades. For example, in a study of school board manuals, district guidelines, and handbooks for
teaching, researchers Susan Austin and Richard McCann (1992) found the explicit mention of
administration as a basic purpose for grades in 7 percent of school board documents, 10 percent
of district guidelines, and 4 percent of handbooks for teachers. Finally, in a survey conducted by
The College Board (1998), over 81 percent of the schools reported using grades for
administrative purposes.
Feedback About Student Achievement
One of the more obvious purposes for grades is to provide feedback about student achievement.
Studies have consistently shown support for this purpose. For example, in 1976, Simon and
Bellanca reported that both educators and noneducators perceived providing information about
student achievement as the primary purpose of grading. In a 1989 study of high school teachers,
Stiggins, Frisbie, and Griswold reported that this grading function—which they refer to as the
information function—was highly valued by teachers. Finally, the study by Austin and McCann
(1992) found that 25 percent of school board documents, 45 percent of district documents, and
65 percent of teacher documents mentioned reporting student achievement as a basic purpose of
grades.
Guidance
When used for guidance purposes, grades help counselors provide direction for students
(Wrinkle, 1947; Terwilliger, 1971). Specifically, counselors use grades to recommend to
individual students courses they should or should not take and schools and occupations they
might consider (Airasian, 1994). Austin and McCann (1992) found that 82 percent of school
board documents, 40 percent of district documents, and 38 percent of teacher documents
identified guidance as an important purpose of grades.
Instructional Planning
Teachers also use grades to make initial decisions about student strengths and weaknesses in
order to group them for instruction. Grading as a tool for instructional planning is not commonly
mentioned by measurement experts. However, the Austin and McCann (1992) study reported
that 44 percent of school board documents, 20 percent of district documents, and 10 percent of
teacher documents emphasized this purpose.
Motivation
Those who advocate using grades to motivate students assume that they encourage students to try
harder both from negative and positive perspectives. On the negative side, receiving a low grade
is believed to motivate students to try harder. On the positive side, it is assumed that receiving a
high grade will motivate students to continue or renew their efforts.
As discussed later in this chapter, some educators object strongly to using grades as motivators.
Rightly or wrongly, however, this purpose is manifested in some U.S. schools. For example,
Austin and McCann (1992) found that 7 percent of school board documents, 15 percent of
district-level documents, and 10 percent of teacher documents emphasized motivation as a
purpose for grades
.
Type of Grading,Definition,Historical Background-
1.Percentage grading
Using a percentage scale (percent of 100), usually based on percent correct on exams and/or
percent of points earned on assignments
• Most common method in use in high schools and grading colleges c.1890–1910.
• Used today as a grading method or as a way of arriving at letter grades.
2.Letter grading and variations
Using a series of letters (often A, B, C, D, F) or letters with plusses and minuses as an ordered
category scale - can be done in a norm-referenced (standards-based) manner
• Yale used a four-category variations system in 1813.
• In the 1920 letter grading was seen as the solution to the problem of reliability of percentage
grading (fewer or criterion-referenced categories) and was increasingly adopted.
3.Norm-referenced grading
Comparing students to each other; using class standing as the basis for assigning grades (usually
letter grades)
• Was advocated in early grading 1900s as scientific measurement.
• Educational disadvantages were known by the 1930s.
4.Mastery grading
Grading students as “masters” or “passers” when their attainment reaches a prespecified level,
usually allowing different amounts of time for different students to reach mastery
• Originating in the 1920s (e.g., Morrison, 1926) as a grading strategy, it became associated with
the educational strategy of mastery learning (Bloom, Hastings, & Madaus, 1971).
5.Pass/Fail
Using a scale with two levels (pass and fail), sometimes in connection with mastery grading
• In 1851, the University of Michigan experimented with pass/fail grading for classes.
6.Standards (or Absolute-Standards)
gradingOriginally, comparing student performance to a preestablished standard (level) of
performance; currently, standards grading sometimes means grading with reference to a list of
state or district content standards according to preestablished performance levels
• Grading according to standards of performance has been championed since the grading 1930s as
more educationally sound than norm-referenced grading.
• Current advocates of standards grading use the same principle but the term "standard” is now
used for the criterion itself, not the level of performance.
• Since 2002, the scales on some standards-based report cards use the state accountability
(proficiency) reporting categories instead of letters.
7.Narrative grading
Writing comments about students’ achievement, either in addition to or instead of using numbers
or letters
• Using a normal instructional practice (describing students’ work) in an assessment context.
Progress Report
a written document that explains how much progress is being made on something you have
previously planned:
. These reports will be staggered throughout the year and we have done our best to ensure that
key points in a student’s development, such as making option choices, are supported with an
appropriate report.
We do not provide an end of year report for all students as it would not be possible for teachers
to write reports on every student they teach at one time. Furthermore, we do not believe that a
summative* end of year report is as valuable for a students’ development as providing a
formative** report that can give them advice on how to improve and crucially, time to work on
those developments.
All reports will provide information on the following:
• Attendance
• A record of the number of positive events
• A record of the number of negative events
• An end of year attainment estimate
• A teacher assessment of current attainment
• A teacher assessment of Learner Characteristics
Further to these points once a year teachers will write a brief statement about strengths and areas
for development.
The learner characteristics we will grade for each report are:
• Attitude to Learning
• Communication Skills
• Homework Quality
• Personal Organisation
• Presentation of Work
Each of these is assessed on a scale from one to five, with one being ‘unacceptable’ and five
being ‘exceptional’. Further information and detail is contained within the document below.
*A summative report is given at the end of a period of study. It would state how well the student
has done but would not give advice on how to improve. If a comment was given on how to
improve the student would not have the opportunity to work on this development.
** A formative report is given during a period of study. It would state how well a student is
doing and would give advice on how to make further progress. The student then has further time
to work on the advice given in the report.
Perspectives on assessment
Assessment is at the centre of the student's experience. It provides a means of evaluating student
progress and achievement; it drives the activity of the student and therefore their learning. This
collection of short presentations is intended to provoke debate about assessment. Over the last
few years, those involved in developing assessment have generated some new perspectives
which, as yet, have not been fully incorporated within mainstream practice. There has been a gap
between the emerging understandings of 'reflective practitioners' and educational developers and
those who are setting assessment policy and defining practice. We would like to close that gap.
In order to do so we have set out some 'challenging perspectives' in short talks. They are
intended to be contentious but well grounded. Each Web talk is an introduction to an idea that we
hope you will pursue using the references provided. The talks may be used by individuals or
serve as a catalyst for a group discussion, for example in a workshop. Please feel free to
comment. We haven?t covered all the ground - far from it - and hope that others might add to
this collection.
Purposes of assessment
Teaching and learning
The primary purpose of assessment is to improve students’ learning and teachers’ teaching as
both respond to the information it provides. Assessment for learning is an ongoing process that
arises out of the interaction between teaching and learning.
What makes assessment for learning effective is how well the information is used.
System improvement
Assessment can do more than simply diagnose and identify students’ learning needs; it can be
used to assist improvements across the education system in a cycle of continuous improvement:
• Students and teachers can use the information gained from assessment to determine their next
teaching and learning steps.
• Parents, families and whānau can be kept informed of next plans for teaching and learning and
the progress being made, so they can play an active role in their children’s learning.
• School leaders can use the information for school-wide planning, to support their teachers and
determine professional development needs.
• Communities and Boards of Trustees can use assessment information to assist their
governance role and their decisions about staffing and resourcing.
• The Education Review Office can use assessment information to inform their advice for
school improvement.
• The Ministry of Education can use assessment information to undertake policy review and
development at a national level, so that government funding and policy intervention is
targeted appropriately to support improved student outcomes.
1. Assessment for Learning (Formative)
The purpose of Formative Assessment is to provide students with feedback on how they are
going. The aim is to help students improve their performance and make their next piece of
assessed work better. It is developmental or formative in nature; hence the term "Formative
Assessment".
The feedback students receive is the key component of formative assessment. Feedback is
intended to help them identify weaknesses and build on strengths to improve the quality of their
next piece of assessment. The focus is on comments for improvement, not marks, and the
awarding of marks in formative assessment can actually be counterproductive.
2. Assessment for Certification (Summative)
Another key purpose of assessment is to gather evidence to make a judgment about a student's
level of performance; against the specified learning objectives.
Students are usually assessed at the end of an element of learning, such as the end of a module,
mid semester or end of semester. They are awarded results typically as marks or grades to
represent a particular level of achievement (high, medium, low). This judgmental "summative"
process formally provides the evidence, to verify or "certify" which students may progress to the
next level of their studies.
3. Protect Academic Standards
Grades from cumulative assessments are used to certify that a person has the necessary
knowledge and skills (and can apply them appropriately) to be awarded a qualification.
Consequently, the quality and integrity of assessment is essential to guarantee the credibility of
qualifications and the academic reputation of the issuing Institution. There is considerable local,
national and international concern to ensure that the ways we protect academic standards stand
up to scrutiny.
4. Feedback for Teaching
The results from both formative and summative assessments can help you track how your
students are going throughout your courses. Closely looking at the results can help you identify
any patterns of difficulties or misunderstandings students might have. This in turn allows you
alter your approach to teaching and adjust your curriculum accordingly. For example, you may
identify that you need to offer more detailed explanations or provide additional resources in a
particular area.
Continuous and comprehensive evaluation
Concept and Importance
Continuous and comprehensive evaluationwas a process of assessment, mandated by the Right to
Education Act, of India. This approach to assessment has been introduced by state governments
in India, as well as by the Central Board of Secondary Education in India, for students of sixth to
tenth grades and twelfth in some schools. The Karnataka government has introduced CCE for
grades 1 through 9 later it was also introduced for 12th grades students. The main aim of CCE is
to evaluate every aspect of the child during their presence at the school. This is believed to help
reduce the pressure on the child during/before examinations as the student will have to sit for
multiple tests throughout the year, of which no test or the syllabus covered will be repeated at the
end of the year, whatsoever. The CCE method is claimed to bring enormous changes from the
traditional chalk and talk method of teaching, provided it is implemented accurately. In 2017, the
CCE system was cancelled for students appearing in the Class 10 Board Exam for 2017-18,
bringing back compulsory Annual Board Exam and removing the Formative and Summative
Assessments under the Remodeled Assessment Pattern.[1]
As a part of this new system, student's marks will be replaced by grades which will be evaluated
through a series of curricular and extra-curricular evaluations along with academics. The aim is
to decrease the workload on the student by means of continuous evaluation by taking number of
small tests throughout the year in place of single test at the end of the academic program. Only
Grades are awarded to students based on work experience skills, dexterity, innovation,
steadiness, teamwork, public speaking, behavior, etc. to evaluate and present an overall measure
of the student's ability. This helps the students who are not good in academics to show their
talent in other fields such as arts, humanities, sports, music, athletics, and also helps to motivate
the students who have a thirst of knowledge
Unlike CBSE's old pattern of only one test at the end of the academic year, the CCE conducts
several. There are two different types of tests. Namely, the formative and the summative.
Formative tests will comprise the student's work at class and home, the student's performance in
oral tests and quizzes and the quality of the projects or assignments submitted by the child.
Formative tests will be conducted four times in an academic session, and they will carry a 40%
weightage for the aggregate. In some schools, an additional written test is conducted instead of
multiple oral tests. However, at least one oral test is conducted.
The summative assessment is a three-hour long written test conducted twice a year. The first
summative or Summative Assessment 1 (SA-1) will be conducted after the first two formatives
are completed. The second (SA-2) will be conducted after the next two formatives. Each
summative will carry a 30% weightage and both together will carry a 60% weightage for the
aggregate. The summative assessment will be conducted by the schools itself. However, the
question papers will be partially prepared by the CBSE and evaluation of the answer sheets is
also strictly monitored by the CBSE. Once completed, the syllabus of one summative will not be
repeated in the next. A student will have to concentrate on totally new topics for the next
summative.
At the end of the year, the CBSE processes the result by adding the formative score to the
summative score, i.e. 40% + 60% = 100%. Depending upon the percentage obtained, the board
will deduce the CGPA (Cumulative Grade Point Average) and thereby deduce the grade
obtained. In addition to the summative assessment, the board will offer an optional online
aptitude test that may also be used as a tool along with the grades obtained in the CCE to help
students to decide the choice of subjects in further studies. The board has also instructed the
schools to prepare the report card and it will be duly signed by the principal, the student.
• Deductive Method - What does the student know and how can he use it to explain a situation.
• Co-relation with a real-life situation - Whether the situation given matches any real-life
situation, like tsunamis, floods, tropical cyclones, etc.
• Usage of Information Technology - Can the problem be solved with the use of IT? If yes,
how?
In addition to that, various assignments can be given such as projects, models and charts, group
work, worksheet, survey, seminar, etc. The teacher will also play a major role. For example, they
give remedial help, maintain a term-wise record and checklists, etc.
Assessment for learning
Assessment for Learning is the process of seeking and interpreting evidence for use by learners
and their teachers to decide where the learners are in their learning, where they need to go and
how best to get there.
Assessment for learning is best described as a process by which assessment information is used
by teachers to adjust their teaching strategies, and by students to adjust their learning strategies.
Assessment, teaching, and learning are inextricably linked, as each informs the others.
Assessment is a powerful process that can either optimise or inhibit learning, depending on how
it’s applied.
For teachers
Assessment for learning helps teachers gather information to:
• plan and modify teaching and learning programmes for individual students, groups of
students, and the class as a whole
• pinpoint students’ strengths so that both teachers and students can build on them
• identify students’ learning needs in a clear and constructive way so they can be addressed
• involve parents, families, and whānau in their children's learning.
For students
Assessment for learning provides students with information and guidance so they can plan and
manage the next steps in their learning.
Assessment for learning uses information to lead from what has been learned to what needs to be
learned next.
Describing assessment for learning
Assessment for learning should use a range of approaches. These may include:
• day-to-day activities, such as learning conversations
• a simple mental note taken by the teacher during observation
• student self and peer assessments
• a detailed analysis of a student’s work
• assessment tools, which may be written items, structured interview questions, or items
teachers make up themselves.
What matters most is not so much the form of the assessment, but how the information gathered
is used to improve teaching and learning
Testing, Assessment, Measurement and Definition
The definitions for each are:
Test: A method to determine a students ability to complete certain tasks or demonstrate
masteryof a skill or knowledge of content. Some types would be multiple choice tests, or a
weeklyspelling test. While it is commonly used interchangeably with assessment, or even
evaluation, itcan be distinguished by the fact that a test is one form of an assessment
Assessment: The process of gathering information to monitor progress and make educational
decisions if necessary. As noted in my definition of test, an assessment may include a test,
butalso includes methods such as observations, interviews, behavior monitoring, etc.
Measurement:beyond its general definition, refers to the set of procedures and the principles
forhow to use the procedures in educational tests and assessments. Some of the basic principlesof
measurement in educational evaluations would be raw scores, percentile ranks, derivedscores,
standard scores, etc
Assessment
In education, the term assessment refers to the wide variety of methods or tools that educators
use to evaluate, measure, and document the academic readiness, learning progress, skill
acquisition, or educational needs of students.
• Assessment involves the use of empirical data on student learning to refine programs and
improve student learning.
• Assessment is the process of gathering and discussing information from multiple and
diverse sources in order to develop a deep understanding of what students know,
understand, and can do with their knowledge as a result of their educational experiences;
the process culminates when assessment results are used to improve subsequent
learning. Assessment is the systematic basis for making inferences about the learning and
development of students. It is the process of defining, selecting, designing, collecting,
analyzing, interpreting, and using information to increase students’ learning and
developmentAssessment is the systematic collection, review, and use of information about
educational programs undertaken for the purpose of improving student learning and
development.
Characteristics
• Learner-Centered
• The primary attention of teachers is focused on observing and improving learning.
• Teacher-Directed
• Individual teachers decide what to assess, how to assess, and how to respond to
the information gained through the assessment
• Teachers do not need to share results with anyone outside of the class.
• Mutually Beneficial
• Students are active participants.
• Students are motivated by the increased interest of faculty in their success as
learners.
• Teachers improve their teaching skills and gain new insights.
• Formative
• Assessments are almost never "graded".
• Assessments are almost always anonymous in the classroom and often
anonymous online.
• Assessments do not provide evidence for evaluating or grading students.
• Context-Specific
• Assessments respond to the particular needs and characteristics of the teachers,
students, and disciplines to which they are applied.
• Customize to meet the needs of your students and course.
• Ongoing
• Classroom assessment is a continuous process.
• Part of the process is creating and maintaining a classroom "feedback loop"
• Each classroom assessment event is of short duration.
• Rooted in Good Teaching Practice
• Classroom assessment builds on good practices by making feedback on students'
learning more systematic, more flexible, and more effective.
Test
• A test or examination (informally, exam or evaluation) is an assessment intended to
measure a test-taker's knowledge, skill, aptitude, physical fitness, or classification in
many other topics (e.g., beliefs).[1]
A test may be administered verbally, on paper, on
a computer, or in a confined area that requires a test taker to physically perform a set of
skills. Tests vary in style, rigor and requirements. For example, in a closed book test, a
test taker is often required to rely upon memory to respond to specific items whereas in
an open book test, a test taker may use one or more supplementary tools such as a
reference book or calculator when responding to an item. A test may be administered
formally or informally. An example of an informal test would be a reading test
administered by a parent to a child. An example of a formal test would be a final
examination administered by a teacher in a classroom or an I.Q. test administered by a
psychologist in a clinic. Formal testing often results in a grade or a test score.[2]
A test
score may be interpreted with regards to a norm or criterion, or occasionally both. The
norm may be established independently, or by statistical analysis of a large number of
participants. An exam is meant to test a child's knowledge or willingness to give time to
manipulate that subject.
• A standardized test is any test that is administered and scored in a consistent manner to
ensure legal defensibility.[3]
Standardized tests are often used in education, professional
certification, psychology (e.g., MMPI), the military, and many other fields.
• A non-standardized test is usually flexible in scope and format, variable in difficulty and
significance. Since these tests are usually developed by individual instructors, the format
and difficulty of these tests may not be widely adopted or used by other instructors or
institutions. A non-standardized test may be used to determine the proficiency level of
students, to motivate students to study, and to provide feedback to students. In some
instances, a teacher may develop non-standardized tests that resemble standardized tests
in scope, format, and difficulty for the purpose of preparing their students for an
upcoming standardized test.[4]
Finally, the frequency and setting by which a non-
standardized tests are administered are highly variable and are usually constrained by the
duration of the class period. A class instructor may for example, administer a test on a
weekly basis or just twice a semester. Depending on the policy of the instructor or
institution, the duration of each test itself may last for only five minutes to an entire class
period.
• In contrasts to non-standardized tests, standardized tests are widely used, fixed in terms
of scope, difficulty and format, and are usually significant in consequences. Standardized
tests are usually held on fixed dates as determined by the test developer, educational
institution, or governing body, which may or may not be administered by the instructor,
held within the classroom, or constrained by the classroom period. Although there is little
variability between different copies of the same type of standardized test
(e.g., SAT or GRE), there is variability between different types of standardized tests.
• Any test with important consequences for the individual test taker is referred to as a high-
stakes test.
• A test may be developed and administered by an instructor, a clinician, a governing body,
or a test provider. In some instances, the developer of the test may not be directly
responsible for its administration. For example, Educational Testing Service (ETS), a
nonprofit educational testing and assessment organization, develops standardized tests
such as the SAT but may not directly be involved in the administration or proctoring of
these tests. As with the development and administration of educational tests, the format
and level of difficulty of the tests themselves are highly variable and there is no general
consensus or invariable standard for test formats and difficulty. Often, the format and
difficulty of the test is dependent upon the educational philosophy of the instructor,
subject matter, class size, policy of the educational institution, and requirements of
accreditation or governing bodies. In general, tests developed and administered by
individual instructors are non-standardized whereas tests developed by testing
organizations are standardized.
Characteristics of Test
Reliable
Reliability refers to the accuracy of the obtained test score or to how close the obtained scores
for individuals are to what would be their “true” score, if we could ever know their true score.
Thus, reliability is the lack of measurement error, the less measurement error the better. The
reliability coefficient, similar to a correlation coefficient, is used as the indicator of the reliability
of a test. The reliability coefficient can range from 0 to 1, and the closer to 1 the better.
Generally, experts tend to look for a reliability coefficient in excess of .70. However, many tests
used in public safety screening are what is referred to as multi-dimensional. Interpreting the
meaning of a reliability coefficient for a knowledge test based on a variety of sources requires a
great deal of experience and even experts are often fooled or offer incorrect interpretations.
There are a number of types of reliability, but the type usually reported is internal consistency or
coefficient alpha. All things being equal, one should look for an assessment with strong evidence
of reliability, where information is offered on the degree of confidence you can have in the
reported test score.
Valid
Validity will be the topic of our third primer in the series. In the selection context, the term
“validity” refers to whether there is an expectation that scores on the test have a demonstrable
relationship to job performance, or other important job-related criteria. Validity may also be used
interchangeably with related terms such as “job related” or “business necessity.” For now, we
will state that there are a number of ways of evaluating validity including:
▪ Content
▪ Criterion-related
▪ Construct
▪ Transfer or transportability
▪ Validity generalization
A good test will offer extensive documentation of the validity of the test.
Practical
A good test should be practical. What defines or constitutes a practical test? Well, this would be
a balancing of a number of factors including:
▪ Length – a shorter test is generally preferred
▪ Time – a test that takes less time is generally preferred
▪ Low cost – speaks for itself
▪ Easy to administer
▪ Easy to score
▪ Differentiates between candidates – a test is of little value if all the applicants obtain the
same score
▪ Adequate test manual – provides a test manual offering adequate information and
documentation
▪ Professionalism – is produced by test developers possessing high levels of expertise
The issue of the practicality of a test is a subjective judgment, which will be impacted by the
constraints facing the public-sector jurisdiction. A test that may be practical for a large city with
10,000 applicants and a large budget, may not be practical for a small town with 10 applicants
and a miniscule testing budget.
Socially Sensitive
A consideration of the social implications and effects of the use of a test is critical in public
sector, especially for high stakes jobs such as public safety occupations. The public safety
assessment professional must be considerate of and responsive to multiple group of stakeholders.
In addition, in evaluating a test, it is critical that attention be given to:
▪ Avoiding adverse Impact – Recent events have highlighted the importance of balance in
the demographics of safety force personnel. Adverse impact refers to differences in the
passing rates on exams between males and females, or minorities and majority group
members. Tests should be designed with an eye toward the minimization of adverse
impact..
▪ Universal Testing – The concept behind universal testing is that your exams should be able
to be taken by the most diverse set of applicants possible, including those with disabilities
and by those who speak other languages. Having a truly universal test is a difficult, if not
impossible, standard to meet. However, organizations should strive to ensure that testing
locations and environments are compatible with the needs of as wide a variety of
individuals as possible. In addition, organizations should have in place committees and
procedures for dealing with requests for accommodations.
Candidate Friendly
One of the biggest changes in testing over the past twenty years has been the increased attention
paid to the candidate experience. Thus, your tests should be designed to look professional and be
easy to administer. Furthermore, the candidate should see a clear connection between the exams
and the job. As the candidate completed the selection battery, you want the reaction to be “That
was a fair test, I had an opportunity to prove why I deserve the job, and this is the type of
organization where I would like to work.”
Measurement
Measurement is the assignment of a number to a characteristic of an object or event, which can
be compared with other objects or events.
The scope and application of a measurement is dependent on the context and discipline. In
the natural sciences and engineering, measurements do not apply to nominal properties of objects
or events, which is consistent with the guidelines of the International vocabulary of
metrology published by the International Bureau of Weights and Measures.
However, in other fields such as statisticsas well as the social and behavioral sciences,
measurements can have multiple levels, which would include nominal, ordinal, interval, and ratio
scales.
Measurement is a cornerstone of trade, science, technology, and quantitative research in many
disciplines. Historically, many measurement systems existed for the varied fields of human
existence to facilitate comparisons in these fields. Often these were achieved by local agreements
between trading partners or collaborators. Since the 18th century, developments progressed
towards unifying, widely accepted standards that resulted in the modern International System of
Units (SI). This system reduces all physical measurements to a mathematical combination of
seven base units. The science of measurement is pursued in the field of metrology.
Characteristic # 1. In educational measurement there is no absolute zero point:
In educational measurement there is no absolute zero point. It is relative to some arbitrary
standard. For example a student has secured ‘O’ in a test of mathematics. It does not mean that
he has ‘O’ knowledge in mathematics. Because he may secured 30 in another test, which is
easier than the first one. As the zero point is not fixed so we cannot say that a student with a
score of ’60’ has doubled the knowledge of a student with a score of ’30’.
Characteristic # 2. The units are not definite in educational measurement:
In educational measurement the units are not definite, so we may not obtain the same value for
every person. Because the test vary in their content and difficulty level. Therefore one individual
may perform differently on different tests and different individuals may perform differently on
one test.
Characteristic # 3. It conveys a sense of infinity:
It means we cannot measure the whole of an attribute of an individual. Generally the scores
obtained from a measurement are observed scores which contains measurement errors. So that
true score is infinite and unknown.
Characteristic # 4. It is a process of assigning symbols:
Measurement is a process of assigning symbols to observations in some meaningful and
consistent manner. In measurement generally we compare with certain standard unit or criteria
which have an universal acceptability.
Characteristic # 5. It cannot be measured directly:
In case of educational measurement we cannot measure for attribute directly. It is observed
through behaviour. For example (he reading ability of an individual can only be measured when
he is asked to read a written material.
Characteristic # 6. It is a means to an end but not an end itself:
The objective of educational measurement is not just to measure a particular attribute. Rather it is
done to evaluate to what extent different objectives have been achieved.
Principles of assessment
Reliability
If a particular assessment were totally reliable, assessors acting independently using the same
criteria and mark scheme would come to exactly the same judgment about a given piece of work.
In the interests of quality assurance, standards and fairness, whilst recognising that complete
objectivity is impossible to achieve, when it comes to summative assessment it is a goal worth
aiming for. To this end, what has been described as the 'connoisseur' approach to assessment
(like a wine-taster or tea-blender of many years experience, not able to describe exactly what
they are looking for but 'knowing it when they find it') is no longer acceptable. Explicitness in
terms of learning outcomes and assessment criteria is vitally important in attempting to achieve
reliability. They should be explicit to the students when the task is set, and where there are
multiple markers they should be discussed, and preferably used on some sample cases prior to be
using used 'for real'.
Validity
Just as important as reliability is the question of validity. Does the assessed task actually assess
what you want it to? Just because an exam question includes the instruction 'analyse and
evaluate' does not actually mean that the skills of analysis and evaluation are going to be
assessed. They may be, if the student is presented with a case study scenario and data they have
never seen before. But if they can answer perfectly adequately by regurgitating the notes they
took from the lecture you gave on the subject then little more may be being assessed than the
ability to memorise. There is an argument that all too often in British higher education we assess
the things which are easy to assess, which tend to be basic factual knowledge and comprehension
rather than the higher order objectives of analysis, synthesis and evaluation.
Relevance and transferability
There is much evidence that human beings do not find it easy to transfer skills from one context
to another, and there is in fact a debate as to whether transferability is in itself a separate skill
which needs to be taught and learnt. Whatever the outcome of that, the transfer of skills is
certainly more likely to be successful when the contexts in which they are developed and used
are similar. It is also true to say that academic assessment has traditionally been based on a fairly
narrow range of tasks with arguably an emphasis on knowing rather than doing; it has therefore
tended to develop a fairly narrow range of skills. For these two reasons, when devising an
assessment task it is important that it both addresses the skills you want the student to develop
and that as much as possible it puts them into a recognisable context with a sense of 'real
purpose' behind why the task would be undertaken and a sense of a 'real audience', beyond the
tutor, for whom the task would be done.
Criterion v Norm referenced assessment
In criterion-referenced assessment particular abilities, skills or behaviours are each specified as a
criterion which must be reached. The driving test is the classic example of a criterion-referenced
test. The examiner has a list of criteria each of which must be satisfactorily demonstrated in
order to pass - completing a three-point turn without hitting either kerb for example. The
important thing is that failure in one criterion cannot be compensated for by above average
performance in others; neither can you fail despite meeting every criterion simply because
everybody else that day surpassed the criteria and was better than you.
Norm-referenced assessment makes judgments on how well the individual did in relation to
others who took the test. Often used in conjunction with this is the curve of 'normal distribution'
which assumes that a few will do exceptionally well and a few will do badly and the majority
will peak in the middle as average. Despite the fact that a cohort may not fit this assumption for
any number of reasons (it may have been a poor intake, or a very good intake, they have been
taught well, or badly, or in introductory courses in particular you may have half who have done it
all before and half who are just starting the subject giving a bimodal distribution) there are even
some assessment systems which require results to be manipulated to fit.
The logic of a model of course design built on learning outcomes is that the assessment should
be criterion-referenced at least to the extent that sufficiently meeting each outcome becomes a
'threshold' minimum to passing the course. If grades and marks have to be generated, a more
complex system than pass/fail can be devised by defining the criteria for each grade either
holistically grade by grade, or grade by grade for each criterion (see below).
Writing and using assessment criteria
Assessment criteria describe how well a student has to be able to achieve the learning outcome,
either in order to pass (in a simple pass/fail system) or in order to be awarded a particular grade;
essentially they describe standards. Most importantly they need to be more than a set of
headings. Use of theory, for example, is not on its own a criterion. Criteria about theory must
describe what aspects of the use of theory are being looked for. You may value any one of the
following: the students' ability to make an appropriate choice of theory to address a particular
problem, or to give an accurate summary of that theory as it applies to the problem, or to apply it
correctly, or imaginatively, or with originality, or to critique the theory, or to compare and
contrast it with other theories. And remember, as soon as you have more than one assessment
criterion you will also have to make decisions about their relative importance (or weighting).
Graded criteria are criteria related to a particular band of marks or honours classification or grade
framework such as Pass, Merit, Distinction. If you write these, be very careful about the
statement at the 'pass' level. Preferably start writing at this level and work upwards. The danger
in starting from, eg first class honours, is that as you move downwards, the criteria become more
and more negative. When drafted, ask yourself whether you would be happy for someone
meeting the standard expressed for pass, or third class, to receive an award from your institution.
Where possible, discuss draft assessment activities, and particularly criteria, with colleagues
before issuing them.
Once decided, the criteria and weightings should be given to the students at the time the task is
set, and preferably some time should be spent discussing and clarifying what they mean. Apart
from the argument of fairness, this hopefully then gives the student a clear idea of the standard
they should aim for and increases the chances they will produce a better piece of work (and
hence have learnt what you wanted them to). And feedback to the student on the work produced
should be explicitly in terms of the extent to which each criterion has been met.
Instructional Assessment Process
Instructional Assessment Process involves collection and analysis of data from six sources that,
when combined, present a comprehensive view of the current state of the school as it compares
to the underlying beliefs and principals that make up the Pedagogy of Confidence and lead to
school transformation. The six components are:
• School Background Pre-Interview Questionnaire
• Principal Interview
• Achievement Data
• Teacher Survey
• Student Survey
• Classroom Visits
School Background Pre-Interview Questionnaire
It gathers background information using a pre-interview questionnaire submitted by the
principal, the school’s School Improvement Plan (or similar document), and interviewing the
principal in person. The pre-interview questionnaire collects basic demographic data about the
school, the students and the faculty, as well as a brief history of current initiatives, school
organization and scheduling practices, special services, community partnerships and the like.
Principal Interview
It meets with the principal to review the questionnaire, to obtain more information about the
school and to learn the principal’s perspectives on the instructional program, students and
staff. Care is taken to ensure that the principal speaks first about the strengths of the school,
unique situations that exist within the school, recent changes that may be affecting the school, his
or her goals for the school and what he or she believes is needed to achieve those goals.
Achievement Data
It gathers and analyzes existing achievement data to uncover patterns over time and to correlate
with what constituents say about the school, how achievement data compares to state and district
achievement, and any other relevant comparisons.
Teacher Survey
It representative conducts the teacher survey during a schoolwide faculty meeting to
ensure consistency of administration and to explain to the faculty other data collection activities
that may be taking place at the school. The survey probes teachers’ perspectives on the school’s
climate and instructional program and seeks suggestions about how they, as a faculty, could best
serve their students, especially underachievers. Surveys are anonymous and make use of
multiple choice and open-ended questions that allow teachers leeway to express their inside
perspective on the instructional life of the school; their assessments of and attitudes toward
students, families and administration; recent and needed professional development initiatives;
and their preferred pedagogical approaches.
Student Survey
The student survey contains 20 items and is administered to all students following a prescribed
method of administration. Its purpose is to assess the school’s instructional program from the
students’ perspectives. The items invite response in five areas:
• Perspectives on myself as a learner
• School climate
• My teachers
• Classroom activities
• My preferred learning activities
Students are asked to strongly agree, agree, disagree, or strongly disagree with some statements
and to select their choices among others. It provides a summary of student survey responses for
ease of analysis.
Classroom Visits
A team of specially trained It representatives conducts classroom visitations that follow a
schedule intended to cover a broad spectrum of classes. Visitors note the activities in which
students are engaged, study the interactions between teacher and students, and attend to other
visible characteristics of the instructional program, including the physical environment of the
rooms. Approximately half the classes in a school are visited to help form a composite picture of
the current state of instruction. Teachers voluntarily participate in the visits and all data is
recorded without identifying individual teachers. Visitors concentrate on elements of effective
instruction that NUA knows to have positive effects on all students’ learning and that NUA finds
particularly important in raising the performance of underachieving students. A sample of these
elements includes:
• The learning engages students. Students comprehend and retain what they are taught most
effectively when they are engaged in classroom activities. Engagement is marked by willing
participation, expressions of interest and displays of enthusiasm, and results when students
find classroom activities and assignments highly meaningful and interesting. Instruction that
engages students has a positive effect on their achievement and increases the likelihood they
will develop into lifelong learners.
• Learning activities guide students to relate lesson content to their lives. Students benefit from
deliberately connecting what they are learning to what they know from their experience as
individuals and as members of the cultural groups with which they most closely identify.
Making such connections between the curriculum and what is personally relevant and
meaningful has a positive influence on students’ motivation to learn, on their confidence as
learners, and on their comprehension and retention of the material. Although the teacher can
suggest such connections, students benefit most by generating and expressing their own
connections.
• The learning includes students interacting with each other as learners.Working
collaboratively in pairs or small groups enables students to pool their knowledge as they
develop their understanding of curriculum material. Interacting productively with peers also
helps students stay attentive in class. In addition, collaborative work can increase students’
motivation to learn because of the support they get from their peers and the enjoyment that
results from peer interaction. Pair or small-group interactions may be used for solving
problems, discussing possible answers to a teacher’s question, generating new questions on a
topic being discussed before sharing ideas with the whole class, representing information that
has been learned in a creative way, and other such purposes.
• The learning promotes high-level thinking about lesson content. High-level thinking about
curriculum content helps students generate deeper and broader understandings while
developing their thinking capacities. Students’ learning is enhanced when they have frequent
opportunities to respond at length to thought-provoking questions, to engage in high-level
conversations with peers, and to ask their own questions about what they are learning to
clarify, refine and extend meanings. High-level thinking includes such mental processes as
hypothesizing, inferring, generalizing, analyzing, synthesizing and evaluating. Opportunities
to engage in such thinking are ideally part of daily instruction as well as integral to long-
term, complex projects.
Types of assessment procedure
• 1. Diagnostic Assessment (as Pre-Assessment)
• One way to think about it: Assesses a student’s strengths, weaknesses, knowledge, and
skills prior to instruction.
• Another way to think about it: A baseline to work from
• 2. Formative Assessment
• One way to think about it: Assesses a student’s performance during instruction, and
usually occurs regularly throughout the instruction process.
• Another way to think about it: Like a doctor’s “check-up” to provide data to revise
instruction
• 3. Summative Assessment
• One way to think about it: Measures a student’s achievement at the end of instruction.
• Another way to think about it: It’s macabre, but if formative assessment is the check-up,
you might think of summative assessment as the autopsy. What happened? Now that it’s
all over, what went right and what went wrong?
• 4. Norm-Referenced Assessment
• One way to think about it: Compares a student’s performance against other students (a
national group or other “norm”)
• Another way to think about it: Group or “Demographic” assessment
• 5. Criterion-Referenced Assessment
• One way to think about it: Measures a student’s performance against a goal, specific
objective, or standard.
• Another way to think about it: a bar to measure all students against
• 6. Interim/Benchmark Assessment
• One way to think about it: Evaluates student performance at periodic intervals, frequently
at the end of a grading period. Can predict student performance on end-of-the-year
summative assessments.
• Another way to think about it: Bar graph growth through a year
• Explanation
• Formative Assessment are informal and formal tests given by teachers during the learning
process. These specific assessment modifies the activities done in the classroom, so that
there is more student achievement. It identifies strengths and weakness and target areas
that needs work.
• Summative Assessment evaluates students learning at the end of an instructional unit
such as a chapter or specified topic. Final papers, midterms and final exams allow the
teachers to determine if you comprehended the information given correctly.
• Norm reference assessment compare student’s performance against a national or other
“norm” group.
• Performance base assessment requires students to solve real world problems or produce
something with real world application. These specific assessment allows the educator to
distinguish how well the students think critical and analytical as well as .Restricted
response is known to be more narrowly defined than extended response task. Examples
would be , multiple choice question and answers as opposed to extended response which
normally is connected to writing a report.
• Authentic assessment is the measurement of accomplishments that are worth while
compared to multiple choice standardized tests.
• Selective response assessment is also referred to as objective assessments including
multiple choice, matching, and true and false questions. It is very effective and efficient
methods for measuring students knowledge. It is a very common form of assessing the
students in th classroom.
• Supply response students must supply an answer to a question prompt.
• Criterion referenced test are designed to measure student performance against a fixed set
of predetermined criteria or learning standards.
Instructional decision
• Instructional Decisions are made to identify student’s instructional needs. This is a
general education initiative, and focuses on instruction by using data about student’s
responses to past instruction to guide future educational decisions. Decisions are
proactive approaches of providing early assistance to students with instructional needs
and matching the amount of resources to the nature of the student’s needs.฀1.
• Screening all students to ensure early identification of students needing extra assistance;
2. Seamless integration of general and special education services; and 3. A focus on
research based practices that match students needs.
• Teachers are constantly collecting informal and formal information about what and how
their student’s are learning. They check student test and assignments, listen to small-
group activities, and observe students engaged in structured and unstructured activities.
They use this information for a variety of purposes, ranging from communicating with
parents to meeting standards and benchmarks. However, when teachers systematically
collect the right kinds of information and use it effectively, they can help their student’s
grow as thinkers and learners
• The need for a complete review of the material; 2. Class discussion may reveal
misunderstanding that must be corrected on the spot; and 3. Interest in a topic may
suggest that more time should be spent on it than originally planned.
Selection assessment
• A selection assessment is the most frequently used type of assessment and part of a
selection procedure. The selection assessment often takes place towards the end of the
procedure, to test the candidates' suitability for the position in question.
• Thegoalofaselectionassessment
• A selection assessment is an attempt to get a better understanding of how the candidate
would perform in the position applied for. The assessment is used based on the idea that
suitability does not really show when using questionnaires, letters and interviews. This is
because candidates often will say what they think the employer wants to hear, so only
practical simulations can clearly demonstrate how a person responds in certain situations.
• Components
• The components of a selection assessment depend on the position being applied for. For
an executive position, the focus will be on testing the candidates' leadership qualities, for
other positions the emphasis can be, for example, on communication skills.
• Frequently used components of an assessment include the mailbox exercise, fact
finding and role-playing. Intelligence tests and interviews are often part of a selection
assessment as well. To prepare for an assessment, you can practice different tests. For
example, you can try the free IQ test.
• Assessmentreport
• Following the assessment, a report will be drafted describing the conclusions on each
candidate. As a candidate, you will always be the first to see this assessment report and
you have the right not to agree to the report being sent to the employer. However, if you
do not agree to this, your chances of getting the job will be practically nil.
• Assessmentcompanies
• Selection assessments are often performed by independent companies that conduct
assessments on behalf of different companies. In that case, the assessment will take place
in the offices of the assessment company. Some companies, especially larger ones,
organise their own assessments and in that case the assessment will take place in the
company itself.
• In the case of internal reorganisations, career assessments are often used. Read more
about the career assessment.
Placement and classification decisions
Selection is a personnel decision whereby an organization decides whether to hire
individuals using each person’s score on a single assessment, such as a test or interview, or a
single predicted performance score based on a composite of multiple assessments. Using this
single score to assign each individual to one of multiple jobs or assignments is referred to as
placement. An example of placement is when colleges assign new students to a particular level
of math class based on a math test score. Classification refers to the situation in which each of a
number of individuals is assigned to one of multiple jobs based on their scores on multiple
assessments. Classification refers to a complex set of personnel decisions and requires more
explanation.
A Conceptual Example
The idea of classification can be illustrated by an example. An organization has 50 openings in
four entry-level jobs: Word processor has 10 openings, administrative assistant has 12 openings,
accounting clerk has 8 openings, and receptionist has 20 openings. Sixty people apply for a job at
this organization and each completes three employment tests: word processing, basic accounting,
and interpersonal skills.
Generally, the goal of classification is to use each applicant’s predicted performance score for
each job to fill all the openings and maximize the overall predicted performance across all four
jobs. Linear computer programming approaches have been developed that make such
assignments within the constraints of a given classification situation such as the number of jobs,
openings or quotas for each job, and applicants. Note that in the example, 50 applicants would
get assigned to one of the four jobs and 10 applicants would get assigned to not hired.
Using past scores on the three tests and measures of performance, formulas can be developed to
estimate predicted performance for each applicant in each job. The tests differ in how well they
predict performance in each job. For example, the basic accounting test is fairly predictive of
performance in the accounting clerk job, but is less predictive of performance in the receptionist
job. Additionally, the word processing test is very predictive of performance in the word
processor job but is less predictive of performance in the receptionist job. This means that the
equations for calculating predicted performance for each job give different weights to each test.
For example, the equation for accounting clerk gives its largest weight to basic accounting test
scores, whereas the receptionist equation gives its largest weight to interpersonal skill test scores
and little weight to accounting test scores. Additionally, scores vary across applicants within
each test and across tests within each individual. This means that each individual will have a
different predicted performance score for each job.
One way to assign applicants to these jobs would be to calculate a single predicted performance
score for each applicant, select all applicants who have scores above some cutoff, and randomly
assign applicants to jobs within the constraints of the quotas. However, random assignment
would not take advantage of the possibility that each selected applicant will not perform equally
well on all available jobs. Classification takes advantage of this possibility. Classification
efficiency can be viewed as the difference in overall predicted performance between this
univariate (one score per applicant) strategy and the multivariate (one score per applicant per
job) classification approach that uses a different equation to predict performance for each job.
A number of parameters influence the degree of classification efficiency. An important one is the
extent to which predicted scores for each job are related to each other. The smaller the
relationships among predicted scores across jobs, the greater the potential classification
efficiency. That is, classification efficiency increases to the extent that multiple assessments
capture differences in the individual characteristics that determine performance in each job.
Policy decisions
Policy decisions are defined in management theory as those decisions that define the basic
principles of the organization and determine how it will develop and function in the future.
Policies set the limits within which operational decisions are made. Examples include:
• Vision, Mission, Aim
• Budget and Finance Practices
• Allocation of Resources
• Organizational Structure
Policy decisions limit the actions an organization and its members can take without changing the
policy.
In sociocracy, policy decisions are made by consent. Operational decisions are made within the
limits set by policy decisions and may be made autocratically by the person in charge or by other
means determined by the people whom the decisions affect.
Examples of Policy Statements
We set policies in our everyday lives without realizing it or writing them down. Examples
include:
• Deciding not to drink coffee or consume animal products
• Pledging to complete tax forms before their due date
• Sending your children to public schools by choice
• Deciding not to have children to devote time to political causes
In non-profit organizations the policies might include:
• Following the IRS regulations that set requirements for 501c3 status to receive tax-deductible
contributions
• Limiting membership to professionals with a demonstrated expertise
• Serving meals to the homeless
• Using contributions only for administrative costs and not staff salaries
In business they might include:
• Annual and departmental budgets
• Employee compensation schedules
• Union agreements
• Future donations of money and employee time to charitable causes
• Production of certain products and not others
• Limiting sales and marketing to retail or wholesale customers
These are all decisions that define the scope of day-to-day decisions about how we will conduct
our personal or work lives, our operations.
Counseling and guidance decisions
Decision making has always been a fundamental human activity.
At some stage within the career guidance planning process, decisions are made. The decision in
some cases might be to make far reaching changes, or perhaps the decision might be not to
change anything. In some cases, little change might ensue, but a decision has still been made,
even if the result, having considered the consequences, is not to change.
As a guide it is important to take into account that individual participants vary a great deal in
terms of how they make decisions, what factors are important to them, how ready they are to
make them and how far participants are prepared to live with uncertain outcomes.
The traditional way within guidance to handle decision making is to see it as a rational, almost
linear process. This is illustrated by the Janis and Mann model exemplified in the practical
exercise example mentioned below involving balance sheets. The aim is to encourage a rational
approach to planning for the future. Typically this involves an evaluation of available options
with a look at the pros and cons of each, taking account of the participant’s personal
circumstances.
In practice of course the process of making a decision is influenced by all sorts of things. In
everyday terms the decision making may in fact be driven by the irrational, the “quick fix”
solution and in some cases, prejudicial ideas, perhaps based upon ingrained or outdated ideas.
Gerard Egan describes this as the “shadow side” of decision making. De Bono’s thinking hats
exercise (see below) attempts to factor in some of the emotional and other factors linked to
decision making.
As individuals we can vary in the style of decision making we use. For some decisions we might
take a “logical” approach based upon the linear thinking mentioned above. For some decisions
we might make a “no thought” decision, either because the matter is so routine it doesn’t require
any thought, or in some occasions just to make a quick fix so we don’t have to think about it any
more. Sometimes participants in guidance interviews may talk about their realisation that they
should have looked into a decision further before rushing into one course of action. Some
individuals employ a hesitant style of decision making, where decisions are delayed as long as
possible, whereas others may make a choice based upon an emotional response, what feels right
subjectively. Finally some participants might make decisions that can be classified as compliant;
that is based upon the perceived expectations of what other people want. A key role in guidance
is to identify how a participant has made previous professional development decisions- and
whether the approach seems to have worked for them. Might there be other ways of deciding that
lead to better decisions?
Using Decision making exercises in a Guidance Setting
There is a broad range of tools to aid the decision making process within a professional
development discussion. Here are two introductory examples. Further examples are available via
the references and web sites below.
Balance sheet
In its simplest form this consists of two columns representing two choices. The advantages and
disadvantages of each choice can simply be listed. Sometimes the very act of writing down pros
and cons can bring clarity.
Sometimes subdividing the headings into Advantages for me, Advantages for others,
disadvantages for me, disadvantages for others can yield a richer analysis. Janis and Mann
suggest this process.
A slightly more sophisticated use of balance sheets might involve the participant completing the
sheet as above initially, then the adviser producing a list of other suggested factors that the
individual may not have considered at first. These can either be included, or ignored by the
participant.
An example of a simple balance sheet
Six thinking Hats
This tool was created by Edward de Bono in his book "6 Thinking Hats".
How to Use the Tool:
To use Six Thinking Hats to improve the quality of the participant’s decision-making; look at the
decision "wearing" each of the thinking hats in turn.
Each "Thinking Hat" is a different style of thinking. These are explained below:
White Hat:
With this thinking hat, the participant is encouraged to focus on the data available. Look at the
information they have about themselves and see what they can learn from it. Look for gaps in
your knowledge, and either try to fill them or take account of them.
This is where the participant is encouraged to analyze past experience, work roles etc. and try to
learn from this
Red Hat:
Wearing the red hat, the participant looks at the decision using intuition, gut reaction, and
emotion. The idea is also to encourage the participant to try to think how other people will react
emotionally to the decision being made, and try to understand the intuitive responses of people
who may not fully know your reasoning.
Black Hat:
When using black hat thinking, look at things pessimistically, cautiously and defensively. Try to
see why ideas and approaches might not work. This is important because it highlights the weak
points in a plan or course of action. It allows the participant to eliminate them, alter your
approach, or prepare contingency plans to counter problems that might arise. Black Hat thinking
can be one of the real benefits of using this technique within professional development planning,
as sometimes participants can get so used to thinking positively that often they cannot see
problems in advance, leaving them under-prepared for difficulties.
Yellow Hat:
The yellow hat helps you to think positively. It is the optimistic viewpoint that helps you to see
all the benefits of the decision and the value in it, and spot the opportunities that arise from it.
Yellow Hat thinking helps you to keep going when everything looks gloomy and difficult.
Green Hat:
The Green Hat stands for creativity. This is where you can develop creative solutions to a
problem. It is a freewheeling way of thinking, in which there is little criticism of ideas.
Blue Hat:
The Blue Hat stands for process control. This is the hat worn by people chairing meetings. When
running into difficulties because ideas are running dry, they may direct activity into Green Hat
thinking. When contingency plans are needed, they will ask for Black Hat thinking, and so on.
You can use Six Thinking Hats in guidane discussions. It is a way of encouraging participants to
look at decision making from different perspectives. This can be done either metaphorically -as
in “imagine you are wearing the white hat...” - or by having cards each with the name of the hat
and a brief description of the “way of looking at things” that the hat brings with it. The cards can
be shuffled and dealt to the participant in turn. By doing this the guide is encouraging the
participant to consider a decision from a range of perspectives
Assembling, Administering and Appraising Classroom Test
and Assessment
Assembling the Test
1. Record items on index cards
2. Double-check all individual test items
3. Double-check the items as a set
4. Arrange items appropriately
5. Prepare directions
6. Reproduce the test
Administering the Test
The guiding principle
• Provide conditions that give all students a fair chance to show what they know
Physical conditions
• Light, ventilation, quiet, etc.
Psychological conditions
• Avoid inducing test anxiety
• Try to reduce test anxiety
• Don’t give test when other events will distract
Suggestions
• Don’t talk unnecessarily before the test
• Minimize interruptions
• Don’t give hints to individuals who ask about items
• Discourage cheating
• Give students plenty of time to take the test.
Appraising the Test
• Step where the institution management find out how effective it has been at conduction
and evaluation of student.
The process
• Define organizational goal
• Defining objective and continuous monitoring the performance and progress
• Performance evaluation / reviews
• Providing feedback
• Performance appraisal ( reward / punishment)
Purpose of Classroom tests and assessment
Classroom assessment is a one of the most important tools teachers can use to
understand the needs of their students. When executed properly and on an ongoing basis,
classroom assessment should shape student learning and give teachers valuable insights.
Identify Student Strengths and Weaknesses
Assessments help teachers identify student strengths as well as areas where students may
be struggling. This is extremely important during the beginning of the year when students
are entering new grades. Classroom assessments, such as diagnostic tests, help teachers
gauge the students' level of mastery of concepts from the prior grade.
Monitor Student Progress
Throughout the course of a lesson or unit, teachers use classroom assessment to monitor
students' understanding of the concepts being taught. This informs teachers in their lesson
planning, helping them pinpoint areas that need further review. Assessment can be done
in the form of weekly tests, daily homework assignments and special projects.
Assess Student Prior Knowledge
Before beginning a new unit, assessment can inform teachers of their students' prior
experience and understanding of a particular concept or subject matter. These types of
assessments can be done orally through classroom discussion or through written
assignments such as journals, surveys or graphic organizers.
Purposes of assessment
Teaching and learning
The primary purpose of assessment is to improve students’ learning and teachers’ teaching as
both respond to the information it provides. Assessment for learning is an ongoing process that
arises out of the interaction between teaching and learning.
What makes assessment for learning effective is how well the information is used.
System improvement
Assessment can do more than simply diagnose and identify students’ learning needs; it can be
used to assist improvements across the education system in a cycle of continuous improvement:
• Students and teachers can use the information gained from assessment to determine their next
teaching and learning steps.
• Parents, families and whānau can be kept informed of next plans for teaching and learning and
the progress being made, so they can play an active role in their children’s learning.
• School leaders can use the information for school-wide planning, to support their teachers and
determine professional development needs.
• Communities and Boards of Trustees can use assessment information to assist their
governance role and their decisions about staffing and resourcing.
• The Education Review Office can use assessment information to inform their advice for
school improvement.
• The Ministry of Education can use assessment information to undertake policy review and
development at a national level, so that government funding and policy intervention is
targeted appropriately to support improved student outcomes.
Developing specifications for tests and assessment
Definitions
I’ve seen the terms “Test Plan” and “Test Specification” mean slightly different things over the
years. In a formal sense (at this given point in time for me), we can define the terms as follows:
• Test Specification – a detailed summary of what scenarios will be tested, how they will
be tested, how often they will be tested, and so on and so forth, for a given feature.
Examples of a given feature include, “Intellisense, Code Snippets, Tool Window
Docking, IDE Navigator.” Trying to include all Editor Features or all Window
Management Features into one Test Specification would make it too large to effectively
read.
• Test Plan – a collection of all test specifications for a given area. The Test Plan contains
a high-level overview of what is tested (and what is tested by others) for the given feature
area. For example, I might want to see how Tool Window Docking is being tested. I can
glance at the Window Management Test Plan for an overview of how Tool Window
Docking is tested, and if I want more info, I can view that particular test specification.
If you ask a tester on another team what’s the difference between the two, you might receive
different answers. In addition, I use the terms interchangeably all the time at work, so if you see
me using the term “Test Plan”, think “Test Specification.”
Parts of a Test Specification
A Test Specification should consist of the following parts:
• History / Revision – Who created the test spec? Who were the developers and Program
Managers (Usability Engineers, Documentation Writers, etc) at the time when the test
spec was created? When was it created? When was the last time it was updated? What
were the major changes at the time of the last update?
• Feature Description – a brief description of what area is being tested.
• What is tested? – a quick overview of what scenarios are tested, so people looking
through this specification know that they are at the correct place.
• What is not tested? – are there any areas being covered by different people or different
test specs? If so, include a pointer to these test specs.
• Nightly Test Cases – a list of the test cases and high-level description of what is tested
each night (or whenever a new build becomes available). This bullet merits its own blog
entry. I’ll link to it here once it is written.
• Breakout of Major Test Areas – This section is the most interesting part of the test spec
where testers arrange test cases according to what they are testing. Note: in no way do I
claim this to be a complete list of all possible Major Test Areas. These areas are
examples to get you going.
o Specific Functionality Tests – Tests to verify the feature is working according to
the design specification. This area also includes verifying error conditions.
o Security tests – any tests that are related to security. An excellent source for
populating this area comes from the Writing Secure Codebook.
o Accessibility Tests – This section shouldn’t be a surprised to any of my blog
readers. <grins> See The Fundamentals of Accessibility for more info.
o Stress Tests – This section talks about what tests you would apply to stress the
feature.
o Performance Tests – this section includes verifying any perf requirements for
your feature.
o Edge cases – This is something I do specifically for my feature areas. I like
walking through books like How to break software, looking for ideas to better test
my features. I jot those ideas down under this section
o Localization / Globalization – tests to ensure you’re meeting your product’s
International requirements.
Setting Test Case Priority
A Test Specification may have a couple of hundred test cases, depending on how the test cases
were defined, how large the feature area is, and so forth. It is important to be able to query for
the most important test cases (nightly), the next most important test cases (weekly), the next
most important test cases (full test pass), and so forth. A sample prioritization for test cases may
look like:
• Highest priority (Nightly) – Must run whenever a new build is available
• Second highest priority (Weekly) – Other major functionality tests run once every three
or four builds
• Lower priority – Run once every major coding milestone
(OR)
Major Points
1. Your goal is valid, reliable, useful assessment
2. Which requires:
a. Determining what is to be measured
b. Defining it precisely
c. Minimizing measurement of irrelevancies
3. And is promoted by following good procedures
Four Steps in Planning an Assessment
1. Deciding its purpose
2. Developing test specifications
3. Selecting best item types
4. Preparing items
Step 1: Decide the Purpose
What location in instruction?
1. pre-testing
o readiness
i. limited in scope
ii. low difficulty level
iii. serve as basis of remedial work, adapting instruction
o pretest (placement)
i. items similar to outcome measure
ii. but not the same (like an alternative form)
2. during instruction
o formative
i. monitor learning progress
ii. detect learning errors
iii. feedback for teacher and students
iv. limited sample of learning outcomes
v. must assure that mix and difficulty of items sufficient
vi. try to use to make correction prescriptions (e.g., review for whole group,
practice exercises for a few)
o diagnostic
i. enough items needed in each specific area
ii. items in one area should have slight variations
3. end of instruction
o mostly summative –broad coverage of objectives
o can be formative too
Step 2: Develop Test Specifications
• Why? Need good sample!
• How? Table of specifications (2-way chart, "blueprint")
1. Prepare list of learning objectives
2. outline instructional content
3. prepare 2-way chart
4. or, use alternative to 2-way chart when more appropriate
5. doublecheck sampling
6. Sample of a Content Domain (For this course)
1. trends/controversies in assessment
2. interdependence of teaching, learning, and assessment
3. purposes and forms of classroom assessment
4. planning a classroom assessment (item types, table of specs)
5. item types (advantages and limitations)
6. strategies for writing good items
7. compiling and administering classroom assessments
8. evaluating and improving classroom assessments
9. grading and reporting systems
10. uses of standardized tests
11. interpreting standardized test scores
Sample Table of Specifications (For chapters 6 and 7 of this course)
Sample SLOs (you would
typically have more)
Bloom Levels
Remember Understand Apply Analyze Evaluate Create
Identifies definition of
key terms (e.g., validity)
X
Identifies examples of
threats to test reliability
and validity
X
Selects best item type
for given objectives
X
Compares the pros and
cons of different kinds
of tests for given
purposes
X
Evaluates particular
educational
reforms (e.g., whether
they will hurt or help
instruction)
X
Create a unit test X
Total number of items
Spot the Poor Specific Learning Outcomes (For use with previous table of specifications)
Which entries are better or worse than others? Why? Improve the poor ones.
1. Knowledge
a. Knows correct definitions
b. Able to list major limitations of different types of items
2. Comprehension
a. Selects correct item type for learning outcome
b. Understands limitations of true-false items
c. Distinguishes poor true-false items from good ones
3. Application
a. Applies construction guidelines to a new content area
b. Creates a table of specifications
4. Analysis
a. Identifies flaws in poor items
b. Lists general and specific learning outcomes
5. Synthesis
a. Lists general and specific content areas
b. Provides weights for areas in table of specifications
6. Evaluation
a. Judges quality of procedure/product
b. Justifies product
c. Improves a product
Why are These Better Specific Learning Outcomes?
1. Knowledge
a. Selects correct definitions
b. Lists major limitations of different item types
2. Comprehension
a. Selects proper procedures for assessment purpose
b. Distinguishes poor procedures from good ones
c. Distinguishes poor decisions/products from good ones
3. Application
a. Applies construction guidelines to a new content area
4. Analysis
a. Identifies flaws in procedure/product
b. Lists major and specific content areas
c. Lists general and specific learning outcomes
5. Synthesis
a. Creates a component of the test
b. Provides weights for cells in table of specifications
6. Evaluation
a. Judges quality of procedure/product
b. Justifies product
c. Improves a product
Step 3: Select the Best Types of Items/Tasks
What types to choose from? Many!
1. objective--supply-type
a. short answer
b. completion
2. objective--selection-type
a. true-false
b. matching
c. multiple choice
3. essays
a. extended response
b. restricted response
4. performance-based
a. extended response
b. restricted response
Which type to use? The one that fits best!
1. most directly measures learning outcome
2. where not clear, use selection-type (more objective)
a. multiple choice best (less guessing, fewer clues)
b. matching only if items homogeneous
c. true-false only if only two possibilities
Strengths and Limitations of Objective vs. Essay/Performance
Objective Items
• Strengths
o Can have many items
o Highly structured
o Scoring quick, easy, accurate
• Limitations
o Cannot assess higher level skills (problem formulation, organization, creativity)
Essay/Performance Tasks
• Strengths
o Can assess higher level skills
o More realistic
• Limitations
o Inefficient for measuring knowledge
o Few items (poorer sampling)
o Time consuming
o Scoring difficult, unreliable
Step 4: Prepare Items/Tasks
Strategies to Measure the Domain Well—Reliably and Validly
1. specifying more precise learning outcomes leads to better-fitting items
2. use 2-way table to assure good sampling of complex skills
3. use enough items for reliable measurement of each objective
o number depends on purpose, task type, age
o if performance-based tasks, use fewer but test more often
4. keep in mind how good assessment can improve (not just measure) learning
o signals learning priorities to students
o clarifies teaching goals for teacher
o if perceived as fair and useful
Strategies to Avoid Contamination
1. eliminate barriers that lead good students to get the item wrong
2. don’t provide clues that help poor students get the item correct
General Suggestions for Item Writing
1. use table of specifications as guide
2. write more items than needed
3. write well in advance of testing date
4. task to be performed is clear, unambiguous, unbiased, and calls forth the intended
outcome
5. use appropriate reading level (don’t be testing for ancillary skills)
6. write so that items provide no clues (minimize value of "test-taking skills")
a. a/an
b. avoid specific determiners (always, never, etc.)
c. don’t use more detailed, longer, or textbook language for correct answers
d. don’t have answers in an identifiable pattern
7. write so that item provides no clues to other items
8. seeming clues should lead away from the correct answer
9. experts would agree on the answer
10. if item revised, recheck its relevance
Selecting and constructing appropriate types of items
and assessment tasks
• Different types of tests- Limited choice questions – MC, T/F, matching type• Open-
ended questions – Short answer, essay• Performance testing – OSCE, OSPE• Action
oriented testing
• Process of test administration Statement Content Table Item of goals outline specification
selection Composition Development of of Item construction answer sheet instructions
Construction of Test Test answer key administration revision
• Characteristics of good test Consistent Reliability, Utility, Validity , How well a test
Uniform Free from Cost & measure in extra time what it measure us source effective
supposed of errors to measure
• A test construction should intend to answer;? What kind of test is to be made? What is
the precise purpose? What are the abilities are to be tested? How detailed and how
accurate the results must be? What constraints are set by unavailability of expertise,
facilities, time of construction, administration & scoring? Who will take the test? What is
the scope of the test
• Principles of test construction1. Measure all instructional objectives – Objectives that
are communicated and imparted to the students – Designed as an operational control to
guide the learning sequences and experiences – Harmonious to the teachers instructional
objectives2. Cover all learning tasks – Measures the representative part of learning task3.
Appropriate testing strategies or items – Items which appraise the specific learning
outcome – Measurements or tests based on the domains of learning
• Make test valid & reliable – Reliable when it produce dependant, consistent, and
accurate scores – Valid when it measures what it purports to measure – Test which are
written clearly and unambiguous are reliable – Tests with fairly more items are reliable
than tests with less items – Tests which are well planned, covers wide objectives, & are
well executed are more valid
• Use test to improve learning – Tests are not only an assessment but also it is a learning
experience – Going over the test items may help teachers to reattach missed items –
Discussion and clarification over the right choice gives further learning – Further
guidance & modification in teaching measures enabled through the revision of test6.
Norm referenced & criterion referenced tests – Norm referenced: higher & abstract level
of cognitive domain – Criterion referenced: lower & concrete levels of learning
• Planning for a test1. Outline the learning objectives or major concepts to be covered by
the test – Test should be representative of objectives and materials covered – Major
students complaint: test don’t fairly cover the material that was supposed to be canvassed
on the test2. Create a test blue print3. Create questions based on blueprint
• For each, check on the blueprint (3-4 alternate questions on the same idea/ objective
should be made)5. Organize questions on item type6. Eliminate similar questions7. Re-
read, and check them from the student stand- point8. Organize questions logically9.
Check the time in completion by teacher-self and then multiplying it with 4 depending on
the level of students10. Analyze the results/ item analysis
• Process of Test Construction
• Preliminary considerations Specify test purposes, & describe the domain of content &/or
behavior of interestb) Specify the group of examinees (age, gender, socio-economic
background etc) c) Determine the time & financial resources available for constructing &
validating the test d) Identify & select qualified staff memberse) Specify the initial
estimate length of the test (time in developing, validating & completion by the students
• Review of content domain/behaviors a) Review the descriptions of the content standard
or objectives to determine the acceptability for inclusion in the test b) Select the final
group of objectives (i.e. finalize the content standard) c) Prepare the item specification
for each objective & review the completeness, clarity, accuracy & practicability
• Item/task writing & preparation of scoring rubrics a) Draft a sufficient number of items
and or tasks for field testing b) Carry out items/task editing, and review scoring rubric
• Assessment of content validity a) Identify a pool of judges & measurement specialties b)
Review the test items & tasks to determine their match to the objectives, their
representativeness, & freedom from stereotyping, & potential biases c) Review the test
items and/or tasks to determine their technical adequacy
• Revision of test tasks/items a) Based upon data from step 4b & 4c; revise the test
items/tasks or delete them b) Write additional test items/tasks & repeat the step 4
• Field test administration a) Organize the test items/ tasks into forms for field testingb)
Administer test forms to appropriately chosen groups of examineesc) Conduct item
analysis & item bias studies {“studies to identify differentially functioning test items”}d)
If statistical thinking or equating of forms is needed
• Revision to test item/ task• Revise/ delete them, using the result from step 6c.• Check the
scoring rubrics for the performance task being field tested
• Test assembly• Determine the test length, the number of forms needed, & the no. of
items/tasks per objective• Select the item from the available pool of valid test material•
Prepare test directions, practice questions, test booklet layout, scoring keys, answer
sheets & so on.• Specify modifications to instructions, medium of presentation, or
examinees response, and time requirement for finishing the items
• Selection of performance standard a) Performance standards are needed to accomplish
the test purpose b) Determine the perform standard c) Initiate & document the
performance standard d) Identify the alternative test score interpretation for examinees
requiring alternative administration or other modalities
• Pilot test (if possible) a) Design the test administration to collect score reliability &
validity informationb) Administer test form(s) to appropriately chosen groups of
examinees c) Identify & evaluate alternative administration/other modification, to meet
individual specific needs that may affect validity and reliability of the test or forms of the
test d) Evaluate the test administration procedures, test items, and score reliability and
validity e) Make final revisions to the test forms of the test based on the available data.
• Preparation of manuals a) Prepare test administrators manual 12. Additional technical
data collection a) Conduct reliability & validity investigations on a continuing basis
• Item analysis• Shortening or lengthening an existing test items is done through item
analysis• Validity & reliability of any test depends on the characteristics of its item• Two
types 1. Qualitative analysis 2. Quantitative analysis
• Qualitative item analysis• Content validity – Content & form of items – Expert opinion•
Effective item formulation Quantitative item analysis• Item difficulty• Item
discrimination
Characteristics of Standardised Tests and teacher made test
Standardised Tests
Some characteristics of these tests are:
1. They consist of items of high quality. The items are pretested and selected on the basis of
difficulty value, discrimination power, and relationship to clearly defined objectives in
behavioural terms.
2. As the directions for administering, exact time limit, and scoring are precisely stated, any
person can administer and score the test.
3. Norms, based on representative groups of individuals, are provided as an aid for interpreting
the test scores. These norms are frequently based on age, grade, sex, etc.
4. Information needed for judging the value of the test is provided. Before the test becomes
available, the reliability and validity are established.
5. A manual is supplied that explains the purposes and uses of the test, describes briefly how it
was constructed, provides specific directions for administering, scoring, and interpreting results,
contains tables of norms and summarizes available research data on the test.
No two standardized tests are exactly alike. Each test measures certain specific aspects of
behaviour and serves a slightly different purpose. There are some tests with similar titles
measuring aspects of behaviour that differ markedly, whereas other tests with dissimilar titles,
the measure the aspects of behaviour that are almost identical. Thus, one has to be careful in
selecting a standardised test.
5. Provides information’s for curriculum planning and to provide remedial coaching for
educationally backward children.
6. It also helps the teacher to assess the effectiveness of his teaching and school instructional
programmes.
7. Provides data for tracing an individual’s growth pattern over a period of years.
8. It helps for organising better guidance programmes.
9. Evaluates the influences of courses of study, teacher’s activities, teaching methods and other
factors considered to be significant for educational practices.
Features of Teacher-Made Tests:
1. The items of the tests are arranged in order of difficulty
2. These are prepared by the teachers which can be used for prognosis and diagnosis purposes.
3. The test covers the whole content area and includes a large number of items.
4. The preparation of the items conforms to the blueprint.
5. Test construction is not a single man’s business, rather it is a co-operative endeavour.
6. A teacher-made test does not cover all the steps of a standardised test.
7. Teacher-made tests may also be employed as a tool for formative evaluation.
8. Preparation and administration of these tests are economical.
9. The test is developed by the teacher to ascertain the student’s achievement and proficiency in
a given subject.
10. Teacher-made tests are least used for research purposes.
11. They do not have norms whereas providing norms is quite essential for standardised tests.
Steps/Principles of Construction of Teacher-made Test:
A teacher-made test does not require a well-planned preparation. Even then, to make it more
efficient and effective tool of evaluation, careful considerations arc needed to be given while
constructing such tests.
The following steps may be followed for the preparation of teacher-made test:
1. Planning:
Planning of a teacher-made test includes:
a. Determining the purpose and objectives of the test, ‘as what to measure and why to measure’.
b. Deciding the length of the test and portion of the syllabus to be covered.
c. Specifying the objectives in behavioural terms. If needed, a table can even be prepared for
specifications and weightage given to the objectives to be measured.
d. Deciding the number and forms of items (questions) according to blueprint.
e. Having a clear knowledge and understanding of the principles of constructing essay type, short
answer type and objective type questions.
f. Deciding date of testing much in advance in order to give time to teachers for test preparation
and administration.
g. Seeking the co-operation and suggestion of co-teachers, experienced teachers of other schools
and test experts.
2. Preparation of the Test:
Planning is the philosophical aspect and preparation is the practical aspect of test construction.
All the practical aspects to be taken into consideration while one constructs the tests. It is an art,
a technique. One is to have it or to acquire it. It requires much thinking, rethinking and reading
before constructing test items.
Different types of objective test items viz., multiple choice, short-answer type and matching type
can be constructed. After construction, test items should be given lo others for review and for
seeking their opinions on it.
The suggestions may be sought even from others on languages, modalities of the items,
statements given, correct answers supplied and on other possible errors anticipated. The
suggestions and views thus sought will help a test constructor in modifying and verifying his
items afresh to make it more acceptable and usable.
After construction of the test, items should be arranged in a simple to complex order. For
arranging the items, a teacher can adopt so many methods viz., group-wise, unit-wise, topic wise
etc. Scoring key should also be prepared forthwith to avoid further delay in scoring.
Direction is an important part of a test construction. Without giving a proper direction or
instruction, there will be a probability of loosing the authenticity of the test reliability. It may
create a misunderstanding in the students also.
Thus, the direction should be simple and adequate to enable the students to know:
(i) The time for completion of test,
(ii) The marks allotted to each item,
(iii) Required number of items to be attempted,
(iv) How and where to record the answer? and
(v) The materials, like graph papers or logarithmic table to be used.
Observation Techniques
In carrying out action research to improve teaching and learning, an important role of the
researcher/instructor is to collect data and evidence about the teaching process and student
learning. What follows is an introduction to some of the techniques which can be used for the
said purpose.
Student Assessment
Tests, examinations and continuous assessment can provide valuable data for action research.
For your teaching course, you have to set up a method of student assessment and your students
have to be assessed, so you might as well make use of it in your project.
You should, however, be clear about the nature of the information you can obtain from
examination results or assessment grades. Comparison of one set of results with another often
has limited validity as assignments, examinations, markers and marking schemes are rarely held
constant. In addition most assessment is norm referenced rather than criterion referenced.
(Linked with permission from CRESST,UCLA in USA.)
You also need to be very clear as to what is being assessed. Examination grades may bear little
relationship to specific qualities you could be investigating. For example, if the theme of an
action research project is encouraging meaningful learning, then the examination results would
only be of value if they truly reflect meaningful learning. They would be of little value if they
consisted of problems which could be solved by substituting numbers into a remembered
formula, or essays which required the reproduction of sections from lecture notes. So think
carefully about the qualities which you wish to test and whether the assessment is a true test of
those qualities.
One way in which answers to assessment questions can be analysed for project purposes is by
dividing them into qualitative categories. A systematic procedure for establishing categories is
the SOLO taxonomy (Biggs and Collis, 1982). The SOLO taxonomy divides answers to written
assessment questions into five categories, judged according to the level of learning: prestructural,
unistructural, multistructural, relational and extended abstract. The five levels correspond to
answers ranging from the incorrect or irrelevant, through use of appropriate data, to integration
of data in an appropriate way, and ending in innovative extensions.
Closed Ended Questionnaires
Closed questionnaires are ones which constrain the responses to a limited number chosen by the
researcher; essentially it is a multiple choice format. Usually respondents are asked the extent to
which they agree or disagree with a given statement. Responses are recorded on a Likert scale,
such as the one in the example below, which ranges from 'definitely agree' to 'definitely
disagree'.
Questions should be carefully constructed so the meaning is clear and unambiguous. It is a good
idea to trial the questionnaire on a limited number of students before giving it to a whole group.
Closed questionnaires are easy to process and evaluate and can give clear answers to specific
questions. However, the questions are defined by the researcher, so could completely miss the
concerns of the respondents. You might therefore draw up the questions after a few exploratory
interviews, or include some open-ended questions to give respondents a chance to raise other
issues of concern.
A section of a typical closed questionnaire used for course evaluation is shown below.
Most institutions now have some form of standard teaching evaluation questionnaire available.
These may be of some help in evaluating a project but in most cases the questions will not be
sufficiently specific to the particular type of innovation which has been introduced. What might
be more helpful are the data banks of optional or additional questions which are available. These
can be used to pick or suggest questions which might be included in a more tailor-made
questionnaire.
Traditionally, collecting questionnaire, survey data is done by using copies of paper
questionnaire and answer sheets. With the availability of web technology, there is now the option
of collecting survey data online.
To collect data using paper questionnaire, special answer sheets called OMR forms are often
used. Respondents to questionnaires will be asked to mark their answers to questions of the
questionnaire on OMR forms. An optical mark scanner will then be used to read the marks made
on the OMR forms. The process will produce an electronic data file containing the responses to
the questionnaire. The data file can then be analysed using software programs such as MS Excel
or SPSS. In HKUST, both the optical mark scanner and OMR forms are available from ITSC.
Diary / Journal
Everyone involved in an action learning project should keep a diary or journal in which they
record:
• their initial reflections on the topic of concern
• the plans that were made
• a record of actions which were taken
• observation of the effects of the actions
• impressions and personal opinions about the action taken and reactions to them
• results obtained from other observation techniques
• references for, and notes on, any relevant literature or supporting documents which are
discovered.
Research reports are often very impersonal documents but this should not be the case for an
action learning journal - quite the contrary! It should contain a record of both what you did and
what you thought. In it you should regularly and systematically reflect critically on the effects of
your project and how it is progressing.
Journals act as the starting points for critical reflection at the regular meetings of the project
team. By sharing observations and reflections it is possible to fine-tune the innovation.
Sympathetic but critical discussion can also heighten awareness and contribute to changing
perspectives.
Supporting Documents
Keep copies of any documents which are relevant to the course(s) you are examining. These can
include:
• documents for the course development and accreditation process
• minutes of course committees
• the course syllabus
• memos between course team leaders and members
• handouts to students
• copies of tests and examinations
• lists of test results and student grades.
Interaction Schedules
Interaction schedules are methods for analysing and recording what takes place during a class. A
common approach is to note down at regular intervals (say every minute) who is talking, and to
categorise what they were saying or doing. An alternative to time sampling is event sampling in
which behaviour is noted every time a particular event occurs. Examples of categories could be;
tutor asking question, tutor giving explanation, tutor giving instruction, student answering
question or student asking question. The analysis can be made by an observer at the class or can
be made subsequently from a tape or video recording.
Below are profiles which compare the interactions during two tutorials. An observer noted, at
one minute intervals, who was talking and the type of communication. The plots can be used to
compare the extent to which the tutor dominated the session and the students contributed. The
example is adapted from Williams and Gillard (1986).
There are other approaches to recording and analysing happenings in a classroom situation.
McKernan (1991) discusses an extensive range of techniques, gives examples of each and
considers how the data gathered should be analysed.
Interviews
Interviews can provide even more opportunity for respondents to raise their own issues and
concerns, but are correspondingly more time-consuming and can raise difficulties in the collation
and interpretation of information. The format can be on a spectrum from completely open
discussion to tightly structured questions. Semi-structured interviews have a small schedule of
questions to point the interviewee towards an area of interest to the researcher, but then allow
interviewees to raise any items they like within the general topic area. Since interviews give an
opportunity for students to raise their own agenda they are useful when issues are open, or at an
exploratory stage. A small number of interviews can be useful to define issues for subsequent
more tightly structured questionnaires.
Interviews are normally tape recorded. If analysis, rather than just impression is required, then
transcripts have to be produced. The transcripts are normally analysed by searching for responses
or themes which commonly occur. Quotations from the transcripts can be used to illuminate or
illustrate findings reported in reports and papers.
There are computer programmes available to assist with the analysis of qualitative data. One
example is the programme NUDIST which has facilities for indexing, text-searching, using
Boolean operations on defined index nodes, and combining data from several initially
independent studies.
Student Learning Inventories
Student learning inventories are examples of empirically derived measuring instruments. There
are many number inventories which purport to measure a wide range of characteristics. Student
learning inventories have been highlighted because they examine the quality of learning. In
particular they look at the categories of deep and surface learning. The inventories can be used to
compare groups of students, examine approaches before and after changes to teaching methods,
and to examine correlations with other variables.
The Study Process Questionnaire (SPQ) developed by John Biggs (1987) assesses students'
approaches to learning. Scores are obtained for each student on deep, surface and achieving
approach scales. The SPQ has been widely used in Hong Kong and its cultural applicability
widely researched. A detailed account of usage of the SPQ, together with tables of norms for
Hong Kong students for comparison purposes, is in Biggs (1992). The SPQ is available in
English, Chinese or bilingual versions.
For action learning projects, a suitable way to use the SPQ is to apply it at the start and end of the
innovation. Changes in SPQ scores can then be interpreted as a reflection of the teaching and
learning context. The results will indicate whether the innovation has encouraged meaningful
approaches to learning.
Biggs, J.B. (1987). The Study Process Questionnaire (SPQ): Manual. Hawthorn, Vic.: Australian
Council for Educational Research.
Biggs, J.B. (1992). Why and how do Hong Kong students learn? Using the Learning and Study
Process Questionnaire. Hong Kong: University of Hong Kong.
Open Ended Questionnaires
Open questionnaires have a series of specific questions but leave space for respondents to answer
as they see fit. You are therefore more likely to find out the views of students but replies are
more difficult to analyse and collate. The usual procedure is to search for categories of common
responses.
An example of an open questionnaire is shown below.
It is not necessary to have separate questionnaires for open and closed items. The most
successful questionnaires often have both open and closed items.
Diagnosis of Student Conceptions
A good basis for improving your teaching is to diagnose your students' understanding of key
concepts in a course. It is often surprising how students can pass university examinations but still
have fundamental misunderstandings of key concepts. The usual method of diagnosing student
conceptions is to ask a question which applies the concept to an every-day situation: one which
cannot be answered by reproduction or by substitution into formulae. Answers are drawn from
the students in interviews or in written form.
The students answers can usually be classified into a small number (usually two to five) of
conceptions or misconceptions about the phenomenon. As with the analysis of interview data
care needs to be taken when deriving classifications. These do not automatically emerge from the
transcript but are subject to the experiences and knowledge of the researcher.
An example of the type of question, and categories of student conceptions which it uncovered is
given below (Dahlgren, 1984).
Tape Recording
Making tape recordings is a way of collecting a complete, accurate and detailed record of
discussions in class, conversations in interviews or arguments and decisions at meetings. It is
easy to obtain the recording; you simply take along cassettes and a portable recorder, and switch
it on. However, the presence of a tape recorder can inhibit discussion or influence people's
behaviour.
There are a number of ethical issues which need to be addressed over the use of tape recordings.
The group being taped should establish the purpose of making the recording and the way in
which the tapes will be used. If any quotations are made in subsequent reports it is customary to
maintain the anonymity of the source.
If you need to do a detailed analysis of the conversations then it will be necessary to produce a
transcript. This is a time-consuming and painstaking process, so limit the use of tape recordings
to situations where it is really necessary.
Triangulation
Triangulation is not a specific observation technique, but is the process of comparing and
justifying data from one source to another. If you do just a handful of interviews your
conclusions may be viewed with skepticism. But if the interview results concur with findings
from a questionnaire, trends in examination results and evidence from your journal, then the
conclusions are much more convincing. The message is simple; use more than one observation
technique in order to see whether your results are consistent.
Peer Appraisal and self-report techniques
Peer Appraisal definition
Employee assessments conducted by colleagues in the immediate working environment
i.e. the people the employee interacts with regularly. Peer appraisal processes exclude
superiors and underlings.
Peer appraisals are a form of performance appraisal which are designed to monitor and
improve job performance.
Peer appraisals can be broken down into specific measures. Peer ranking involves
workers ranking each member of the group from best to worst, either overall or on
various areas of performance or responsibility. In peer ratings, workers rate colleagues on
performance metrics while peer nomination is a simple nomination of the ‘best’ worker
either overall or on performance metrics.
Commonly-cited advantages of the peer appraisal process include insight and knowledge
– workers have their ‘ear to the ground’ and are often in the best position to appraise a
colleague’s performance. Peer appraisal also encourages a more inclusive team dynamic
as colleagues gain a deeper insight into the challenges their colleagues face, and
encourages development of a shared goal as workers realise they must impress their
colleagues and respond to their ideas, concerns and needs.
Self-report techniques describe methods of gathering data where participants provide
information about themselves without interference from the experimenter.
Such techniques can include questionnaires, interviews, or even diaries, and ultimately will
require giving responses to pre-set questions.
Evaluation of self-report methods
Strengths:
- Participants can be asked about their feelings and cognitions (i.e. thoughts), which can be more
useful than simply observing behaviour alone.
- Scenarios can be asked about hypothetically without having to physically set them up and
observe participants’ behaviour.
Weaknesses:
- Gathering information about thoughts or feelings is only useful if participants are willing to
disclose them to the experimenter.
- Participants may try to give the ‘correct’ responses they think researchers are looking for (or
deliberately do the opposite), or try to come across in most socially acceptable way (i.e. social
desirability bias), which can lead to giving untruthful responses.
A self-report study is a type of survey, questionnaire, or poll in which respondents read the
question and select a response by themselves without researcher interference. A self-reportis any
method which involves asking a participant about their feelings, attitudes, beliefs and so on.
Examples of self-reports are questionnaires and interviews; self-reports are often used as a way
of gaining participants' responses in observational studies and experiments.
Self-report studies have validity problems. Patients may exaggerate symptoms in order to make
their situation seem worse, or they may under-report the severity or frequency of symptoms in
order to minimize their problems. Patients might also simply be mistaken or misremember the
material covered by the survey.
Questionnaires and interviews
Questionnaires are a type of self-report method which consist of a set of questions usually in a
highly structured written form. Questionnaires can contain both open questions and closed
questions and participants record their own answers. Interviews are a type of spoken
questionnaire where the interviewer records the responses. Interviews can be structured whereby
there is a predetermined set of questions or unstructured whereby no questions are decided in
advance. The main strength of self-report methods are that they are allowing participants to
describe their own experiences rather than inferring this from observing participants.
Questionnaires and interviews are often able to study large samples of people fairly easy and
quickly. They are able to examine a large number of variables and can ask people to reveal
behaviour and feelings which have been experienced in real situations. However participants
may not respond truthfully, either because they cannot remember or because they wish to present
themselves in a socially acceptable manner. Social desirability bias can be a big problem with
self-report measures as participants often answer in a way to portray themselves in a good light.
Questions are not always clear and we do not know if the respondent has really understood the
question we would not be collecting valid data. If questionnaires are sent out, say via email or
through tutor groups, response rate can be very low. Questions can often be leading. That is, they
may be unwittingly forcing the respondent to give a particular reply.
Unstructured interviews can be very time consuming and difficult to carry out whereas structured
interviews can restrict the respondents’ replies. Therefore psychologists often carry out semi-
structured interviews which consist of some pre-determined questions and followed up with
further questions which allow the respondent to develop their answers.
Open and closed questions
Questionnaires and interviews can use open or closed questions, or both.
Closed questions are questions which provide a limited choice (for example, a participant's age
or their favourite type of football team), especially if the answer must be taken from a
predetermined list. Such questions provide quantitative data, which is easy to analyse. However
these questions do not allow the participant to give in-depth insights.
Open questions are those questions which invite the respondent to provide answers in their own
words and provide qualitative data. Although these type of questions are more difficult to
analyse, they can produce more in-depth responses and tell the researcher what the participant
actually thinks, rather than being restricted by categories.
Rating scales
One of the most common rating scales is the Likert scale. A statement is used and the participant
decides how strongly they agree or disagree with the statements. For example the participant
decides whether Mozzarella cheese is great with the options of "strongly agree", "agree",
"undecided", "disagree", and "strongly disagree". One strength of Likert scales is that they can
give an idea about how strongly a participant feels about something. This therefore gives more
detail than a simple yes no answer. Another strength is that the data are quantitative, which are
easy to analyse statistically. However, there is a tendency with Likert scales for people to
respond towards the middle of the scale, perhaps to make them look less extreme. As with any
questionnaire, participants may provide the answers that they feel they should. Moreover,
because the data is quantitative, it does not provide in-depth replies.
Fixed-choice questions
Fixed-choice questions are phrased so that the respondent has to make a fixed-choice answer,
usually 'yes' or 'no'.
This type of questionnaire is easy to measure and quantify. It also prevents a participant from
choosing an option that is not in the list. Respondents may not feel that their desired response is
available. For example, a person who dislikes all alcoholic beverages may feel that it is
inaccurate to choose a favorite alcoholic beverage from a list that includes beer, wine, and liquor,
but does not include none of the above as an option. Answers to fixed-choice questions are not
in-depth.
Reliability
Reliability refers to how consistent a measuring device is. A measurement is said to be reliable
or consistent if the measurement can produce similar results if used again in similar
circumstances. For example, if a speedometer gave the same readings at the same speed it would
be reliable. If it didn't it would be pretty useless and unreliable. Importantly reliability of self-
report measures, such as psychometric tests and questionnaires can be assessed using the split
half method. This involves splitting a test into two and having the same participant doing both
halves of the test. If the two halves of the test provide similar results this would suggest that the
test has internal reliability. There are a number of ways to improve the reliability of self-report
techniques. For example ambiguous questions could be clarified or in the case of interviews the
interviewers could be given training.
Validity
Validity refers to whether a study measures or examines what it claims to measure or examine.
Questionnaires are said to often lack validity for a number of reasons. Participants may lie; give
answers that are desired and so on. A way of assessing the validity of self-report measures is to
compare the results of the self-report with another self-report on the same topic. (This is
called concurrent validity). For example if an interview is used to investigate sixth grade
students' attitudes toward smoking, the scores could be compared with a questionnaire of former
sixth graders' attitudes toward smoking. There are a number of ways to improve the validity of
self-report techniques. For example leading questionscould be avoided, open questions could be
added to allow respondents to expand upon their replies and confidentiality could be reinforced
to allow respondents to give more truthful responses.
Disadvantages
Self-report studies have many advantages, but they also suffer from specific disadvantages due to
the way that subjects generally behave. Self-reported answers may be exaggerated; respondents
may be too embarrassed to reveal private details; various biases may affect the results, like social
desirability bias. Subjects may also forget pertinent details. Self-report studies are inherently
biased by the person's feelings at the time they filled out the questionnaire. If a person feels bad
at the time they fill out the questionnaire, for example, their answers will be more negative. If the
person feels good at the time, then the answers will be more positive.
As with all studies relying on voluntary participation, results can be biased by a lack of
respondents, if there are systematic differences between people who respond and people who do
not. Care must be taken to avoid biases due to interviewers and their demand characteristics.
Types of performance based assessment
Performance based learning is when students participate in performing tasks or activities that are
meaningful and engaging. The purpose of this kind of learning is to help students acquire and
apply knowledge, practice skills, and develop independent and collaborative work habits. The
culminating activity or product for performance-based learning is one that lets a
student demonstrate evidence of understanding through a transfer of skills.
This form of learning is measured through a performance-based assessment, which is open-
ended and without a single, correct answer. The performance-based assessment should
be something that shows authentic learning such as the creation of a newspaper or
class debate.The benefit of these types of performance-based assessments is that when the
students are more actively involved in the learning process, they will absorb and understand the
material at a much deeper level. Other characteristics of performance-based assessments are that
they are complex and time-bound.
In addition, there are learning standards in each discipline that set academic expectations and
define what is proficient in meeting that standard. Performance based activities can integrate two
or more subjects and should also meet 21st Century expectations whenever possible:
• Creativity and Innovation
• Critical Thinking and Problem Solving
• Communication and Collaboration
There are also Information Literacy standards and Media Literacy standards that be incorporated
into performance based learning.
Performance-based activities can be quite challenging for students to complete. They need to
understand from the beginning exactly what is being asked of them and how they will be
assessed.
Exemplars and models may help, but it is more important to provide detailed criteria that will be
used to assess the performance-based assessment. That criteria should be incorporated into a
scoring rubric.
Observations are an important part of evaluating performance-based assessments. Observations
can be used to provide students with feedback to improve performance. Teachers and students
can both use observations. There may be peer to peer student feedback. There could be a
checklist or a tally in order to record performance.
Students can take their experiences in performance-based learning to use at later points in their
educational, personal, or professional lives. The goal of performance-based learning should be to
enhance what the students have learned, not just have them recall facts.
Following are six different types of activities that can be developed as assessments for
performance-based learning.
Presentations
One easy way to have students complete a performance-based activity is to have them do a
presentation or report of some kind. This could be done by students, which takes time, or in
collaborative groups.
The basis for the presentation may be one of the following:
• Providing Information
• Teaching a Skill
• Reporting Progress
• Persuading Others
Students may choose to add in visual aids or a PowerPoint presentation or Google Slides to help
illustrate elements in their speech. Presentations work well across the curriculum as long as there
is a clear set of expectations for students to work with from the beginning.
Portfolios
Student portfolios can include items that students have created and/or collected over a specific
period of time. Art portfolios are often used for students who want to apply to art programs in
college.
Another example is when students create a portfolio of their written work that shows how they
have progressed from the beginning to the end of class. This writing in a portfolio can be from
any discipline or from a combination of disciplines.
Some teachers have students select those items they feel represents their best work to be included
in a portfolio. The benefit of an activity like this is that it is something that grows over time and
is therefore not just completed and forgotten. A portfolio can provide students with a lasting
selection of artifacts that they can use later in their academic career.
Reflections may be included in student portfolios in which students may make note of their
growth based on the materials in the portfolio.
In designing portfolios may include taped presentations, dramatic readings, or digital files.
Performances
Dramatic performances are one kind of collaborative activities that can be used as a
performance-based assessment. Students can create, perform, and/or provide a critical response.
Examples include dance, recital, dramatic enactment. There may be prose or poetry
interpretation.
This form of performance based assessment can take time, so there must be a clear pacing guide.
Students must be provided time to address the demands of the activity; resources must be readily
available and meet all safety standards. Students should have opportunities to draft stage work
and practice.
Developing the criteria and the rubric and sharing these with students before evaluating a
dramatic performance is critical before assessing student effort.
Projects
Projects are quite commonly used by teachers as performance-based activities. They can include
everything from research papers to artistic representations of information learned. Projects
may require students to apply their knowledge and skills while completing the assigned task,
using creativity, critical thinking, analysis, and synthesis.
Students might be asked to complete reports, diagrams, and maps. Teachers can also choose to
have students work individually or in groups.
Journals may be part of a performance-based assessment. Journals can be used to record student
reflections. Teachers may require students to complete journal entries. Some teachers may use
journals as a way to record participation.
Exhibits And Fairs
Teachers can expand the idea of performance-based activities by creating exhibits or fairs for
students to display their work. Examples include things like history fairs to art exhibitions.
Students work on a product or item that will be publicly exhibited.
Exhibitions show in-depth learning and may include feedback from viewers.
In some cases, students might be required to explain or 'defend' their work to those attending the
exhibition.
Some fairs like science fairs could include the possibility of prizes and awards.
Debates
A debate in the classroom is one form of performance-based learning that teaches students about
varied viewpoints and opinions. Skills associated with debate include research, media and
argument literacy, reading comprehension, evidence evaluation, and public speaking, and civic
skills.
There are many different formats debate. One is the fishbowl debate in which a handful of
students to come in a half-circle facing the other students and debate a topic. The rest of the
classmates may pose questions to the panel.
Another form is a mock trial where teams representing the prosecution and defense take on the
roles of attorneys and witnesses. A judge, or judging panel, oversees the courtroom presentation.
Middle school and high schools can use debates in the classroom, with increased levels of
sophistication by grade level.
Student Logs
Documenting student participation in physical activity (NASPE Standard 3) is often difficult.
Teachers can assess participation in an activity or skill practice trials completed outside of class
using logs. Practice trials during class that demonstrate student effort can also be documented
with logs. A log records behaviors over a period of time (see figure 14.1). Often the information
recorded shows changes in behavior, trends in performance, results of participation, progress, or
the regularity of physical activity. A student log is an excellent artifact for use in a portfolio.
Because logs are usually a self-recorded document, they are not used for summative assessments
unless as an artifact in a portfolio or for a project. If teachers wanted to increase the importance
placed on a log, a method of verification by an adult or someone in authority should be added.
Journals
Journals can be used to record student feelings, thoughts, perceptions, or reflections about actual
events or results. The entries in journals often report social or psychological perspectives, both
positive and negative, and may be used to document the personal meaning associated with one’s
participation (NASPE Standard 6). Journal entries would not be an appropriate summative
assessment by themselves, but might be included as an artifact in a portfolio. Journal entries are
excellent ways for teachers to “take the pulse” of a class and determine whether students are
valuing the content of the class. Teachers must be careful not to assess affective domain journal
entries for the actual content, because doing so may cause students to write what teachers want to
hear (or give credit for) instead of true and genuine feelings. Teachers could hold students
accountable for completing journal entries. Some teachers use journals as a way to log
participation over time.
Using Observation in the Assessment Process
Human performance provides many opportunities for students to exhibit behaviors that may be
directly observed by others, a unique advantage of working in the psychomotor domain. Wiggins
(1998) uses physical activity when providing examples to illustrate complex assessment
concepts, as they are easier to visualize than would be the case with a cognitive example. The
nature of performing a motor skill makes assessment through observational analysis a logical
choice for many physical education teachers. In fact, investigations of measurement practices of
physical educators have consistently shown a reliance on observation and related assessment
methods (Hensley and East 1989; Matanin and Tannehill 1994; Mintah 2003).
Observation is a skill used with several performance-based assessments. It is often used to
provide students with feedback to improve performance. However, without some way to record
results, observation alone is not an assessment. Going back to the definition of assessment
provided earlier in the chapter, assessment is the gathering of information, analyzing the data,
and then using the information to make an evaluation. Therefore, some type of written product
must be produced if the task is considered an assessment.
Teachers and peers can assess others using observation. They might use a checklist or some type
of event recording scheme to tally the number of times a behavior occurred. Keeping game play
statistics is an example of recording data using event recording techniques. Students can self-
analyze their own performance and record their performances using criteria provided on a
checklist or a game play rubric. Table 14.1 is an example of a recording form that could be used
for peer assessment.
When using peer assessment, it is best to have the assessor do only the assessment. When the
person recording assessment results is also expected to take part in the assessment (e.g., tossing
the ball to the person being assessed), he or she cannot both toss and do an accurate observation.
In the case of large classes, teachers might even use groups of four, in which one person is being
evaluated, a second person is feeding the ball, the third person is doing the observation, and a
fourth person is recording the results.
Guidelines for developing effective performance assessment
The performance development system at Wellesley College is designed to provide alignment
between the College’s mission, constituent needs and performance expectations. The program
fosters ongoing two-way communication between employees and managers; supports the
development of clear, consistent, and measurable goals linked directly to Wellesley’s core values
and competencies; helps to articulate and support training needs and career development; and
establishes the criteria for making reward and recognition decisions.
Effective performance development at Wellesley College begins with respect for one another and
ends with excellence in performance. It is the responsibility of every supervisor to communicate
on an ongoing basis with their employees. These conversations should provide clear and honest
role expectations and feedback and should help identify improvement, development, and career
issues. Each employee has a responsibility to participate fully in these conversations, be sure
they understand their role responsibilities and expectations, and communicate any obstacles or
training needed in order to perform their role at an optimum level.
The Performance Development Annual Summary Meeting
Performance development should be happening all year long. When a manager compliments an
employee for a job well done or coaches an employee through a difficult situation, that is part of
performance development. Wellesley’s performance development process includes a summary
review assessment that should bring closure to the performance period and provide a basis for
performance development for the next period. The following suggestions help set the stage for a
productive discussion.
1. Establish the proper climate.
Create a sincere, open, and constructive atmosphere.
Schedule the meeting in advance and stick to it.
Allow enough time to discuss the review.
Locate a private space and guard against interruptions.
2. Make it clear that this is a joint discussion.
Listen and ask for the employee’s opinion.
Avoid words or body language that criticize the employee’s view.
Understand your employee’s point of view. Working together is better than being at odds.
Be willing to modify the Performance Develoment Document to reflect what is discussed and
agreed upon at the meeting.
3. Discuss the role document and performance requirements.
Explore the competencies required for successful performance.
Update the role document if needed.
4. Discuss goals for the performance review period.
Review whether the goals were met.
Discuss obstacles and roadblocks that affected goal achievement.
5. Discuss opportunities for growth and development in the current role or a different role.
Discuss the employee’s developmental and career goals.
Remember there is also the opportunity for growth and development within the current role.
There are new things to be done and more effective and efficient ways to accomplish work
Either at this meeting or a separate meeting, develop goals for the coming year. Refer to
Guidelines for Setting Goals and Objectives for additional information on setting goals.
Remember, performance development is about ongoing two-way communication between the
employee and their supervisor. The annual performance appraisal should be a summary of
various meetings throughout the year (interim goal reviews/updates). There should be no
surprises at this summary meeting.
Preparing for Annual Performance Development Discussions
Tips for the Employee: Employees have a responsibility in the performance development process
and should be prepared to give feedback to their manager.
Review your current role document. Does it reflect your current role in the department? If not,
discuss with your supervisor about revising your role document.
Review your goals for the year. Have they been met? Review your achievements. Think about
obstacles/roadblocks you encountered and how you dealt with them.
Is there anyone else your supervisor should speak with before preparing your evaluation? Let your
supervisor know this before the review meeting.
Review the competencies required for administrative staff positions at Wellesley. Identify specific
areas of expertise or skills that you would like to develop or improve. Identify your strengths. In
what areas have you improved? Can you identify any developmental goals for the coming year?
What ideas do you have for changes that would help you perform your role better and/or improve the
operation of the department? Think about obstacles/roadblocks that you face in performing your
responsibilities and what help is needed from your supervisor to overcome them.
If you manage others, what have you done to develop/strengthen your staff’s performance and
skills?
Tips for the Supervisor: The supervisor is responsible for ongoing communication about
performance throughout the year. Performance problems should be addressed as they occur. There
should be no surprises in the end-of-the-year summary. The supervisor is responsible for preparing
the summary documentation.
Review the employee’s role document. Does it reflect their current role in the department?
Review the primary position responsibilities. Has the employee effectively performed these? What is
your overall assessment of how these responsibilities were performed?
Review the employee’s goals from last year. Were goals modified or changed during the review
period? Have the goals been met? Have you been able to provide the employee with the tools and
support to get the job done?
Review last year’s appraisal. How does this year compare to last year? Have there been
improvements?
Consider whether you need to speak with anyone else in order to have a more complete and accurate
picture of your employee’s performance.
Review the competencies required for administrative staff roles at Wellesley. Assess the employee’s
strengths, weaknesses and areas of greatest improvement. Is there a specific area where you would
like to establish a developmental goal?
What suggestions do you have for the employee that will help improve their performance in their
role or the overall operations of the department?
If the employee supervises others, discuss what he or she has done to strengthen their own staff. Ask
about regular communication of information, job expectations, and feedback.
Contact the Human Resources Office for assistance if substantial performance issues exist.
Finalizing the Performance Development Document
The supervisor is responsible for completing the final draft of the Performance Development
Document and forwarding the completed document to Human Resources to become part of the
employee’s personnel file. Send a hard copy so that signatures are included.
The supervisor should provide a copy of the final Performance Development Document to the
employee.
The employee should sign the Performance Development Document. Signing the Performance
Development Document indicates that the employee has met with their supervisor to provide input to
the document, that they have reviewed the document, and that they have met with the supervisor to
discuss it. The employee has the right to respond to the evaluation in writing.
Tips on Ongoing, Effective Feedback
Feedback involves treating each other with respect.
Constructive feedback tries to reinforce the positive and change the negative by:
Identifying what was done well or poorly.
Describing what action or behavior is desired.
Explaining the effects of the observed and desired acts of behavior.
Good feedback is timely. Give the feedback as quickly as possible after the event. Feedback long
delayed is rarely effective.
Feedback involves both parties listening carefully. Check for clarity to ensure that the receiver fully
understands what is being said.
Good feedback should be specific. Generalized feedback does not explain what behavior to repeat or
avoid. Describe exactly what was done well and/or what could be improved. For example, “This
report is well organized and the summary clearly states your conclusions and proposed actions”
rather than “Good report.”
Keep feedback objective. Use factual records and information whenever possible. Include details
that focus on specific actions and results rather than characteristics of the employee. For example,
say “this happened” rather than “you are.” “You hung up the phone without saying good-bye.” rather
than “you are rude.”
Feedback about performance issues is best delivered in person. The employee will have a chance to
respond to any issues raised. Especially avoid delivering negative feedback via e-mail messages.
Performance criteria
The National Center for Research, Evaluation, Standards, and Student Testing (1996) defines
criteria as "guidelines, rules, characteristics, or dimensions that are used to judge the quality of
student performance. Criteria indicate what we value in student responses, products, or
performances."
With performance assessments such as a lab, group project, portfolio, task, or presentation, students
need to clearly know and understand what performance criteria will be used to judge their
performance. Although student interpretations are important, educators need to recognize that on the
basis of cultural and environmental norms, explanations that seem diametrically opposed may be
equally defensible or right.
Because this quality of complexity allows performance assessments to mirror real life, educators
need to explicitly include the exact parameters of the responses they want to elicit in each
assessment task or problem. (For example, educators should make sure students know if the writing
process--rather than punctuation and grammar--is the criterion on which performance will be judged,
or if a paragraph--as opposed to a few words--is the criterion response.)
The problem of interpretation differences that result when performance criteria (requirements) are
ambiguous is compounded when students have diverse experiences based on their ethnicity, primary
language, or gender. In an effort to assess higher-order cognitive skills and complex problem
solving, educators need to develop appropriate learning assessments that have no single right answer
and in which students' interpretation of information or evidence is key in defending their solution.
Scoring Rubrics
Scoring rubrics are descriptive scoring schemes developed to assess any student performance
whether it's written or oral, online or face-to-face.Scoring rubrics are especially well suited for
evaluating complex tasks or assignments such as: written work (e.g., assignments, essay tests,
papers, portfolios); presentations (e.g., debates, role plays); group work; or other types of work
products or performances (e.g., artistic works, portfolios). Scoring rubrics are assignment-
specific; criteria are different for each assignment or test. It is a way to make your criteria and
standards clear to both you and your students.
Good scoring rubrics:
• Consist of a checklist of items, each with an even number of points. For example, two-
point rubrics would indicate that the student either did or did not perform the specified
task. Four or more points in a rubric are common and indicate the degree to which a
student performed a given task.
• Are criterion based. That is, the rubric contains descriptive criteria for acceptable
performance that are meaningful, clear, concise, unambiguous, and credible--thus
ensuring inter-rater reliability.
• Are used to assess only those behaviors that are directly observable.
• Require a single score based on the overall quality of the work or presentation.
• Provide a better assessment and understanding of expected or actual performance.
Sample Rubric for Quizzes and Homework
Why Develop Scoring Rubrics?
Here are some reasons why taking the time to construct a grading rubric will be worth your time:
• Make grading more consistent and fair.
• Save you time in the grading process.
• Help identify students' strengths and weaknesses so you can teach more effectively.
• To help students understand what and how they need to improve.
Guidelines for Developing a Scoring Rubric
Step 1: Select a project/assignment for assessment.
Example: Work in small groups to write and present a collaborative research paper.
Step 2: What performance skill(s) or competency(ies) are students demonstrating through their
work on this project?
Example: Ability to work as part of a team.
Step 3: List the traits you'll assess when evaluating the project--in other words, ask: "What
counts in my assessment of this work?" Use nouns or noun phrases to name traits, and avoid
evaluative language. Limit the number of traits to no more than seven. Each trait should
represent a key teachable attribute of the overall skill you're assessing.
Example:
Content
Coherence and Organization
Creativity
Graphics and visuals
Delivery
Step 4: Decide on the number of gradations of mastery you'll establish for each trait and the
language you'll use to describe those levels.
Five points of gradation:
5=Proficient 4=Clearly Competent 3=Acceptable 2=Limited 1=Attempted
Four points of gradation:
Exceptional/Excellent Admirable/Good Acceptable/Fair Amateur/Poor
Step 5: For each trait write statements that describe work at each level of mastery. If, for
example, you have seven traits and five gradations, you'll have 35 descriptive statements in your
rubric. Attempt to strike a balance between over-generalizations and task-specificity. For the trait
"coherence and organization" in a four-point rubric:
Exceptional:
Thesis is clearly stated and developed; specific examples are
appropriate and clearly develop thesis; conclusion is clear; ideas flow
together well; good transitions; succinct but not choppy; well-
organized.
Admirable:
Most information presented in logical sequence; generally very
organized but better transitions between ideas is needed.
Acceptable:
Concept and ideas are loosely connected; lacks clear transitions; flow
and organization are choppy.
Amateur:
Presentation of ideas is choppy and disjointed; doesn't flow;
development of thesis is vague; no apparent logical order to writing
Step 6: Design a format for presenting the rubric to students and for scoring student work.
Step 7: Test the rubric and fine tune it based on feedback from colleagues and students.
Check list and rating scale
What is a checklist?
A checklist is just what it sounds like: a list that educators check off. Using this method is a little
bit like going bird watching. Start with a list of items you want to observe and then check off
each item when appropriate.
One popular choice for educators is to use developmental checklists to record what they have
observed about individual children; these developmental checklists consist of lists of skills from
the different developmental domains for a specific age range.
Why use checklists?
Checklists are quick and easy to use, so they are popular with educators. They can be used to
record observations in virtually any situation, and do not require the educator to spend much time
recording data; in general, a few moments is all it takes. One other advantage is that there are
many different pre-made checklists available for use from a variety of sources. For example,
certain websites connected with ECE offer developmental checklists that educators can
download and print out. Educators can also create a checklist that exactly meets their needs,
depending on what they want to observe and record.
How do I use a checklist?
As it is such a popular choice for educators, the example we will present here shows how to use a
developmental checklist. These developmental checklists are generally used to record
observations of one child at a time. The list of skills is targeted for a specific age group (e.g. 12
to 24 months). They may be divided into the different developmental domains or focus only on
one aspect of a child’s development.
Once you have chosen or created a checklist, you then observe the child in a variety of natural
contexts and check off all the relevant skills or behaviours. Usually, there is a space to indicate
the relevant date(s) on the checklist, as this might be an important piece of data.
As the checklist method does not allow for the recording of a lot of qualitative data, you might
choose to have a column for comments.
Sample checklist for language development: Two-year-olds
A blank checklist could look something like this:
Child’s Name: Alan
Behaviour/Skill Date Comments
Communicates with gestures and
pointing
Shakes head for no
Uses one- word sentences
Uses two- word sentences
Names familiar objects
Follows simple instructions
Enjoys songs and rhymes
Refers to self as "me" or "I"
Once you begin filling in the checklist, it will start to look something like this:
Child’s Name: Alan
Behaviour/Skill Date Comments
Communicates with gestures and
pointing
March 9, 2012
Shakes head for no March 9, 2012
Uses one- word sentences
March 10,
2012
Uses two- word sentences
March 29,
2012
"My book"
Names familiar objects
Follows simple instructions Aprl 15, 2012
Enjoys songs and rhymes March 5, 2012 Loves Hokey Pokey
Refers to self as "me" or "I"
March 20,
2012
Taps self on chest, says "Ayan"
Note that, in general, behaviours and/or skills that you have not yet observed, or that the child
has not yet mastered, are left blank, so that you can update the checklist as needed.
In some cases, you may want to add a comment like the one in the last box in the sample above.
In this example, Alan’s strategies for referring to himself are significant, even if he is not yet
demonstrating the specific behaviour from the checklist.
Using a rating scale
Sometimes educators feel limited by a checklist because this method only allows the observer to
record if a child uses a specific skill or not. In this case, they might choose to add a rating scale
to their observations. By adding a rating scale, an educator can rate the quality, frequency or ease
with which a child uses a certain skill.
If you were to add a rating scale to your checklist, it might look like this:
Child’s Name: Alan
Date: March/April 2012
Behaviour/Skill Usually Frequently Rarely Never Comments
Communicates with gestures and
pointing
Shakes head for no
Uses one- word sentences
Uses two- word sentences
Names familiar objects
Follows simple instructions
Enjoys songs and rhymes
Refers to self as "me" or "I"
Once you begin filling it in, it could look something like this:
Child’s Name: Alan
Date: March/April 2012
Behaviour/Skill Usually Frequently Rarely Never Comments
Communicates with gestures and
pointing
Shakes head for no
Uses one- word sentences
Uses two- word sentences "My book"
Names familiar objects
Follows simple instructions
Enjoys songs and rhymes
Refers to self as "me" or "I" Taps self on chest, says "Ayan"
Purpose of portfolio
A student portfolio is a compilation of academic work and other forms of educational evidence
assembled for the purpose of (1) evaluating coursework quality, learning progress, and academic
achievement; (2) determining whether students have met learning standards or other academic
requirements for courses, grade-level promotion, and graduation; (3) helping students reflect on
their academic goals and progress as learners; and (4) creating a lasting archive of academic
work products, accomplishments, and other documentation. Advocates of student portfolios
argue that compiling, reviewing, and evaluating student work over time can provide a richer,
deeper, and more accurate picture of what students have learned and are able to do than more
traditional measures—such as standardized tests, quizzes, or final exams—that only measure
what students know at a specific point in time.
Portfolios come in many forms, from notebooks filled with documents, notes, and graphics to
online digital archives and student-created websites, and they may be used at the elementary,
middle, and high school levels. Portfolios can be a physical collection of student work that
includes materials such as written assignments, journal entries, completed tests, artwork, lab
reports, physical projects (such as dioramas or models), and other material evidence of learning
progress and academic accomplishment, including awards, honors, certifications,
recommendations, written evaluations by teachers or peers, and self-reflections written by
students. Portfolios may also be digital archives, presentations, blogs, or websites that feature the
same materials as physical portfolios, but that may also include content such as student-created
videos, multimedia presentations, spreadsheets, websites, photographs, or other digital artifacts
of learning.
Online portfolios are often called digital portfolios or e-portfolios, among other terms. In some
cases, blogs or online journals may be maintained by students and include ongoing reflections
about learning activities, progress, and accomplishments. Portfolios may also be presented—
publicly or privately—to parents, teachers, and community members as part of a demonstration
of learning, exhibition, or capstone project.
It’s important to note that there are many different types of portfolios in education, and each
form has its own purpose. For example, “capstone” portfolios would feature student work
completed as part of long-term projects or final assessments typically undertaken at the
culmination of a middle school or high school, or at the end of a long-term, possibly multiyear
project. Some portfolios are only intended to evaluate learning progress and achievement in a
specific course, while others are maintained for the entire time a student is enrolled in a school.
And some portfolios are used to assess learning in a specific subject area, while others evaluate
the acquisition of skills that students can apply in all subject areas.
The following arguments are often made by educators who advocate for the use of portfolios in
the classroom:
• Student portfolios are most effective when they are used to evaluate student learning
progress and achievement. When portfolios are used to document and evaluate the
knowledge, skills, and work habits students acquire in school, teachers can use them to
adapt instructional strategies when evidence shows that students either are or are not
learning what they were taught. Advocates typically contend that portfolios should be
integrated into and inform the instructional process, and students should incrementally build
out portfolios on an ongoing basis—i.e., portfolios should not merely be an idle archive of
work products that’s only reviewed at the end of a course or school year.
• Portfolios can help teachers monitor and evaluate learning progress over time. Tests
and quizzes give teachers information about what students know at a particular point in
time, but portfolios can document how students have grown, matured, and improved as
learners over the course of a project, school year, or multiple years. For this reason, some
educators argue that portfolios should not just be compilations of a student’s best work, but
rather they should include evidence and work products that demonstrate how students
improved over time. For example, multiple versions of an essay can show how students
revised and improved their work based on feedback from the teachers or their peers.
• Portfolios help teachers determine whether students can apply what they have learned
to new problems and different subject areas. A test can help teachers determine, for
example, whether students have learned a specific mathematical skill. But can those students
also apply that skill to a complex problem in economics, geography, civics, or history? Can
they use it to conduct a statistical analysis of a large data set in a spreadsheet? Or can they
use it to develop a better plan for a hypothetical business. (Educators may call this ability to
apply skills and knowledge to novel problems and different domains “transfer of
learning”). Similarly, portfolios can also be used to evaluate student work and learning in
non-school contexts. For example, if a student participated in an internship or completed a
project under the guidance of an expert mentor from the community, students could create
portfolios over the course of these learning activities and submit them to their teachers or
school as evidence they have met certain learning expectations or graduation requirements.
• Portfolios can encourage students to take more ownership and responsibility over the
learning process. In some schools, portfolios are a way for students to critique and evaluate
their own work and academic progress, often during the process of deciding what will be
included in their portfolios. Because portfolios document learning growth over time, they
can help students reflect on where they started a course, how they developed, and where
they ended up at the conclusion of the school year. When reviewing a portfolio, teachers
may also ask students to articulate the connection between particular work products and the
academic expectations and goals for a course. For these reasons, advocates of portfolios
often recommend that students be involved in determining what goes into a portfolio, and
that teachers should not unilaterally make the decisions without involving students. For
related discussions, see student engagement and student voice.
• Portfolios can improve communication between teachers and parents. Portfolios can
also help parents become more informed about the education and learning progress of their
children, what is being taught in a particular course, and what students are doing and
learning in the classroom. Advocates may also contend that when parents are more informed
about and engaged in their child’s education, they can play a more active role in supporting
their children at home, which could have a beneficial affect on academic achievement and
long-term student outcomes.
Debate
While portfolios are not generally controversial in concept, it’s possible that skepticism,
criticism, and debate may arise if portfolios are viewed as burdensome, add-on requirements
rather than as a vital instructional strategy and assessment option. Portfolios may also be viewed
negatively if they are poorly designed and executed, if they tend to be filed away and forgotten,
if they are not actively maintained by students, if they are not meaningfully integrated into the
school’s academic program, if educators do not use them to inform and adjust their instructional
techniques, or if sufficient time is not provided during the school day for teachers and students to
review and discuss them. In short, how portfolios are actually used or not used in schools, and
whether they produce the desired educational results, will likely determine how they are
perceived.
Creating, maintaining, and assessing student portfolios can also be a time-consuming endeavor.
For this reason and others, some critics may contend that portfolios are not a practical or feasible
option for use in large-scale evaluations of school and student performance. (Just imagine, for
example, what it would require in terms of funding, time, and human resources to evaluate
dozens or hundreds of pages of academic documentation produced by each of the of tens of
thousands of eleventh-grade students scattered across a state in any given year.)
Standardized tests, in contrast, are relatively efficient and inexpensive to score, and test results
are considered more reliable or comparable across students, schools, or states, given that there is
less chance that error, bias, or inconsistency may occur during the scoring process (in large part
because most standardized tests today are scored in full or in part by automated machines,
computers, or online programs). Student portfolios are a comparably time-consuming—and
therefore far more expensive—assessment strategy because they require human scorers, and it is
also far more challenging to maintain consistent and reliable evaluations or student achievement
across different scorers. Many advocates would argue, however, that portfolios are not intended
for use in large-scale evaluations of school and student performance, and that they provide the
greatest educational value at the classroom level where teachers have personal relationships and
conversations with students, and where in-depth feedback from teachers can help students grow,
improve, and mature as learners.
Evaluation criteria and using portfolios in instruction and
communication
WHAT IS PORTFOLIO ASSESSMENT?
In program evaluation as in other areas, a picture can be worth a thousand words. As an
evaluation tool for community-based programs, we can think of a portfolio as a kind of
scrapbook or photo album that records the progress and activities of the program and its
participants, and showcases them to interested parties both within and outside of the program.
While portfolio assessment has been predominantly used in educational settings to document the
progress and achievements of individual children and adolescents, it has the potential to be a
valuable tool for program assessment as well.
Many programs do keep such albums, or scrapbooks, and use them informally as a means of
conveying their pride in the program, but most do not consider using them in a systematic way as
part of their formal program evaluation. However, the concepts and philosophy behind portfolios
can apply to community evaluation, where portfolios can provide windows into community
practices, procedures, and outcomes, perhaps better than more traditional measures.
ortfolio assessment has become widely used in educational settings as a way to examine and
measure progress, by documenting the process of learning or change as it occurs. Portfolios
extend beyond test scores to include substantive descriptions or examples of what the student is
doing and experiencing. Fundamental to "authentic assessment" or "performance assessment" in
educational theory is the principle that children and adolescents should demonstrate, rather than
tell about, what they know and can do (Cole, Ryan, & Kick, 1995). Documenting progress
toward higher order goals such as application of skills and synthesis of experience requires
obtaining information beyond what can be provided by standardized or norm-based tests. In
"authentic assessment", information or data is collected from various sources, through multiple
methods, and over multiple points in time (Shaklee, Barbour, Ambrose, & Hansford, 1997).
Contents of portfolios (sometimes called "artifacts" or "evidence") can include drawings, photos,
video or audio tapes, writing or other work samples, computer disks, and copies of standardized
or program-specific tests. Data sources can include parents, staff, and other community members
who know the participants or program, as well as the self-reflections of participants themselves.
Portfolio assessment provides a practical strategy for systematically collecting and organizing
such data.
PORTFOLIO ASSESSMENT IS MOST USEFUL FOR:
*Evaluating programs that have flexible or individualized goals or outcomes. For example,
within a program with the general purpose of enhancing children's social skills, some individual
children may need to become less aggressive while other shy children may need to
become more assertive.
Each child's portfolio asseessment would be geared to his or her individual needs and goals.
*Allowing individuals and programs in the community (those being evaluated) to be involved in
their own change and decisions to change.
*Providing information that gives meaningful insight into behavior and related change. Because
portfolio assessment emphasizes the process of change or growth, at multiple points in time, it
may be easier to see patterns.
*Providing a tool that can ensure communication and accountability to a range of audiences.
Participants, their families, funders, and members of the community at large who may not have
much sophistication in interpreting statistical data can often appreciate more visual or
experiential "evidence" of success.
*Allowing for the possibility of assessing some of the more complex and important aspects of
many constructs (rather than just the ones that are easiest to measure).
PORTFOLIO ASSESSMENT IS NOT AS USEFUL FOR:
*Evaluating programs that have very concrete, uniform goals or purposes. For example, it would
be unneccessary to compile a portfolio of individualized "evidence" in a program whose sole
purpose is full immunization of all children in a community by the age of five years. The
required immunizations are the same, and the evidence is generally clear and straightforward.
*Allowing you to rank participants or programs in a quantitative or standardized way (although
evaluators or program staff may be able to make subjective judgements of relative merit).
*Comparing participants or programs to standardized norms. While portfolios can (and often do)
include some standardized test scores along with other kinds of "evidence", this is not the main
purpose of the portfolio.
USING PORTFOLIO ASSESSMENT WITH THE STATE STRENGTHENING
EVALUATION GUIDE
Tier 1 - Program Definition
Using portfolios can help you to document the needs and assets of the community of interest.
Portfolios can also help you to clarify the identity of your program and allow you to document
the "thinking" behind the development of and throughout the program. Ideally, the process of
deciding on criteria for the portfolio will flow directly from the program objectives that have
been established in designing the program. However, in a new or existing program where the
original objectives are not as clearly defined as they need to be, program developers and staff
may be able to clarify their own thinking by visualizing what successful outcomes would look
like, and what they would accept as "evidence". Thus, thinking about portfolio criteria may
contribute to clearer thinking and better definition of program objectives.
Tier 2 - Accountability
Critical to any form of assessment is accountability. In the educational arena for example,
teachers are accountable to themselves, their students, and the families, the schools and society.
The portfolio is an assessment practice that can inform all of these constituents. The process of
selecting "evidence" for inclusion in portfolios involves ongoing dialogue and feedback between
participants and service providers.
Tier 3 - Understanding and Refining
Portfolio assessment of the program or participants provides a means of conducting assessments
throughout the life of the program, as the program addresses the evolving needs and assets of
participants and of the community involved. This helps to maintain focus on the outcomes of the
program and the steps necessary to meet them, while ensuring that the implementation is in line
with the vision established in Tier 1.
Tier 4 - Progress Toward Outcomes
Items are selected for inclusion in the portfolio because they provide "evidence" of progress
toward selected outcomes. Whether the outcomes selected are specific to individual participants
or apply to entire communities, the portfolio documents steps toward achievement. Usually it is
most helpful for this selection to take place at regular intervals, in the context of conferences or
discussions among participants and staff.
Tier 5 - Program Impact
One of the greatest strengths of portfolio assessment in program evaluation may be its power as a
tool to communicate program impact to those outside of the program. While this kind of data
may not take the place of statistics about numbers served, costs, or test scores, many policy
makers, funders, and community members find visual or descriptive evidence of successes of
individuals or programs to be very persuasive.
ADVANTAGES OF USING PORTFOLIO ASSESSMENT
*Allows the evaluators to see the student, group, or community as individual, each unique with
its own characteristics, needs, and strengths.
*Serves as a cross-section lens, providing a basis for future analysis and planning. By viewing
the total pattern of the community or of individual participants, one can identify areas of
strengths and weaknesses, and barriers to success.
*Serves as a concrete vehicle for communication, providing ongoing communication or
exchanges of information among those involved.
*Promotes a shift in ownership; communities and participants can take an active role in
examining where they have been and where they want to go.
*Portfolio assessment offers the possibility of addressing shortcomings of traditional assessment.
It offers the possibility of assessing the more complex and important aspects of an area or topic.
*Covers a broad scope of knowledge and information, from many different people who know the
program or person in different contexts ( eg., participants, parents, teachers or staff, peers, or
community leaders).
DISADVANTAGES OF USING PORTFOLIO ASSESSMENT
*May be seen as less reliable or fair than more quantitative evaluations such as test scores.
*Can be very time consuming for teachers or program staff to organize and evaluate the contents,
especially if portfolios have to be done in addition to traditional testing and grading.
*Having to develop your own individualized criteria can be difficult or unfamiliar at first.
*If goals and criteria are not clear, the portfolio can be just a miscellaneous collection of artifacts
that don't show patterns of growth or achievement.
*Like any other form of qualitative data, data from portfolio assessments can be difficult to
analyze or aggregate to show change.
HOW TO USE PORTFOLIO ASSESSMENT
Design and Development
Three main factors guide the design and development of a portfolio: 1) purpose, 2) assessment
criteria, and 3) evidence (Barton & Collins, 1997).
1) Purpose
The primary concern in getting started is knowing the purpose that the portfolio will serve.
This decision defines the operational guidelines for collecting materials. For example, is the goal
to use the portfolio as data to inform program development? To report progress? To identify
special needs? For program accountability? For all of these?
2) Assessment Criteria
Once the purpose or goal of the portfolio is clear, decisions are made about what will be
considered sucess (criteria or standards), and what strategies are necessary to meet the goals.
Items are then selected to include in the portfolio because they provide evidence of meeting
criteria, or making progress toward goals.
3) Evidence
In collecting data, many things need to be considered. What sources of evidence should be used?
How much evidence do we need to make good decisions and determinations? How often should
we collect evidence? How congruent should the sources of evidence be? How can we make
sense of the evidence that is collected? How should evidence be used to modify program and
evaluation? According to Barton and Collins (1997), evidence can include artifacts (items
produced in the normal course of classroom or program
activities), reproductions (documentation of interviews or projects done outside of the
classroom or program), attestations (statements and observations by staff or others about the
participant), and productions (items prepared especially for the portfolio, such as participant
reflections on their learning or choices) . Each item is selected because it adds some new
information related to attainment of the goals.
Steps of Portfolio Assessment
Although many variations of portfolio assessment are in use, most fall into two basic types:
process portfolios and product portfolios (Cole, Ryan, & Kick, 1995). These are not the only
kinds of portfolios in use, nor are they pure types clearly distinct from each other. It may be more
helpful to think of these as two steps in the portfolio assessment process, as the participant(s) and
staff reflectively select items from their process portfolios for inclusion in the product portfolio.
Step 1: The first step is to develop a process portfolio, which documents growth over time
toward a goal. Documentation includes statements of the end goals, criteria, and plans for the
future. This should include baseline information, or items describing the participant's
performance or mastery level at the beginning of the program. Other items are "works in
progress", selected at many interim points to demonstrate steps toward mastery. At this stage, the
portfolio is a formative evaluation tool, probably most useful for the internal information of the
participant(s) and staff as they plan for the future.
Step 2: The next step is to develop a product portfolio (also known as a "best pieces portfolio"),
which includes examples of the best efforts of a participant, community, or program. These also
include "final evidence", or items which demonstrate attainment of the end goals. Product or
"best pieces" portfolios encourage reflection about change or learning. The program participants,
either individually or in groups, are involved in selecting the content, the criteria for selection,
and the criteria for judging merits, and "evidence" that the criteria have been met (Winograd &
Jones, 1992). For individuals and communities alike, this provides opportunities for a sense of
ownership and strength. It helps to show-case or communicate the accomplishments of the
person or program. At this stage, the portfolio is an example of summative evaluation, and may
be particularly useful as a public relations tool.
Distinguishing Characteristics
Certain characteristics are essential to the development of any type of portfolio used for
assessment. According to Barton and Collins (1997), portfolios should be:
1) Multisourced (allowing for the opportunity to evaluate a variety of specific evidence)
Multiple data sources include both people (statements and observations of participants, teachers
or program staff, parents, and community members), and artifacts (anything from test scores to
photos, drawings, journals, & audio or videotapes of performances).
2) Authentic (context and evidence are directly linked)
The items selected or produced for evidence should be related to program activities, as well as
the goals and criteria. If the portfolio is assessing the effect of a program on participants or
communities, then the "evidence" should reflect the activities of the program rather than skills
that were gained elsewhere. For example, if a child's musical performance skills were gained
through private piano lessons, not through 4-H activities, an audio tape would be irrelevant in his
4-H portfolio. If a 4-H activity involved the same child in teaching other children to play, a tape
might be relevant.
3) Dynamic (capturing growth and change)
An important feature of portfolio assessment is that data or evidence is added at many points in
time, not just as "before and after" measures. Rather than including only the best work, the
portfolio should include examples of different stages of mastery. At least some of the items are
self-selected. This allows a much richer understanding of the process of change.
4) Explicit (purpose and goals are clearly defined)
The students or program participants should know in advance what is expected of them, so that
they can take responsibility for developing their evidence.
5) Integrated (evidence should establish a correspondence between program activities and life
experiences)
Participants should be asked to demonstrate how they can apply their skills or knowledge to real-
life situations.
6) Based on ownership (the participant helps determine evidence to include and goals to be met)
The portfolio assessment process should require that the participants engage in some reflection
and self-evaluation as they select the evidence to include and set or modify their goals. They are
not simply being evaluated or graded by others.
7) Multipurposed (allowing assessment of the effectiveness of the program while assessing
performance of the participant).
A well-designed portfolio assessment process evaluates the effectiveness of your intervention at
the same time that it evaluates the growth of individuals or communities. It also serves as a
communication tool when shared with family, other staff, or community members. In school
settings, it can be passed on to other teachers or staff as a child moves from one grade level to
another.
Analyzing and Reporting Data
As with any qualitative assessment method, analysis of portfolio data can pose challenges.
Methods of analysis will vary depending on the purpose of the portfolio, and the types of data
collected (Patton, 1990). However, if goals and criteria have been clearly defined, the "evidence"
in the portfolio makes it relatively easy to demonstrate that the individual or population has
moved from a baseline level of performance to achievement of particular goals.
It should also be possible to report some aggregated or comparative results, even if participants
have individualized goals within a program. For example, in a teen peer tutoring program, you
might report that "X% of participants met or exceeded two or more of their personal goals within
this time frame", even if one teen's primary goal was to gain public speaking skills and another's
main goal was to raise his grade point average by mastering study skills. Comparing across
programs, you might be able to say that the participants in Town X on average mastered 4 new
skills in the course of six months, while those in Town Y only mastered 2, and speculate that
lower attendance rates in Town Y could account for the difference.
Subjectivity of judgements is often cited as a concern in this type of assessment (Bateson, 1994).
However, in educational settings, teachers or staff using portfolio assessment often choose to
periodically compare notes by independently rating the same portfolio to see if they are in
agreement on scoring (Barton & Collins, 1997). This provides a simple check on reliability, and
can be very simply reported. For example, a local programmer could say "To ensure some
consistency in assessment standards, every 5th portfolio (or 20%) was assessed by more than one
staff member. Agreement between raters, or inter-rater reliability, was 88%".
There are many books and articles that address the problems of analyzing and reporting on
qualitative data in more depth than can be covered here. The basic issues of reliability, validity
and generalizability are relevant even when using qualitative methods, and various strategies
have been developed to address them. Those who are considering using portfolio assessment in
evaluation are encouraged to refer to some of the sources listed below for more in-depth
information.

Assessment In Education At Secondary Level

  • 1.
    Assessment in educationat Secondary level 1.Functions of grading - This book is about designing classroom grading systems that are both precise and efficient. One of the first steps to this end is to clarify the basic purpose of grades. How a school or district defines the purpose of grades dictates much of the form and function of grades. - Measurement experts such as Peter Airasian (1994) explain that educators use grades primarily (1) for administrative purposes, (2) to give students feedback about their progress and achievement, (3) to provide guidance to students about future course work, (4) to provide guidance to teachers for instructional planning, and (5) to motivate students. Administrative Purposes For at least several decades, grades have served a variety of administrative functions (Wrinkle, 1947), most dealing with district-level decisions about students, including • Student matriculation and retention. • Placement when students transfer from one school to another. • Student entrance into college. Research indicates that some districts explicitly make note of the administrative function of grades. For example, in a study of school board manuals, district guidelines, and handbooks for teaching, researchers Susan Austin and Richard McCann (1992) found the explicit mention of administration as a basic purpose for grades in 7 percent of school board documents, 10 percent of district guidelines, and 4 percent of handbooks for teachers. Finally, in a survey conducted by The College Board (1998), over 81 percent of the schools reported using grades for administrative purposes. Feedback About Student Achievement One of the more obvious purposes for grades is to provide feedback about student achievement. Studies have consistently shown support for this purpose. For example, in 1976, Simon and Bellanca reported that both educators and noneducators perceived providing information about student achievement as the primary purpose of grading. In a 1989 study of high school teachers, Stiggins, Frisbie, and Griswold reported that this grading function—which they refer to as the information function—was highly valued by teachers. Finally, the study by Austin and McCann (1992) found that 25 percent of school board documents, 45 percent of district documents, and 65 percent of teacher documents mentioned reporting student achievement as a basic purpose of grades. Guidance When used for guidance purposes, grades help counselors provide direction for students (Wrinkle, 1947; Terwilliger, 1971). Specifically, counselors use grades to recommend to
  • 2.
    individual students coursesthey should or should not take and schools and occupations they might consider (Airasian, 1994). Austin and McCann (1992) found that 82 percent of school board documents, 40 percent of district documents, and 38 percent of teacher documents identified guidance as an important purpose of grades. Instructional Planning Teachers also use grades to make initial decisions about student strengths and weaknesses in order to group them for instruction. Grading as a tool for instructional planning is not commonly mentioned by measurement experts. However, the Austin and McCann (1992) study reported that 44 percent of school board documents, 20 percent of district documents, and 10 percent of teacher documents emphasized this purpose. Motivation Those who advocate using grades to motivate students assume that they encourage students to try harder both from negative and positive perspectives. On the negative side, receiving a low grade is believed to motivate students to try harder. On the positive side, it is assumed that receiving a high grade will motivate students to continue or renew their efforts. As discussed later in this chapter, some educators object strongly to using grades as motivators. Rightly or wrongly, however, this purpose is manifested in some U.S. schools. For example, Austin and McCann (1992) found that 7 percent of school board documents, 15 percent of district-level documents, and 10 percent of teacher documents emphasized motivation as a purpose for grades . Type of Grading,Definition,Historical Background- 1.Percentage grading Using a percentage scale (percent of 100), usually based on percent correct on exams and/or percent of points earned on assignments • Most common method in use in high schools and grading colleges c.1890–1910. • Used today as a grading method or as a way of arriving at letter grades. 2.Letter grading and variations Using a series of letters (often A, B, C, D, F) or letters with plusses and minuses as an ordered category scale - can be done in a norm-referenced (standards-based) manner • Yale used a four-category variations system in 1813. • In the 1920 letter grading was seen as the solution to the problem of reliability of percentage grading (fewer or criterion-referenced categories) and was increasingly adopted. 3.Norm-referenced grading Comparing students to each other; using class standing as the basis for assigning grades (usually letter grades)
  • 3.
    • Was advocatedin early grading 1900s as scientific measurement. • Educational disadvantages were known by the 1930s. 4.Mastery grading Grading students as “masters” or “passers” when their attainment reaches a prespecified level, usually allowing different amounts of time for different students to reach mastery • Originating in the 1920s (e.g., Morrison, 1926) as a grading strategy, it became associated with the educational strategy of mastery learning (Bloom, Hastings, & Madaus, 1971). 5.Pass/Fail Using a scale with two levels (pass and fail), sometimes in connection with mastery grading • In 1851, the University of Michigan experimented with pass/fail grading for classes. 6.Standards (or Absolute-Standards) gradingOriginally, comparing student performance to a preestablished standard (level) of performance; currently, standards grading sometimes means grading with reference to a list of state or district content standards according to preestablished performance levels • Grading according to standards of performance has been championed since the grading 1930s as more educationally sound than norm-referenced grading. • Current advocates of standards grading use the same principle but the term "standard” is now used for the criterion itself, not the level of performance. • Since 2002, the scales on some standards-based report cards use the state accountability (proficiency) reporting categories instead of letters. 7.Narrative grading Writing comments about students’ achievement, either in addition to or instead of using numbers or letters • Using a normal instructional practice (describing students’ work) in an assessment context. Progress Report a written document that explains how much progress is being made on something you have previously planned: . These reports will be staggered throughout the year and we have done our best to ensure that key points in a student’s development, such as making option choices, are supported with an appropriate report. We do not provide an end of year report for all students as it would not be possible for teachers to write reports on every student they teach at one time. Furthermore, we do not believe that a
  • 4.
    summative* end ofyear report is as valuable for a students’ development as providing a formative** report that can give them advice on how to improve and crucially, time to work on those developments. All reports will provide information on the following: • Attendance • A record of the number of positive events • A record of the number of negative events • An end of year attainment estimate • A teacher assessment of current attainment • A teacher assessment of Learner Characteristics Further to these points once a year teachers will write a brief statement about strengths and areas for development. The learner characteristics we will grade for each report are: • Attitude to Learning • Communication Skills • Homework Quality • Personal Organisation • Presentation of Work Each of these is assessed on a scale from one to five, with one being ‘unacceptable’ and five being ‘exceptional’. Further information and detail is contained within the document below. *A summative report is given at the end of a period of study. It would state how well the student has done but would not give advice on how to improve. If a comment was given on how to improve the student would not have the opportunity to work on this development. ** A formative report is given during a period of study. It would state how well a student is doing and would give advice on how to make further progress. The student then has further time to work on the advice given in the report.
  • 5.
    Perspectives on assessment Assessmentis at the centre of the student's experience. It provides a means of evaluating student progress and achievement; it drives the activity of the student and therefore their learning. This collection of short presentations is intended to provoke debate about assessment. Over the last few years, those involved in developing assessment have generated some new perspectives which, as yet, have not been fully incorporated within mainstream practice. There has been a gap between the emerging understandings of 'reflective practitioners' and educational developers and those who are setting assessment policy and defining practice. We would like to close that gap. In order to do so we have set out some 'challenging perspectives' in short talks. They are intended to be contentious but well grounded. Each Web talk is an introduction to an idea that we hope you will pursue using the references provided. The talks may be used by individuals or serve as a catalyst for a group discussion, for example in a workshop. Please feel free to comment. We haven?t covered all the ground - far from it - and hope that others might add to this collection. Purposes of assessment Teaching and learning The primary purpose of assessment is to improve students’ learning and teachers’ teaching as both respond to the information it provides. Assessment for learning is an ongoing process that arises out of the interaction between teaching and learning. What makes assessment for learning effective is how well the information is used. System improvement Assessment can do more than simply diagnose and identify students’ learning needs; it can be used to assist improvements across the education system in a cycle of continuous improvement: • Students and teachers can use the information gained from assessment to determine their next teaching and learning steps. • Parents, families and whānau can be kept informed of next plans for teaching and learning and the progress being made, so they can play an active role in their children’s learning. • School leaders can use the information for school-wide planning, to support their teachers and determine professional development needs.
  • 6.
    • Communities andBoards of Trustees can use assessment information to assist their governance role and their decisions about staffing and resourcing. • The Education Review Office can use assessment information to inform their advice for school improvement. • The Ministry of Education can use assessment information to undertake policy review and development at a national level, so that government funding and policy intervention is targeted appropriately to support improved student outcomes. 1. Assessment for Learning (Formative) The purpose of Formative Assessment is to provide students with feedback on how they are going. The aim is to help students improve their performance and make their next piece of assessed work better. It is developmental or formative in nature; hence the term "Formative Assessment". The feedback students receive is the key component of formative assessment. Feedback is intended to help them identify weaknesses and build on strengths to improve the quality of their next piece of assessment. The focus is on comments for improvement, not marks, and the awarding of marks in formative assessment can actually be counterproductive. 2. Assessment for Certification (Summative) Another key purpose of assessment is to gather evidence to make a judgment about a student's level of performance; against the specified learning objectives. Students are usually assessed at the end of an element of learning, such as the end of a module, mid semester or end of semester. They are awarded results typically as marks or grades to represent a particular level of achievement (high, medium, low). This judgmental "summative" process formally provides the evidence, to verify or "certify" which students may progress to the next level of their studies. 3. Protect Academic Standards Grades from cumulative assessments are used to certify that a person has the necessary knowledge and skills (and can apply them appropriately) to be awarded a qualification. Consequently, the quality and integrity of assessment is essential to guarantee the credibility of qualifications and the academic reputation of the issuing Institution. There is considerable local, national and international concern to ensure that the ways we protect academic standards stand up to scrutiny. 4. Feedback for Teaching The results from both formative and summative assessments can help you track how your students are going throughout your courses. Closely looking at the results can help you identify any patterns of difficulties or misunderstandings students might have. This in turn allows you alter your approach to teaching and adjust your curriculum accordingly. For example, you may identify that you need to offer more detailed explanations or provide additional resources in a particular area. Continuous and comprehensive evaluation Concept and Importance
  • 7.
    Continuous and comprehensiveevaluationwas a process of assessment, mandated by the Right to Education Act, of India. This approach to assessment has been introduced by state governments in India, as well as by the Central Board of Secondary Education in India, for students of sixth to tenth grades and twelfth in some schools. The Karnataka government has introduced CCE for grades 1 through 9 later it was also introduced for 12th grades students. The main aim of CCE is to evaluate every aspect of the child during their presence at the school. This is believed to help reduce the pressure on the child during/before examinations as the student will have to sit for multiple tests throughout the year, of which no test or the syllabus covered will be repeated at the end of the year, whatsoever. The CCE method is claimed to bring enormous changes from the traditional chalk and talk method of teaching, provided it is implemented accurately. In 2017, the CCE system was cancelled for students appearing in the Class 10 Board Exam for 2017-18, bringing back compulsory Annual Board Exam and removing the Formative and Summative Assessments under the Remodeled Assessment Pattern.[1] As a part of this new system, student's marks will be replaced by grades which will be evaluated through a series of curricular and extra-curricular evaluations along with academics. The aim is to decrease the workload on the student by means of continuous evaluation by taking number of small tests throughout the year in place of single test at the end of the academic program. Only Grades are awarded to students based on work experience skills, dexterity, innovation, steadiness, teamwork, public speaking, behavior, etc. to evaluate and present an overall measure of the student's ability. This helps the students who are not good in academics to show their talent in other fields such as arts, humanities, sports, music, athletics, and also helps to motivate the students who have a thirst of knowledge Unlike CBSE's old pattern of only one test at the end of the academic year, the CCE conducts several. There are two different types of tests. Namely, the formative and the summative. Formative tests will comprise the student's work at class and home, the student's performance in oral tests and quizzes and the quality of the projects or assignments submitted by the child. Formative tests will be conducted four times in an academic session, and they will carry a 40% weightage for the aggregate. In some schools, an additional written test is conducted instead of multiple oral tests. However, at least one oral test is conducted. The summative assessment is a three-hour long written test conducted twice a year. The first summative or Summative Assessment 1 (SA-1) will be conducted after the first two formatives are completed. The second (SA-2) will be conducted after the next two formatives. Each summative will carry a 30% weightage and both together will carry a 60% weightage for the aggregate. The summative assessment will be conducted by the schools itself. However, the question papers will be partially prepared by the CBSE and evaluation of the answer sheets is also strictly monitored by the CBSE. Once completed, the syllabus of one summative will not be repeated in the next. A student will have to concentrate on totally new topics for the next summative. At the end of the year, the CBSE processes the result by adding the formative score to the summative score, i.e. 40% + 60% = 100%. Depending upon the percentage obtained, the board will deduce the CGPA (Cumulative Grade Point Average) and thereby deduce the grade obtained. In addition to the summative assessment, the board will offer an optional online aptitude test that may also be used as a tool along with the grades obtained in the CCE to help students to decide the choice of subjects in further studies. The board has also instructed the schools to prepare the report card and it will be duly signed by the principal, the student.
  • 8.
    • Deductive Method- What does the student know and how can he use it to explain a situation. • Co-relation with a real-life situation - Whether the situation given matches any real-life situation, like tsunamis, floods, tropical cyclones, etc. • Usage of Information Technology - Can the problem be solved with the use of IT? If yes, how? In addition to that, various assignments can be given such as projects, models and charts, group work, worksheet, survey, seminar, etc. The teacher will also play a major role. For example, they give remedial help, maintain a term-wise record and checklists, etc. Assessment for learning Assessment for Learning is the process of seeking and interpreting evidence for use by learners and their teachers to decide where the learners are in their learning, where they need to go and how best to get there. Assessment for learning is best described as a process by which assessment information is used by teachers to adjust their teaching strategies, and by students to adjust their learning strategies. Assessment, teaching, and learning are inextricably linked, as each informs the others. Assessment is a powerful process that can either optimise or inhibit learning, depending on how it’s applied. For teachers Assessment for learning helps teachers gather information to: • plan and modify teaching and learning programmes for individual students, groups of students, and the class as a whole • pinpoint students’ strengths so that both teachers and students can build on them • identify students’ learning needs in a clear and constructive way so they can be addressed • involve parents, families, and whānau in their children's learning. For students Assessment for learning provides students with information and guidance so they can plan and manage the next steps in their learning. Assessment for learning uses information to lead from what has been learned to what needs to be learned next. Describing assessment for learning Assessment for learning should use a range of approaches. These may include:
  • 9.
    • day-to-day activities,such as learning conversations • a simple mental note taken by the teacher during observation • student self and peer assessments • a detailed analysis of a student’s work • assessment tools, which may be written items, structured interview questions, or items teachers make up themselves. What matters most is not so much the form of the assessment, but how the information gathered is used to improve teaching and learning Testing, Assessment, Measurement and Definition The definitions for each are: Test: A method to determine a students ability to complete certain tasks or demonstrate masteryof a skill or knowledge of content. Some types would be multiple choice tests, or a weeklyspelling test. While it is commonly used interchangeably with assessment, or even evaluation, itcan be distinguished by the fact that a test is one form of an assessment Assessment: The process of gathering information to monitor progress and make educational decisions if necessary. As noted in my definition of test, an assessment may include a test, butalso includes methods such as observations, interviews, behavior monitoring, etc. Measurement:beyond its general definition, refers to the set of procedures and the principles forhow to use the procedures in educational tests and assessments. Some of the basic principlesof measurement in educational evaluations would be raw scores, percentile ranks, derivedscores, standard scores, etc Assessment In education, the term assessment refers to the wide variety of methods or tools that educators use to evaluate, measure, and document the academic readiness, learning progress, skill acquisition, or educational needs of students. • Assessment involves the use of empirical data on student learning to refine programs and improve student learning. • Assessment is the process of gathering and discussing information from multiple and diverse sources in order to develop a deep understanding of what students know, understand, and can do with their knowledge as a result of their educational experiences; the process culminates when assessment results are used to improve subsequent learning. Assessment is the systematic basis for making inferences about the learning and development of students. It is the process of defining, selecting, designing, collecting, analyzing, interpreting, and using information to increase students’ learning and developmentAssessment is the systematic collection, review, and use of information about educational programs undertaken for the purpose of improving student learning and development.
  • 10.
    Characteristics • Learner-Centered • Theprimary attention of teachers is focused on observing and improving learning. • Teacher-Directed • Individual teachers decide what to assess, how to assess, and how to respond to the information gained through the assessment • Teachers do not need to share results with anyone outside of the class. • Mutually Beneficial • Students are active participants. • Students are motivated by the increased interest of faculty in their success as learners. • Teachers improve their teaching skills and gain new insights. • Formative • Assessments are almost never "graded". • Assessments are almost always anonymous in the classroom and often anonymous online. • Assessments do not provide evidence for evaluating or grading students. • Context-Specific • Assessments respond to the particular needs and characteristics of the teachers, students, and disciplines to which they are applied. • Customize to meet the needs of your students and course. • Ongoing • Classroom assessment is a continuous process. • Part of the process is creating and maintaining a classroom "feedback loop" • Each classroom assessment event is of short duration. • Rooted in Good Teaching Practice • Classroom assessment builds on good practices by making feedback on students' learning more systematic, more flexible, and more effective. Test • A test or examination (informally, exam or evaluation) is an assessment intended to measure a test-taker's knowledge, skill, aptitude, physical fitness, or classification in many other topics (e.g., beliefs).[1] A test may be administered verbally, on paper, on a computer, or in a confined area that requires a test taker to physically perform a set of skills. Tests vary in style, rigor and requirements. For example, in a closed book test, a test taker is often required to rely upon memory to respond to specific items whereas in an open book test, a test taker may use one or more supplementary tools such as a reference book or calculator when responding to an item. A test may be administered formally or informally. An example of an informal test would be a reading test administered by a parent to a child. An example of a formal test would be a final examination administered by a teacher in a classroom or an I.Q. test administered by a psychologist in a clinic. Formal testing often results in a grade or a test score.[2] A test score may be interpreted with regards to a norm or criterion, or occasionally both. The
  • 11.
    norm may beestablished independently, or by statistical analysis of a large number of participants. An exam is meant to test a child's knowledge or willingness to give time to manipulate that subject. • A standardized test is any test that is administered and scored in a consistent manner to ensure legal defensibility.[3] Standardized tests are often used in education, professional certification, psychology (e.g., MMPI), the military, and many other fields. • A non-standardized test is usually flexible in scope and format, variable in difficulty and significance. Since these tests are usually developed by individual instructors, the format and difficulty of these tests may not be widely adopted or used by other instructors or institutions. A non-standardized test may be used to determine the proficiency level of students, to motivate students to study, and to provide feedback to students. In some instances, a teacher may develop non-standardized tests that resemble standardized tests in scope, format, and difficulty for the purpose of preparing their students for an upcoming standardized test.[4] Finally, the frequency and setting by which a non- standardized tests are administered are highly variable and are usually constrained by the duration of the class period. A class instructor may for example, administer a test on a weekly basis or just twice a semester. Depending on the policy of the instructor or institution, the duration of each test itself may last for only five minutes to an entire class period. • In contrasts to non-standardized tests, standardized tests are widely used, fixed in terms of scope, difficulty and format, and are usually significant in consequences. Standardized tests are usually held on fixed dates as determined by the test developer, educational institution, or governing body, which may or may not be administered by the instructor, held within the classroom, or constrained by the classroom period. Although there is little variability between different copies of the same type of standardized test (e.g., SAT or GRE), there is variability between different types of standardized tests. • Any test with important consequences for the individual test taker is referred to as a high- stakes test. • A test may be developed and administered by an instructor, a clinician, a governing body, or a test provider. In some instances, the developer of the test may not be directly responsible for its administration. For example, Educational Testing Service (ETS), a nonprofit educational testing and assessment organization, develops standardized tests such as the SAT but may not directly be involved in the administration or proctoring of these tests. As with the development and administration of educational tests, the format and level of difficulty of the tests themselves are highly variable and there is no general consensus or invariable standard for test formats and difficulty. Often, the format and difficulty of the test is dependent upon the educational philosophy of the instructor, subject matter, class size, policy of the educational institution, and requirements of accreditation or governing bodies. In general, tests developed and administered by individual instructors are non-standardized whereas tests developed by testing organizations are standardized.
  • 12.
    Characteristics of Test Reliable Reliabilityrefers to the accuracy of the obtained test score or to how close the obtained scores for individuals are to what would be their “true” score, if we could ever know their true score. Thus, reliability is the lack of measurement error, the less measurement error the better. The reliability coefficient, similar to a correlation coefficient, is used as the indicator of the reliability of a test. The reliability coefficient can range from 0 to 1, and the closer to 1 the better. Generally, experts tend to look for a reliability coefficient in excess of .70. However, many tests used in public safety screening are what is referred to as multi-dimensional. Interpreting the meaning of a reliability coefficient for a knowledge test based on a variety of sources requires a great deal of experience and even experts are often fooled or offer incorrect interpretations. There are a number of types of reliability, but the type usually reported is internal consistency or coefficient alpha. All things being equal, one should look for an assessment with strong evidence of reliability, where information is offered on the degree of confidence you can have in the reported test score. Valid Validity will be the topic of our third primer in the series. In the selection context, the term “validity” refers to whether there is an expectation that scores on the test have a demonstrable relationship to job performance, or other important job-related criteria. Validity may also be used interchangeably with related terms such as “job related” or “business necessity.” For now, we will state that there are a number of ways of evaluating validity including: ▪ Content ▪ Criterion-related ▪ Construct ▪ Transfer or transportability ▪ Validity generalization A good test will offer extensive documentation of the validity of the test. Practical A good test should be practical. What defines or constitutes a practical test? Well, this would be a balancing of a number of factors including: ▪ Length – a shorter test is generally preferred ▪ Time – a test that takes less time is generally preferred ▪ Low cost – speaks for itself
  • 13.
    ▪ Easy toadminister ▪ Easy to score ▪ Differentiates between candidates – a test is of little value if all the applicants obtain the same score ▪ Adequate test manual – provides a test manual offering adequate information and documentation ▪ Professionalism – is produced by test developers possessing high levels of expertise The issue of the practicality of a test is a subjective judgment, which will be impacted by the constraints facing the public-sector jurisdiction. A test that may be practical for a large city with 10,000 applicants and a large budget, may not be practical for a small town with 10 applicants and a miniscule testing budget. Socially Sensitive A consideration of the social implications and effects of the use of a test is critical in public sector, especially for high stakes jobs such as public safety occupations. The public safety assessment professional must be considerate of and responsive to multiple group of stakeholders. In addition, in evaluating a test, it is critical that attention be given to: ▪ Avoiding adverse Impact – Recent events have highlighted the importance of balance in the demographics of safety force personnel. Adverse impact refers to differences in the passing rates on exams between males and females, or minorities and majority group members. Tests should be designed with an eye toward the minimization of adverse impact.. ▪ Universal Testing – The concept behind universal testing is that your exams should be able to be taken by the most diverse set of applicants possible, including those with disabilities and by those who speak other languages. Having a truly universal test is a difficult, if not impossible, standard to meet. However, organizations should strive to ensure that testing locations and environments are compatible with the needs of as wide a variety of individuals as possible. In addition, organizations should have in place committees and procedures for dealing with requests for accommodations. Candidate Friendly One of the biggest changes in testing over the past twenty years has been the increased attention paid to the candidate experience. Thus, your tests should be designed to look professional and be easy to administer. Furthermore, the candidate should see a clear connection between the exams and the job. As the candidate completed the selection battery, you want the reaction to be “That
  • 14.
    was a fairtest, I had an opportunity to prove why I deserve the job, and this is the type of organization where I would like to work.” Measurement Measurement is the assignment of a number to a characteristic of an object or event, which can be compared with other objects or events. The scope and application of a measurement is dependent on the context and discipline. In the natural sciences and engineering, measurements do not apply to nominal properties of objects or events, which is consistent with the guidelines of the International vocabulary of metrology published by the International Bureau of Weights and Measures. However, in other fields such as statisticsas well as the social and behavioral sciences, measurements can have multiple levels, which would include nominal, ordinal, interval, and ratio scales. Measurement is a cornerstone of trade, science, technology, and quantitative research in many disciplines. Historically, many measurement systems existed for the varied fields of human existence to facilitate comparisons in these fields. Often these were achieved by local agreements between trading partners or collaborators. Since the 18th century, developments progressed towards unifying, widely accepted standards that resulted in the modern International System of Units (SI). This system reduces all physical measurements to a mathematical combination of seven base units. The science of measurement is pursued in the field of metrology. Characteristic # 1. In educational measurement there is no absolute zero point: In educational measurement there is no absolute zero point. It is relative to some arbitrary standard. For example a student has secured ‘O’ in a test of mathematics. It does not mean that he has ‘O’ knowledge in mathematics. Because he may secured 30 in another test, which is easier than the first one. As the zero point is not fixed so we cannot say that a student with a score of ’60’ has doubled the knowledge of a student with a score of ’30’. Characteristic # 2. The units are not definite in educational measurement: In educational measurement the units are not definite, so we may not obtain the same value for every person. Because the test vary in their content and difficulty level. Therefore one individual may perform differently on different tests and different individuals may perform differently on one test. Characteristic # 3. It conveys a sense of infinity: It means we cannot measure the whole of an attribute of an individual. Generally the scores obtained from a measurement are observed scores which contains measurement errors. So that true score is infinite and unknown.
  • 15.
    Characteristic # 4.It is a process of assigning symbols: Measurement is a process of assigning symbols to observations in some meaningful and consistent manner. In measurement generally we compare with certain standard unit or criteria which have an universal acceptability. Characteristic # 5. It cannot be measured directly: In case of educational measurement we cannot measure for attribute directly. It is observed through behaviour. For example (he reading ability of an individual can only be measured when he is asked to read a written material. Characteristic # 6. It is a means to an end but not an end itself: The objective of educational measurement is not just to measure a particular attribute. Rather it is done to evaluate to what extent different objectives have been achieved. Principles of assessment Reliability If a particular assessment were totally reliable, assessors acting independently using the same criteria and mark scheme would come to exactly the same judgment about a given piece of work. In the interests of quality assurance, standards and fairness, whilst recognising that complete objectivity is impossible to achieve, when it comes to summative assessment it is a goal worth aiming for. To this end, what has been described as the 'connoisseur' approach to assessment (like a wine-taster or tea-blender of many years experience, not able to describe exactly what they are looking for but 'knowing it when they find it') is no longer acceptable. Explicitness in terms of learning outcomes and assessment criteria is vitally important in attempting to achieve reliability. They should be explicit to the students when the task is set, and where there are multiple markers they should be discussed, and preferably used on some sample cases prior to be using used 'for real'. Validity Just as important as reliability is the question of validity. Does the assessed task actually assess what you want it to? Just because an exam question includes the instruction 'analyse and evaluate' does not actually mean that the skills of analysis and evaluation are going to be assessed. They may be, if the student is presented with a case study scenario and data they have never seen before. But if they can answer perfectly adequately by regurgitating the notes they took from the lecture you gave on the subject then little more may be being assessed than the ability to memorise. There is an argument that all too often in British higher education we assess
  • 16.
    the things whichare easy to assess, which tend to be basic factual knowledge and comprehension rather than the higher order objectives of analysis, synthesis and evaluation. Relevance and transferability There is much evidence that human beings do not find it easy to transfer skills from one context to another, and there is in fact a debate as to whether transferability is in itself a separate skill which needs to be taught and learnt. Whatever the outcome of that, the transfer of skills is certainly more likely to be successful when the contexts in which they are developed and used are similar. It is also true to say that academic assessment has traditionally been based on a fairly narrow range of tasks with arguably an emphasis on knowing rather than doing; it has therefore tended to develop a fairly narrow range of skills. For these two reasons, when devising an assessment task it is important that it both addresses the skills you want the student to develop and that as much as possible it puts them into a recognisable context with a sense of 'real purpose' behind why the task would be undertaken and a sense of a 'real audience', beyond the tutor, for whom the task would be done. Criterion v Norm referenced assessment In criterion-referenced assessment particular abilities, skills or behaviours are each specified as a criterion which must be reached. The driving test is the classic example of a criterion-referenced test. The examiner has a list of criteria each of which must be satisfactorily demonstrated in order to pass - completing a three-point turn without hitting either kerb for example. The important thing is that failure in one criterion cannot be compensated for by above average performance in others; neither can you fail despite meeting every criterion simply because everybody else that day surpassed the criteria and was better than you. Norm-referenced assessment makes judgments on how well the individual did in relation to others who took the test. Often used in conjunction with this is the curve of 'normal distribution' which assumes that a few will do exceptionally well and a few will do badly and the majority will peak in the middle as average. Despite the fact that a cohort may not fit this assumption for any number of reasons (it may have been a poor intake, or a very good intake, they have been taught well, or badly, or in introductory courses in particular you may have half who have done it all before and half who are just starting the subject giving a bimodal distribution) there are even some assessment systems which require results to be manipulated to fit. The logic of a model of course design built on learning outcomes is that the assessment should be criterion-referenced at least to the extent that sufficiently meeting each outcome becomes a 'threshold' minimum to passing the course. If grades and marks have to be generated, a more
  • 17.
    complex system thanpass/fail can be devised by defining the criteria for each grade either holistically grade by grade, or grade by grade for each criterion (see below). Writing and using assessment criteria Assessment criteria describe how well a student has to be able to achieve the learning outcome, either in order to pass (in a simple pass/fail system) or in order to be awarded a particular grade; essentially they describe standards. Most importantly they need to be more than a set of headings. Use of theory, for example, is not on its own a criterion. Criteria about theory must describe what aspects of the use of theory are being looked for. You may value any one of the following: the students' ability to make an appropriate choice of theory to address a particular problem, or to give an accurate summary of that theory as it applies to the problem, or to apply it correctly, or imaginatively, or with originality, or to critique the theory, or to compare and contrast it with other theories. And remember, as soon as you have more than one assessment criterion you will also have to make decisions about their relative importance (or weighting). Graded criteria are criteria related to a particular band of marks or honours classification or grade framework such as Pass, Merit, Distinction. If you write these, be very careful about the statement at the 'pass' level. Preferably start writing at this level and work upwards. The danger in starting from, eg first class honours, is that as you move downwards, the criteria become more and more negative. When drafted, ask yourself whether you would be happy for someone meeting the standard expressed for pass, or third class, to receive an award from your institution. Where possible, discuss draft assessment activities, and particularly criteria, with colleagues before issuing them. Once decided, the criteria and weightings should be given to the students at the time the task is set, and preferably some time should be spent discussing and clarifying what they mean. Apart from the argument of fairness, this hopefully then gives the student a clear idea of the standard they should aim for and increases the chances they will produce a better piece of work (and hence have learnt what you wanted them to). And feedback to the student on the work produced should be explicitly in terms of the extent to which each criterion has been met. Instructional Assessment Process Instructional Assessment Process involves collection and analysis of data from six sources that, when combined, present a comprehensive view of the current state of the school as it compares to the underlying beliefs and principals that make up the Pedagogy of Confidence and lead to school transformation. The six components are: • School Background Pre-Interview Questionnaire
  • 18.
    • Principal Interview •Achievement Data • Teacher Survey • Student Survey • Classroom Visits School Background Pre-Interview Questionnaire It gathers background information using a pre-interview questionnaire submitted by the principal, the school’s School Improvement Plan (or similar document), and interviewing the principal in person. The pre-interview questionnaire collects basic demographic data about the school, the students and the faculty, as well as a brief history of current initiatives, school organization and scheduling practices, special services, community partnerships and the like. Principal Interview It meets with the principal to review the questionnaire, to obtain more information about the school and to learn the principal’s perspectives on the instructional program, students and staff. Care is taken to ensure that the principal speaks first about the strengths of the school, unique situations that exist within the school, recent changes that may be affecting the school, his or her goals for the school and what he or she believes is needed to achieve those goals. Achievement Data It gathers and analyzes existing achievement data to uncover patterns over time and to correlate with what constituents say about the school, how achievement data compares to state and district achievement, and any other relevant comparisons. Teacher Survey It representative conducts the teacher survey during a schoolwide faculty meeting to ensure consistency of administration and to explain to the faculty other data collection activities that may be taking place at the school. The survey probes teachers’ perspectives on the school’s climate and instructional program and seeks suggestions about how they, as a faculty, could best serve their students, especially underachievers. Surveys are anonymous and make use of multiple choice and open-ended questions that allow teachers leeway to express their inside perspective on the instructional life of the school; their assessments of and attitudes toward students, families and administration; recent and needed professional development initiatives; and their preferred pedagogical approaches. Student Survey The student survey contains 20 items and is administered to all students following a prescribed method of administration. Its purpose is to assess the school’s instructional program from the students’ perspectives. The items invite response in five areas: • Perspectives on myself as a learner
  • 19.
    • School climate •My teachers • Classroom activities • My preferred learning activities Students are asked to strongly agree, agree, disagree, or strongly disagree with some statements and to select their choices among others. It provides a summary of student survey responses for ease of analysis. Classroom Visits A team of specially trained It representatives conducts classroom visitations that follow a schedule intended to cover a broad spectrum of classes. Visitors note the activities in which students are engaged, study the interactions between teacher and students, and attend to other visible characteristics of the instructional program, including the physical environment of the rooms. Approximately half the classes in a school are visited to help form a composite picture of the current state of instruction. Teachers voluntarily participate in the visits and all data is recorded without identifying individual teachers. Visitors concentrate on elements of effective instruction that NUA knows to have positive effects on all students’ learning and that NUA finds particularly important in raising the performance of underachieving students. A sample of these elements includes: • The learning engages students. Students comprehend and retain what they are taught most effectively when they are engaged in classroom activities. Engagement is marked by willing participation, expressions of interest and displays of enthusiasm, and results when students find classroom activities and assignments highly meaningful and interesting. Instruction that engages students has a positive effect on their achievement and increases the likelihood they will develop into lifelong learners. • Learning activities guide students to relate lesson content to their lives. Students benefit from deliberately connecting what they are learning to what they know from their experience as individuals and as members of the cultural groups with which they most closely identify. Making such connections between the curriculum and what is personally relevant and meaningful has a positive influence on students’ motivation to learn, on their confidence as learners, and on their comprehension and retention of the material. Although the teacher can suggest such connections, students benefit most by generating and expressing their own connections. • The learning includes students interacting with each other as learners.Working collaboratively in pairs or small groups enables students to pool their knowledge as they develop their understanding of curriculum material. Interacting productively with peers also helps students stay attentive in class. In addition, collaborative work can increase students’ motivation to learn because of the support they get from their peers and the enjoyment that results from peer interaction. Pair or small-group interactions may be used for solving problems, discussing possible answers to a teacher’s question, generating new questions on a topic being discussed before sharing ideas with the whole class, representing information that has been learned in a creative way, and other such purposes.
  • 20.
    • The learningpromotes high-level thinking about lesson content. High-level thinking about curriculum content helps students generate deeper and broader understandings while developing their thinking capacities. Students’ learning is enhanced when they have frequent opportunities to respond at length to thought-provoking questions, to engage in high-level conversations with peers, and to ask their own questions about what they are learning to clarify, refine and extend meanings. High-level thinking includes such mental processes as hypothesizing, inferring, generalizing, analyzing, synthesizing and evaluating. Opportunities to engage in such thinking are ideally part of daily instruction as well as integral to long- term, complex projects. Types of assessment procedure • 1. Diagnostic Assessment (as Pre-Assessment) • One way to think about it: Assesses a student’s strengths, weaknesses, knowledge, and skills prior to instruction. • Another way to think about it: A baseline to work from • 2. Formative Assessment • One way to think about it: Assesses a student’s performance during instruction, and usually occurs regularly throughout the instruction process. • Another way to think about it: Like a doctor’s “check-up” to provide data to revise instruction • 3. Summative Assessment • One way to think about it: Measures a student’s achievement at the end of instruction. • Another way to think about it: It’s macabre, but if formative assessment is the check-up, you might think of summative assessment as the autopsy. What happened? Now that it’s all over, what went right and what went wrong? • 4. Norm-Referenced Assessment • One way to think about it: Compares a student’s performance against other students (a national group or other “norm”)
  • 21.
    • Another wayto think about it: Group or “Demographic” assessment • 5. Criterion-Referenced Assessment • One way to think about it: Measures a student’s performance against a goal, specific objective, or standard. • Another way to think about it: a bar to measure all students against • 6. Interim/Benchmark Assessment • One way to think about it: Evaluates student performance at periodic intervals, frequently at the end of a grading period. Can predict student performance on end-of-the-year summative assessments. • Another way to think about it: Bar graph growth through a year • Explanation • Formative Assessment are informal and formal tests given by teachers during the learning process. These specific assessment modifies the activities done in the classroom, so that there is more student achievement. It identifies strengths and weakness and target areas that needs work. • Summative Assessment evaluates students learning at the end of an instructional unit such as a chapter or specified topic. Final papers, midterms and final exams allow the teachers to determine if you comprehended the information given correctly. • Norm reference assessment compare student’s performance against a national or other “norm” group. • Performance base assessment requires students to solve real world problems or produce something with real world application. These specific assessment allows the educator to distinguish how well the students think critical and analytical as well as .Restricted response is known to be more narrowly defined than extended response task. Examples would be , multiple choice question and answers as opposed to extended response which normally is connected to writing a report. • Authentic assessment is the measurement of accomplishments that are worth while compared to multiple choice standardized tests. • Selective response assessment is also referred to as objective assessments including multiple choice, matching, and true and false questions. It is very effective and efficient
  • 22.
    methods for measuringstudents knowledge. It is a very common form of assessing the students in th classroom. • Supply response students must supply an answer to a question prompt. • Criterion referenced test are designed to measure student performance against a fixed set of predetermined criteria or learning standards. Instructional decision • Instructional Decisions are made to identify student’s instructional needs. This is a general education initiative, and focuses on instruction by using data about student’s responses to past instruction to guide future educational decisions. Decisions are proactive approaches of providing early assistance to students with instructional needs and matching the amount of resources to the nature of the student’s needs.฀1. • Screening all students to ensure early identification of students needing extra assistance; 2. Seamless integration of general and special education services; and 3. A focus on research based practices that match students needs. • Teachers are constantly collecting informal and formal information about what and how their student’s are learning. They check student test and assignments, listen to small- group activities, and observe students engaged in structured and unstructured activities. They use this information for a variety of purposes, ranging from communicating with parents to meeting standards and benchmarks. However, when teachers systematically collect the right kinds of information and use it effectively, they can help their student’s grow as thinkers and learners • The need for a complete review of the material; 2. Class discussion may reveal misunderstanding that must be corrected on the spot; and 3. Interest in a topic may suggest that more time should be spent on it than originally planned. Selection assessment • A selection assessment is the most frequently used type of assessment and part of a selection procedure. The selection assessment often takes place towards the end of the procedure, to test the candidates' suitability for the position in question. • Thegoalofaselectionassessment • A selection assessment is an attempt to get a better understanding of how the candidate would perform in the position applied for. The assessment is used based on the idea that suitability does not really show when using questionnaires, letters and interviews. This is because candidates often will say what they think the employer wants to hear, so only practical simulations can clearly demonstrate how a person responds in certain situations.
  • 23.
    • Components • Thecomponents of a selection assessment depend on the position being applied for. For an executive position, the focus will be on testing the candidates' leadership qualities, for other positions the emphasis can be, for example, on communication skills. • Frequently used components of an assessment include the mailbox exercise, fact finding and role-playing. Intelligence tests and interviews are often part of a selection assessment as well. To prepare for an assessment, you can practice different tests. For example, you can try the free IQ test. • Assessmentreport • Following the assessment, a report will be drafted describing the conclusions on each candidate. As a candidate, you will always be the first to see this assessment report and you have the right not to agree to the report being sent to the employer. However, if you do not agree to this, your chances of getting the job will be practically nil. • Assessmentcompanies • Selection assessments are often performed by independent companies that conduct assessments on behalf of different companies. In that case, the assessment will take place in the offices of the assessment company. Some companies, especially larger ones, organise their own assessments and in that case the assessment will take place in the company itself. • In the case of internal reorganisations, career assessments are often used. Read more about the career assessment. Placement and classification decisions Selection is a personnel decision whereby an organization decides whether to hire individuals using each person’s score on a single assessment, such as a test or interview, or a single predicted performance score based on a composite of multiple assessments. Using this single score to assign each individual to one of multiple jobs or assignments is referred to as placement. An example of placement is when colleges assign new students to a particular level of math class based on a math test score. Classification refers to the situation in which each of a
  • 24.
    number of individualsis assigned to one of multiple jobs based on their scores on multiple assessments. Classification refers to a complex set of personnel decisions and requires more explanation. A Conceptual Example The idea of classification can be illustrated by an example. An organization has 50 openings in four entry-level jobs: Word processor has 10 openings, administrative assistant has 12 openings, accounting clerk has 8 openings, and receptionist has 20 openings. Sixty people apply for a job at this organization and each completes three employment tests: word processing, basic accounting, and interpersonal skills. Generally, the goal of classification is to use each applicant’s predicted performance score for each job to fill all the openings and maximize the overall predicted performance across all four jobs. Linear computer programming approaches have been developed that make such assignments within the constraints of a given classification situation such as the number of jobs, openings or quotas for each job, and applicants. Note that in the example, 50 applicants would get assigned to one of the four jobs and 10 applicants would get assigned to not hired. Using past scores on the three tests and measures of performance, formulas can be developed to estimate predicted performance for each applicant in each job. The tests differ in how well they predict performance in each job. For example, the basic accounting test is fairly predictive of performance in the accounting clerk job, but is less predictive of performance in the receptionist job. Additionally, the word processing test is very predictive of performance in the word processor job but is less predictive of performance in the receptionist job. This means that the equations for calculating predicted performance for each job give different weights to each test. For example, the equation for accounting clerk gives its largest weight to basic accounting test scores, whereas the receptionist equation gives its largest weight to interpersonal skill test scores and little weight to accounting test scores. Additionally, scores vary across applicants within each test and across tests within each individual. This means that each individual will have a different predicted performance score for each job. One way to assign applicants to these jobs would be to calculate a single predicted performance score for each applicant, select all applicants who have scores above some cutoff, and randomly assign applicants to jobs within the constraints of the quotas. However, random assignment would not take advantage of the possibility that each selected applicant will not perform equally well on all available jobs. Classification takes advantage of this possibility. Classification efficiency can be viewed as the difference in overall predicted performance between this univariate (one score per applicant) strategy and the multivariate (one score per applicant per job) classification approach that uses a different equation to predict performance for each job. A number of parameters influence the degree of classification efficiency. An important one is the extent to which predicted scores for each job are related to each other. The smaller the relationships among predicted scores across jobs, the greater the potential classification efficiency. That is, classification efficiency increases to the extent that multiple assessments capture differences in the individual characteristics that determine performance in each job.
  • 25.
    Policy decisions Policy decisionsare defined in management theory as those decisions that define the basic principles of the organization and determine how it will develop and function in the future. Policies set the limits within which operational decisions are made. Examples include: • Vision, Mission, Aim • Budget and Finance Practices • Allocation of Resources • Organizational Structure Policy decisions limit the actions an organization and its members can take without changing the policy. In sociocracy, policy decisions are made by consent. Operational decisions are made within the limits set by policy decisions and may be made autocratically by the person in charge or by other means determined by the people whom the decisions affect. Examples of Policy Statements We set policies in our everyday lives without realizing it or writing them down. Examples include: • Deciding not to drink coffee or consume animal products • Pledging to complete tax forms before their due date • Sending your children to public schools by choice • Deciding not to have children to devote time to political causes In non-profit organizations the policies might include: • Following the IRS regulations that set requirements for 501c3 status to receive tax-deductible contributions • Limiting membership to professionals with a demonstrated expertise • Serving meals to the homeless • Using contributions only for administrative costs and not staff salaries In business they might include: • Annual and departmental budgets • Employee compensation schedules • Union agreements • Future donations of money and employee time to charitable causes • Production of certain products and not others • Limiting sales and marketing to retail or wholesale customers These are all decisions that define the scope of day-to-day decisions about how we will conduct our personal or work lives, our operations.
  • 26.
    Counseling and guidancedecisions Decision making has always been a fundamental human activity. At some stage within the career guidance planning process, decisions are made. The decision in some cases might be to make far reaching changes, or perhaps the decision might be not to change anything. In some cases, little change might ensue, but a decision has still been made, even if the result, having considered the consequences, is not to change. As a guide it is important to take into account that individual participants vary a great deal in terms of how they make decisions, what factors are important to them, how ready they are to make them and how far participants are prepared to live with uncertain outcomes. The traditional way within guidance to handle decision making is to see it as a rational, almost linear process. This is illustrated by the Janis and Mann model exemplified in the practical exercise example mentioned below involving balance sheets. The aim is to encourage a rational approach to planning for the future. Typically this involves an evaluation of available options with a look at the pros and cons of each, taking account of the participant’s personal circumstances. In practice of course the process of making a decision is influenced by all sorts of things. In everyday terms the decision making may in fact be driven by the irrational, the “quick fix” solution and in some cases, prejudicial ideas, perhaps based upon ingrained or outdated ideas. Gerard Egan describes this as the “shadow side” of decision making. De Bono’s thinking hats exercise (see below) attempts to factor in some of the emotional and other factors linked to decision making. As individuals we can vary in the style of decision making we use. For some decisions we might take a “logical” approach based upon the linear thinking mentioned above. For some decisions we might make a “no thought” decision, either because the matter is so routine it doesn’t require any thought, or in some occasions just to make a quick fix so we don’t have to think about it any more. Sometimes participants in guidance interviews may talk about their realisation that they should have looked into a decision further before rushing into one course of action. Some individuals employ a hesitant style of decision making, where decisions are delayed as long as possible, whereas others may make a choice based upon an emotional response, what feels right subjectively. Finally some participants might make decisions that can be classified as compliant; that is based upon the perceived expectations of what other people want. A key role in guidance is to identify how a participant has made previous professional development decisions- and whether the approach seems to have worked for them. Might there be other ways of deciding that lead to better decisions? Using Decision making exercises in a Guidance Setting There is a broad range of tools to aid the decision making process within a professional development discussion. Here are two introductory examples. Further examples are available via the references and web sites below.
  • 27.
    Balance sheet In itssimplest form this consists of two columns representing two choices. The advantages and disadvantages of each choice can simply be listed. Sometimes the very act of writing down pros and cons can bring clarity. Sometimes subdividing the headings into Advantages for me, Advantages for others, disadvantages for me, disadvantages for others can yield a richer analysis. Janis and Mann suggest this process. A slightly more sophisticated use of balance sheets might involve the participant completing the sheet as above initially, then the adviser producing a list of other suggested factors that the individual may not have considered at first. These can either be included, or ignored by the participant. An example of a simple balance sheet Six thinking Hats This tool was created by Edward de Bono in his book "6 Thinking Hats". How to Use the Tool: To use Six Thinking Hats to improve the quality of the participant’s decision-making; look at the decision "wearing" each of the thinking hats in turn. Each "Thinking Hat" is a different style of thinking. These are explained below: White Hat: With this thinking hat, the participant is encouraged to focus on the data available. Look at the information they have about themselves and see what they can learn from it. Look for gaps in your knowledge, and either try to fill them or take account of them. This is where the participant is encouraged to analyze past experience, work roles etc. and try to learn from this Red Hat: Wearing the red hat, the participant looks at the decision using intuition, gut reaction, and emotion. The idea is also to encourage the participant to try to think how other people will react emotionally to the decision being made, and try to understand the intuitive responses of people who may not fully know your reasoning. Black Hat: When using black hat thinking, look at things pessimistically, cautiously and defensively. Try to see why ideas and approaches might not work. This is important because it highlights the weak points in a plan or course of action. It allows the participant to eliminate them, alter your approach, or prepare contingency plans to counter problems that might arise. Black Hat thinking can be one of the real benefits of using this technique within professional development planning,
  • 28.
    as sometimes participantscan get so used to thinking positively that often they cannot see problems in advance, leaving them under-prepared for difficulties. Yellow Hat: The yellow hat helps you to think positively. It is the optimistic viewpoint that helps you to see all the benefits of the decision and the value in it, and spot the opportunities that arise from it. Yellow Hat thinking helps you to keep going when everything looks gloomy and difficult. Green Hat: The Green Hat stands for creativity. This is where you can develop creative solutions to a problem. It is a freewheeling way of thinking, in which there is little criticism of ideas. Blue Hat: The Blue Hat stands for process control. This is the hat worn by people chairing meetings. When running into difficulties because ideas are running dry, they may direct activity into Green Hat thinking. When contingency plans are needed, they will ask for Black Hat thinking, and so on. You can use Six Thinking Hats in guidane discussions. It is a way of encouraging participants to look at decision making from different perspectives. This can be done either metaphorically -as in “imagine you are wearing the white hat...” - or by having cards each with the name of the hat and a brief description of the “way of looking at things” that the hat brings with it. The cards can be shuffled and dealt to the participant in turn. By doing this the guide is encouraging the participant to consider a decision from a range of perspectives Assembling, Administering and Appraising Classroom Test and Assessment Assembling the Test 1. Record items on index cards 2. Double-check all individual test items 3. Double-check the items as a set 4. Arrange items appropriately 5. Prepare directions 6. Reproduce the test Administering the Test The guiding principle • Provide conditions that give all students a fair chance to show what they know Physical conditions
  • 29.
    • Light, ventilation,quiet, etc. Psychological conditions • Avoid inducing test anxiety • Try to reduce test anxiety • Don’t give test when other events will distract Suggestions • Don’t talk unnecessarily before the test • Minimize interruptions • Don’t give hints to individuals who ask about items • Discourage cheating • Give students plenty of time to take the test. Appraising the Test • Step where the institution management find out how effective it has been at conduction and evaluation of student. The process • Define organizational goal • Defining objective and continuous monitoring the performance and progress • Performance evaluation / reviews • Providing feedback • Performance appraisal ( reward / punishment) Purpose of Classroom tests and assessment Classroom assessment is a one of the most important tools teachers can use to understand the needs of their students. When executed properly and on an ongoing basis, classroom assessment should shape student learning and give teachers valuable insights. Identify Student Strengths and Weaknesses Assessments help teachers identify student strengths as well as areas where students may be struggling. This is extremely important during the beginning of the year when students are entering new grades. Classroom assessments, such as diagnostic tests, help teachers gauge the students' level of mastery of concepts from the prior grade. Monitor Student Progress Throughout the course of a lesson or unit, teachers use classroom assessment to monitor students' understanding of the concepts being taught. This informs teachers in their lesson planning, helping them pinpoint areas that need further review. Assessment can be done in the form of weekly tests, daily homework assignments and special projects.
  • 30.
    Assess Student PriorKnowledge Before beginning a new unit, assessment can inform teachers of their students' prior experience and understanding of a particular concept or subject matter. These types of assessments can be done orally through classroom discussion or through written assignments such as journals, surveys or graphic organizers. Purposes of assessment Teaching and learning The primary purpose of assessment is to improve students’ learning and teachers’ teaching as both respond to the information it provides. Assessment for learning is an ongoing process that arises out of the interaction between teaching and learning. What makes assessment for learning effective is how well the information is used. System improvement Assessment can do more than simply diagnose and identify students’ learning needs; it can be used to assist improvements across the education system in a cycle of continuous improvement: • Students and teachers can use the information gained from assessment to determine their next teaching and learning steps. • Parents, families and whānau can be kept informed of next plans for teaching and learning and the progress being made, so they can play an active role in their children’s learning. • School leaders can use the information for school-wide planning, to support their teachers and determine professional development needs. • Communities and Boards of Trustees can use assessment information to assist their governance role and their decisions about staffing and resourcing. • The Education Review Office can use assessment information to inform their advice for school improvement. • The Ministry of Education can use assessment information to undertake policy review and development at a national level, so that government funding and policy intervention is targeted appropriately to support improved student outcomes.
  • 31.
    Developing specifications fortests and assessment Definitions I’ve seen the terms “Test Plan” and “Test Specification” mean slightly different things over the years. In a formal sense (at this given point in time for me), we can define the terms as follows: • Test Specification – a detailed summary of what scenarios will be tested, how they will be tested, how often they will be tested, and so on and so forth, for a given feature. Examples of a given feature include, “Intellisense, Code Snippets, Tool Window Docking, IDE Navigator.” Trying to include all Editor Features or all Window Management Features into one Test Specification would make it too large to effectively read. • Test Plan – a collection of all test specifications for a given area. The Test Plan contains a high-level overview of what is tested (and what is tested by others) for the given feature area. For example, I might want to see how Tool Window Docking is being tested. I can glance at the Window Management Test Plan for an overview of how Tool Window Docking is tested, and if I want more info, I can view that particular test specification. If you ask a tester on another team what’s the difference between the two, you might receive different answers. In addition, I use the terms interchangeably all the time at work, so if you see me using the term “Test Plan”, think “Test Specification.” Parts of a Test Specification A Test Specification should consist of the following parts: • History / Revision – Who created the test spec? Who were the developers and Program Managers (Usability Engineers, Documentation Writers, etc) at the time when the test spec was created? When was it created? When was the last time it was updated? What were the major changes at the time of the last update? • Feature Description – a brief description of what area is being tested. • What is tested? – a quick overview of what scenarios are tested, so people looking through this specification know that they are at the correct place. • What is not tested? – are there any areas being covered by different people or different test specs? If so, include a pointer to these test specs. • Nightly Test Cases – a list of the test cases and high-level description of what is tested each night (or whenever a new build becomes available). This bullet merits its own blog entry. I’ll link to it here once it is written. • Breakout of Major Test Areas – This section is the most interesting part of the test spec where testers arrange test cases according to what they are testing. Note: in no way do I
  • 32.
    claim this tobe a complete list of all possible Major Test Areas. These areas are examples to get you going. o Specific Functionality Tests – Tests to verify the feature is working according to the design specification. This area also includes verifying error conditions. o Security tests – any tests that are related to security. An excellent source for populating this area comes from the Writing Secure Codebook. o Accessibility Tests – This section shouldn’t be a surprised to any of my blog readers. <grins> See The Fundamentals of Accessibility for more info. o Stress Tests – This section talks about what tests you would apply to stress the feature. o Performance Tests – this section includes verifying any perf requirements for your feature. o Edge cases – This is something I do specifically for my feature areas. I like walking through books like How to break software, looking for ideas to better test my features. I jot those ideas down under this section o Localization / Globalization – tests to ensure you’re meeting your product’s International requirements. Setting Test Case Priority A Test Specification may have a couple of hundred test cases, depending on how the test cases were defined, how large the feature area is, and so forth. It is important to be able to query for the most important test cases (nightly), the next most important test cases (weekly), the next most important test cases (full test pass), and so forth. A sample prioritization for test cases may look like: • Highest priority (Nightly) – Must run whenever a new build is available • Second highest priority (Weekly) – Other major functionality tests run once every three or four builds • Lower priority – Run once every major coding milestone (OR) Major Points 1. Your goal is valid, reliable, useful assessment 2. Which requires: a. Determining what is to be measured b. Defining it precisely c. Minimizing measurement of irrelevancies
  • 33.
    3. And ispromoted by following good procedures Four Steps in Planning an Assessment 1. Deciding its purpose 2. Developing test specifications 3. Selecting best item types 4. Preparing items Step 1: Decide the Purpose What location in instruction? 1. pre-testing o readiness i. limited in scope ii. low difficulty level iii. serve as basis of remedial work, adapting instruction o pretest (placement) i. items similar to outcome measure ii. but not the same (like an alternative form) 2. during instruction o formative i. monitor learning progress ii. detect learning errors iii. feedback for teacher and students iv. limited sample of learning outcomes v. must assure that mix and difficulty of items sufficient vi. try to use to make correction prescriptions (e.g., review for whole group, practice exercises for a few) o diagnostic i. enough items needed in each specific area ii. items in one area should have slight variations 3. end of instruction o mostly summative –broad coverage of objectives o can be formative too Step 2: Develop Test Specifications • Why? Need good sample! • How? Table of specifications (2-way chart, "blueprint") 1. Prepare list of learning objectives 2. outline instructional content 3. prepare 2-way chart 4. or, use alternative to 2-way chart when more appropriate 5. doublecheck sampling
  • 34.
    6. Sample ofa Content Domain (For this course) 1. trends/controversies in assessment 2. interdependence of teaching, learning, and assessment 3. purposes and forms of classroom assessment 4. planning a classroom assessment (item types, table of specs) 5. item types (advantages and limitations) 6. strategies for writing good items 7. compiling and administering classroom assessments 8. evaluating and improving classroom assessments 9. grading and reporting systems 10. uses of standardized tests 11. interpreting standardized test scores Sample Table of Specifications (For chapters 6 and 7 of this course) Sample SLOs (you would typically have more) Bloom Levels Remember Understand Apply Analyze Evaluate Create Identifies definition of key terms (e.g., validity) X Identifies examples of threats to test reliability and validity X Selects best item type for given objectives X Compares the pros and cons of different kinds of tests for given purposes X Evaluates particular educational reforms (e.g., whether they will hurt or help instruction) X Create a unit test X Total number of items
  • 35.
    Spot the PoorSpecific Learning Outcomes (For use with previous table of specifications) Which entries are better or worse than others? Why? Improve the poor ones. 1. Knowledge a. Knows correct definitions b. Able to list major limitations of different types of items 2. Comprehension a. Selects correct item type for learning outcome b. Understands limitations of true-false items c. Distinguishes poor true-false items from good ones 3. Application a. Applies construction guidelines to a new content area b. Creates a table of specifications 4. Analysis a. Identifies flaws in poor items b. Lists general and specific learning outcomes 5. Synthesis a. Lists general and specific content areas b. Provides weights for areas in table of specifications 6. Evaluation a. Judges quality of procedure/product b. Justifies product c. Improves a product Why are These Better Specific Learning Outcomes? 1. Knowledge a. Selects correct definitions b. Lists major limitations of different item types 2. Comprehension a. Selects proper procedures for assessment purpose b. Distinguishes poor procedures from good ones c. Distinguishes poor decisions/products from good ones 3. Application a. Applies construction guidelines to a new content area 4. Analysis a. Identifies flaws in procedure/product b. Lists major and specific content areas c. Lists general and specific learning outcomes 5. Synthesis a. Creates a component of the test b. Provides weights for cells in table of specifications 6. Evaluation a. Judges quality of procedure/product b. Justifies product
  • 36.
    c. Improves aproduct Step 3: Select the Best Types of Items/Tasks What types to choose from? Many! 1. objective--supply-type a. short answer b. completion 2. objective--selection-type a. true-false b. matching c. multiple choice 3. essays a. extended response b. restricted response 4. performance-based a. extended response b. restricted response Which type to use? The one that fits best! 1. most directly measures learning outcome 2. where not clear, use selection-type (more objective) a. multiple choice best (less guessing, fewer clues) b. matching only if items homogeneous c. true-false only if only two possibilities Strengths and Limitations of Objective vs. Essay/Performance Objective Items • Strengths o Can have many items o Highly structured o Scoring quick, easy, accurate • Limitations o Cannot assess higher level skills (problem formulation, organization, creativity) Essay/Performance Tasks • Strengths o Can assess higher level skills o More realistic • Limitations
  • 37.
    o Inefficient formeasuring knowledge o Few items (poorer sampling) o Time consuming o Scoring difficult, unreliable Step 4: Prepare Items/Tasks Strategies to Measure the Domain Well—Reliably and Validly 1. specifying more precise learning outcomes leads to better-fitting items 2. use 2-way table to assure good sampling of complex skills 3. use enough items for reliable measurement of each objective o number depends on purpose, task type, age o if performance-based tasks, use fewer but test more often 4. keep in mind how good assessment can improve (not just measure) learning o signals learning priorities to students o clarifies teaching goals for teacher o if perceived as fair and useful Strategies to Avoid Contamination 1. eliminate barriers that lead good students to get the item wrong 2. don’t provide clues that help poor students get the item correct General Suggestions for Item Writing 1. use table of specifications as guide 2. write more items than needed 3. write well in advance of testing date 4. task to be performed is clear, unambiguous, unbiased, and calls forth the intended outcome 5. use appropriate reading level (don’t be testing for ancillary skills) 6. write so that items provide no clues (minimize value of "test-taking skills") a. a/an b. avoid specific determiners (always, never, etc.) c. don’t use more detailed, longer, or textbook language for correct answers d. don’t have answers in an identifiable pattern 7. write so that item provides no clues to other items 8. seeming clues should lead away from the correct answer 9. experts would agree on the answer 10. if item revised, recheck its relevance
  • 38.
    Selecting and constructingappropriate types of items and assessment tasks • Different types of tests- Limited choice questions – MC, T/F, matching type• Open- ended questions – Short answer, essay• Performance testing – OSCE, OSPE• Action oriented testing • Process of test administration Statement Content Table Item of goals outline specification selection Composition Development of of Item construction answer sheet instructions Construction of Test Test answer key administration revision • Characteristics of good test Consistent Reliability, Utility, Validity , How well a test Uniform Free from Cost & measure in extra time what it measure us source effective supposed of errors to measure • A test construction should intend to answer;? What kind of test is to be made? What is the precise purpose? What are the abilities are to be tested? How detailed and how accurate the results must be? What constraints are set by unavailability of expertise, facilities, time of construction, administration & scoring? Who will take the test? What is the scope of the test • Principles of test construction1. Measure all instructional objectives – Objectives that are communicated and imparted to the students – Designed as an operational control to guide the learning sequences and experiences – Harmonious to the teachers instructional objectives2. Cover all learning tasks – Measures the representative part of learning task3. Appropriate testing strategies or items – Items which appraise the specific learning outcome – Measurements or tests based on the domains of learning • Make test valid & reliable – Reliable when it produce dependant, consistent, and accurate scores – Valid when it measures what it purports to measure – Test which are written clearly and unambiguous are reliable – Tests with fairly more items are reliable than tests with less items – Tests which are well planned, covers wide objectives, & are well executed are more valid • Use test to improve learning – Tests are not only an assessment but also it is a learning experience – Going over the test items may help teachers to reattach missed items – Discussion and clarification over the right choice gives further learning – Further guidance & modification in teaching measures enabled through the revision of test6. Norm referenced & criterion referenced tests – Norm referenced: higher & abstract level of cognitive domain – Criterion referenced: lower & concrete levels of learning • Planning for a test1. Outline the learning objectives or major concepts to be covered by the test – Test should be representative of objectives and materials covered – Major students complaint: test don’t fairly cover the material that was supposed to be canvassed on the test2. Create a test blue print3. Create questions based on blueprint • For each, check on the blueprint (3-4 alternate questions on the same idea/ objective should be made)5. Organize questions on item type6. Eliminate similar questions7. Re- read, and check them from the student stand- point8. Organize questions logically9. Check the time in completion by teacher-self and then multiplying it with 4 depending on the level of students10. Analyze the results/ item analysis • Process of Test Construction
  • 39.
    • Preliminary considerationsSpecify test purposes, & describe the domain of content &/or behavior of interestb) Specify the group of examinees (age, gender, socio-economic background etc) c) Determine the time & financial resources available for constructing & validating the test d) Identify & select qualified staff memberse) Specify the initial estimate length of the test (time in developing, validating & completion by the students • Review of content domain/behaviors a) Review the descriptions of the content standard or objectives to determine the acceptability for inclusion in the test b) Select the final group of objectives (i.e. finalize the content standard) c) Prepare the item specification for each objective & review the completeness, clarity, accuracy & practicability • Item/task writing & preparation of scoring rubrics a) Draft a sufficient number of items and or tasks for field testing b) Carry out items/task editing, and review scoring rubric • Assessment of content validity a) Identify a pool of judges & measurement specialties b) Review the test items & tasks to determine their match to the objectives, their representativeness, & freedom from stereotyping, & potential biases c) Review the test items and/or tasks to determine their technical adequacy • Revision of test tasks/items a) Based upon data from step 4b & 4c; revise the test items/tasks or delete them b) Write additional test items/tasks & repeat the step 4 • Field test administration a) Organize the test items/ tasks into forms for field testingb) Administer test forms to appropriately chosen groups of examineesc) Conduct item analysis & item bias studies {“studies to identify differentially functioning test items”}d) If statistical thinking or equating of forms is needed • Revision to test item/ task• Revise/ delete them, using the result from step 6c.• Check the scoring rubrics for the performance task being field tested • Test assembly• Determine the test length, the number of forms needed, & the no. of items/tasks per objective• Select the item from the available pool of valid test material• Prepare test directions, practice questions, test booklet layout, scoring keys, answer sheets & so on.• Specify modifications to instructions, medium of presentation, or examinees response, and time requirement for finishing the items • Selection of performance standard a) Performance standards are needed to accomplish the test purpose b) Determine the perform standard c) Initiate & document the performance standard d) Identify the alternative test score interpretation for examinees requiring alternative administration or other modalities • Pilot test (if possible) a) Design the test administration to collect score reliability & validity informationb) Administer test form(s) to appropriately chosen groups of examinees c) Identify & evaluate alternative administration/other modification, to meet individual specific needs that may affect validity and reliability of the test or forms of the test d) Evaluate the test administration procedures, test items, and score reliability and validity e) Make final revisions to the test forms of the test based on the available data. • Preparation of manuals a) Prepare test administrators manual 12. Additional technical data collection a) Conduct reliability & validity investigations on a continuing basis • Item analysis• Shortening or lengthening an existing test items is done through item analysis• Validity & reliability of any test depends on the characteristics of its item• Two types 1. Qualitative analysis 2. Quantitative analysis • Qualitative item analysis• Content validity – Content & form of items – Expert opinion• Effective item formulation Quantitative item analysis• Item difficulty• Item discrimination
  • 40.
    Characteristics of StandardisedTests and teacher made test Standardised Tests Some characteristics of these tests are: 1. They consist of items of high quality. The items are pretested and selected on the basis of difficulty value, discrimination power, and relationship to clearly defined objectives in behavioural terms. 2. As the directions for administering, exact time limit, and scoring are precisely stated, any person can administer and score the test. 3. Norms, based on representative groups of individuals, are provided as an aid for interpreting the test scores. These norms are frequently based on age, grade, sex, etc. 4. Information needed for judging the value of the test is provided. Before the test becomes available, the reliability and validity are established. 5. A manual is supplied that explains the purposes and uses of the test, describes briefly how it was constructed, provides specific directions for administering, scoring, and interpreting results, contains tables of norms and summarizes available research data on the test. No two standardized tests are exactly alike. Each test measures certain specific aspects of behaviour and serves a slightly different purpose. There are some tests with similar titles measuring aspects of behaviour that differ markedly, whereas other tests with dissimilar titles, the measure the aspects of behaviour that are almost identical. Thus, one has to be careful in selecting a standardised test. 5. Provides information’s for curriculum planning and to provide remedial coaching for educationally backward children. 6. It also helps the teacher to assess the effectiveness of his teaching and school instructional programmes. 7. Provides data for tracing an individual’s growth pattern over a period of years. 8. It helps for organising better guidance programmes.
  • 41.
    9. Evaluates theinfluences of courses of study, teacher’s activities, teaching methods and other factors considered to be significant for educational practices. Features of Teacher-Made Tests: 1. The items of the tests are arranged in order of difficulty 2. These are prepared by the teachers which can be used for prognosis and diagnosis purposes. 3. The test covers the whole content area and includes a large number of items. 4. The preparation of the items conforms to the blueprint. 5. Test construction is not a single man’s business, rather it is a co-operative endeavour. 6. A teacher-made test does not cover all the steps of a standardised test. 7. Teacher-made tests may also be employed as a tool for formative evaluation. 8. Preparation and administration of these tests are economical. 9. The test is developed by the teacher to ascertain the student’s achievement and proficiency in a given subject. 10. Teacher-made tests are least used for research purposes. 11. They do not have norms whereas providing norms is quite essential for standardised tests. Steps/Principles of Construction of Teacher-made Test: A teacher-made test does not require a well-planned preparation. Even then, to make it more efficient and effective tool of evaluation, careful considerations arc needed to be given while constructing such tests. The following steps may be followed for the preparation of teacher-made test: 1. Planning: Planning of a teacher-made test includes: a. Determining the purpose and objectives of the test, ‘as what to measure and why to measure’. b. Deciding the length of the test and portion of the syllabus to be covered.
  • 42.
    c. Specifying theobjectives in behavioural terms. If needed, a table can even be prepared for specifications and weightage given to the objectives to be measured. d. Deciding the number and forms of items (questions) according to blueprint. e. Having a clear knowledge and understanding of the principles of constructing essay type, short answer type and objective type questions. f. Deciding date of testing much in advance in order to give time to teachers for test preparation and administration. g. Seeking the co-operation and suggestion of co-teachers, experienced teachers of other schools and test experts. 2. Preparation of the Test: Planning is the philosophical aspect and preparation is the practical aspect of test construction. All the practical aspects to be taken into consideration while one constructs the tests. It is an art, a technique. One is to have it or to acquire it. It requires much thinking, rethinking and reading before constructing test items. Different types of objective test items viz., multiple choice, short-answer type and matching type can be constructed. After construction, test items should be given lo others for review and for seeking their opinions on it. The suggestions may be sought even from others on languages, modalities of the items, statements given, correct answers supplied and on other possible errors anticipated. The suggestions and views thus sought will help a test constructor in modifying and verifying his items afresh to make it more acceptable and usable. After construction of the test, items should be arranged in a simple to complex order. For arranging the items, a teacher can adopt so many methods viz., group-wise, unit-wise, topic wise etc. Scoring key should also be prepared forthwith to avoid further delay in scoring. Direction is an important part of a test construction. Without giving a proper direction or instruction, there will be a probability of loosing the authenticity of the test reliability. It may create a misunderstanding in the students also. Thus, the direction should be simple and adequate to enable the students to know:
  • 43.
    (i) The timefor completion of test, (ii) The marks allotted to each item, (iii) Required number of items to be attempted, (iv) How and where to record the answer? and (v) The materials, like graph papers or logarithmic table to be used. Observation Techniques In carrying out action research to improve teaching and learning, an important role of the researcher/instructor is to collect data and evidence about the teaching process and student learning. What follows is an introduction to some of the techniques which can be used for the said purpose. Student Assessment Tests, examinations and continuous assessment can provide valuable data for action research. For your teaching course, you have to set up a method of student assessment and your students have to be assessed, so you might as well make use of it in your project. You should, however, be clear about the nature of the information you can obtain from examination results or assessment grades. Comparison of one set of results with another often has limited validity as assignments, examinations, markers and marking schemes are rarely held constant. In addition most assessment is norm referenced rather than criterion referenced. (Linked with permission from CRESST,UCLA in USA.) You also need to be very clear as to what is being assessed. Examination grades may bear little relationship to specific qualities you could be investigating. For example, if the theme of an action research project is encouraging meaningful learning, then the examination results would only be of value if they truly reflect meaningful learning. They would be of little value if they consisted of problems which could be solved by substituting numbers into a remembered formula, or essays which required the reproduction of sections from lecture notes. So think carefully about the qualities which you wish to test and whether the assessment is a true test of those qualities. One way in which answers to assessment questions can be analysed for project purposes is by dividing them into qualitative categories. A systematic procedure for establishing categories is the SOLO taxonomy (Biggs and Collis, 1982). The SOLO taxonomy divides answers to written assessment questions into five categories, judged according to the level of learning: prestructural,
  • 44.
    unistructural, multistructural, relationaland extended abstract. The five levels correspond to answers ranging from the incorrect or irrelevant, through use of appropriate data, to integration of data in an appropriate way, and ending in innovative extensions. Closed Ended Questionnaires Closed questionnaires are ones which constrain the responses to a limited number chosen by the researcher; essentially it is a multiple choice format. Usually respondents are asked the extent to which they agree or disagree with a given statement. Responses are recorded on a Likert scale, such as the one in the example below, which ranges from 'definitely agree' to 'definitely disagree'. Questions should be carefully constructed so the meaning is clear and unambiguous. It is a good idea to trial the questionnaire on a limited number of students before giving it to a whole group. Closed questionnaires are easy to process and evaluate and can give clear answers to specific questions. However, the questions are defined by the researcher, so could completely miss the concerns of the respondents. You might therefore draw up the questions after a few exploratory interviews, or include some open-ended questions to give respondents a chance to raise other issues of concern. A section of a typical closed questionnaire used for course evaluation is shown below. Most institutions now have some form of standard teaching evaluation questionnaire available. These may be of some help in evaluating a project but in most cases the questions will not be sufficiently specific to the particular type of innovation which has been introduced. What might be more helpful are the data banks of optional or additional questions which are available. These can be used to pick or suggest questions which might be included in a more tailor-made questionnaire.
  • 45.
    Traditionally, collecting questionnaire,survey data is done by using copies of paper questionnaire and answer sheets. With the availability of web technology, there is now the option of collecting survey data online. To collect data using paper questionnaire, special answer sheets called OMR forms are often used. Respondents to questionnaires will be asked to mark their answers to questions of the questionnaire on OMR forms. An optical mark scanner will then be used to read the marks made on the OMR forms. The process will produce an electronic data file containing the responses to the questionnaire. The data file can then be analysed using software programs such as MS Excel or SPSS. In HKUST, both the optical mark scanner and OMR forms are available from ITSC. Diary / Journal Everyone involved in an action learning project should keep a diary or journal in which they record: • their initial reflections on the topic of concern • the plans that were made • a record of actions which were taken • observation of the effects of the actions • impressions and personal opinions about the action taken and reactions to them • results obtained from other observation techniques • references for, and notes on, any relevant literature or supporting documents which are discovered. Research reports are often very impersonal documents but this should not be the case for an action learning journal - quite the contrary! It should contain a record of both what you did and what you thought. In it you should regularly and systematically reflect critically on the effects of your project and how it is progressing. Journals act as the starting points for critical reflection at the regular meetings of the project team. By sharing observations and reflections it is possible to fine-tune the innovation. Sympathetic but critical discussion can also heighten awareness and contribute to changing perspectives. Supporting Documents Keep copies of any documents which are relevant to the course(s) you are examining. These can include:
  • 46.
    • documents forthe course development and accreditation process • minutes of course committees • the course syllabus • memos between course team leaders and members • handouts to students • copies of tests and examinations • lists of test results and student grades. Interaction Schedules Interaction schedules are methods for analysing and recording what takes place during a class. A common approach is to note down at regular intervals (say every minute) who is talking, and to categorise what they were saying or doing. An alternative to time sampling is event sampling in which behaviour is noted every time a particular event occurs. Examples of categories could be; tutor asking question, tutor giving explanation, tutor giving instruction, student answering question or student asking question. The analysis can be made by an observer at the class or can be made subsequently from a tape or video recording. Below are profiles which compare the interactions during two tutorials. An observer noted, at one minute intervals, who was talking and the type of communication. The plots can be used to compare the extent to which the tutor dominated the session and the students contributed. The example is adapted from Williams and Gillard (1986).
  • 47.
    There are otherapproaches to recording and analysing happenings in a classroom situation. McKernan (1991) discusses an extensive range of techniques, gives examples of each and considers how the data gathered should be analysed. Interviews Interviews can provide even more opportunity for respondents to raise their own issues and concerns, but are correspondingly more time-consuming and can raise difficulties in the collation and interpretation of information. The format can be on a spectrum from completely open discussion to tightly structured questions. Semi-structured interviews have a small schedule of questions to point the interviewee towards an area of interest to the researcher, but then allow interviewees to raise any items they like within the general topic area. Since interviews give an opportunity for students to raise their own agenda they are useful when issues are open, or at an exploratory stage. A small number of interviews can be useful to define issues for subsequent more tightly structured questionnaires. Interviews are normally tape recorded. If analysis, rather than just impression is required, then transcripts have to be produced. The transcripts are normally analysed by searching for responses or themes which commonly occur. Quotations from the transcripts can be used to illuminate or illustrate findings reported in reports and papers. There are computer programmes available to assist with the analysis of qualitative data. One example is the programme NUDIST which has facilities for indexing, text-searching, using Boolean operations on defined index nodes, and combining data from several initially independent studies. Student Learning Inventories Student learning inventories are examples of empirically derived measuring instruments. There are many number inventories which purport to measure a wide range of characteristics. Student learning inventories have been highlighted because they examine the quality of learning. In particular they look at the categories of deep and surface learning. The inventories can be used to compare groups of students, examine approaches before and after changes to teaching methods, and to examine correlations with other variables. The Study Process Questionnaire (SPQ) developed by John Biggs (1987) assesses students' approaches to learning. Scores are obtained for each student on deep, surface and achieving approach scales. The SPQ has been widely used in Hong Kong and its cultural applicability widely researched. A detailed account of usage of the SPQ, together with tables of norms for Hong Kong students for comparison purposes, is in Biggs (1992). The SPQ is available in English, Chinese or bilingual versions. For action learning projects, a suitable way to use the SPQ is to apply it at the start and end of the innovation. Changes in SPQ scores can then be interpreted as a reflection of the teaching and
  • 48.
    learning context. Theresults will indicate whether the innovation has encouraged meaningful approaches to learning. Biggs, J.B. (1987). The Study Process Questionnaire (SPQ): Manual. Hawthorn, Vic.: Australian Council for Educational Research. Biggs, J.B. (1992). Why and how do Hong Kong students learn? Using the Learning and Study Process Questionnaire. Hong Kong: University of Hong Kong. Open Ended Questionnaires Open questionnaires have a series of specific questions but leave space for respondents to answer as they see fit. You are therefore more likely to find out the views of students but replies are more difficult to analyse and collate. The usual procedure is to search for categories of common responses. An example of an open questionnaire is shown below. It is not necessary to have separate questionnaires for open and closed items. The most successful questionnaires often have both open and closed items. Diagnosis of Student Conceptions A good basis for improving your teaching is to diagnose your students' understanding of key concepts in a course. It is often surprising how students can pass university examinations but still have fundamental misunderstandings of key concepts. The usual method of diagnosing student conceptions is to ask a question which applies the concept to an every-day situation: one which cannot be answered by reproduction or by substitution into formulae. Answers are drawn from the students in interviews or in written form. The students answers can usually be classified into a small number (usually two to five) of conceptions or misconceptions about the phenomenon. As with the analysis of interview data care needs to be taken when deriving classifications. These do not automatically emerge from the transcript but are subject to the experiences and knowledge of the researcher.
  • 49.
    An example ofthe type of question, and categories of student conceptions which it uncovered is given below (Dahlgren, 1984). Tape Recording Making tape recordings is a way of collecting a complete, accurate and detailed record of discussions in class, conversations in interviews or arguments and decisions at meetings. It is easy to obtain the recording; you simply take along cassettes and a portable recorder, and switch it on. However, the presence of a tape recorder can inhibit discussion or influence people's behaviour. There are a number of ethical issues which need to be addressed over the use of tape recordings. The group being taped should establish the purpose of making the recording and the way in which the tapes will be used. If any quotations are made in subsequent reports it is customary to maintain the anonymity of the source. If you need to do a detailed analysis of the conversations then it will be necessary to produce a transcript. This is a time-consuming and painstaking process, so limit the use of tape recordings to situations where it is really necessary. Triangulation Triangulation is not a specific observation technique, but is the process of comparing and justifying data from one source to another. If you do just a handful of interviews your conclusions may be viewed with skepticism. But if the interview results concur with findings from a questionnaire, trends in examination results and evidence from your journal, then the conclusions are much more convincing. The message is simple; use more than one observation technique in order to see whether your results are consistent.
  • 50.
    Peer Appraisal andself-report techniques Peer Appraisal definition Employee assessments conducted by colleagues in the immediate working environment i.e. the people the employee interacts with regularly. Peer appraisal processes exclude superiors and underlings. Peer appraisals are a form of performance appraisal which are designed to monitor and improve job performance. Peer appraisals can be broken down into specific measures. Peer ranking involves workers ranking each member of the group from best to worst, either overall or on various areas of performance or responsibility. In peer ratings, workers rate colleagues on performance metrics while peer nomination is a simple nomination of the ‘best’ worker either overall or on performance metrics. Commonly-cited advantages of the peer appraisal process include insight and knowledge – workers have their ‘ear to the ground’ and are often in the best position to appraise a colleague’s performance. Peer appraisal also encourages a more inclusive team dynamic as colleagues gain a deeper insight into the challenges their colleagues face, and encourages development of a shared goal as workers realise they must impress their colleagues and respond to their ideas, concerns and needs. Self-report techniques describe methods of gathering data where participants provide information about themselves without interference from the experimenter. Such techniques can include questionnaires, interviews, or even diaries, and ultimately will require giving responses to pre-set questions. Evaluation of self-report methods Strengths: - Participants can be asked about their feelings and cognitions (i.e. thoughts), which can be more useful than simply observing behaviour alone. - Scenarios can be asked about hypothetically without having to physically set them up and observe participants’ behaviour. Weaknesses: - Gathering information about thoughts or feelings is only useful if participants are willing to disclose them to the experimenter.
  • 51.
    - Participants maytry to give the ‘correct’ responses they think researchers are looking for (or deliberately do the opposite), or try to come across in most socially acceptable way (i.e. social desirability bias), which can lead to giving untruthful responses. A self-report study is a type of survey, questionnaire, or poll in which respondents read the question and select a response by themselves without researcher interference. A self-reportis any method which involves asking a participant about their feelings, attitudes, beliefs and so on. Examples of self-reports are questionnaires and interviews; self-reports are often used as a way of gaining participants' responses in observational studies and experiments. Self-report studies have validity problems. Patients may exaggerate symptoms in order to make their situation seem worse, or they may under-report the severity or frequency of symptoms in order to minimize their problems. Patients might also simply be mistaken or misremember the material covered by the survey. Questionnaires and interviews Questionnaires are a type of self-report method which consist of a set of questions usually in a highly structured written form. Questionnaires can contain both open questions and closed questions and participants record their own answers. Interviews are a type of spoken questionnaire where the interviewer records the responses. Interviews can be structured whereby there is a predetermined set of questions or unstructured whereby no questions are decided in advance. The main strength of self-report methods are that they are allowing participants to describe their own experiences rather than inferring this from observing participants. Questionnaires and interviews are often able to study large samples of people fairly easy and quickly. They are able to examine a large number of variables and can ask people to reveal behaviour and feelings which have been experienced in real situations. However participants may not respond truthfully, either because they cannot remember or because they wish to present themselves in a socially acceptable manner. Social desirability bias can be a big problem with self-report measures as participants often answer in a way to portray themselves in a good light. Questions are not always clear and we do not know if the respondent has really understood the question we would not be collecting valid data. If questionnaires are sent out, say via email or through tutor groups, response rate can be very low. Questions can often be leading. That is, they may be unwittingly forcing the respondent to give a particular reply. Unstructured interviews can be very time consuming and difficult to carry out whereas structured interviews can restrict the respondents’ replies. Therefore psychologists often carry out semi- structured interviews which consist of some pre-determined questions and followed up with further questions which allow the respondent to develop their answers. Open and closed questions Questionnaires and interviews can use open or closed questions, or both. Closed questions are questions which provide a limited choice (for example, a participant's age or their favourite type of football team), especially if the answer must be taken from a predetermined list. Such questions provide quantitative data, which is easy to analyse. However these questions do not allow the participant to give in-depth insights.
  • 52.
    Open questions arethose questions which invite the respondent to provide answers in their own words and provide qualitative data. Although these type of questions are more difficult to analyse, they can produce more in-depth responses and tell the researcher what the participant actually thinks, rather than being restricted by categories. Rating scales One of the most common rating scales is the Likert scale. A statement is used and the participant decides how strongly they agree or disagree with the statements. For example the participant decides whether Mozzarella cheese is great with the options of "strongly agree", "agree", "undecided", "disagree", and "strongly disagree". One strength of Likert scales is that they can give an idea about how strongly a participant feels about something. This therefore gives more detail than a simple yes no answer. Another strength is that the data are quantitative, which are easy to analyse statistically. However, there is a tendency with Likert scales for people to respond towards the middle of the scale, perhaps to make them look less extreme. As with any questionnaire, participants may provide the answers that they feel they should. Moreover, because the data is quantitative, it does not provide in-depth replies. Fixed-choice questions Fixed-choice questions are phrased so that the respondent has to make a fixed-choice answer, usually 'yes' or 'no'. This type of questionnaire is easy to measure and quantify. It also prevents a participant from choosing an option that is not in the list. Respondents may not feel that their desired response is available. For example, a person who dislikes all alcoholic beverages may feel that it is inaccurate to choose a favorite alcoholic beverage from a list that includes beer, wine, and liquor, but does not include none of the above as an option. Answers to fixed-choice questions are not in-depth. Reliability Reliability refers to how consistent a measuring device is. A measurement is said to be reliable or consistent if the measurement can produce similar results if used again in similar circumstances. For example, if a speedometer gave the same readings at the same speed it would be reliable. If it didn't it would be pretty useless and unreliable. Importantly reliability of self- report measures, such as psychometric tests and questionnaires can be assessed using the split half method. This involves splitting a test into two and having the same participant doing both halves of the test. If the two halves of the test provide similar results this would suggest that the test has internal reliability. There are a number of ways to improve the reliability of self-report techniques. For example ambiguous questions could be clarified or in the case of interviews the interviewers could be given training. Validity Validity refers to whether a study measures or examines what it claims to measure or examine. Questionnaires are said to often lack validity for a number of reasons. Participants may lie; give answers that are desired and so on. A way of assessing the validity of self-report measures is to compare the results of the self-report with another self-report on the same topic. (This is
  • 53.
    called concurrent validity).For example if an interview is used to investigate sixth grade students' attitudes toward smoking, the scores could be compared with a questionnaire of former sixth graders' attitudes toward smoking. There are a number of ways to improve the validity of self-report techniques. For example leading questionscould be avoided, open questions could be added to allow respondents to expand upon their replies and confidentiality could be reinforced to allow respondents to give more truthful responses. Disadvantages Self-report studies have many advantages, but they also suffer from specific disadvantages due to the way that subjects generally behave. Self-reported answers may be exaggerated; respondents may be too embarrassed to reveal private details; various biases may affect the results, like social desirability bias. Subjects may also forget pertinent details. Self-report studies are inherently biased by the person's feelings at the time they filled out the questionnaire. If a person feels bad at the time they fill out the questionnaire, for example, their answers will be more negative. If the person feels good at the time, then the answers will be more positive. As with all studies relying on voluntary participation, results can be biased by a lack of respondents, if there are systematic differences between people who respond and people who do not. Care must be taken to avoid biases due to interviewers and their demand characteristics. Types of performance based assessment Performance based learning is when students participate in performing tasks or activities that are meaningful and engaging. The purpose of this kind of learning is to help students acquire and apply knowledge, practice skills, and develop independent and collaborative work habits. The culminating activity or product for performance-based learning is one that lets a student demonstrate evidence of understanding through a transfer of skills. This form of learning is measured through a performance-based assessment, which is open- ended and without a single, correct answer. The performance-based assessment should be something that shows authentic learning such as the creation of a newspaper or class debate.The benefit of these types of performance-based assessments is that when the students are more actively involved in the learning process, they will absorb and understand the material at a much deeper level. Other characteristics of performance-based assessments are that they are complex and time-bound. In addition, there are learning standards in each discipline that set academic expectations and define what is proficient in meeting that standard. Performance based activities can integrate two or more subjects and should also meet 21st Century expectations whenever possible: • Creativity and Innovation • Critical Thinking and Problem Solving • Communication and Collaboration There are also Information Literacy standards and Media Literacy standards that be incorporated into performance based learning.
  • 54.
    Performance-based activities canbe quite challenging for students to complete. They need to understand from the beginning exactly what is being asked of them and how they will be assessed. Exemplars and models may help, but it is more important to provide detailed criteria that will be used to assess the performance-based assessment. That criteria should be incorporated into a scoring rubric. Observations are an important part of evaluating performance-based assessments. Observations can be used to provide students with feedback to improve performance. Teachers and students can both use observations. There may be peer to peer student feedback. There could be a checklist or a tally in order to record performance. Students can take their experiences in performance-based learning to use at later points in their educational, personal, or professional lives. The goal of performance-based learning should be to enhance what the students have learned, not just have them recall facts. Following are six different types of activities that can be developed as assessments for performance-based learning. Presentations One easy way to have students complete a performance-based activity is to have them do a presentation or report of some kind. This could be done by students, which takes time, or in collaborative groups. The basis for the presentation may be one of the following: • Providing Information • Teaching a Skill • Reporting Progress • Persuading Others Students may choose to add in visual aids or a PowerPoint presentation or Google Slides to help illustrate elements in their speech. Presentations work well across the curriculum as long as there is a clear set of expectations for students to work with from the beginning. Portfolios Student portfolios can include items that students have created and/or collected over a specific period of time. Art portfolios are often used for students who want to apply to art programs in college. Another example is when students create a portfolio of their written work that shows how they have progressed from the beginning to the end of class. This writing in a portfolio can be from any discipline or from a combination of disciplines.
  • 55.
    Some teachers havestudents select those items they feel represents their best work to be included in a portfolio. The benefit of an activity like this is that it is something that grows over time and is therefore not just completed and forgotten. A portfolio can provide students with a lasting selection of artifacts that they can use later in their academic career. Reflections may be included in student portfolios in which students may make note of their growth based on the materials in the portfolio. In designing portfolios may include taped presentations, dramatic readings, or digital files. Performances Dramatic performances are one kind of collaborative activities that can be used as a performance-based assessment. Students can create, perform, and/or provide a critical response. Examples include dance, recital, dramatic enactment. There may be prose or poetry interpretation. This form of performance based assessment can take time, so there must be a clear pacing guide. Students must be provided time to address the demands of the activity; resources must be readily available and meet all safety standards. Students should have opportunities to draft stage work and practice. Developing the criteria and the rubric and sharing these with students before evaluating a dramatic performance is critical before assessing student effort. Projects Projects are quite commonly used by teachers as performance-based activities. They can include everything from research papers to artistic representations of information learned. Projects may require students to apply their knowledge and skills while completing the assigned task, using creativity, critical thinking, analysis, and synthesis. Students might be asked to complete reports, diagrams, and maps. Teachers can also choose to have students work individually or in groups. Journals may be part of a performance-based assessment. Journals can be used to record student reflections. Teachers may require students to complete journal entries. Some teachers may use journals as a way to record participation. Exhibits And Fairs Teachers can expand the idea of performance-based activities by creating exhibits or fairs for students to display their work. Examples include things like history fairs to art exhibitions. Students work on a product or item that will be publicly exhibited.
  • 56.
    Exhibitions show in-depthlearning and may include feedback from viewers. In some cases, students might be required to explain or 'defend' their work to those attending the exhibition. Some fairs like science fairs could include the possibility of prizes and awards. Debates A debate in the classroom is one form of performance-based learning that teaches students about varied viewpoints and opinions. Skills associated with debate include research, media and argument literacy, reading comprehension, evidence evaluation, and public speaking, and civic skills. There are many different formats debate. One is the fishbowl debate in which a handful of students to come in a half-circle facing the other students and debate a topic. The rest of the classmates may pose questions to the panel. Another form is a mock trial where teams representing the prosecution and defense take on the roles of attorneys and witnesses. A judge, or judging panel, oversees the courtroom presentation. Middle school and high schools can use debates in the classroom, with increased levels of sophistication by grade level. Student Logs Documenting student participation in physical activity (NASPE Standard 3) is often difficult. Teachers can assess participation in an activity or skill practice trials completed outside of class using logs. Practice trials during class that demonstrate student effort can also be documented with logs. A log records behaviors over a period of time (see figure 14.1). Often the information recorded shows changes in behavior, trends in performance, results of participation, progress, or the regularity of physical activity. A student log is an excellent artifact for use in a portfolio. Because logs are usually a self-recorded document, they are not used for summative assessments unless as an artifact in a portfolio or for a project. If teachers wanted to increase the importance placed on a log, a method of verification by an adult or someone in authority should be added. Journals Journals can be used to record student feelings, thoughts, perceptions, or reflections about actual events or results. The entries in journals often report social or psychological perspectives, both positive and negative, and may be used to document the personal meaning associated with one’s participation (NASPE Standard 6). Journal entries would not be an appropriate summative assessment by themselves, but might be included as an artifact in a portfolio. Journal entries are
  • 57.
    excellent ways forteachers to “take the pulse” of a class and determine whether students are valuing the content of the class. Teachers must be careful not to assess affective domain journal entries for the actual content, because doing so may cause students to write what teachers want to hear (or give credit for) instead of true and genuine feelings. Teachers could hold students accountable for completing journal entries. Some teachers use journals as a way to log participation over time. Using Observation in the Assessment Process Human performance provides many opportunities for students to exhibit behaviors that may be directly observed by others, a unique advantage of working in the psychomotor domain. Wiggins (1998) uses physical activity when providing examples to illustrate complex assessment concepts, as they are easier to visualize than would be the case with a cognitive example. The nature of performing a motor skill makes assessment through observational analysis a logical choice for many physical education teachers. In fact, investigations of measurement practices of physical educators have consistently shown a reliance on observation and related assessment methods (Hensley and East 1989; Matanin and Tannehill 1994; Mintah 2003). Observation is a skill used with several performance-based assessments. It is often used to provide students with feedback to improve performance. However, without some way to record results, observation alone is not an assessment. Going back to the definition of assessment provided earlier in the chapter, assessment is the gathering of information, analyzing the data, and then using the information to make an evaluation. Therefore, some type of written product must be produced if the task is considered an assessment. Teachers and peers can assess others using observation. They might use a checklist or some type of event recording scheme to tally the number of times a behavior occurred. Keeping game play statistics is an example of recording data using event recording techniques. Students can self- analyze their own performance and record their performances using criteria provided on a checklist or a game play rubric. Table 14.1 is an example of a recording form that could be used for peer assessment. When using peer assessment, it is best to have the assessor do only the assessment. When the person recording assessment results is also expected to take part in the assessment (e.g., tossing the ball to the person being assessed), he or she cannot both toss and do an accurate observation. In the case of large classes, teachers might even use groups of four, in which one person is being evaluated, a second person is feeding the ball, the third person is doing the observation, and a fourth person is recording the results.
  • 58.
    Guidelines for developingeffective performance assessment The performance development system at Wellesley College is designed to provide alignment between the College’s mission, constituent needs and performance expectations. The program fosters ongoing two-way communication between employees and managers; supports the development of clear, consistent, and measurable goals linked directly to Wellesley’s core values and competencies; helps to articulate and support training needs and career development; and establishes the criteria for making reward and recognition decisions. Effective performance development at Wellesley College begins with respect for one another and ends with excellence in performance. It is the responsibility of every supervisor to communicate on an ongoing basis with their employees. These conversations should provide clear and honest role expectations and feedback and should help identify improvement, development, and career issues. Each employee has a responsibility to participate fully in these conversations, be sure they understand their role responsibilities and expectations, and communicate any obstacles or training needed in order to perform their role at an optimum level. The Performance Development Annual Summary Meeting Performance development should be happening all year long. When a manager compliments an employee for a job well done or coaches an employee through a difficult situation, that is part of performance development. Wellesley’s performance development process includes a summary review assessment that should bring closure to the performance period and provide a basis for performance development for the next period. The following suggestions help set the stage for a productive discussion. 1. Establish the proper climate. Create a sincere, open, and constructive atmosphere. Schedule the meeting in advance and stick to it. Allow enough time to discuss the review. Locate a private space and guard against interruptions. 2. Make it clear that this is a joint discussion. Listen and ask for the employee’s opinion. Avoid words or body language that criticize the employee’s view. Understand your employee’s point of view. Working together is better than being at odds. Be willing to modify the Performance Develoment Document to reflect what is discussed and agreed upon at the meeting. 3. Discuss the role document and performance requirements. Explore the competencies required for successful performance. Update the role document if needed. 4. Discuss goals for the performance review period. Review whether the goals were met. Discuss obstacles and roadblocks that affected goal achievement.
  • 59.
    5. Discuss opportunitiesfor growth and development in the current role or a different role. Discuss the employee’s developmental and career goals. Remember there is also the opportunity for growth and development within the current role. There are new things to be done and more effective and efficient ways to accomplish work Either at this meeting or a separate meeting, develop goals for the coming year. Refer to Guidelines for Setting Goals and Objectives for additional information on setting goals. Remember, performance development is about ongoing two-way communication between the employee and their supervisor. The annual performance appraisal should be a summary of various meetings throughout the year (interim goal reviews/updates). There should be no surprises at this summary meeting. Preparing for Annual Performance Development Discussions Tips for the Employee: Employees have a responsibility in the performance development process and should be prepared to give feedback to their manager. Review your current role document. Does it reflect your current role in the department? If not, discuss with your supervisor about revising your role document. Review your goals for the year. Have they been met? Review your achievements. Think about obstacles/roadblocks you encountered and how you dealt with them. Is there anyone else your supervisor should speak with before preparing your evaluation? Let your supervisor know this before the review meeting. Review the competencies required for administrative staff positions at Wellesley. Identify specific areas of expertise or skills that you would like to develop or improve. Identify your strengths. In what areas have you improved? Can you identify any developmental goals for the coming year? What ideas do you have for changes that would help you perform your role better and/or improve the operation of the department? Think about obstacles/roadblocks that you face in performing your responsibilities and what help is needed from your supervisor to overcome them. If you manage others, what have you done to develop/strengthen your staff’s performance and skills? Tips for the Supervisor: The supervisor is responsible for ongoing communication about performance throughout the year. Performance problems should be addressed as they occur. There should be no surprises in the end-of-the-year summary. The supervisor is responsible for preparing the summary documentation. Review the employee’s role document. Does it reflect their current role in the department? Review the primary position responsibilities. Has the employee effectively performed these? What is your overall assessment of how these responsibilities were performed? Review the employee’s goals from last year. Were goals modified or changed during the review period? Have the goals been met? Have you been able to provide the employee with the tools and support to get the job done? Review last year’s appraisal. How does this year compare to last year? Have there been improvements? Consider whether you need to speak with anyone else in order to have a more complete and accurate picture of your employee’s performance.
  • 60.
    Review the competenciesrequired for administrative staff roles at Wellesley. Assess the employee’s strengths, weaknesses and areas of greatest improvement. Is there a specific area where you would like to establish a developmental goal? What suggestions do you have for the employee that will help improve their performance in their role or the overall operations of the department? If the employee supervises others, discuss what he or she has done to strengthen their own staff. Ask about regular communication of information, job expectations, and feedback. Contact the Human Resources Office for assistance if substantial performance issues exist. Finalizing the Performance Development Document The supervisor is responsible for completing the final draft of the Performance Development Document and forwarding the completed document to Human Resources to become part of the employee’s personnel file. Send a hard copy so that signatures are included. The supervisor should provide a copy of the final Performance Development Document to the employee. The employee should sign the Performance Development Document. Signing the Performance Development Document indicates that the employee has met with their supervisor to provide input to the document, that they have reviewed the document, and that they have met with the supervisor to discuss it. The employee has the right to respond to the evaluation in writing. Tips on Ongoing, Effective Feedback Feedback involves treating each other with respect. Constructive feedback tries to reinforce the positive and change the negative by: Identifying what was done well or poorly. Describing what action or behavior is desired. Explaining the effects of the observed and desired acts of behavior. Good feedback is timely. Give the feedback as quickly as possible after the event. Feedback long delayed is rarely effective. Feedback involves both parties listening carefully. Check for clarity to ensure that the receiver fully understands what is being said. Good feedback should be specific. Generalized feedback does not explain what behavior to repeat or avoid. Describe exactly what was done well and/or what could be improved. For example, “This report is well organized and the summary clearly states your conclusions and proposed actions” rather than “Good report.” Keep feedback objective. Use factual records and information whenever possible. Include details that focus on specific actions and results rather than characteristics of the employee. For example, say “this happened” rather than “you are.” “You hung up the phone without saying good-bye.” rather than “you are rude.”
  • 61.
    Feedback about performanceissues is best delivered in person. The employee will have a chance to respond to any issues raised. Especially avoid delivering negative feedback via e-mail messages. Performance criteria The National Center for Research, Evaluation, Standards, and Student Testing (1996) defines criteria as "guidelines, rules, characteristics, or dimensions that are used to judge the quality of student performance. Criteria indicate what we value in student responses, products, or performances." With performance assessments such as a lab, group project, portfolio, task, or presentation, students need to clearly know and understand what performance criteria will be used to judge their performance. Although student interpretations are important, educators need to recognize that on the basis of cultural and environmental norms, explanations that seem diametrically opposed may be equally defensible or right. Because this quality of complexity allows performance assessments to mirror real life, educators need to explicitly include the exact parameters of the responses they want to elicit in each assessment task or problem. (For example, educators should make sure students know if the writing process--rather than punctuation and grammar--is the criterion on which performance will be judged, or if a paragraph--as opposed to a few words--is the criterion response.) The problem of interpretation differences that result when performance criteria (requirements) are ambiguous is compounded when students have diverse experiences based on their ethnicity, primary language, or gender. In an effort to assess higher-order cognitive skills and complex problem solving, educators need to develop appropriate learning assessments that have no single right answer and in which students' interpretation of information or evidence is key in defending their solution. Scoring Rubrics Scoring rubrics are descriptive scoring schemes developed to assess any student performance whether it's written or oral, online or face-to-face.Scoring rubrics are especially well suited for evaluating complex tasks or assignments such as: written work (e.g., assignments, essay tests, papers, portfolios); presentations (e.g., debates, role plays); group work; or other types of work products or performances (e.g., artistic works, portfolios). Scoring rubrics are assignment- specific; criteria are different for each assignment or test. It is a way to make your criteria and standards clear to both you and your students. Good scoring rubrics: • Consist of a checklist of items, each with an even number of points. For example, two- point rubrics would indicate that the student either did or did not perform the specified task. Four or more points in a rubric are common and indicate the degree to which a student performed a given task.
  • 62.
    • Are criterionbased. That is, the rubric contains descriptive criteria for acceptable performance that are meaningful, clear, concise, unambiguous, and credible--thus ensuring inter-rater reliability. • Are used to assess only those behaviors that are directly observable. • Require a single score based on the overall quality of the work or presentation. • Provide a better assessment and understanding of expected or actual performance. Sample Rubric for Quizzes and Homework Why Develop Scoring Rubrics? Here are some reasons why taking the time to construct a grading rubric will be worth your time: • Make grading more consistent and fair. • Save you time in the grading process. • Help identify students' strengths and weaknesses so you can teach more effectively. • To help students understand what and how they need to improve. Guidelines for Developing a Scoring Rubric Step 1: Select a project/assignment for assessment. Example: Work in small groups to write and present a collaborative research paper. Step 2: What performance skill(s) or competency(ies) are students demonstrating through their work on this project? Example: Ability to work as part of a team. Step 3: List the traits you'll assess when evaluating the project--in other words, ask: "What counts in my assessment of this work?" Use nouns or noun phrases to name traits, and avoid evaluative language. Limit the number of traits to no more than seven. Each trait should represent a key teachable attribute of the overall skill you're assessing. Example: Content Coherence and Organization Creativity Graphics and visuals Delivery Step 4: Decide on the number of gradations of mastery you'll establish for each trait and the language you'll use to describe those levels. Five points of gradation:
  • 63.
    5=Proficient 4=Clearly Competent3=Acceptable 2=Limited 1=Attempted Four points of gradation: Exceptional/Excellent Admirable/Good Acceptable/Fair Amateur/Poor Step 5: For each trait write statements that describe work at each level of mastery. If, for example, you have seven traits and five gradations, you'll have 35 descriptive statements in your rubric. Attempt to strike a balance between over-generalizations and task-specificity. For the trait "coherence and organization" in a four-point rubric: Exceptional: Thesis is clearly stated and developed; specific examples are appropriate and clearly develop thesis; conclusion is clear; ideas flow together well; good transitions; succinct but not choppy; well- organized. Admirable: Most information presented in logical sequence; generally very organized but better transitions between ideas is needed. Acceptable: Concept and ideas are loosely connected; lacks clear transitions; flow and organization are choppy. Amateur: Presentation of ideas is choppy and disjointed; doesn't flow; development of thesis is vague; no apparent logical order to writing Step 6: Design a format for presenting the rubric to students and for scoring student work. Step 7: Test the rubric and fine tune it based on feedback from colleagues and students.
  • 64.
    Check list andrating scale What is a checklist? A checklist is just what it sounds like: a list that educators check off. Using this method is a little bit like going bird watching. Start with a list of items you want to observe and then check off each item when appropriate. One popular choice for educators is to use developmental checklists to record what they have observed about individual children; these developmental checklists consist of lists of skills from the different developmental domains for a specific age range. Why use checklists? Checklists are quick and easy to use, so they are popular with educators. They can be used to record observations in virtually any situation, and do not require the educator to spend much time recording data; in general, a few moments is all it takes. One other advantage is that there are many different pre-made checklists available for use from a variety of sources. For example, certain websites connected with ECE offer developmental checklists that educators can download and print out. Educators can also create a checklist that exactly meets their needs, depending on what they want to observe and record. How do I use a checklist? As it is such a popular choice for educators, the example we will present here shows how to use a developmental checklist. These developmental checklists are generally used to record observations of one child at a time. The list of skills is targeted for a specific age group (e.g. 12 to 24 months). They may be divided into the different developmental domains or focus only on one aspect of a child’s development. Once you have chosen or created a checklist, you then observe the child in a variety of natural contexts and check off all the relevant skills or behaviours. Usually, there is a space to indicate the relevant date(s) on the checklist, as this might be an important piece of data. As the checklist method does not allow for the recording of a lot of qualitative data, you might choose to have a column for comments. Sample checklist for language development: Two-year-olds A blank checklist could look something like this: Child’s Name: Alan Behaviour/Skill Date Comments Communicates with gestures and pointing Shakes head for no Uses one- word sentences Uses two- word sentences
  • 65.
    Names familiar objects Followssimple instructions Enjoys songs and rhymes Refers to self as "me" or "I" Once you begin filling in the checklist, it will start to look something like this: Child’s Name: Alan Behaviour/Skill Date Comments Communicates with gestures and pointing March 9, 2012 Shakes head for no March 9, 2012 Uses one- word sentences March 10, 2012 Uses two- word sentences March 29, 2012 "My book" Names familiar objects Follows simple instructions Aprl 15, 2012 Enjoys songs and rhymes March 5, 2012 Loves Hokey Pokey Refers to self as "me" or "I" March 20, 2012 Taps self on chest, says "Ayan" Note that, in general, behaviours and/or skills that you have not yet observed, or that the child has not yet mastered, are left blank, so that you can update the checklist as needed. In some cases, you may want to add a comment like the one in the last box in the sample above. In this example, Alan’s strategies for referring to himself are significant, even if he is not yet demonstrating the specific behaviour from the checklist. Using a rating scale Sometimes educators feel limited by a checklist because this method only allows the observer to record if a child uses a specific skill or not. In this case, they might choose to add a rating scale to their observations. By adding a rating scale, an educator can rate the quality, frequency or ease with which a child uses a certain skill.
  • 66.
    If you wereto add a rating scale to your checklist, it might look like this: Child’s Name: Alan Date: March/April 2012 Behaviour/Skill Usually Frequently Rarely Never Comments Communicates with gestures and pointing Shakes head for no Uses one- word sentences Uses two- word sentences Names familiar objects Follows simple instructions Enjoys songs and rhymes Refers to self as "me" or "I" Once you begin filling it in, it could look something like this: Child’s Name: Alan Date: March/April 2012 Behaviour/Skill Usually Frequently Rarely Never Comments Communicates with gestures and pointing Shakes head for no Uses one- word sentences Uses two- word sentences "My book" Names familiar objects Follows simple instructions Enjoys songs and rhymes Refers to self as "me" or "I" Taps self on chest, says "Ayan"
  • 67.
    Purpose of portfolio Astudent portfolio is a compilation of academic work and other forms of educational evidence assembled for the purpose of (1) evaluating coursework quality, learning progress, and academic achievement; (2) determining whether students have met learning standards or other academic requirements for courses, grade-level promotion, and graduation; (3) helping students reflect on their academic goals and progress as learners; and (4) creating a lasting archive of academic work products, accomplishments, and other documentation. Advocates of student portfolios argue that compiling, reviewing, and evaluating student work over time can provide a richer, deeper, and more accurate picture of what students have learned and are able to do than more traditional measures—such as standardized tests, quizzes, or final exams—that only measure what students know at a specific point in time. Portfolios come in many forms, from notebooks filled with documents, notes, and graphics to online digital archives and student-created websites, and they may be used at the elementary, middle, and high school levels. Portfolios can be a physical collection of student work that includes materials such as written assignments, journal entries, completed tests, artwork, lab reports, physical projects (such as dioramas or models), and other material evidence of learning progress and academic accomplishment, including awards, honors, certifications, recommendations, written evaluations by teachers or peers, and self-reflections written by students. Portfolios may also be digital archives, presentations, blogs, or websites that feature the same materials as physical portfolios, but that may also include content such as student-created videos, multimedia presentations, spreadsheets, websites, photographs, or other digital artifacts of learning. Online portfolios are often called digital portfolios or e-portfolios, among other terms. In some cases, blogs or online journals may be maintained by students and include ongoing reflections about learning activities, progress, and accomplishments. Portfolios may also be presented— publicly or privately—to parents, teachers, and community members as part of a demonstration of learning, exhibition, or capstone project. It’s important to note that there are many different types of portfolios in education, and each form has its own purpose. For example, “capstone” portfolios would feature student work completed as part of long-term projects or final assessments typically undertaken at the culmination of a middle school or high school, or at the end of a long-term, possibly multiyear project. Some portfolios are only intended to evaluate learning progress and achievement in a specific course, while others are maintained for the entire time a student is enrolled in a school. And some portfolios are used to assess learning in a specific subject area, while others evaluate the acquisition of skills that students can apply in all subject areas. The following arguments are often made by educators who advocate for the use of portfolios in the classroom:
  • 68.
    • Student portfoliosare most effective when they are used to evaluate student learning progress and achievement. When portfolios are used to document and evaluate the knowledge, skills, and work habits students acquire in school, teachers can use them to adapt instructional strategies when evidence shows that students either are or are not learning what they were taught. Advocates typically contend that portfolios should be integrated into and inform the instructional process, and students should incrementally build out portfolios on an ongoing basis—i.e., portfolios should not merely be an idle archive of work products that’s only reviewed at the end of a course or school year. • Portfolios can help teachers monitor and evaluate learning progress over time. Tests and quizzes give teachers information about what students know at a particular point in time, but portfolios can document how students have grown, matured, and improved as learners over the course of a project, school year, or multiple years. For this reason, some educators argue that portfolios should not just be compilations of a student’s best work, but rather they should include evidence and work products that demonstrate how students improved over time. For example, multiple versions of an essay can show how students revised and improved their work based on feedback from the teachers or their peers. • Portfolios help teachers determine whether students can apply what they have learned to new problems and different subject areas. A test can help teachers determine, for example, whether students have learned a specific mathematical skill. But can those students also apply that skill to a complex problem in economics, geography, civics, or history? Can they use it to conduct a statistical analysis of a large data set in a spreadsheet? Or can they use it to develop a better plan for a hypothetical business. (Educators may call this ability to apply skills and knowledge to novel problems and different domains “transfer of learning”). Similarly, portfolios can also be used to evaluate student work and learning in non-school contexts. For example, if a student participated in an internship or completed a project under the guidance of an expert mentor from the community, students could create portfolios over the course of these learning activities and submit them to their teachers or school as evidence they have met certain learning expectations or graduation requirements. • Portfolios can encourage students to take more ownership and responsibility over the learning process. In some schools, portfolios are a way for students to critique and evaluate their own work and academic progress, often during the process of deciding what will be included in their portfolios. Because portfolios document learning growth over time, they can help students reflect on where they started a course, how they developed, and where they ended up at the conclusion of the school year. When reviewing a portfolio, teachers may also ask students to articulate the connection between particular work products and the academic expectations and goals for a course. For these reasons, advocates of portfolios often recommend that students be involved in determining what goes into a portfolio, and that teachers should not unilaterally make the decisions without involving students. For related discussions, see student engagement and student voice. • Portfolios can improve communication between teachers and parents. Portfolios can also help parents become more informed about the education and learning progress of their children, what is being taught in a particular course, and what students are doing and learning in the classroom. Advocates may also contend that when parents are more informed about and engaged in their child’s education, they can play a more active role in supporting their children at home, which could have a beneficial affect on academic achievement and long-term student outcomes.
  • 69.
    Debate While portfolios arenot generally controversial in concept, it’s possible that skepticism, criticism, and debate may arise if portfolios are viewed as burdensome, add-on requirements rather than as a vital instructional strategy and assessment option. Portfolios may also be viewed negatively if they are poorly designed and executed, if they tend to be filed away and forgotten, if they are not actively maintained by students, if they are not meaningfully integrated into the school’s academic program, if educators do not use them to inform and adjust their instructional techniques, or if sufficient time is not provided during the school day for teachers and students to review and discuss them. In short, how portfolios are actually used or not used in schools, and whether they produce the desired educational results, will likely determine how they are perceived. Creating, maintaining, and assessing student portfolios can also be a time-consuming endeavor. For this reason and others, some critics may contend that portfolios are not a practical or feasible option for use in large-scale evaluations of school and student performance. (Just imagine, for example, what it would require in terms of funding, time, and human resources to evaluate dozens or hundreds of pages of academic documentation produced by each of the of tens of thousands of eleventh-grade students scattered across a state in any given year.) Standardized tests, in contrast, are relatively efficient and inexpensive to score, and test results are considered more reliable or comparable across students, schools, or states, given that there is less chance that error, bias, or inconsistency may occur during the scoring process (in large part because most standardized tests today are scored in full or in part by automated machines, computers, or online programs). Student portfolios are a comparably time-consuming—and therefore far more expensive—assessment strategy because they require human scorers, and it is also far more challenging to maintain consistent and reliable evaluations or student achievement across different scorers. Many advocates would argue, however, that portfolios are not intended for use in large-scale evaluations of school and student performance, and that they provide the greatest educational value at the classroom level where teachers have personal relationships and conversations with students, and where in-depth feedback from teachers can help students grow, improve, and mature as learners. Evaluation criteria and using portfolios in instruction and communication WHAT IS PORTFOLIO ASSESSMENT? In program evaluation as in other areas, a picture can be worth a thousand words. As an evaluation tool for community-based programs, we can think of a portfolio as a kind of scrapbook or photo album that records the progress and activities of the program and its
  • 70.
    participants, and showcasesthem to interested parties both within and outside of the program. While portfolio assessment has been predominantly used in educational settings to document the progress and achievements of individual children and adolescents, it has the potential to be a valuable tool for program assessment as well. Many programs do keep such albums, or scrapbooks, and use them informally as a means of conveying their pride in the program, but most do not consider using them in a systematic way as part of their formal program evaluation. However, the concepts and philosophy behind portfolios can apply to community evaluation, where portfolios can provide windows into community practices, procedures, and outcomes, perhaps better than more traditional measures. ortfolio assessment has become widely used in educational settings as a way to examine and measure progress, by documenting the process of learning or change as it occurs. Portfolios extend beyond test scores to include substantive descriptions or examples of what the student is doing and experiencing. Fundamental to "authentic assessment" or "performance assessment" in educational theory is the principle that children and adolescents should demonstrate, rather than tell about, what they know and can do (Cole, Ryan, & Kick, 1995). Documenting progress toward higher order goals such as application of skills and synthesis of experience requires obtaining information beyond what can be provided by standardized or norm-based tests. In "authentic assessment", information or data is collected from various sources, through multiple methods, and over multiple points in time (Shaklee, Barbour, Ambrose, & Hansford, 1997). Contents of portfolios (sometimes called "artifacts" or "evidence") can include drawings, photos, video or audio tapes, writing or other work samples, computer disks, and copies of standardized or program-specific tests. Data sources can include parents, staff, and other community members who know the participants or program, as well as the self-reflections of participants themselves. Portfolio assessment provides a practical strategy for systematically collecting and organizing such data. PORTFOLIO ASSESSMENT IS MOST USEFUL FOR: *Evaluating programs that have flexible or individualized goals or outcomes. For example, within a program with the general purpose of enhancing children's social skills, some individual children may need to become less aggressive while other shy children may need to become more assertive. Each child's portfolio asseessment would be geared to his or her individual needs and goals. *Allowing individuals and programs in the community (those being evaluated) to be involved in their own change and decisions to change. *Providing information that gives meaningful insight into behavior and related change. Because portfolio assessment emphasizes the process of change or growth, at multiple points in time, it may be easier to see patterns.
  • 71.
    *Providing a toolthat can ensure communication and accountability to a range of audiences. Participants, their families, funders, and members of the community at large who may not have much sophistication in interpreting statistical data can often appreciate more visual or experiential "evidence" of success. *Allowing for the possibility of assessing some of the more complex and important aspects of many constructs (rather than just the ones that are easiest to measure). PORTFOLIO ASSESSMENT IS NOT AS USEFUL FOR: *Evaluating programs that have very concrete, uniform goals or purposes. For example, it would be unneccessary to compile a portfolio of individualized "evidence" in a program whose sole purpose is full immunization of all children in a community by the age of five years. The required immunizations are the same, and the evidence is generally clear and straightforward. *Allowing you to rank participants or programs in a quantitative or standardized way (although evaluators or program staff may be able to make subjective judgements of relative merit). *Comparing participants or programs to standardized norms. While portfolios can (and often do) include some standardized test scores along with other kinds of "evidence", this is not the main purpose of the portfolio. USING PORTFOLIO ASSESSMENT WITH THE STATE STRENGTHENING EVALUATION GUIDE Tier 1 - Program Definition Using portfolios can help you to document the needs and assets of the community of interest. Portfolios can also help you to clarify the identity of your program and allow you to document the "thinking" behind the development of and throughout the program. Ideally, the process of deciding on criteria for the portfolio will flow directly from the program objectives that have been established in designing the program. However, in a new or existing program where the original objectives are not as clearly defined as they need to be, program developers and staff may be able to clarify their own thinking by visualizing what successful outcomes would look like, and what they would accept as "evidence". Thus, thinking about portfolio criteria may contribute to clearer thinking and better definition of program objectives. Tier 2 - Accountability Critical to any form of assessment is accountability. In the educational arena for example, teachers are accountable to themselves, their students, and the families, the schools and society. The portfolio is an assessment practice that can inform all of these constituents. The process of selecting "evidence" for inclusion in portfolios involves ongoing dialogue and feedback between participants and service providers.
  • 72.
    Tier 3 -Understanding and Refining Portfolio assessment of the program or participants provides a means of conducting assessments throughout the life of the program, as the program addresses the evolving needs and assets of participants and of the community involved. This helps to maintain focus on the outcomes of the program and the steps necessary to meet them, while ensuring that the implementation is in line with the vision established in Tier 1. Tier 4 - Progress Toward Outcomes Items are selected for inclusion in the portfolio because they provide "evidence" of progress toward selected outcomes. Whether the outcomes selected are specific to individual participants or apply to entire communities, the portfolio documents steps toward achievement. Usually it is most helpful for this selection to take place at regular intervals, in the context of conferences or discussions among participants and staff. Tier 5 - Program Impact One of the greatest strengths of portfolio assessment in program evaluation may be its power as a tool to communicate program impact to those outside of the program. While this kind of data may not take the place of statistics about numbers served, costs, or test scores, many policy makers, funders, and community members find visual or descriptive evidence of successes of individuals or programs to be very persuasive. ADVANTAGES OF USING PORTFOLIO ASSESSMENT *Allows the evaluators to see the student, group, or community as individual, each unique with its own characteristics, needs, and strengths. *Serves as a cross-section lens, providing a basis for future analysis and planning. By viewing the total pattern of the community or of individual participants, one can identify areas of strengths and weaknesses, and barriers to success. *Serves as a concrete vehicle for communication, providing ongoing communication or exchanges of information among those involved. *Promotes a shift in ownership; communities and participants can take an active role in examining where they have been and where they want to go. *Portfolio assessment offers the possibility of addressing shortcomings of traditional assessment. It offers the possibility of assessing the more complex and important aspects of an area or topic.
  • 73.
    *Covers a broadscope of knowledge and information, from many different people who know the program or person in different contexts ( eg., participants, parents, teachers or staff, peers, or community leaders). DISADVANTAGES OF USING PORTFOLIO ASSESSMENT *May be seen as less reliable or fair than more quantitative evaluations such as test scores. *Can be very time consuming for teachers or program staff to organize and evaluate the contents, especially if portfolios have to be done in addition to traditional testing and grading. *Having to develop your own individualized criteria can be difficult or unfamiliar at first. *If goals and criteria are not clear, the portfolio can be just a miscellaneous collection of artifacts that don't show patterns of growth or achievement. *Like any other form of qualitative data, data from portfolio assessments can be difficult to analyze or aggregate to show change. HOW TO USE PORTFOLIO ASSESSMENT Design and Development Three main factors guide the design and development of a portfolio: 1) purpose, 2) assessment criteria, and 3) evidence (Barton & Collins, 1997). 1) Purpose The primary concern in getting started is knowing the purpose that the portfolio will serve. This decision defines the operational guidelines for collecting materials. For example, is the goal to use the portfolio as data to inform program development? To report progress? To identify special needs? For program accountability? For all of these? 2) Assessment Criteria Once the purpose or goal of the portfolio is clear, decisions are made about what will be considered sucess (criteria or standards), and what strategies are necessary to meet the goals. Items are then selected to include in the portfolio because they provide evidence of meeting criteria, or making progress toward goals. 3) Evidence
  • 74.
    In collecting data,many things need to be considered. What sources of evidence should be used? How much evidence do we need to make good decisions and determinations? How often should we collect evidence? How congruent should the sources of evidence be? How can we make sense of the evidence that is collected? How should evidence be used to modify program and evaluation? According to Barton and Collins (1997), evidence can include artifacts (items produced in the normal course of classroom or program activities), reproductions (documentation of interviews or projects done outside of the classroom or program), attestations (statements and observations by staff or others about the participant), and productions (items prepared especially for the portfolio, such as participant reflections on their learning or choices) . Each item is selected because it adds some new information related to attainment of the goals. Steps of Portfolio Assessment Although many variations of portfolio assessment are in use, most fall into two basic types: process portfolios and product portfolios (Cole, Ryan, & Kick, 1995). These are not the only kinds of portfolios in use, nor are they pure types clearly distinct from each other. It may be more helpful to think of these as two steps in the portfolio assessment process, as the participant(s) and staff reflectively select items from their process portfolios for inclusion in the product portfolio. Step 1: The first step is to develop a process portfolio, which documents growth over time toward a goal. Documentation includes statements of the end goals, criteria, and plans for the future. This should include baseline information, or items describing the participant's performance or mastery level at the beginning of the program. Other items are "works in progress", selected at many interim points to demonstrate steps toward mastery. At this stage, the portfolio is a formative evaluation tool, probably most useful for the internal information of the participant(s) and staff as they plan for the future. Step 2: The next step is to develop a product portfolio (also known as a "best pieces portfolio"), which includes examples of the best efforts of a participant, community, or program. These also include "final evidence", or items which demonstrate attainment of the end goals. Product or "best pieces" portfolios encourage reflection about change or learning. The program participants, either individually or in groups, are involved in selecting the content, the criteria for selection, and the criteria for judging merits, and "evidence" that the criteria have been met (Winograd & Jones, 1992). For individuals and communities alike, this provides opportunities for a sense of ownership and strength. It helps to show-case or communicate the accomplishments of the person or program. At this stage, the portfolio is an example of summative evaluation, and may be particularly useful as a public relations tool. Distinguishing Characteristics Certain characteristics are essential to the development of any type of portfolio used for assessment. According to Barton and Collins (1997), portfolios should be: 1) Multisourced (allowing for the opportunity to evaluate a variety of specific evidence)
  • 75.
    Multiple data sourcesinclude both people (statements and observations of participants, teachers or program staff, parents, and community members), and artifacts (anything from test scores to photos, drawings, journals, & audio or videotapes of performances). 2) Authentic (context and evidence are directly linked) The items selected or produced for evidence should be related to program activities, as well as the goals and criteria. If the portfolio is assessing the effect of a program on participants or communities, then the "evidence" should reflect the activities of the program rather than skills that were gained elsewhere. For example, if a child's musical performance skills were gained through private piano lessons, not through 4-H activities, an audio tape would be irrelevant in his 4-H portfolio. If a 4-H activity involved the same child in teaching other children to play, a tape might be relevant. 3) Dynamic (capturing growth and change) An important feature of portfolio assessment is that data or evidence is added at many points in time, not just as "before and after" measures. Rather than including only the best work, the portfolio should include examples of different stages of mastery. At least some of the items are self-selected. This allows a much richer understanding of the process of change. 4) Explicit (purpose and goals are clearly defined) The students or program participants should know in advance what is expected of them, so that they can take responsibility for developing their evidence. 5) Integrated (evidence should establish a correspondence between program activities and life experiences) Participants should be asked to demonstrate how they can apply their skills or knowledge to real- life situations. 6) Based on ownership (the participant helps determine evidence to include and goals to be met) The portfolio assessment process should require that the participants engage in some reflection and self-evaluation as they select the evidence to include and set or modify their goals. They are not simply being evaluated or graded by others. 7) Multipurposed (allowing assessment of the effectiveness of the program while assessing performance of the participant). A well-designed portfolio assessment process evaluates the effectiveness of your intervention at the same time that it evaluates the growth of individuals or communities. It also serves as a communication tool when shared with family, other staff, or community members. In school
  • 76.
    settings, it canbe passed on to other teachers or staff as a child moves from one grade level to another. Analyzing and Reporting Data As with any qualitative assessment method, analysis of portfolio data can pose challenges. Methods of analysis will vary depending on the purpose of the portfolio, and the types of data collected (Patton, 1990). However, if goals and criteria have been clearly defined, the "evidence" in the portfolio makes it relatively easy to demonstrate that the individual or population has moved from a baseline level of performance to achievement of particular goals. It should also be possible to report some aggregated or comparative results, even if participants have individualized goals within a program. For example, in a teen peer tutoring program, you might report that "X% of participants met or exceeded two or more of their personal goals within this time frame", even if one teen's primary goal was to gain public speaking skills and another's main goal was to raise his grade point average by mastering study skills. Comparing across programs, you might be able to say that the participants in Town X on average mastered 4 new skills in the course of six months, while those in Town Y only mastered 2, and speculate that lower attendance rates in Town Y could account for the difference. Subjectivity of judgements is often cited as a concern in this type of assessment (Bateson, 1994). However, in educational settings, teachers or staff using portfolio assessment often choose to periodically compare notes by independently rating the same portfolio to see if they are in agreement on scoring (Barton & Collins, 1997). This provides a simple check on reliability, and can be very simply reported. For example, a local programmer could say "To ensure some consistency in assessment standards, every 5th portfolio (or 20%) was assessed by more than one staff member. Agreement between raters, or inter-rater reliability, was 88%". There are many books and articles that address the problems of analyzing and reporting on qualitative data in more depth than can be covered here. The basic issues of reliability, validity and generalizability are relevant even when using qualitative methods, and various strategies have been developed to address them. Those who are considering using portfolio assessment in evaluation are encouraged to refer to some of the sources listed below for more in-depth information.