Treasury Board of Canada Secretariat. (2012). Policy on evalua.docx

Treasury Board of Canada Secretariat. (2012). Policy on
evaluation. Retrieved from
http://www.tbs-sct.gc.ca/pol/doc-eng.aspx?id=15024
U.S. Government Accountability Office. (2011). GPRA
Modernization Act implementation
provides important opportunities to address government
challenges (GAO-11–617T).
Retrieved from http://www.gao.gov/assets/130/126150.pdf
Von Bertalanffy, L. (1968). General system theory:
Foundations, development, applications (Rev.
ed.). New York: G. Braziller.
Wandersman, A., & Fetterman, D. (2007). Empowerment
evaluation: Yesterday, today, and
tomorrow. American Journal of Evaluation, 28(2), 179–198.
Weibe, R. H. (1962). Businessmen and reform: A study of the
progressive movement. Cambridge,
MA: Harvard University Press.
Wholey, J. S. (2001). Managing for results: Roles for evaluators
in a new management era.
American Journal of Evaluation, 22(3), 343–347.
Wildavsky, A. B. (1979). Speaking truth to power: The art and
craft of policy analysis. Boston,
MA: Little Brown.
Williams, D. W. (2003). Measuring government in the early
twentieth century. Public

Administration Review, 63(6), 643–659.
Wilson, W. (1887). The study of administration. Political
Science Quarterly, 2(2), 197–222.
WorkSafeBC. (2011). Reports: See 2010 annual report and
2011–2013 service plan. Retrieved
from
http://www.worksafebc.com/publications/reports/default.asp
CHAPTER 9
DESIGN AND IMPLEMENTATION OF
PERFORMANCE MEASUREMENT SYSTEMS
Introduction
Key Steps in Designing and Implementing a Performance
Measurement System
Identify the Organizational Champions of This Change
Understand What Performance Measurement Systems Can and
Cannot Do
Establish Multichannel Ways of Communicating That Facilitate
Top-Down, Bottom-Up,
and Horizontal Sharing of Information, Problem Identification,
and Problem Solving
Clarify the Expectations for the Intended Uses of the
Performance Information That Is
Created
Identify the Resources Available for Designing, Implementing,
and Maintaining the
Performance Measurement System
Take the Time to Understand the Organizational History Around

Similar Initiatives
Develop Logic Models for the Programs for Which Performance
Measures Are Being
Developed, and Identify the Key Constructs to Be Measured
Identify Any Constructs That Apply Beyond Single Programs
Involve Prospective Users in Reviewing Logic Models and
Constructs in the Proposed
Measure the Constructs That Have Been Identified as Parts of
the Performance
Measurement System
Record, Analyze, Interpret, and Report the Performance Data
Regularly Review Feedback From the Users and, If Needed,
Make Changes to the
Performance Measurement for Public Accountability
Summary
Discussion Questions
Appendix A: Organizational Logic Models
References
INTRODUCTION
In this chapter, we begin by introducing two complementary
perspectives on public sector
organizations: (1) a technical/rational view that emphasizes
systems and structures and (2) a

political/cultural view that emphasizes the dynamics that
develop when we take into account
people interacting to get things done. Then, we introduce and
elaborate 12 steps that are
important in designing and implementing performance
measurement systems. These steps
reflect both the technical/rational and the political/cultural
perspectives on organizations. As
we describe each step, we offer advice and also point to
possible pitfalls and limitations while
working within complex organizations. The chapter ends with a
section that serves as a
transition to Chapter 10, which discusses the uses of
performance results.
The process of designing and implementing performance
measurement systems uses core
knowledge and skills that are also a part of designing,
conducting, and reporting program
evaluations. In Chapter 8, we pointed out that program
evaluation and performance
measurement share core knowledge and skills including logic
modeling and measurement. In
addition, understanding research designs and the four kinds of
validity we described in Chapter
3 is valuable for understanding and working with the strengths
and limitations of performance
measurement systems.
In Chapter 1, we outlined the steps that make up a typical
program evaluation. In this
chapter, we will do the same for performance measurement
systems, understanding that for
each situation, there will be unique circumstances that can
result in differences between the
checklist below and the process that is appropriate for that

context. Each of the 12 steps of
designing and implementing a performance measurement system
is elaborated to clarify issues
and possible problems. We distinguish designing and
implementing performance measurement
systems from the uses of such systems. Usage is a critical topic
on its own, and we will
elaborate on it in Chapter 10.
Designing and implementing performance measurement systems
can be a significant
organizational change, particularly in public sector
organizations that have focused on
processes instead of results. Depending on the origins of such
an initiative (external to the
organization, internal, top-down, or manager driven), different
actors and factors will be more
or less important. When we design and implement performance
measurement systems that are
intended to be sustainable, we must go beyond normative
frameworks that focus on technical
and rational steps, and consider the “psychological, cultural,
and political implications of
organizational change” (de Lancer Julnes, 1999, p. 49). de
Lancer Julnes and Holzer (2001)
have distinguished a rational/technical framework and a
political/cultural framework as key to
understanding the successful adoption, implementation, and use
of performance measures.
The technical/rational perspective is grounded in a view of
organizations as complex
rational means–ends systems that are designed to achieve
purposive ends. This view
emphasizes the importance of systems and formal structures as
keys to understanding how

complex organizations work and how to change them. With
respect to performance
measurement systems, as they are designed and implemented
there are rational and technical
factors to keep in mind. These factors include having sufficient
resources, training people
appropriately, aligning management systems, developing
appropriate information systems, and
developing valid and reliable performance measures. It is
important to have an overall plan that
organizes the process, including who should be involved as
different stages, how the stages
link timing-wise, what is expected—and from whom—as each
stage is implemented, and how
the overall system is expected to function once it has been
implemented.
The political/cultural perspective on organizations emphasizes
the people dynamics in
organizations rather than the systems and structures in which
they are embedded.
Organizations as political systems is one of the metaphors that
Gareth Morgan (2006) includes
in his seminal book Images of Organization. This view of
organizations involves
understanding how people interact with and in complex
organizations. Performance
management systems and structures play a role, but individuals
and coalitions can influence
and even negate the results intended from them. Organizational
politics is an inevitable and
important feature of organizational dynamics. Politics does not
have to be about political
parties or formal political allegiances. Instead, it is essentially

about the processes (both formal
and informal) that are used to allocate scarce resources among
competing values. Even though
there will be organizational and program objectives, with
resources being devoted to their
achievement (the rational purposes of organizations), there will
also be interests and incentives,
and coalitions of stakeholders who can either facilitate
implementing and using performance
measurement systems, or impede them. Organizations are more
than systems and structures.
They are fundamentally about people interacting in patterns that
reflect both the intended
outcomes of the organizations as well as their own personal or
group objectives (which may or
may not support the stated organizational objectives).
Overlaid on these two views of organizations is the wide range
of environments in which
organizations can be embedded. In Chapter 8, we introduced the
idea of complex systems to
show how complexity can serve as a useful lens to understand
evolving organizations in
evolving environments. What we will see in Chapter 10 is that
some environments for
organizations are more conducive to sustaining performance
measurement systems than others.
Where performance measurement is focused on public reporting
in high-stakes, accountability-
oriented environments, it can be challenging to construct and
maintain performance
measurement systems. One “solution” that we will explore in
Chapter 10 is to decouple the
performance measurement system that is used for (internal)
performance management from the
performance measures that are used for external reporting

(McDavid & Huse, 2012).
The 12 steps discussed in this chapter outline a process that is
intended to increase the
chances that a performance measurement system will be
successfully implemented and
sustained. A key part of sustaining performance measurement as
an evaluative function in
organizations is to use the performance information (Moynihan,
Pandey, & Wright, 2012). In
other words, there must be a demand for performance
information, to sustain the supply.
Supplying performance information (e.g., preparing
performance reports) where there is
limited or no demand tends to undermine the credibility of the
system—lack of use is an
indication that the system is not aligned with actual substantive
organizational priorities. In
many situations, the conditions under which actual
organizations undertake the development of
performance measures are less than ideal. In the summary to
this chapter, we identify the six
steps among the 12 that are most critical if organizations want
performance measurement
systems that can contribute to managerial and organizational
efforts to improve efficiency,
effectiveness, and accountability.
KEY STEPS IN DESIGNING AND IMPLEMENTING A
PERFORMANCE MEASUREMENT SYSTEM
Table 9.1 summarizes the key steps in designing and
implementing a performance
measurement system. Each of these steps can be viewed as a
guideline—no single performance
measurement development and implementation process will

conform to all of them. In some
cases, the process may diverge from the sequence of steps.
Again, this could be due to local
factors. Each of the steps in Table 9.1 is discussed more fully in
the following sections. Our
discussion of the steps is intended to do two things: (1)
elaborate on what is involved and (2)
point out limitations and pitfalls along the way. As you review
the steps, you will see that most
of them acknowledge the importance of both a rational/technical
and a political/cultural view
of organizations. Beyond the technical issues, it will be
important to consider the interactions
among the people, incentives, history, and who wins and who
loses.
One way that we can look at these 12 steps is to divide them
between a technical/rational
and a cultural/political perspective on organizations. Among the
steps, the majority are more
closely aligned with the political/cultural view of organizations:
identifying the champions of
this change, understanding what performance measurement
systems can actually do (and not
do), establishing and using communication channels, clarifying
intended uses (for all the
stakeholders involved), understanding the organizational history
and its impacts on this change
process, involving users in developing models and performance
measures, and regularly
reviewing and acting on user feedback. The others—identifying
resources, developing logic
models, indentifying constructs that span programs or the whole

organization, measuring
constructs, and analyzing and reporting performance results—
are more closely aligned with a
technical/rational view of organizations. Our approach
emphasizes the importance of both
perspectives and their complementarity in building and
implementing sustainable performance
measurement systems.
Table 9.1 Key Steps in Designing and Implementing a
1. Identify the organizational champions of this change.
2. Understand what a performance measurement system can and
cannot do and why it is
needed.
3. Establish multichannel ways of communicating that facilitate
top-down, bottom-up,
and horizontal sharing of information, problem identification,
and problem solving.
4. Clarify the expectations for the uses of the performance
information that will be
created.
5. Identify the resources available for developing,
implementing, maintaining, and
renewing the performance measurement system.
6. Take the time to understand the organizational history around
similar initiatives.
7. Develop logic models for the programs or lines of business
for which performance
measures are being developed.

8. Identify additional constructs that are intended to represent
performance for
aggregations of programs or the whole organization.
9. Involve prospective users in reviewing the logic models and
constructs in the
proposed performance measurement system.
10. Measure the key constructs in the performance measurement
system.
11. Record, analyze, interpret, and report the performance data.
12. Regularly review feedback from users and, if needed, make
changes to the
performance measurement system.
Identify the Organizational Champions of This Change
The introduction of performance measurement, particularly
measuring outcomes, is an
important change in both an organization’s way of doing
business and its culture (de Lancer
Julnes & Holzer, 2001). Unlike program evaluations,
performance measurement systems are
ongoing, and it is therefore important that there be
organizational leaders who are champions
of this change, to provide continuing support for the process
from its inception onward. In
many cases, an emphasis on measuring outcomes is a significant
departure from existing
practices of tracking program inputs (money, human resources),
program activities, and
program outputs (work done). Most managers have experience
measuring/recording inputs,

processes, and outputs, so the challenge in outcome-focused
performance measurement is in
specifying the expected outcomes (stating clear objectives for
programs, lines of business, or
organizations) and facilitating organizational commitment to the
process of measuring and
working with outcome-related results.
By including outcomes, performance measurement commits
organizations to comparing
their actual results with the stated objectives. In many
jurisdictions, objectives are parsed into
annual targets, and actual outcomes are compared with the
targets for that year. Thus, the
performance measurement information commonly is intended to
serve multiple purposes,
including enhancing managerial decision making, encouraging
organizational alignment, and
promoting transparency and accountability.
New Public Management emphasizes the (normative)
importance of freeing managers from
“red tape,” that is, process-related restrictions, so that they can
more efficiently and effectively
use the resources that are available (Moynihan, 2008; Norman &
Gregory, 2003). Managerial
flexibility, coupled with measures for intended outcomes, is
expected to offer incentives to
improve their operations. In Chapter 10, we will look at the
actual uses of performance
information in governments and public sector organizations and
explore in some depth the
incentives for managers to become involved in developing and
using performance measures.
Because performance measurement systems are ongoing, it is

important that the champions
of this change support the process from its inception onward.
Moynihan et al. (2012) suggest
that leadership commitment is critical to the process and also
affects performance information
uses. The nature of performance measures is that they create
new information—a potential
resource in public and nonprofit organizations. Information can
reduce uncertainty with respect
to the questions it is intended to answer, but the process of
building performance measurement
into the organization’s business can significantly increase
uncertainty for managers. The
changes implied by measuring results (outcomes), reporting
results, and being held
accountable for results can loom large as the system is being
designed and implemented. If a
performance measurement system is implemented as a top-down
initiative, managers may see
this as a threat to their existing practices. Typically, some will
resist this change, and if
leadership commitment is not sustained, the transition to
performance measurement as a part of
managing programs will wane with time (de Waal, 2003).
A results-oriented approach to managing has implications for
public sector accountability.
In many jurisdictions, public organizations are still expected to
operate in ways that conform to
process-focused notions of accountability. In Canada, for
example, the Westminster
parliamentary system makes the minister who heads each
government department nominally
accountable for all that happens in his or her domain. The
adversarial nature of politics,
combined with the tendency of the media and interest groups to

emphasize mistakes that
become public, can bias managerial behavior toward a
procedurally focused process, wherein
only “safe” decisions are made (Propper & Wilson, 2003).
Navigating such environments
while working to implement performance measurement systems
requires leadership that is
willing to embrace some risks, not only in developing the
system but in encouraging a culture
wherein performance results are used to inform decision
making. We explore these issues in
much greater detail in Chapters 10 and 11.
In most governmental settings, leadership at two levels is
required. Senior executives in a
ministry or department must actively support the process of
constructing and implementing a
performance measurement system. But it is equally important
that the political leadership be
supportive of the development, implementation, and use of a
system. The key intended users of performance information that
is publicly reported are the
elected officials (of all the political parties) (McDavid & Huse,
2012).
In British Columbia, Canada, for example, the Budget
Transparency and Accountability
Act (Government of British Columbia, 2001) specifies that
annual performance reports are to
be tabled in the legislative assembly. The goal is to have
committees of the legislature review
these reports and use them as they scrutinize ministry

operations and future budgets. Each year,
the public reports are tabled in June and are based on the actual
results for the fiscal year
ending March 31. Strategically, the reports should figure in the
budgetary process for the
following year, which begins in the fall. If producing and
publishing these performance reports
is not coupled with scrutiny of the reports by legislators, then a
key reason for committing
resources to this form of public accountability is undermined. In
Chapter 10, we will look at
the ways in which elected officials actually use performance
reports.
In summary, an initial organizational commitment to
performance measurement, which
typically includes designing the system, can produce “results”
that are visible (e.g., a website
with the performance measurement framework), but
implementing and working with the
system over three to five years is a much better indicator of its
sustainability, and for this to
happen, it is critical to have organizational champions of the
process.
Understand What Performance Measurement Systems Can and
Cannot Do
There are limitations to what performance measurement systems
can do, yet in some
jurisdictions, performance measurement has been treated as a
cost-effective substitute for
program evaluation (Martin & Kettner, 1996). Public sector
downsizing has diminished the
resources committed to program evaluations, and managers have
been expected to initiate

performance measurement instead (McDavid, 2001b). The
emphasis on performance reporting
for public accountability, and the assumption that that can drive
performance improvements, is
the principal reason for making performance measurement the
central evaluative approach in
many organizations. We will look at this assumption in Chapter
10 when we discuss the uses
of performance information when public reporting is mandated.
Performance measurement can be a powerful tool in managing
programs or organizations.
If the measures are valid and the information is timely,
emerging trends can identify possible
problems (a negative-feedback mechanism) as well as possible
successes (positive feedback).
But performance measurement results only describe what is
going on; they do not explain why
it is happening (McDavid & Huse, 2006; Newcomer, 1997).
Recall the distinction between intended outcomes and actual
outcomes (introduced in
Chapter 1). Programs are designed to produce specified
outcomes, and one way to judge the
success of a program is to see whether the intended outcomes
have actually occurred. If the
actual outcomes match the intended outcomes, we might be
prepared to conclude that the
program was effective.
However, we cannot conclude that the outcomes are due to the
program unless we have
additional information that supports the assumption that other
factors in the environment could
not have caused the observed outcomes. Getting that
information is at the core of what

program evaluation is about, and it is essential that those using
information understand this distinction. As Martin and Kettner
(1996) commented when
discussing the cause-and-effect relationship that many people
mistakenly understand to be
implied in performance measurement information, “Educating
stakeholders about what
outcome performance measures really are, and what they are
not, is an important—and little
discussed—problem associated with their use by human service
programs” (p. 56).
Establishing the causal link between observed outcomes and the
program that was intended
to produce them is the attribution problem. Some analysts have
explicitly addressed this
problem for performance measurement. Mayne (2001) offers six
strategies intended to reduce
the uncertainty about whether the observed performance
measurement outcomes can be
attributed to the program. Briefly, his suggestions are as
follows: (1) develop an intended-
results chain; (2) assess the existing research/evidence that
supports the results chain; (3)
assess the alternative explanations for the observed results; (4)
assemble the performance story;
(5) seek out additional evidence, if necessary; and (6) revise
and strengthen the performance
story. Several of his suggestions are common to both program
evaluation and performance
measurement, as we have outlined them in this book. His final
(seventh) suggestion is to do a

program evaluation if the performance story is not sufficient to
address the attribution
question,. This suggestion supports a key theme of this book—
that performance measurement
and program evaluation are complementary, and each offers
ways to reduce uncertainty for
managers and other stakeholders in public and nonprofit
organizations.
There are nuances to the strengths and limitations of
performance measurement systems.
Some programs or organizations are easier to work with in
developing and implementing
performance measurement systems that can credibly connect
programs to actual outcomes. In
Chapter 2, we introduced the concept of program technologies
to help explain why some
program logics “work” better than others. Recall that for
programs that are constructed around
high-probability program technologies (highways maintenance
programs would be an
example), it is relatively straightforward to assume a linkage
between program outputs and
outcomes. In other words, if you know how many lane miles of
highway (as a proportion of all
the roads in a given jurisdiction) are being kept free of snow
and ice in the wintertime (an
output), you have a pretty good idea of the safety of the roads
(an outcome). But if you know
how many families were served by a program intended to
improve parenting skills so that
parents can keep their children instead of having to give them
up to foster care, you probably
do not know (at least not to the same degree as the
transportation example above) whether the
program actually succeeded in improving the likelihood that

children are not taken out of their
homes to be placed in foster homes. The attribution question is
not as easily answered for cases
such as this one, that are constructed with low-probability
program technologies.
Establish Multichannel Ways of Communicating That Facilitate
Top-Down, Bottom-
Up, and Horizontal Sharing of Information, Problem
Identification, and Problem
Solving
It is quite common for public sector or nonprofit organizations
to begin developing a
performance measurement system informally. Managers who are
keen to obtain information
that they can use formatively will take the lead in developing
their own measures and
procedures for gathering and using the data. This bottom-up
process is one that encourages a
sense of ownership of the system. In the British Columbia
provincial government, this more
manager-driven process spanned the period roughly from 1995
to 2000 (McDavid, 2001a).
Some departments made more progress than others, in part
because some department heads
were more supportive of this process than others. Because they
were driven by internal
performance management needs, the systems that developed
were adapted to local needs.
To support this evolutionary bottom-up process in the British
Columbia government, the

Treasury Board Staff (a central agency responsible for budget
analysis and program approval)
hosted an informal network of government practitioners who
had an interest in performance
measurement and performance improvement. The Performance
Measurement Resource Team
held monthly meetings that included speakers from ministries
and outside agencies who
provided information on their problems and solutions.
Attendance and contributions were
voluntary. Information sharing was the principal purpose of the
sessions.
When the Budget Transparency and Accountability Act
(Government of British Columbia,
2000) was passed, mandating performance measurement and
public reporting government-
wide, the stakes changed dramatically. Performance
measurement systems that had been
intended for formative uses were now exposed to the
requirement that a selection of the
performance results would be made public in an annual report.
This top-down directive to
report performance for summative purposes needed to be
meshed with the bottom-up
(formative) cultures that had been developed in some ministries.
Some departments that had existing performance measurement
systems confronted the
challenge of melding the existing formative and new summative
thrusts of the required system
by communicating up and down and across the organization. For
example, one department
responsible for the publicly funded college and university
system in the province conducted a
series of formal and informal workshops and meetings with

executives and senior and middle
managers in attendance. Over a period of a year, using an
iterative process, the department was
able to develop a general understanding of how the new,
externally focused performance
measurement system would look, what the new system would
do, and how it would connect
with the internal performance management system, which the
department managers were keen
to sustain.
Generally, public organizations that undertake the design and
implementation of
performance measurement systems that are intended to be used
internally must include the
intended users (Kravchuk & Schack, 1996), the organizational
leaders of this initiative, and the
methodologists (Thor, 2000). Top-down communications can
serve to clarify direction, offer a
framework and timelines for the process, clarify what resources
will be available, and affirm
the importance of this initiative. Bottom-up communications can
question or seek clarification
of definitions, timelines, resources, and direction. Horizontal
communications can provide
examples, share problem solutions, and offer informal support.
The communications process outlined here exemplifies a culture
that needs to emerge in
the organization if performance management is to take hold and
be sustainable. Key to
developing a performance management culture is treating
information as a resource, being
willing to “speak truth to power” (Wildavsky, 1979), and not
treating performance information
as a political weapon. Kravchuk and Schack (1996) suggest that

the most appropriate metaphor
to build a performance culture is the learning organization. This
construct was introduced by
Senge (1990) and continues to be a goal for public
organizations that have committed to
performance measurement as part of a broader performance
management framework (Mayne,
2008; Mayne & Rist, 2006).
Clarify the Expectations for the Intended Uses of the
Performance Information That Is
Created
Developing performance measures is intended, in part, to
improve performance by
providing managers and other stakeholders with information
they can use to monitor and make
adjustments to program processes. Having “real-time”
information on how programs are
tracking is often viewed by managers as an asset and is an
incentive to get involved in
constructing and implementing a performance measurement
system. Managerial involvement
in performance measurement is a widespread expectation and is
reflected in policies in some
jurisdictions.
To attract the buy-in that is essential for successful design and
implementation of
performance measurement systems, we believe that performance
measurement needs to be
used first and foremost for internal performance improvement.
Public reporting can be a part of

the process of using performance measurement data, but it
should not be the primary reason for
developing a performance measurement system (Hildebrand &
McDavid, 2011). A robust
performance measurement system should support using
information to inform improvements
to programs and/or the organization. It should help identify
areas where activities are most
effective in producing intended outcomes and areas where
improvement could be made (de
Waal, 2003).
Designing and implementing a performance measurement
system primarily for public
accountability usually entails public reporting of performance
results, and in jurisdictions
where performance results can be used to criticize elected
officials or bureaucrats, there are
incentives to limit reporting of anything that would reflect
negatively on the government of the
day. Richard Prebble, a long-time political leader in New
Zealand, outlines Andrew Ladley’s
“Iron Rule of the Political Contest”:
• The opposition is intent on replacing the government.
• The government is intent on remaining in power.
• MPs want to get re-elected.
• Party leadership is dependent on retaining the confidence of
colleagues (which is shaped
by the first three principles). (Prebble, 2010, p. 3)
In terms of performance measures to be reported publicly, this
highlights that
organizational performance information will not only be used to
review performance but will

likely be mined for details that can be used to embarrass the
government.
In Chapter 10, we will look at the issues involved in using
systems to contribute to public accountability. Understanding
and balancing the incentives for
participants in this process is one of the significant challenges
for the leaders of an
organization. As we mentioned earlier, developing and then
using a performance measurement
system can create uncertainty for those whose programs are
being assessed. They will want to
know how the information that is produced will affect them,
both positively and negatively. It
is essential that the leaders of this process be forthcoming about
the intended uses of the
measurement system.
If a system is designed for formative program improvement
purposes, using it for
summative purposes will change the incentives for those
involved. Sustaining the internal uses
of performance information will mean involving those who have
contributed to the (earlier)
formative process. Changing the purposes of a performance
measurement system affects the
likelihood that gaming will occur as data are collected and
reported (Pollitt, 2007; Pollitt, Bal,
Jerak-Zuiderent, Dowswell, & Harrison, 2010; Propper &
Wilson, 2003). In Chapter 10, we
will discuss gaming as an unintended response to incentives in
systems.
Some organizations begin the design and implementation

process by making explicit the
intention that the measurement results will only be used
formatively for a 3- to 5-year period of
time, for example. That can generate the kind of buy-in that is
required to develop meaningful
measures and convince participants that the process is actually
useful to them. Then, as the
uses of the information are broadened to include external
reporting, it may be more likely that
managers will see the value of a system that has both formative
and summative purposes.
Pollitt et al. (2010) offer us a cautionary example, from the
British health services, of the
transformation of the intended uses of performance information.
His example suggests that
performance measurement systems that begin with formative
intentions tend, over time, to
migrate to summative uses.
In the early 1980s in Britain, there were broad government
concerns with hospital
efficiency that prompted the then Conservative government to
initiate a system-wide
performance measurement process. Right from the start, the
messages that managers and
executives were given were ambiguous. Pollitt et al. (2010) note
that
despite the ostensible connection to government aims to
increase central control over the
NHS, the Minister who announced the new package described
PIs [performance

indicators] in formative terms. Local managers were to be
equipped to make comparisons,
and the stress was on using them to trigger inquiry rather than
as answers in themselves, a
message that was subsequently repeated throughout the 1980s.
(p. 17)
However, by the early 1990s, the “formative” performance
results were being reported
publicly, and comparisons among health districts (health trusts)
were a central part of this
transition. “League tables,” wherein districts were compared
across a set of performance
measures, marked the transition from formative to summative
uses of the performance
information. By the late 1990s, league tables had evolved into a
“star rating system,” wherein
districts could earn up to three stars for their performance. The
Healthcare Commission, a
government oversight and audit agency, conducted and
published the ratings and rankings.
Pollitt et al. (2010) summarize the transition from a formative
to a summative performance
measurement system thus:
In more general terms, the move from formative to summative
may be thought of as the
result of PIs [performance indicators] constituting a standing
temptation to executive
politicians and top managers. Even if the PIs were originally
installed on an explicitly
formative basis (as in the UK), they constitute a body of
information which, when things
(inevitably) go wrong, can be seized upon as a new means of
control and direction. (p. 21)

This change brought with it different incentives for those
involved and ushered in an
ongoing dynamic wherein managerial responses to performance-
related requirements included
gaming the measures, that is, manipulating activities and/or the
information to enhance
performance ratings and reduce poor performance results in
ways that were not intended by the
designers of the system. This issue will be explored in greater
detail in the next chapter.
Identify the Resources Available for Designing, Implementing,
and Maintaining the
Organizations planning performance measurement systems often
face substantial resource
constraints. One of the reasons for embracing performance
measurement is to do a better job of
managing the (scarce) available resources. If a performance
measurement system is mandated
by external stakeholders (e.g., a central agency, an audit office,
or a board of directors), there
may be considerable pressure to plunge in without fully
planning the design and
implementation phases.
Often, organizations that are implementing performance
measurement systems are
expecting to achieve efficiency gains, as well as improved
effectiveness. Downsizing may have
already occurred, and performance measurement is expected to
occur within existing budgets.

Those involved may have the expectation that this work can be
added onto the existing
workload of managers—they are clearly important stakeholders
and logically should be in the
best position to suggest or validate the proposed measures.
Under such conditions, the
development work may be assigned to an ad hoc committee of
managers, analysts, co-op or
intern students, other temporary employees, or consultants.
Identifying possible performance measures is usually iterative,
time-consuming work, but
it is only a part of the process. The work of implementing the
measures (identifying data that
correspond to the performance constructs and collecting data for
the measures), preparing
reports and briefings, and maintaining and renewing the system
is the key difference between a
process that offers the appearance of having a performance
measurement system in place (a
website, progress reports, testimonials by participants in the
process) and a process that
actually results in using performance data on a continuing basis
to improve the programs in the
organization. Although a “one-shot” infusion of resources can
be very useful as a way to get
the process started, it is not sufficient to sustain the system.
Measuring and reporting
performance takes ongoing commitments of resources, including
the time of persons in the
organization.
Training for staff who will be involved in the design and
implementation of the
performance measures is important. On the face of it, a
minimalist approach to measuring

performance is straightforward. “Important” measures are
selected, perhaps by an ad hoc
committee; data are marshaled for those measures; and the
required reports are produced. But a
commitment to designing and implementing a performance
measurement system that is
sustainable requires an understanding of the process of
connecting performance measurement
to managing with performance data (Kates, Marconi, & Mannle,
2001).
In some jurisdictions, the creation of legislative mandates for
public performance reporting
has resulted in organizational responses that meet the legislative
requirements but do not build
the capacity to sustain performance measurement. However,
performance measurement is
intended to be a means rather than an end in itself. Unless the
organization is committed to
using the information to manage performance, it is unlikely that
will be well integrated into the operations of the organization.
In situations where there are financial barriers to validly
measuring outcomes, it is common
for performance measures to focus on outputs. In many
organizations, outputs are commonly
easier to measure, and the data are more readily available. Also,
managers are usually more
willing to have output data reported publicly because outputs
are typically much easier to
attribute to a program or even program activity. Some
performance measurement systems have
focused on outputs from their inception. The best example of
that approach has been in New
Zealand, where public departments and agencies negotiate

output-focused contracts with the
New Zealand Treasury (Gill, 2011). However, although outputs
are important as a way to
report work done, they cannot be entirely substituted for
outcomes; the assumption that if
outputs are produced, outcomes must have been produced is
usually not defensible (see the
discussion of measurement validity vs. the validity of causes
and effects in Chapter 4).
Take the Time to Understand the Organizational History Around
Similar Initiatives
Performance measurement is not new. In Chapter 8, we learned
that in the United States,
local governments began measuring the performance of services
in the first years of the 20th
century (Williams, 2003). Since then, there have been several
waves of government reform that
have included measuring results. New Public Management
emerged in the early 1990s (Hood,
1991), in part from efforts by Western democratic governments
to eliminate fiscal deficits in
the 1970s and 1980s.
In most public organizations, current efforts to develop
performance measures come on top
of other, previous attempts to improve the efficiency and
effectiveness of their operations.
Managers who have been a part of previous change efforts,
particularly unsuccessful ones,
have experience that will affect their willingness to support
current efforts to establish a system
to measure performance. It is important to understand the

organizational memory of past
efforts to make changes, and to gain some understanding of why
previous efforts to make
changes have or have not succeeded. The organizational lore
around these changes is as
important as a dispassionate view, in that participants’ beliefs
are the reality that the current
change will first need to address.
A significant issue for some public sector organizations can be
the retirement of employees
who exercise their option to leave early, facilitating downsizing
goals that governments have
put into place (Levine, Rubin, & Wolohojian, 1981). Long-term
employees will often have an
in-depth understanding of the organization and its history. In
organizations that have a history
of successful change initiatives, losing the people who were
involved can be a liability when
designing and implementing a performance measurement
system. Their participation in the
past may have been important in successfully implementing
change initiatives. On the other
hand, if an organization has a history of questionable success in
implementing change
initiatives, organizational turnover may actually be an asset.
Develop Logic Models for the Programs for Which Performance
Measures Are Being
Developed, and Identify the Key Constructs to Be Measured
In Chapter 2, we discussed logic models as a way to make
explicit the intended cause-and-
effect linkages in a program or even an organization. We
discussed several different styles of
logic models and pointed out that selecting a logic modeling

approach depends in part on how
explicit one wants to be about intended cause-and-effect
linkages. A key requirement of logic
modeling that explicates causes and effects is the presentation
of which outputs are connected
to which outcomes.
Key to constructing and validating logic models with
stakeholders is identifying and stating
clear objectives for programs (Kravchuk & Schack, 1996).
Although this requirement might
seem straightforward, it is one of the more challenging aspects
of the logic modeling process.
Often, program or organizational objectives are put together to
satisfy the expectations of
stakeholders, who may not agree among themselves about what
a program is expected to
accomplish. One way these differences are sometimes resolved
is to construct objectives that
are general enough so as to appear to meet competing
expectations. Although this solution is
expedient from an organizational-political standpoint, it
complicates the process of measuring
performance.
Criteria for sound program objectives were discussed in Chapter
1. Briefly, objectives
should state an expected change or improvement if the program
works (e.g., reducing the
number of drug-related crimes), an expected magnitude of
change (e.g., reducing the number
of drug-related crimes by 20%), a target audience/population
(e.g., reducing the number of

drug-related crimes by 20% in Harrisburg, Pennsylvania), and a
time frame for achieving the
intended result (e.g., reducing the number of drug-related
crimes by 20% in Harrisburg,
Pennsylvania, in 2 years).
Although logic models do constrain us in the sense that they
assume that programs are
open systems that are stable enough to be depicted as a static
model, they are useful as a means
of identifying constructs that are candidates for performance
measurement. Martin and Kettner
(1996) have identified three major foci for performance
measures: (1) program efficiency
(comparing inputs with outputs), (2) program quality (whether
the outputs meet some specified
quality standard), and (3) program effectiveness (whether the
intended outcomes have been
achieved). They suggest that a good performance measurement
system needs to track all of
these various program attributes, since each will be important to
at least some program
stakeholders.
The open-systems metaphor also invites us to identify
environmental factors that could
affect the program, including those that affect our outcome
constructs. Although some
performance measurement systems do not measure factors that
are external to the program or
organization, it is worthwhile including such constructs as
candidates for measurement.
Measuring these environmental factors (or at least accounting
for their influences qualitatively)
allows us to begin addressing attribution questions.

In an ideal performance measurement system, both costs and
results data are available and
can be compared. An important driver behind the movement to
develop planning,
programming, and budgeting systems (PPBS) in the 1960s was,
in fact, the expectation that
cost-effectiveness ratios could be constructed. However, the
lack of both budgetary flexibility
and information management capacities in most public sector
organizations resulted in a
significant barrier to being able to fully implement PPBS at that
time.
Most public sector organizations now have accounting systems
that permit managers to
cost out programs. Information systems are more flexible than
in the past, and the budgetary
and expenditure data are more complete. Some organizations
have also developed the capacity
to cost out individual activities within each program (Brimson,
1991).
James Q. Wilson (1989) has suggested that the environment of
public sector organizations
also influences the likelihood that robust measures of outputs
and outcomes can be developed.
Table 9.2 adapts his approach to produce a typology describing
the challenges and
opportunities for measuring outputs and outcomes in different
types of organizations. Coping
organizations (in which work tasks change a lot, and results are
not visible—e.g., central
government policy units), where both program technologies and
environments combine to limit
performance measurement, are the least likely to be successful
in measuring outputs and

outcomes. Production organizations (with simple, repetitive
tasks, the results of which are
visible and countable) are the most likely to be able to build
systems that include outputs and outcomes. Craft organizations
rely on applying mixes of
professional knowledge and skills to unique tasks to produce
visible outcomes—a public audit
office would be an example. Procedural organizations rely on
processes to produce outputs
that are visible and countable but produce outcomes that are
less visible—military
organizations are an example. Thus, craft and procedural
organizations differ in their capacities
to develop output measures (procedural organizations can do
this more readily) and outcome
measures (craft organizations can do this more readily).
Table 9.2 Measuring Outputs and Outcomes: Influences of Core
Technologies and
Organizational Environments
Source: Adapted from Wilson (1989).
Identify Any Constructs That Apply Beyond Single Programs
Organizational logic models can be seen as an extension of
program logic models, but
because they typically focus on a higher-level view of programs
or business lines, the
constructs will be more general. The balanced scorecard
(Kaplan & Norton, 1996) is one type
of organizational performance measurement system that
includes a general (normative) model

of key organizational-level constructs that are intended to be
linked causally. Typically,
balanced scorecards include clusters of performance measures
for four different dimensions:
(1) organizational learning and growth, (2) internal business
processes, (3) customers, and (4)
the financial perspective. Performance measures are constructed
for each of these dimensions.
In Appendix A, Table 9A.1 illustrates an earlier organizational
logic model for the British
Columbia Ministry of Human Resources (now the Ministry of
Social Development). The
ministry was primarily focused on providing income assistance
and moving income assistance
recipients into job training programs as a transition to
employment. Table 9A.1 is complicated,
but if one wants to see how the operations of this entire
organization fit together, an
organizational logic model is a parsimonious way to show this
visually and to identify
constructs that might be candidates for constructing
performance measures.
Some jurisdictions require organizational logic models that
depict the high-level intended
links between strategic outcomes and programs. Figure 9A.1 in
Appendix A is a high-level
logic model of Human Resources and Skills Development
Canada (HRSDC), one of the largest
federal departments in the Canadian government. All federal
departments and agencies in
Canada are required to develop and periodically update a
Program Alignment Architecture that
summarizes departmental objectives/outcomes and how those
are intended to be achieved

through the program structure (Treasury Board of Canada
Secretariat, 2012b). The HRSDC
(2010) logic model shows how strategic outcomes are connected
with clusters of programs.
Each program has its own cluster of outcomes and is expected
to be evaluated on a 5-year
cyclical basis (Treasury Board of Canada Secretariat, 2012a).
Performance measurement systems are sometimes expected to
offer measures of
performance that transcend single government departments, and
measure sectoral or whole
government performance. The Government of Alberta, for
example, publishes an annual report
called Measuring Up: Progress Report on the Government of
Alberta Business Plan, which
describes and graphs performance trends over the previous five
years (Government of Alberta,
2011). Included in the most recent report are summaries of 59
performance measures related to
10 province-wide strategic goals.
Publishing this report is required under the Government
Accountability Act (Government
of Alberta, 1995) and must include “a comparison of the actual
performance results to the
targets included in the government business plan under section
7(3), and an explanation of any
significant variances” (p. 6). The provincial auditor assesses a
sample of the measures in each
annual report—13 of the 59 measures were audited for
“completeness, reliability,
comparability and understandability” (Government of Alberta,

2011, p. 1). Although some of
the measures include comparisons with other jurisdictions (e.g.,
labor productivity is compared
with that of other Canadian provinces), most are displayed as a
time series for Alberta alone.
As part of the annual reporting process, the Alberta government
surveys a random sample of
residents of the province and asks them to rate social, health,
educational, and criminal justice
–related services. The survey results are featured among the
performance measures in the
annual report. The performance measures that are included in
the annual report are selected
from among the ones Alberta government departments have
included in their performance
reports, so the province-wide report is in part a roll-up of
departmental performance results.
Many social problems cannot easily be assigned to one
administrative department. An
example is homelessness. A social services department might
have a mandate to provide funds
to nonprofit organizations or even developers to build housing
for the homeless in a
jurisdiction. Housing is costly, and states or provinces may be
reluctant to undertake such
initiatives on their own. The nature of homelessness, with its
high incidence of mental health
challenges and drug dependences, will mean that housing the
homeless, even if funding and
land to construct housing can be marshaled, is just part of a
more comprehensive suite of
programs needed to address the complex cases that homeless
persons typically present.
Homelessness transcends government departments, and even
levels of government, involving

local, state/provincial, and federal governments. Effectively
addressing this kind of problem
requires collaboration among agencies and governments that
crosses existing organizational
and functional boundaries.
Horizontal initiatives like ones to address homelessness present
challenges for measuring
performance, particularly where there is an expectation that
reporting results will be part of
being accountable (Bakvis & Juillet, 2004). Developing
performance measures for this kind of
program would involve a sharing of responsibility and
accountability for the overall program
objectives. If permitted to focus simply on the objectives of
each government department or
level of government during the design of the system, each
contributor would have a tendency
to select objectives that are conservative, that is, not commit the
department to be responsible
for the overall outcome. In particular, if legislation has been
passed that emphasizes
departments being individually accountable, then broader
sectoral objectives may well be
overlooked.
A similar problem arises for many nonprofit organizations. In
Canada and the United
States, many funding organizations (e.g., governments, private
foundations, the United Way)
are opting for a performance-based approach to their
relationship with organizations that
deliver programs and services. Increasingly, funders expect
results-focused performance
information as a condition for grants funding and renewals.
Governments that have opted for

contractual relationships with nonprofit service providers are
developing performance
contracting requirements that specify deliverables and often tie
funding to the provision of
evidence that these results have been achieved (Bish &
McDavid, 1988).
Nonprofit organizations are often quite small and are dedicated
to the amelioration of a
community problem or issue that has attracted the commitment
of members and volunteers.
Being required to bid for contracts and account for the
performance results of the money they
have received is added onto existing administrative
requirements, and many of these
organizations have limited capacity to do these additional tasks.
Campbell (2002) has pointed
out that in settings where targeted outcomes span several
nonprofit providers, it is beneficial to
have some collaboration among funders and for providers to
agree on ways of directly
addressing the desired outcomes. If providers compete and
funders continue to address parts of
a problem, the same sectoral disregard that was suggested for
government departments will
happen in the nonprofit sector.
One issue that can easily be overlooked as performance
measures are being developed is
the “levels of analysis” problem (McDavid, 2001a). Suppose a
government department
develops a set of performance measures that is intended to
indicate how the organization as a

whole is doing. If the actual performance results suggest that
the organization is meeting its
overall objectives, it might be tempting to conclude that the
programs that contribute to the
objectives are also effective. That would be a mistake because
success at one level does not
warrant a conclusion that performance at other levels is also
comparable. It is possible to have
programs that are not meeting their objectives while, overall,
the organization is meeting its
objectives. Likewise, we cannot use program success alone to
indicate organizational success,
nor can we use individual employee performance measures to
tell us whether programs or the
organization are meeting their objectives.
Ideally, individual and group objectives should connect with
program objectives, which
should in turn connect with organizational objectives. It is
necessary to measure performance
at all of these levels to be able to effectively manage
organizational performance.
One additional issue with respect to organization-level and
sectoral measures of
performance is who should take responsibility for gathering the
data and reporting
interpretations of it. Since reporting responsibilities can be
linked to expectations of
accountability in some organizations, ownership of these
measures becomes an important
organizational-political issue. We will discuss the political
dimensions of performance
measurement in Chapter 10.
Involve Prospective Users in Reviewing Logic Models and

Constructs in the Proposed
Developing logic models of programs and/or the organization as
a whole is an iterative
process. Although the end product is meant to represent the
programmatic and intended causal
reasoning that transforms resources into results, it is essential
that logic models be reviewed
and validated with organizational participants and other
stakeholders. Involvement at this stage
of the development process will validate key constructs for
prospective users and set the
agenda for developing performance measures. Program
managers in particular will have an
important stake in the system. Their participation in validating
the logic models increases the
likelihood that performance measurement results will be useful
for program improvements.
Typically, logic models identify outputs and outcomes that are
linked in intended causal
relationships. Depending on the purposes of the performance
measurement process, some
constructs will be more important than others. For example, if a
logic model for a job training
and placement program operated by a community nonprofit
organization has identified the
number of persons who complete the training as an output and
the number who are employed
full-time one year after the program as an outcome, the program
managers would likely
emphasize the output as a valid measure of program
performance—in part because they have
more control over that construct. But the funders might want to
focus on the permanent

employment results because that is really what the program is
intended to do.
By specifying the intended causal linkages, it is possible to
review the relative placement
of constructs in the model and clarify which ones will be a
priority for measurement. In our
example, managers might be more interested in training
program completions since they are
necessary for any other intended results to occur. Depending on
the clients, getting persons to
actually complete the program can be a major challenge in
itself. If the performance
measurement system is intended to be summative as well, then
measuring the permanent
employment status of program participants would be
important—although there would be a
question of whether the program produced the observed
employment results.
Figure 2.6 in Chapter 2 described a logic model for a family
preservation and strengthening
program. The program was intended to offer parents of families
in crisis the opportunity to
acquire and practice the skills needed to be more effective,
enhancing the likelihood that they
would be able to avoid having to give up their children to foster
care. A key construct in that
program logic is parents acquiring skills related to managing
family issues—that construct is
the “hub” of the program logic. Program success is uniquely
dependent on that happening, and
“developing parental skills” would be central to developing a

suite of performance measures.
One indicator of the importance of constructs in logic models,
then, is the number of causal
links connecting to and coming from each construct.
If a performance measurement system is going to be designed
and implemented as a public
accountability initiative that is high stakes, that is, has
resource-related consequences for those
organizational units being measured, reported, and compared,
then the performance measures
chosen should be ones that would be difficult to “game” by
those who are being held
accountable. Furthermore, it may be necessary to periodically
audit the performance
information to assess its reliability and validity (Bevan &
Hamblin, 2009). Some
jurisdictions—New Zealand, for example—regularly audit the
public performance reports that
are produced by all departments and agencies (Gill, 2011).
Measure the Constructs That Have Been Identified as Parts of
the Performance
Measurement System
We learned in Chapter 4 that the process of translating
constructs into observables involves
measurement. For performance measurement, secondary data
sources are the principal means
of measuring constructs. Because these data sources already
exist, their use is generally seen to
be cost-effective. There are, however, several issues that must
be kept in mind when using
secondary data sources:
• Can the existing data (usually kept by the organization) be

adapted to fit constructs in the
performance measurement system? In many performance
measurement situations, the
challenge is to adapt what exists, particularly data readily
available via information
systems, to what is needed to translate performance constructs
into reliable and valid
measures. Often, existing data have been collected for purposes
that are not related to
measuring and reporting on performance. Using these data
raises validity questions. Do
they really measure what the performance measurement
designers say that they
measure? Or do they distort or bias the performance construct
so that the data are not
credible? For example, measuring changes in employee job
satisfaction by counting the
number of sick days taken by workers over time could be
misleading. Changes in the
number of sick days could be due to a wide range of factors,
making it an invalid
measure of job satisfaction.
• Do existing data sources sufficiently cover the constructs that
need to be measured? The
issue here is whether our intended performance measures are
matched by what we can
get our hands on in terms of existing data sources. In the
language we introduced in
Chapter 4, this is a content validity issue.
• A separate, but related, issue is whether existing data sources
permit us to triangulate our
measurements of key constructs. In other words, can we
measure a given construct in

two or more independent ways, ideally with different
methodologies? Generally,
triangulation increases confidence that the measures are valid.
• Can existing data sources be manipulated by stakeholders if
they are included in a
performance measurement system? Managers and other
organizational members
generally respond to incentives. If a performance measure
becomes the focus of
summative program or service assessments, and if the data for
that measure are collected
by organizational participants, it is possible that the data will be
manipulated to indicate
“improved” performance (Otley, 2003).
An example of this type of situation from policing was an
experiment in Orange County,
California, to link salary increases in the police department to
reduced reporting rates for
certain kinds of crimes (Staudohar, 1975). The agreement
between the police union and
management specified clear thresholds between percentage
reductions in four types of crimes
and the magnitude of salary increases.
The experiment “succeeded.” Crime rates in the four targeted
crimes decreased just enough
to maximize the wage increases. Correspondingly, crime rates
increased for several related
types of crimes. A concern in this case is whether the crime
classification system may have
been manipulated by participants in the experiment, given the
incentive to “reduce” crimes in

order to maximize salary increases.
If primary data sources (those designed specifically for the
system) are being used, several issues should also be kept in
mind:
• Are there ongoing resources to enable collecting, coding, and
reporting of data? If not,
then situations can develop where the initial infusion of
resources to get the system
started may include funding to collect outcomes data (e.g., to
conduct a client survey),
but beyond this point, there will be gaps in the performance
measurement system where
these data are no longer collected.
• Are there issues of sampling procedures, instrument design,
and implementation that
need to be reviewed or even done externally? In other words,
are there methodological
requirements that need to be established to ensure the
credibility of the data?
• Who will actually collect and report the data? If managers are
involved, is there any
concern that their involvement could be seen to be in conflict
with the incentives they
perceive?
• When managers review the performance measures that are
being proposed, if a draft of
the proposed performance measures does not feature any that
pertain to their programs,
they may conclude that they are being excluded and are
therefore vulnerable in future

budget allocations. It is essential to have a rationale for each
measure and some overall
rationale for featuring some measures but not others.
Organization executives may need
to be involved in settling any managerial disagreements.
In Chapter 4, we introduced measurement validity and
reliability criteria to indicate the
methodological requirements for sound measurement processes.
Those criteria are rooted in the
social sciences (Goodwin, 1997), and satisfying them is
generally premised on having the
resources to properly establish validity and reliability. In many
situations, there are few resources, and limited time, to
determine whether each measure is
defensible in methodological terms. Performance measurement
is fundamentally about finding
indicators that plausibly connect constructs with data. In terms
of the kinds of validity
discussed in Chapter 4, persons or teams that are developing
and implementing performance
measures usually pay attention to face validity (On the face of
it, does the measure do an
adequate job of representing the construct?), content validity
(How well does the measure or
measures represent the range of content implied by the
construct?), and response process
validity (Have the participants in the measurement process
taken it seriously?).
We are reminded of a quote that has been attributed to Sir
Josiah Stamp, a tax collector for

the government in England during the 19th century:
The government is extremely fond of amassing great quantities
of statistics. These are
raised to the nth degree, the cube roots are extracted and the
results are arranged into
elaborate and impressive displays. What must be kept ever in
mind, however, is that in
every case, the figures are first put down by a village watchman
and he puts down
anything he damn well pleases. (Source, Sir Josiah Stamp, Her
Majesty’s Collector of
Inland Revenues, more than a century ago) (cited in Thomas,
2004, p. xiii)
Assessing other kinds of measurement validity (internal
structure, concurrent, predictive,
convergent, and discriminant; see Chapter 4) is generally
beyond the methodological resources
in performance measurement situations. The reliability of
performance measures is often
assessed with a judgmental estimate of whether the measure and
the data are accurate, that is,
are collected and recorded so that there are no important errors
in the ways the data represent
the events or processes in question. In some jurisdictions,
performance measures are audited
for reliability (see, e.g., Texas State Auditor’s Office, 2012).
An example of judgmentally assessing the reliability and
validity of measures of program
results might be a social service agency that has included the
number of client visits as a
performance measure for the funders of its counseling program.
Suppose that initially, the
agency and the funders agree that the one measure is sufficient

since payments to the agency
are linked to the volume of work done and client visits are
deemed to be a reasonably accurate
measure for that purpose. To assess the validity and reliability
of that measure, one would want
to know how the data are recorded (e.g., by the social worker or
by the receptionist) and how
the files are transferred to the agency database (manually or
electronically as part of the intake
process for each visit). Are there under- or overcounting biases
in the way the data are
recorded? Do telephone consultations count as client visits?
What if the same client visits the
agency repeatedly, perhaps even to a point where other
prospective client appointments are less
available? Should a second measure of performance be added
that tracks the number of clients
served (improving content validity)? Will that create a more
balanced picture and create
incentives to move clients through the treatment process? What
if clients change their
names—does that get taken into account in recording the
number of clients served? Each
performance measure or combination of measures for each
construct will have these types of
practical problems that must be addressed if the data in the
performance measurement system
are to be credible and usable.
In jurisdictions where public performance reporting is
mandated, a significant issue is an
expectation that requiring fewer performance measures for a
department will simplify
performance reporting and make the performance report more
concise and more readable.
Internationally, guidelines exist that suggest a rule of

parsimony when it comes to selecting the
number of performance measures for public reporting. For
example, the Canadian
Comprehensive Auditing Foundation (CCAF-FCVI, 2002) has
outlined nine principles for
public performance reporting, one of which is to “focus on the
few critical aspects of
performance” (p. 4). This same principle is reflected in
guidelines developed for performance
reporting by the Queensland State Government in Australia
(Thomas, 2006).
The international public accounting community has taken an
interest in public performance
reporting generally and, in particular, the role that public
auditors can play in assessing the
quality of public performance reports (Klay, McCall, & Baybes,
2004). The assumption is that
if the quality of the supply of public performance reports is
improved, that is, performance
reports are independently audited for their credibility, they are
more likely to be used, and the
demand for them will increase.
Typically, the number of performance measures in public
reports is somewhere between 10
and 20, meaning that in large organizations, some programs will
not be represented in the
image of the department that is conveyed publicly. A useful way
to address managers wanting
their programs to be represented publicly is to commit to
constructing separate internal
performance reports. Internal reports are consistent with the
balancing of formative and

summative uses of performance measurement systems. It is our
belief that unless a
performance measurement system is used primarily for internal
performance management, it is
unlikely to be sustainable. Internal performance measures can
more fully reflect each program
and are generally seen to better represent the accomplishments
of programs.
One additional measurement issue is whether measures and the
data that correspond to the
measures should be quantitative. In Chapter 5, we discussed the
important contributions that
qualitative evaluation methods can make to program
evaluations. We included an example of
how qualitative methods can be used to build a performance
measurement and reporting
system (Davies & Dart, 2005; Sigsgaard, 2002). There is a
meaningful distinction between the
information that is conveyed by words and that which is
conveyed by numbers. Words can
provide us with texture, emotions, and a more vivid
understanding of situations. Words can
qualify numbers, interpret numbers, and balance presentations.
Most important, words can
describe experiences—how a program was experienced by
particular clients as opposed to the
number of clients served, for example.
In performance measurement systems, it is desirable to have
both quantitative and
qualitative measures/data. Stakeholders who take the time to
read a mixed presentation can
learn more about program performance. But in many situations,
particularly where annual
targets are set and external reporting is mandated, there is a bias

toward numerical information,
since targets are nearly always stated quantitatively. If the
number of persons on social
assistance is expected to be reduced by 10% in the next fiscal
year, for example, the most
relevant data will be numerical. Whether that program meets its
target or not, however, the
percent reduction in the number of persons on social assistance
provides no information about
the process whereby that happened, and other contextual
factors.
Performance measurement systems that focus primarily on
providing information for
formative uses should include deeper and richer measures than
those used for public reporting.
Qualitative information can provide managers with feedback
that is very helpful in adjusting
program processes to improve results. Also, qualitative
information can reveal to managers the
client experiences that accompany the process of measuring
quantitative results.
Qualitative information presented as cases or examples that
illustrate a pattern that is
reported in the quantitative data can be a powerful way to
convey the meaning of the numerical
information. Although single cases can only illustrate, they
communicate very effectively. For
political decision makers, case-based narratives can be essential
to conveying the meaning of
performance results.
Record, Analyze, Interpret, and Report the Performance Data
One potential problem with any performance measurement

system is the potential for
ambiguity in observed patterns of results. In an Oregon
benchmarking report (Oregon Progress
Board, 2003), affordability of housing was offered as an
indicator of the well-being of the state
(presumably of the broad social and economic systems in the
state). If housing prices are
trending downward, does that mean that things are getting worse
or better? From an economic
perspective, declining housing prices could mean that (a)
demand is decreasing in the face of a
steady supply; (b) demand is decreasing, while supply is
increasing; (c) demand and supply are
both increasing, but supply is increasing more quickly; or (d)
demand and supply are both
decreasing, but demand is decreasing more quickly. Each of
these scenarios suggests
something different about the well-being of the economy. To
complicate matters, each of these
scenarios would have different interpretations if we were to take
a social rather than an
economic perspective. The point is that prospective users of
performance information should
be challenged to offer their interpretations of simulated patterns
of such information (Davies &
Warman, 1998). In other words, prospective users should be
offered scenarios in which
different trends and levels of measures are posed. If these trends
or levels have ambiguous
interpretations—“it depends”—then it is quite likely that when
the performance measurement
system is implemented, the same ambiguities will arise as
reports are produced and used.

Fundamentally, ambiguous measures invite conflicting
interpretations of results and will tend
to weaken the credibility of the system.
In addition to simulating different patterns of information for
prospective users, it is
important to ascertain what kinds of comparisons are envisioned
with performance data. A
common comparison is to look for trends over time, and make
judgments based on
interpretations of those trends. An example of a publicly
reported performance measure that
tracks trends over time is the WorkSafeBC measure of injured
workers’ overall satisfaction
with their experience with the organization.
Each year, WorkSafeBC arranges for an independent survey of
about 400 injured workers
who are randomly selected from among those who made claims
for workplace injuries
(WorkSafeBC, 2011). Workers can rate their overall satisfaction
on a 5-point scale from very
poor to very good. This performance measure is one of 11 that
are included in the annual report
and has been used since 2003. Figure 9.1 has been excerpted
from the 2010 Annual Report
(WorkSafeBC, 2011) and displays the percentages of surveyed
workers who rated their overall
satisfaction as good or very good, over time. Also displayed are
the targets for this
performance measure for the next three years. This format for a
performance measure makes it
possible to see what the overall trend is and how that trend is
expected to change in the future.
We can see that approximately three quarters of injured workers
have tended to be satisfied

over time. But in 2009 and in 2010, the percentage drops. The
organization is forecasting a
return to the historical percentage with modest improvements in
the next 3 years. As program
evaluators, we might want to know more about why injured
worker satisfaction levels dropped
in 2009 and 2010. Given the challenging economic environment
in British Columbia and in
many other jurisdictions, it is possible that worker satisfaction
reflects those pressures.
Another comparison that can be made using performance
information is across similar
administrative units. For example, all provincial governments in
Canada have ministries or
departments that manage payments to injured workers, and
assess and collect insurance
premiums from employers to offset these payment costs. Figure
9.2 compares injury frequency
among all jurisdictions in Canada.
There is considerable variation among Canadian provinces in
terms of injury frequency,
and a potential evaluation question would be how to explain this
variation. Is it due to random
factors, or are there differences in policies and programs that
are linked to this important
outcome measure?
Figure 9.1 Performance Measurement Results Over Time:
Injured Workers’ Overall
Satisfaction With WorkSafeBC
Source: WorkSafeBC (2011, p. 34). Injury Frequency (per 100

workers of assessable employers) is
reprinted, with permission, from the WorkSafeBC 2010 Annual
Report and 2011–2013 Service Plan.
Copyright © WorkSafeBC. Used with permission.
Figure 9.2 Injury Frequency per 100 Workers for All Canadian
Provinces
Source: WorkSafeBC (2011, p. 96). Injury Frequency (per 100
workers of assessable employers) is
reprinted, with permission, from the WorkSafeBC 2010 Annual
Report and 2011–2013 Service Plan.
Copyright © WorkSafeBC. Used with permission.
A third type of comparison is with benchmarks, standards, or
targets. For example, in some
program or service areas, such as hospital services, it is
common to use standards to assess
waiting times for services. When physicians refer patients for
testing or for medical
procedures, waiting time can become a critical factor, especially
where initial diagnoses
indicate a progressive disease. Performance reporting that is
intended for public accountability
purposes will typically include comparisons between
performance targets and actual results.
We saw an example of this with Figure 9.1, which incorporated
annual targets and actual
results for overall worker satisfaction with their interactions
with WorkSafeBC. As another
example of comparisons with targets, a municipal government
graffiti management program
might have an objective of reducing the number of public
buildings defaced by graffiti. If the

target was a maximum of 5% of buildings with graffiti
(measured by a year-end physical
survey of all public buildings), the actual survey results could
be compared with the target. If
the survey revealed that 10% of public buildings had graffiti on
them, the program manager
(and other stakeholders) might decide to investigate the gap
between the target and the actual
result. Following up on this performance result would entail
asking why the observed result
occurred—a question typically in the domain of program
evaluation.
Setting targets can become a contentious process. If the salaries
of senior managers are
linked to achieving targets, there will be pressure to make sure
the targets are achievable. If
reporting targets and achievements is part of an adversarial
political culture, there will again be
pressure to make targets conservative (Davies & Warman,
1998). Norman (2001) has
suggested that performance measurement systems can result in
underperformance for these
reasons. Hood (2006) points to the ratchet effect (a tendency for
performance targets to be
lowered over time as agencies fail to meet them) as a problem
for public sector performance
measurement in Britain.
Buy-in is an incremental process. Managers want to see what
actually happens with the
performance results and the reports that are produced, before
they are willing to fully accept
this change. Acceptance can also be eroded. If there is turnover
in the organization’s leadership
and the new executive unilaterally shifts the balance from

formative to summative uses of the
performance results, it is quite likely that resistance to the
system will develop.
There is an issue of access to the performance data. In some
organizations, the performance
measurement function has been separated from line management
entirely. Managers do not
have access to data; instead, they receive periodic reports.
Excluding managers and other
organizational members from having access to performance data
tends to reinforce a cultural
norm that such information is a source of power and control.
Related to access is the question
of whether users can prepare their own reports, in addition to
reports that are mandated. Are
they given the opportunity to analyze the data included in
existing reports, in order to
corroborate or disconfirm interpretations of the data? In New
Zealand, for example, many
managers have developed their own information sources and
ways of working with
performance data in their organizations (Gill, 2011).
Finally, how are reports prepared? Is there a regular cycle of
reporting? Is there a process
whereby reports are reviewed and critiqued internally before
they are released to users? Often,
agencies have internal vetting processes wherein the authors of
reports are expected to be able
to defend the report in front of their peers before the report is
released. This challenge function
is valuable as a way of assessing the defensibility of the report
and anticipating the reactions of
stakeholders.

Regularly Review Feedback From the Users and, If Needed,
Make Changes to the
Uses of and organizational needs for performance data will
change over time.
Implementing a system with a fixed structure (logic models and
measures) at one point in time
will not ensure the relevance or continued use of the system in
the future. There is a balance
between the need to maintain continuity of performance
measures, on the one hand, and the
need to reflect changing organizational objectives, structures,
and prospective uses of the
system, on the other (Kravchuk & Schack, 1996). In many
performance measurement systems,
there are measures that are replaced periodically and measures
that are works in progress. A
certain amount of continuity in the measures increases the
capacity of measures to be
compared over time. Data displayed as a time series can, for
example, show trends in
environmental factors, as well as changes in outputs and
outcomes; by comparing
environmental variable trends with outcome trends, it may be
possible to take into account the
influences of plausible rival hypotheses on particular outcome
measures. Although this process
depends on the length of the time series and is often
judgmental, it does permit analysts to use
some of the same tools that would be used by program
evaluators. In Chapter 3, recall that in
the York crime prevention program evaluation, the
unemployment rate in the community was

an external variable that was included in the evaluation to assist
the evaluators in determining
whether the neighborhood watch program was the likely cause
of the observed changes in the
reported burglary rate.
But continuity can also make a system less relevant over time.
Suppose, for example, that a
performance measurement system was designed to pull data
from several different databases,
and the original information system programming to make this
work was expensive. Even if
the data needs change, there may well be a desire not to go back
and repeat this work, simply
because of the resources involved. Likewise, if a performance
measurement system is based on
a logic model that becomes outdated, then the measures will no
longer fully reflect what the
program(s) or the organization is trying to accomplish. But
going back to redo the logic model
(which can be a time-consuming, iterative process) may not be
feasible in the short term, given
the resources available. The price of such a decision might be a
gradual reduction in the
relevance of the system, which may not be readily detected.
With all the activity to design and implement performance
measurement and reporting
systems, there has been surprisingly little effort to date to
evaluate their effectiveness
(McDavid & Huse, 2012). In Chapter 10, we will discuss what
is known now about the ways in
which performance information is used, but it is appropriate
here to suggest some practical
steps to generate feedback that can be used to modify and better
sustain performance

measurement systems:
• Develop channels for user feedback. This step is intended to
create a process that will
allow the users to provide feedback and suggest ways to revise,
review, and update the
performance measures. Furthermore, this step is intended to
help identify when
corrections are required and how to address errors and
misinterpretations of the data.
• Create an expert review panel of persons who are both
knowledgeable about
performance measurement and do not have a stake in the system
that is being reviewed.
Performance measurement should be conducted on an ongoing
basis, and this expert
panel review can provide feedback and address issues and
problems over a long-term
time frame. A review panel can also provide an independent
assessment of buy-in and
use of performance information by managers and staff, and
track the (intended and
unintended) effects of the system on the organization.
The credibility of performance information is an enduring
concern. Davies and Warman
(1998) point to the importance of auditing in the context of the
performance reports of the
(British) National Meteorological Office:
An independent audit, then, is not a luxury, it is a necessity.
The credibility of the whole

system of agencies is put at risk if the data from one is found to
be unverified and open to
dispute. Where performance-related bonuses are linked with
outcomes, it is unreasonable
to expect staff concerned to be responsible for the measurement
and reporting of results in
an objective manner when the very same results will determine
their own pay. (p. 47)
Legislative auditors, in addition to recommending principles to
guide public performance
reporting, have been active in promoting audits of performance
reports (CCAF-FCVI,
CHAPTER 10
USING PERFORMANCE MEASUREMENT FOR
ACCOUNTABILITY AND PERFORMANCE
IMPROVEMENT
Introduction
Using Performance Results
Legislator Expected Versus Actual Uses of Performance Reports
in British Columbia,
Canada
High-Stakes Uses of Performance Measures
The British Experience With Performance Management
Assessing the “Naming and Shaming” Approach to Performance
Management in Britain
A Case Study of Gaming: Distorting the Output of a Coal Mine

Reassessing the Performance Management Cycle: The Roles of
Incentives and
Organizational Politics
Use of Performance Measures in a Non-Adversarial Political
Environment
Joining Internal and External Uses of Performance Information:
The Lethbridge Local
Government Study
Using Performance Information for Management: Encouraging
Internal Uses of
Performance Results
Increasing Uses of Performance Information by Elected
Officials: Supply and Demand
Improvements
Improving the Supply and Demand of Performance Information:
Examining the Audit
Strategy
Improving the Demand for Performance Information: Examining
Legislation and Training
Assessing the Realities of Performance Measurement for Public
Accountability,
Performance Improvement, and Program Evaluation
Three Additional Considerations in Implementing and
Sustaining Performance
Measurement Systems
The Centralizing Influence of Performance Measurement in
Public Organizations
Attributing Outcomes to Programs
The Levels of Analysis Problem: Conflating Organizational,

Program, and Individual
Performance
Summary
Discussion Questions
References
INTRODUCTION
In Chapter 10, we look at both the intended and actual uses of
performance information by
elected officials and organizational managers. Performance
measurement systems are intended
to improve public accountability and the management of
organizational performance. Elected
officials are a key stakeholder in efforts to improve
accountability, so we summarize key
findings from one of the few studies that looks at ways that
elected decision makers actually
use public performance reports provided to them each year.
Then, we look at uses of
performance information in a system where poor performance
results were given media
coverage and, in some cases, resulted in executives being fired
in poorly performing
organizations. The “naming and shaming” approach used in
Britain between 2000 and 2005 is
unique in implementing a high-stakes performance measurement
system for public
accountability and performance improvement.
The British experience also raises the issue of the importance of
incentives and
organizational political factors in understanding how

performance information is actually used
and whether it is seen to be credible. So we come back to the
(idealistic) performance
management cycle that we introduced in Chapter 1 and re-
examine it from a perspective that
now takes into account how people in organizations actually
behave, instead of how they
“should” behave. Given the challenges of implementing public
performance reporting systems
that produce information that is not distorted by “gaming”
responses from those whose
performance is being judged, we consider the following: Are
there any circumstances where
public performance reports are taken at face value, and the
managers who produce them are not
concerned about the risks of reporting less-than-positive
results? There is one empirical study
of a local government in Western Canada that directly addresses
this issue, so we report some
of key findings from that study.
In many settings, public performance reporting is risky; often,
political cultures are risk
averse, so it is difficult to report anything but positive or at
least noncontroversial performance
results. We look at possible strategies to increase internal uses
of performance information in
such environments and some strategies that managers tend to
adopt to manage performance but
not expose the program to political risks.
Finally, we turn to several problems with performance
measurement that can more
generally affect both the validity and the usefulness of
performance information. We come
back to a key theme in our textbook; performance measurement

is part of a broader suite of
approaches to doing evaluations. Performance measurement
does not replace program
evaluation but instead focuses and informs it.
In Chapter 9, we discussed the design and implementation of
systems and outlined 12 steps for guiding the process.
Implementation implies that
performance measurement data are being collected regularly,
are being analyzed, and are being
reported (either internally or externally). If the system is
intended primarily for formative uses,
reporting may be informal and open-ended; that is, analyses and
reports are prepared as
needed. If the reports are prepared to meet external
accountability requirements, they usually
have a summative intent. In many jurisdictions where
performance measurement is mandated,
public reports are required on a periodic (often annual) basis
and are intended to be used for
decision making, including budgetary decision making (Hatry,
2006; Melkers, 2006).
Generating performance information is not sufficient to ensure
that performance
information is actually used. This chapter will examine ways
that performance information is
intended to be used, and is actually used. We will look at the
factors that affect how
performance information is used and suggest that the political
cultures in which organizations
are embedded are important in assessing the prospects for using

information.
Historically, accountability focused on the processes by which
decisions were made in
public organizations. The emphasis was on keeping good
records, following established
procedures, and knowing how decisions were made and who was
involved in making the
decisions. Public organizations have typically been structured
hierarchically, and authority to
make the final decisions and accountability for decisions have
ultimately resided in the person
or persons who occupied the higher offices of the structure.
These bureaucratic structures and
their limitations have been critically analyzed by public choice
theorists (e.g., Downs, 1965;
Niskanen, 1971) who have argued that to understand how public
bureaucracies work, it is
necessary to acknowledge that public servants are not unlike
private sector employees and
have an underlying self-interest that affects their work efforts.
Furthermore, they argue that
understanding public sector motivations is central to designing
government organizations and
structures that perform well.
Le Grand (2010) has suggested that historically, the assumption
has been that public
servants are chiefly motivated by their desire to serve the public
interest: that public servants
were “knights,” motivated to do the right thing. Public choice
theorists and proponents of New
Public Management (Osborne & Gaebler, 1992) challenged that
assumption and instead
offered a model of motivation that emphasized the importance

of incentives, rewards, and
sanctions to induce good performance (see Le Grand, 2010). For
public choice theorists,
process-focused accountability missed what was important, and
even got in the way of good
performance. They argued that, instead, performance should be
focused on results and on
aligning the incentives for public servants, so that following
their self-interest would lead to
efficient and effective outcomes for public organizations. Key
to that approach was identifying
desired results, measuring the extent to which they have been
achieved, and holding public
servants accountable for delivering those results.
NPM, as a broad movement that has influenced both public and
nonprofit management and
governance in Western countries, has been strongly connected
to the drive to measure and
publicly report performance (Hood, 1991; Osborne & Gaebler,
1992). Among the intended
uses of performance information, improving accountability is a
key one.
As we suggested in Figure 9.3 in Chapter 9, performance
measurement and public
reporting were expected to contribute to improved public
accountability (for results) and also
improve performance. Requiring public reporting of
performance results, particularly in
relation to targets for a suite of performance measures, is now
built into many public sector
performance measurement systems (Bevan & Hood, 2006;
Pollitt, Bal, Jerak-Zuiderent,
Dowswell, & Harrison, 2010).

Making public performance reporting work as a means of
inducing improved performance
assumes that performance results have real consequences for the
organizations reporting them.
Recall Figure 1.1 in Chapter 1 in which we introduced the
performance management cycle.
The final stage of the cycle, once performance results (both
performance measures and
program evaluations) are reported, is to use that information to
inform decisions, set priorities,
make budgets, and position organizations and governments for
another performance
management cycle based on evidence from the past cycle. These
are the real consequences that
are expected to flow from reporting performance results. If the
performance management cycle
is an annual process, then a part of it would be annual
performance reports.
USING PERFORMANCE RESULTS
Elected leaders are expected to be principal users of
performance results (McDavid & Huse,
2012; Thomas, 2006). The efficacy of the performance
management cycle is linked to elected
decision makers paying attention to performance results and
using that information in their
deliberations and decisions. In spite of the importance of this
link, there has been limited
research that has focused squarely on whether, and to what
extent, elected officials make use of
the performance information that is regularly supplied to them
through public reports. In the
next section, we will summarize a recent empirical study that

has examined, over time, how
elected public officials in one jurisdiction used performance
reports. The findings from this
study help us understand how the link between public reporting
and “real consequences”
actually operates.
Legislator Expected Versus Actual Uses of Performance Reports
in British Columbia,
Canada
In 2000, the British Columbia (B.C.) Legislature passed the
Budget Transparency and
Accountability Act, a law mandating annual performance plans
and annual performance
reports for all departments and agencies. The act was amended
in 2001 (Government of British
Columbia, 2000, 2001), and the first performance reports were
completed in June 2003.
McDavid and Huse (2012) surveyed all elected members of the
legislature anonymously on
three occasions: The first survey was completed before the first
public performance reports
were received in 2003, and then legislators were surveyed again
in 2005 and 2007.
Table 10.1 summarizes the overall response rates to the three
surveys.
In each of the three surveys, the same measures were used—the
only difference between
the 2003 survey and the other two was that in 2003 the
statements were worded in terms of
expected uses of the performance reports. Fifteen separate
Likert statements were included,
asking politicians to rate the extent to which they used (or, in

Treasury Board of Canada Secretariat. (2012). Policy on evalua.docx

Treasury Board of Canada Secretariat. (2012). Policy on evalua.docx

Recommended

Recommended

More Related Content

Similar to Treasury Board of Canada Secretariat. (2012). Policy on evalua.docx

Similar to Treasury Board of Canada Secretariat. (2012). Policy on evalua.docx (20)

More from turveycharlyn

More from turveycharlyn (20)

Recently uploaded

Recently uploaded (20)

Treasury Board of Canada Secretariat. (2012). Policy on evalua.docx