SlideShare a Scribd company logo
Chapter 5
1. Your company needs a small front-end loader for handling
bulk materials at the Wide place plant. It can be leased from
the dealer for three years for $4050 per year including all
maintenance. It can also be purchased for $14,000. You expect
the loader to last for six years and to have a salvage value of
$3000. You predict that maintenance will cost $400 the first
year and increase by $200 per year in each year after the first.
Your MARR is 15% per year. (a) Use AW analysis to determine
whether to lease or buy the loader.
2. You have identified three alternatives for a small project at
your plant. Any of the alternatives would save about $30,000
per year in operating costs. (a) Use AW analysis and an MARR
of 15% per year to determine which alternative to select.
Alternative
F
G
H
Initial Cost, $
40,000
50,000
60,000
Salvage Value, $
4000
6000
9000
Annual Cost, $/year
8000
6000
4000
Life, years
3
4
5
3. ABC Drinks purchases its 355ml cans in large bulk from
China. The finish on the anodized aluminum surface is produced
by mechanical finishing technology called Brushing and Bread
Blasting
Use MARR = 8%
Alternatives
Brush: P=-$400,000, n = 10years, S = $50,000, AOC =-$50,000
in year 1 decreasing by $5000 annually starting in year 2
Bread Blasting: P=-$300,000, n = Permanent, S =0, AOC =-
$50,000
Select between the two alternatives
4. A contractor has been awarded the contract to construct a six
miles long tunnel in the mountain of western Wisconsin. During
the five year period, the contractor will need water from the
nearby stream. He will construct a pipe line to convey the water
to the main construction yard. An analysis of the various pipe
sizes is as follows:
Pipe
sizes
2”
3” 4” 6”
Installed cost of pipeline and pump $22000
$23000 $25000 $30000
Cost per hour for pumping $1.20
$0.65 $0.50 $0.40
The pipe and the pump will have a salvage value at the end of
five years equal to the cost to remove them. The pump will
operate 2000 hours per year. The lowest rate at which the
contractor is willing to invest is 7%. Select the best Pipe using
Annual Worth
5. The expansion of the Wide-place Mall is delayed over the
issue of parking. There is not enough now to support the new
facility and more must be added. Let's suppose that there are 3
options: buying more land, filling wetlands at the rear of the
site, or building a multilevel garage on the present lot. Assume
a forty-year planning horizon and an interest rate of 9% per
year. Use Annual worth analysis and the data below to
determine which option should be selected.
Purchase Land
Fill Wetlands
Garage
Initial Cost, $
$12,000,000
$19,000,000
$44,000,000
Annual Benefit,
$ per year
0
0
4,000,000
(parking fees)
Annual Cost,
$ per year
200,000
160,000
2,900,000
Text Book Problems
6. Problem 5.24
7. Problem 5.28 (Use Spread sheet only to solve this)
Chapter 6
8. Two years ago, you bought 100 shares of XYZ stock at $60
per share. The stock paid a dividend of $6 per share per quarter.
If you sell the shares now for $98 per share, what is your annual
ROR on this investment?
9. At the end of 1987, you bought a piece of land for $35,000.
In addition to the $35,000, you paid $1,700 in closing costs
(costs associated with the purchase and title registration). For
the years 1988 through 2002, you paid, on average, $950 in
property taxes at the end of each year. At the end of 2003, you
sold the land for $120,000. A sale time, you paid a 6%
commission to the realtor and $1,600 was your share of the
closing costs. What was the ROR on this investment?
10. UW-Stout is considering which of two devices to install to
reduce costs in a particular situation. Both devices (A and B)
cost $1000, have useful lives of 10years and no salvage value.
Device A: Annual savings of $300
Device B: Annual savings of $400 in the first year but will
decline $50 annually.
If MARR is 7%, which device should Stout purchase? Use
(A-B)
11. In your uncle’s will, you are to choose of the following two
alternatives
Alternative 1: $2000 cash
Alternative 2: $150 now plus $100 per month for twenty months
a. At what rate of return are the two alternatives equivalent?
b. If you think the rate of return in (a) is too low, which
alternative will you select?
Chapter 7
12. Two machines are considered for purchase. Assume 10%
interest, Use Benefit Cost analysis.
Machine X
Machine Y
Initial Cost $200
$700
Uniform annual benefit $95
$120
Salvage value $50
$150
Useful life in years 6
12
a. Which machine should be bought?
b. List the decision guideline for single project
c. List the selection rule for incremental analysis
13. Which of the following alternatives will you select using
benefit to cost ratio?
A: First cost = $560, Annual benefit = $140, Salvage value =
$40
B: First cost = $340, Annual benefit = $100, Salvage value = $0
C: First cost = $120, Annual benefit = $40, Salvage value = $40
Each alternative has 6 years useful live. Assume MARR = 10%
The Catch data warehouse: support for community health
care decision-making
Donald J. Berndt
a
, Alan R. Hevner
a,*, James Studnicki
b
a
Information Systems and Decision Sciences Department,
College of Business Administration, 4202 Fowler Ave.,
CIS1040,
University of South Florida, Tampa, FL 33620, USA
b
College of Public Health, University of South Florida, Tampa,
FL 33620, USA
Accepted 1 April 2002
Abstract
The measurement and assessment of health status in
communities throughout the world is a massive information
technology
challenge. Comprehensive Assessment for Tracking Community
Health (CATCH) provides systematic methods for community-
level assessment that is invaluable for resource allocation and
health care policy formulation. CATCH is based on health status
indicators from multiple data sources, using an innovative
comparative framework and weighted evaluation process to
produce a
rank-ordered list of critical community health care challenges.
The community-level focus is intended to empower local
decision-
makers by providing a clear methodology for organizing and
interpreting relevant health care data. Extensive field
experience
with the CATCH methods, in combination with expertise in data
warehousing technology, has led to an innovative application of
information technology in the health care arena. The data
warehouse allows a core set of reports to be produced at a
reasonable
cost for community use. In addition, online analytic processing
(OLAP) functionality can be used to gain a deeper
understanding
of specific health care issues. The data warehouse in
conjunction with Web-enabled dissemination methods allows
the infor-
mation to be presented in a variety of formats and to be
distributed more widely in the decision-making community. In
this paper,
we focus on the technical challenges of designing and
implementing an effective data warehouse for health care
information.
Illustrations of actual data designs and reporting formats from
the CATCH data warehouse are used throughout the discussion.
Ongoing research directions in health care data warehousing and
community health care decision-making conclude the paper.
D 2002 Elsevier Science B.V. All rights reserved.
Keywords: Health care information systems; Data warehousing;
Data staging; Online analytic processing (OLAP); Decision
support systems;
Community decision-making; Data quality
1. Introduction
The United States spends over a trillion dollars
annually on health expenditures. Both as a percentage
of national productivity and per capita, health care
spending by the United States exceeds that of any other
nation in the world. However, this tremendous expen-
diture has not secured the U.S. a rank among the
‘healthiest’ nations. In fact, for many health indicators,
such as infant mortality and measles immunizations,
the U.S. ranks below some countries characterized as
underdeveloped [23,29]. Prolonged public debates on
health care policy in the United States have focused on
0167-9236/02/$ - see front matter D 2002 Elsevier Science B.V.
All rights reserved.
doi:10.1016/S0167-9236(02)00114-8
*
Corresponding author.
E-mail addresses: [email protected] (D.J. Berndt),
[email protected] (A.R. Hevner), [email protected]
(J. Studnicki).
www.elsevier.com/locate/dsw
Decision Support Systems 35 (2003) 367–384
insurance coverage and medical care financing pro-
grams without any serious examination of the true
health status of the nation.
The need to assess the health status of U.S.
communities in a comprehensive and systematic
manner has been widely recognized within the health
professions. The Institute of Medicine (IOM) of the
National Academy of Sciences has acknowledged the
importance of a population-based perspective in two
influential reports, emphasizing the need for a regular
and systematic collection, assemblage, and analysis
of the health status of our nation’s communities
[16,18]. A community health profile is comprised
of socio-demographic characteristics, health status
and quality of life indicators, health risk factors,
and health resource measures. The intent of such a
comprehensive health profile is to assist a community
in developing, refining, and monitoring a long-term
strategic view of its overall health status. Although
there are many sources of health data, there are no
standard data definitions, formats, or reports across
the health care industry. Thus, health care data are
widely used (and misused) in an ad-hoc manner to
justify managerial objectives of health institutions
and agencies, a maze of mandated categorical fund-
ing, and a variety of political agendas. Sound infor-
mation and accepted analytic techniques are even
more important as funding is consolidated in block
grants and local community decision-making is
emphasized.
As part of the ongoing clarification of the public
health role at the community level and the transition
from a disease to a health focus and from a treatment
to a prevention strategy, there has been recognition
that partnerships and collaboration are necessary to
support effective action [17,21]. Health organizations,
public sector agencies, medical care providers, busi-
nesses, the religious community, educational institu-
tions, and other community organizations are
interdependent components of a multi-sectoral com-
munity health environment. The overall community
must be empowered to make the necessary, and
sometimes difficult, resource allocation choices to
improve health through information, education,
behavior change, and social support [7]. Such collab-
orative action at the community level must be
informed by unbiased data describing the communi-
ty’s health status, needs, and resources. The ability is
also needed to track progress over time to meet the
community’s health care goals [24].
The gap between current practice in community
health care spending and the above goals of collabo-
rative community health care decision-making is vast.
The availability and quality of health indicators are
problematic. There is little empirical evidence on the
use, sharing, or integration of health data into deci-
sion-making to provide guidance to community health
organizations. While most of the literature on collab-
orative leadership and community engagement
emphasizes the process [4,5], little attention has been
focused on the effect of the availability of a common
set of data, such as the community health profile, on
the quality and inclusiveness of decision-making.
There is also scant information about the use of data
and information technology to support and monitor
the process.
The purpose of this paper is to present an overview
of the Comprehensive Assessment for Tracking Com-
munity Health (CATCH) methods [25] and then focus
on the construction of a comprehensive health care
data warehouse that provides automated support for
CATCH. The combination of extensive field experi-
ence with CATCH and the application of current data
warehousing technology make this an innovative
interdisciplinary research effort. Section 2 briefly
presents the CATCH methods and our motivation
for building a data warehouse. In Section 3, we
present a detailed discussion of the technical chal-
lenges in designing and implementing the data ware-
house. Twin star data staging, an effective approach
for ensuring quality as data are entered into the
warehouse, is highlighted in Section 4. Section 5
discusses the use of the data warehouse for advanced
health care applications. The paper concludes with
future research directions on data warehousing tech-
nical challenges and the use of health profiles to
support improved community health care decision-
making.
2. The CATCH methods of community assessment
The University of South Florida’s Center for
Health Outcomes Research (CHOR) developed
CATCH to provide comprehensive and objective
health status data for community health planning
D.J. Berndt et al. / Decision Support Systems 35 (2003) 367–
384368
purposes. CATCH collects, organizes, analyzes, pri-
oritizes, and reports data on over 250 health and social
indicators on a local community basis. The CATCH
methods have been tested, refined, and validated in
the field over the past 10 years. Reports have been
prepared for more than 20 U.S. counties both within
and outside of Florida.
The CATCH process can be briefly described as
shown in Fig. 1. Community health indicator data are
gathered from a variety of sources. Secondary data
sources include health care data reported by hospitals,
local, state, and federal health agencies, and national
health care groups. Primary data sources would involve
data gathered from door-to-door or mail-in surveys. All
health care data are translated into common formats and
integrated with other data warehouse components to
support the production of health care report cards.
Over 250 indicators are used within CATCH and
are organized into 10 indicator categories. These
indicators and categories represent a wide spectrum
of health care issues and have evolved through both
research and field practice. Table 1 lists the 10
indicator categories and presents a few representative
indicators to lend a sense of perspective to the level of
detail provided in CATCH reports. These indicators
are collected from a variety of sources.
Each indicator value is compared against the state
average, an average from a peer group of counties,
and other interesting values (e.g., a national goal for
that indicator) [26]. The results of these comparisons
are organized into a multi-dimensional matrix based
on favorable or unfavorable comparisons against each
comparison dimension. Fig. 1 shows a 2-by-2 com-
parison matrix based on state averages and peer
Fig. 1. The CATCH process.
Table 1
Ten indicator groups with representative indicators
Demographic
Characteristics
Health Status: Morbidity
and Mortality
Total Population Breast Cancer
Racial Composition Cardiovascular Disease
Net Migration Stroke
Socioeconomic
Characteristics
Sentinel Events
Rubella
Employment Measles
High School Dropouts Late Stage Cancer
Per Capita Income Avoidable Hospitalizations
Maternal and Child Health Health Resource Availability
Infant Mortality Licensed Hospital Beds
Low Birthweight Licensed Medical Doctors
Birth Defects Mortality Licensed Registered Nurses
Social and Mental Health Infectious Disease
Domestic Violence Syphilis
Homicide Rate AIDS
Psychiatric Admissions Hepatitis
Physical Environmental Health Behavioral Risk Factors
Foodborne Outbreaks Smoking
Contaminated Wells Obesity
Lead Poisoning Mammograms
D.J. Berndt et al. / Decision Support Systems 35 (2003) 367–
384 369
averages. Community indicators that demonstrate
unfavorable comparisons on all dimensions are high-
lighted as community health challenges. After this
simple comparison, the health care challenges are
prioritized using a set of five filters.
Number Affected—number of persons in the
community affected by the indicator.
Economic Impact—an estimate of the direct cost
per case for individuals affected by the indicator.
Availability of Efficacious Intervention—an esti-
mate of the relative degree to which treatment or
prevention is likely to be effective.
Magnitude of Difference—the degree to which the
community indicator is worse than the dimensional
comparisons.
Trend Analysis—for a 5-year period is the trend
favorable or unfavorable and what is the magnitude
of change in the trend direction?
The community stakeholders are given an oppor-
tunity to weight the importance of each of the above
factors. The final product of the CATCH methodology
is a comprehensive, prioritized listing of community
health care challenges. A more detailed description of
the CATCH methodology with a complete listing of
health care indicators can be found in Ref. [25].
2.1. Limitations
While the value of CATCH is incontrovertible, the
ultimate deployment of CATCH throughout Florida
and the nation has been constrained by several serious
limitations:
. The handcrafted process is labor-intensive and
slow. Hundreds of individual sources of data must be
identified and contacted. Data are often provided in
hard copy formats and must be manually checked,
validated, and entered into spreadsheets. With manual
methods, it takes 3–4 months to complete a CATCH
report for a single county.
. Longitudinal trend analyses over many years are
cost prohibitive for most communities. Since each
application is expensive and time-consuming, the
capability to fund and produce annual assessments
in a single community is limited.
. Most public health funding comes from state and
federal governments. A statewide CATCH assessment
would help to prioritize funding and serve to enable
effective program evaluation based on quantifiable
outcome assessment. Since nearly all data indicators
available in Florida are available in most other states,
there is reason to be confident that CATCH will be
expanded nationally and even internationally.
. With the massive amount of health care data
involved, many interesting relationships and correla-
tions between health status indicators can be found
and investigated. In the manual system, such discov-
ery was not feasible. A comprehensive and integrated
data warehouse provides the infrastructure for such
data mining efforts.
2.2. CATCH data warehouse challenges
The application of data warehousing technologies
for the automated support of CATCH holds tremen-
dous promise. The remainder of this paper describes
our work to construct an effective and efficient data
warehouse solution, enabling both cost-effective
report generation and ad-hoc analyses of critical
health care issues. The construction of a data ware-
house for public health care data poses major chal-
lenges beyond those required for the construction of a
commercial data warehouse (e.g., retail sales). Such
challenges include the following.
. Data come from a very diverse set of sources.
Health care data are published in a wide variety of
formats with differing semantics. There are currently
few standards in the health care field for such data.
The data integration task to build the data warehouse
requires significant effort.
. CATCH reports are disseminated to a diverse
and geographically distributed set of stakeholders.
. The data warehouse is required to support the
activities of public policy formulation. The socio-
political issues of health care planning impact design
features such as security, availability, data quality, and
performance.
3. The CATCH data warehouse
The goals of the CATCH data warehouse include
the support and enhancement of the CATCH methods,
the provision of cost-effective and thorough reports to
communities, and the creation of a rich environment
D.J. Berndt et al. / Decision Support Systems 35 (2003) 367–
384370
for more detailed research into critical health care
issues. In addition, a focus on data quality makes the
data warehouse an especially valuable asset over time
as a rich and trustworthy historical repository is built.
Lastly, the data warehouse lends itself to a variety of
dissemination strategies based on hardcopy reports,
interactive access, and Web-enabled information
delivery. The different access technologies allow a
diverse group of community planners and stakehold-
ers to investigate important health care issues using
comparable data. All of these characteristics make the
CATCH data warehouse a unique application of
technology in the field of public health. In fact, the
implementation of this type of data warehouse and its
use in monitoring, as well as improving health status,
will become a primary role of public health agencies
in the future.
The CATCH data warehouse includes a variety of
components arranged in three broad categories:
reporting tables for direct support of the CATCH
methods, aggregated dimensional structures, and
fine-grained or transaction-oriented dimensional struc-
tures. In the sections that follow, examples of these
data warehouse components are presented. All of the
components draw on the dimensional model or star
schema, some components with more than a dozen
dimensions and some with a few simple dimensions.
3.1. The dimensional model
Important missions of a data warehouse include the
support of decision-making activities and the creation
of an infrastructure for ad-hoc exploration of very
large collections of data. Decision-makers should be
able to pursue many of their investigations using
browsing tools, without relying on database program-
mers to construct queries. The emphasis on end-user
data access places a premium on an understandable
database design that provides an intuitive basis for
navigating through the data. The star schema or
dimensional model has been recognized as an effec-
tive structure for organizing many data warehouse
components [12,15,19]. The star schema is character-
ized by a center fact table, which usually contains
numeric information that can be used in summary
reports. Radiating from the fact table are dimension
tables that provide a rich query environment. This
structure provides a logical data cube, with dimen-
sions such as time and location identifying a set of
numeric measurements within the cube. Fig. 2 con-
tains a fragment from the hospital discharge trans-
action-oriented star schema discussed in this paper.
3.1.1. Fact tables
The most appropriate facts are additive numeric data
items that can be summed, averaged, or combined in
other ways across the dimensions to form summary
statistics. The only way to compress the millions of data
points and produce a reasonably sized answer set is to
present some mathematical summarization. No human
will want thousands, let alone millions, of items in
answer to their queries. As Kimball [19] pointsout, ‘‘the
best and most useful facts are numeric, continuously
valued, and additive.’’ The CATCH data warehouse
includes facts such as counts of hundreds of different
health events, population-based rates, age-adjusted
rates, and even fine-grained financial data in the case
of the hospital discharge data depicted in Fig. 2. For
example, using the hospital discharge star it is possible
to focus on a single hospital (using the hospital dimen-
sion), select a single disease (using the ICD DIAGNOSIS
dimension), and investigate how the length of stay has
varied over a specified time period. Using the hierarch-
ical nature of the dimensions, it is also possible to ‘roll-
up’ to compare types of hospitals, disease categories,
or even patient age bands. While the dimensional
structure is simple and readily understandable, it sup-
ports a large and very useful universe of queries.
3.1.2. Dimension tables
The dimensions define the query environment, the
richer the set of dimensions the more ways the data
can be accessed via queries. Two of the important
characteristics of dimensions are the richness of the
attributes that describe the dimension and the hier-
archical nature of the dimension. For example, the
COUNTY dimension in the CATCH data warehouse
includes attributes that describe whether a county is
coastal, wealthy, urban, dense, large in area, or
includes a military base. Therefore, the counties can
be organized by any value in this attribute set. Some
of the attributes lend themselves to hierarchical
organization. In the case of COUNTY, there is natural
geographic hierarchy that includes groups of counties
that form regions within the state and the state itself.
The county is also composed of finer geographic units
D.J. Berndt et al. / Decision Support Systems 35 (2003) 367–
384 371
such as communities, ZIP codes, and census tracts. The
dimension hierarchies enable roll-up and drill-down
operations that control the level of detail in queries.
These formally defined hierarchies also provide the
framework for navigation or data browsing.
In order to describe the dimension hierarchies suc-
cinctly to both end-users and developers, dimension
hierarchy diagrams have been utilized in the CATCH
data warehouse design process. These diagrams show
the hierarchical nature so that end-users have an
uncluttered view of how they can navigate and design-
ers can easily understand the dimensional structures.
Fig. 3 illustrates an important health care dimension
based on the International Classification of Disease
(ICD) codes. Currently, we are using versions 9 and 10
of the ICD codes. These codes are divided into chapters
and sections, which provides a natural hierarchy for the
codes. Fig. 3 shows the hierarchical structure using
separate tables, but these tables can be easily denor-
malized to enhance query performance. In addition,
there are several other tables that provide alternative
hierarchies for this important dimension. This ICD
PROCEDURE dimension is combined with many other
dimensions such as patient age, gender, mortality risk,
and severity of illness to form star schemas (see Fig. 2)
with rich query environments.
3.2. Data warehouse design: the data access pyramid
The mission of the CATCH data warehouse is to
support the automated and cost-effective application
of CATCH, as well as to enable more detailed
Fig. 2. Hospital discharge star schema (not all dimensions are
shown).
D.J. Berndt et al. / Decision Support Systems 35 (2003) 367–
384372
analyses that were not possible using the coarse-
grained data that typified past CATCH reports. In
order to meet these goals, the data warehouse design
includes several levels of data granularity, from the
coarse-grained data used in generic report production
to actual event-level data, such as hospital discharges.
The data warehouse design includes major compo-
nents at all three levels of granularity as illustrated in
the data access pyramid found in Fig. 4.
Report indicators—Reporting tables with derived
or highly aggregated data are used to support the core
CATCH reports, including comparisons between a
target county and peer counties. These tables also
provide fast response for interactive access via data
browsing tools and can provide the foundation for
simple community-wide Internet access. In addition,
the metadata play an important role at the reporting
level, providing indicator definitions, state or federal
goals, and expert domain knowledge for priority
filters (e.g., economic impact and treatment availabil-
ity). This report level of the data warehouse may not
be needed in all data warehouse applications but
provides important support for rapid generation of
community CATCH reports.
Aggregate data—There are families of star sche-
mas that provide true dimensional data warehouse
capabilities, such as interactive roll-up and drill-down
operations. These components have carefully designed
dimensions that can be utilized by more sophisticated
data browsing tools. The star schemas are populated
using thorough data staging and quality procedures that
usually involve processing detailed data sets extracted
by various health care agencies and organizations.
Typically, the data are aggregated and transformed for
loading into a family of related star schemas—a con-
stellation—that share important dimensions and sup-
port interactive online analytic processing (OLAP)
techniques.
Transaction data—For certain types of informa-
tion, the design calls for retaining very fine-grained or
even event level data. An example is the hospital
discharge data that includes each hospital discharge
event for the more than 200 hospitals that are man-
dated to report such information in Florida. These data
are retained at the transaction level because of the rich
set of facts and dimensions available for analysis and
the density of potential aggregations that result in
negligible space savings.
These three levels of aggregation within the data
warehouse combine to meet a wide range of reporting
requirements and performance goals, thus providing a
flexible basis for disseminating health care informa-
tion to community decision-makers. The following
two sections (Sections 3.3 and 3.4) provide some
examples of the major data warehouse components.
At the aggregate data level, a coarse-grained compo-
nent based on the Public Health Information Data
System (PHIDS) is used to support CATCH report
production and high-level browsing. A second exam-
Fig. 3. ICD PROCEDURE dimension hierarchy.
Fig. 4. Data access pyramid.
D.J. Berndt et al. / Decision Support Systems 35 (2003) 367–
384 373
ple aggregate is procedure volume information
formed from the underlying hospital discharge data.
The original hospital discharge data provide an exam-
ple of transaction-oriented data that supports detailed
analyses, along with other data such as vital statistics
(e.g., births and deaths) and specific disease registries.
3.3. Aggregated Florida Department of Health Data
An example of a highly aggregated data warehouse
component is the Public Health Information Data
System (PHIDS) star schema. The Florida Department
of Health collects, analyzes, and reports a large
number of public health indicators. These items have
always provided critical assessment measures within
CATCH. The importance of the PHIDS indicators
made them obvious candidates for inclusion in the
data warehouse and a natural resource for automation
of the traditional CATCH report.
The PHIDS indicators are clearly not the fine-
grained data that support a detailed OLAP environ-
ment. The data are highly aggregated and provided
annually at the county level. Therefore, the data set is
suitable for generating the traditional CATCH report,
but unsuitable for more specific analyses. Essentially,
the construction of the data warehouse has been a
search for both fine and coarse data that can provide
synergies through integration. The simple star schema
used to implement the PHIDS-based data warehouse
component has only the year reported and the county
as explicit dimensions. Currently, many of the PHIDS
indicators are maintained using spreadsheets at the
Florida Department of Health. For use in the data
warehouse, the data are first extracted from the
spreadsheets, reformatted using custom staging pro-
grams, and then loaded via a bulk loader utility. The
twin star staging process, as described in Section 4, is
used to ensure data quality. Data correctness is veri-
fied by sampling the data and comparing the data
warehouse values with published PHIDS reports.
3.4. Transaction-oriented hospital discharge data
Florida hospital discharge transactions are col-
lected by the Agency for Health Care Administration
(AHCA) from the more than 200 short-term acute care
hospitals in the state. These hospitals report every
discharge transaction, regardless of payer, throughout
the state. Hospital discharge data are used to derive
several CATCH indicators such as avoidable hospital-
izations due to diabetes and other chronic diseases.
Typically, the large volume of hospital discharge
transactions is scanned to form derived or aggregated
data for CATCH indicators. However, the broader
mission of the CATCH data warehouse is both to
support the CATCH methods and enable more
detailed investigations of critical local health care
issues. It is the ability to fully explore issues at
appropriate levels of detail that make the fine-grained
components so important. While first staging and
preprocessing the hospital discharge data for use in
forming CATCH indicators, the value of the discharge
transactions themselves became very apparent. The
hospital discharge transactions provide an interesting
set of numeric data items, such as length of stay and a
breakdown of revenues, which are very well suited for
a data warehousing approach. In addition, the trans-
actions include a rich set of attributes that provide
many natural dimensions for use in formulating
queries.
Transaction-based star schemas can provide very
useful functionality within a data warehouse frame-
work, making the hospital discharge star an important
component of the CATCH data warehouse. The hos-
pital discharge data includes over 20 interesting
dimensions such as the discharging hospital character-
istics, admission criteria, diagnostic codes, procedure
codes, reimbursement categories, time, geographic
location, and many others. Furthermore, many of
these dimensions are hierarchical in nature, easily
supporting important roll-up/drill-down operations.
Fig. 2 is a partial representation of the discharge star
schema. The discharge star is equally rich in additive
numeric facts. For instance, length of patient stay is a
particularly important measurement for analysis.
There is also a measurement indicating elapsed days
until the medical procedure. Finally, there is a total
revenue item that provides important cost information.
In fact, there is also a large text field with embedded
revenue items that provides a breakdown of the
various costs from room charges to laboratory fees.
Procedures to parse this text field have been devel-
oped as part of the data staging activities and are used
to extract revenue items, providing nearly 30 interest-
ing numeric facts for each transaction. It is not
uncommon to have useful information buried in text
D.J. Berndt et al. / Decision Support Systems 35 (2003) 367–
384374
fields that must be preprocessed using data staging
tools or customized procedures. This can be a chal-
lenging task since the source database has no under-
standing of the structure embedded in such text fields
and therefore, simple query access is impossible. In
this case, the rich set of facts and highly dimensional
structure of the hospital discharge data make it a
powerful warehouse component for detailed investi-
gations and customized analyses.
The hospital discharge star has repeating groups for
diagnoses (ICD DX 1–10) and procedures (ICD
PROCEDURE 1–10). This design mirrors the underlying
data and simplifies the data staging process for the
millions of discharge records used in the project. An
alternative design without repeating groups might
simplify some queries, but this fine-grained data is
at the bottom of the data access pyramid and is
typically aggregated for most query processing. The
original positional representation also conveys infor-
mation relevant to health care coding practitioners and
is used in several ancillary algorithms. For many
purposes, the primary diagnosis or procedure is used
in calculating higher-level health care indicators, so
this structure is maintained in the transaction-oriented
data [28].
It is sometimes preferable to store the actual trans-
actions rather than lightly aggregated data that has
been derived from the underlying transactions. Kim-
ball [19] uses the term sparsity failure to describe the
size explosion that can occur when creating aggregate
data from a sparsely populated fact table. Detailed fact
data such as hospital discharge transactions will
probably not have all combinations of the dimensions
present in the actual data. In other words, not all
diseases occur in all hospitals during a particular year
and therefore the effect with regard to size is not
multiplicative. If we consider only the cardinality of
the actual dimensions then the possible combinations
of dimension key values is very large for the hospital
discharge data. For example, consider the following
four dimensions with approximate cardinalities, hos-
pitals (250), ICD codes (15,000), severity ratings (5),
and payers (10). This could result in 187.5 million
dimension key combinations. Further, we can define
density as the actual number of records (roughly 2
million/year for discharges) divided by the potential
combinations of dimension keys, yielding a density of
2/187.5 or roughly 1.07%. This remarkably low
density makes intuitive sense since the very fine
ICD distinctions lead to sparse usage. Imagine that
we decide to construct an aggregate table by creating
150 disease categories that summarize the 15,000 ICD
codes, reducing the dimension size by a factor of 100.
In this case, all 150 categories may appear for each
hospital (a reasonable assumption) giving a density of
100% and roughly 1.9 million rows. This rather
insignificant space savings comes at the expense of
losing the richness of the original ICD codes and the
flexibility of having individual cost data for each
transaction. Therefore, in the CATCH data warehouse
and many other applications, transaction-oriented
components make good sense. In fact, to really under-
stand the implications for tasks such as data ware-
house capacity planning it is often necessary to
sample the data to discover the actual distribution of
dimension values. The design challenge is to carefully
consider the number of fine-grained items that are
summarized to form the aggregate data and look for a
factor of 10 or more as a reasonable compression ratio
[19].
3.5. Performance issues
The large volumes of data contained in the CATCH
data warehouse coupled with demanding queries can
conspire to produce some truly awful performance. As
in any database project, good design is the most
effective tool for enhancing performance. The
CATCH data warehouse design continues to evolve
in response to new challenges. In addition to design
changes, three other techniques offer avenues for
improving performance: aggregate tables, star schema
indexing strategies, and physical table partitions.
3.5.1. Aggregates
Many data warehouse designers identify aggre-
gates as one of the most effective strategies for
improving performance. Kimball [19] notes that
‘‘aggregates can have a very significant effect on
performance, in some cases speeding queries by a
factor of 100 or even 1000.’’ If the aggregate data are
useful, having the data physically ready and waiting
will certainly improve query speeds. In addition, if
sparsity failure is avoided, then the amount of data
required may also be substantially reduced. That is,
benefits from both reduced space and previously
D.J. Berndt et al. / Decision Support Systems 35 (2003) 367–
384 375
handled computations can accrue through the use of
aggregates. In addition, many data warehousing nav-
igation tools are aggregate-aware, making the aggre-
gate structures transparent to the end user. However,
there are a potentially large number of aggregates that
are possible given a rich set of dimensions. The choice
of which aggregate tables to build is based on the type
of queries being executed and will naturally change
over time [14].
Aggregates play an important role in the CATCH
data warehouse. Some data are extracted and loaded
in aggregate form, such as the PHIDS indicators
discussed above, and other aggregates are derived
from more detailed data warehouse components. For
instance, vital statistics such as death and birth certif-
icates are used to derive a collection of aggregated
mortality and birth-related indicators. There are two
somewhat different purposes for aggregates. Highly
aggregated data are used to directly support traditional
CATCH report production, while lightly aggregated
data are used to improve query performance. The
continual re-evaluation of aggregates is an important
task in data warehouse administration.
3.5.2. Indexing
Many database management systems intended for
data warehousing support bitmap index structures.
Bitmap indexes are especially suited to low cardin-
ality dimensions such as admission quarter, day of the
week, gender, and others. These indexes are space
efficient and speed the star queries that characterize
access to fine-grained structures such as the hospital
discharge data. Another technique is to cache the
smaller dimension tables in memory for improved
query performance. All of these techniques have been
employed and performance tuning continues to be an
ongoing activity as the user community grows and
explores new uses for the data warehouse.
3.5.3. Partitioning
The third important performance tuning technique
is the use of physical table partitioning [6]. The use of
table partitions is important both for query perform-
ance, as well as data warehouse management. Since
the data are loaded or staged at different times, these
activities can be isolated through partitioning. This
also allows preprocessing and data quality procedures
to be run on separate partitions. In addition, parti-
tioned indexes can also be used. One of the most
important benefits of partitioned tables is the oppor-
tunity for the optimizer to exclude large portions of
the data when queries include restrictions on parti-
tioning attributes. An excellent example of partitioned
tables in the CATCH data warehouse is the hospital
discharge data. In recent years, there have been
roughly 2 million discharge transactions/year. The
goal is to keep at least 10 years of discharge data or
20 million transactions available for analysis, but
often only a few years are necessary for any given
query, thereby creating an ideal parameter for parti-
tioning. The hospital discharge data is partitioned by
year, with roughly 1.5–2 million rows per partition. If
a query specifies a single year or a small range of
years, the optimizer can create an execution plan that
only searches the required partitions, leaving the vast
majority of data untouched. Since most of the detailed
interactive analyses fit this mold, the performance
tends to be quite good. However, the entire collection
of data is still available for queries that cover a wide
range of years, it just takes more time.
4. Data staging and quality assurance
The extraction, transformation, and loading (ETL)
functions in a data warehouse are considered the most
time-consuming and expensive portion of the devel-
opment lifecycle [22]. These processes are concerned
with the extraction of data from legacy systems,
transformation and preprocessing requirements to
produce useful, integrated data, and the transportation
of the data into the actual data warehouse structures.
The CATCH data warehouse involves somewhat
unusual challenges with regard to data staging activ-
ities. The data are drawn from multiple organizations,
which usually apply in-house transformations to data
collected by yet another layer of organizations. For
instance, the hospital discharge data are originally
collected by hospitals and reported to the Florida
Department of Health. These data are then integrated,
preprocessed, and provided to other interested organ-
izations, including the CATCH data warehouse proj-
ect. In the case of demographic data, population levels
are extracted from the Florida Governor’s Office and
the Census Bureau. Overall, the data warehouse has
continued to grow without the need for a data purging
D.J. Berndt et al. / Decision Support Systems 35 (2003) 367–
384376
strategy. However, as the size continues to increase
and finer geographic levels are used, a purging strat-
egy will become necessary in the near term. The
design is already multi-level, as described in Fig. 4,
and it is the base of the data pyramid that accounts for
most of the space. As space becomes an issue, the
earlier years of fine-grained data will be purged and
retained offline. These structures are maintained as
physical partitions, so the purging operations can be
conducted without disrupting data access and data can
easily be re-introduced.
Two innovative techniques, twin star data staging
and data quality filters, have been developed to
manage the ETL processing required in the CATCH
data warehouse.
4.1. Twin star staging
Fig. 5 outlines the twin star staging process and its
three component stages. The approach is designed to
utilize the power of commercial database systems,
especially referential integrity constraints and excep-
tion processing. The various stages use a combination
of scripting languages, bulk-loading tools, and data-
base procedures.
4.1.1. Stage 1
The process begins with file-based preprocessing
and cleansing activities. These procedures can be
written in any programming language, but AWK
and Perl have been especially useful in the CATCH
project with their built-in parsing and pattern match-
ing capabilities. Data transformation, quality checks,
and simple reports can all be performed on the initial
data file. Even though many checks will be repeated
throughout the data staging process, the presence of
redundant checks is an asset with regard to data
quality. Stage 1 of the twin star strategy involves
using a bulk loader to move the data into a staging
table within the database system. The staging table is
designed for maximum flexibility in storing data,
minimizing data type conflicts, and providing a work-
bench for database-resident transformation proce-
dures. Typically this includes additional attributes
that are created as part of the preprocessing and
cleansing tasks. Bulk loading utilities are used to
quickly populate the staging table and capture prob-
lematic data in a series of log files. Data type, unique-
ness, and ‘‘not null’’ checks for critical staging table
attributes can be used to control the thoroughness of
this data staging step. With care, many simple data
quality issues can be resolved at this early stage.
4.1.2. Stage 2
The temporary star shares the critical data dimen-
sions with the permanent star, and is essentially a
‘twin’ of the permanent star (though there may be
different supporting dimensions for particular tasks).
The fact table attributes and important dimensions
should be exact duplicates so that any operations or
referential integrity checks will be consistent between
the stars. Stage 2 entails moving the data from the
staging table to the temporary star. Attribute data
types should be compatible and referential integrity
constraints can be used to check for valid dimension
keys. The referential integrity constraints are disabled
and later re-enabled sequentially after the load to
perform the checks in one sweep, thereby improving
processing time. Most database systems provide a
method of capturing invalid rows and it is important
to make use of such capabilities during both the StageFig. 5.
Twin star staging.
D.J. Berndt et al. / Decision Support Systems 35 (2003) 367–
384 377
2 and 3 transfers. Since the temporary star is the
functional equivalent of the permanent star, just much
smaller, the interface and data browsing tools devel-
oped for the actual data warehouse can be used to
exercise the temporary star. Test reports, browsing by
power users, and sanity checks based on comparisons
with previously loaded data in the permanent star are
all useful methods of ensuring high quality data in the
temporary star.
4.1.3. Stage 3
The permanent star is the long-term storage area
for the data warehouse. This star must be carefully
indexed, distributed across storage devices to avoid I/
O bottlenecks, and possibly partitioned. As noted
earlier, partitioned tables can provide performance
improvements by distributing information across
physical devices and by allowing the query optimizer
to select only the relevant partitions. In addition,
partitioned tables ease data warehouse management
tasks through creation, loading, and archiving of
independent partitions. The Stage 3 transfer from the
temporary star to the permanent star should be fast
and free of data type and referential integrity viola-
tions. The simple transfer will allow large volumes to
be processed within most load windows. Redundant
referential integrity constraints can be used as a final
check (again disabling and re-enabling for efficiency).
The resulting exception tables should be empty, but
any offending rows are a clear sign that somehow
problems survived Stages 1 and 2. This provides a last
opportunity to postpone releasing or publishing the
data.
4.2. Data quality filters
The data quality issues that surface while initially
constructing a data warehouse are among the most
challenging obstacles, contributing significantly to the
time spent in data staging activities. As noted above,
the ETL processes and quality assurance procedures
can account for the majority of time and resource
commitments in a data warehouse project. This has
been the case in the CATCH data warehouse project,
where there are a large number of data sources and
many intermediate stages for errors to be introduced.
In addition, the challenge of producing a truly inte-
grated design requires translations to common defi-
nitions and shared dimensions. Rather than any
‘‘magic bullet,’’ a long-term effort to develop a
comprehensive set of preprocessing procedures will
produce the best data quality. The procedures under
development on the CATCH project include a meas-
ure of redundancy to provide added insurance against
quality problems surviving various phases of the ETL
process. As more procedures have been added to the
quality assurance arsenal, an interesting structure has
emerged, mirroring the natural structure of the data
warehouse, with procedures falling into the following
categories of quality filters.
. Fact filters are the quality procedures used to
check the fine-grained data, such as hospital discharge
transactions. For example, any discrepancies between
itemized fees and total charges should be flagged.
Quality procedures at this level compare attributes
within a fact table row, or may compare between two
rows, but the focus is on fine-grained data.
. Aggregate filters include quality checks that
become possible only when the focus is on summaries
of fact-level answer sets. As we have seen, aggregates
are important for boosting performance, but they also
present data quality assurance opportunities. At this
level, ‘roll-up’ operations over important dimensions
allow aggregate averages, maximums, or other sum-
maries to be compared. With regard to hospital dis-
charge transactions, comparisons of average lengths
of stay, maximum costs, or diagnostic volumes can all
be usefully compared by hospital and by year. That is,
large hospitals can be verified against each other and
new data can be compared against previous years.
Aggregate filters can be the basis for some very
powerful data quality procedures, effectively using
the capabilities existing in the data warehouse.
. Dimension filters are the procedures used to
investigate ‘dirty’ dimensions. For instance, many
business-oriented data warehouses include a customer
dimension that can be very large and may have severe
data quality problems [19]. Duplicate customer
entries, household matching, and data obsolescence
issues are among the problems inherent in such
dimensions [3,10]. In the CATCH data warehouse,
dimensions that must be carefully monitored include
hospitals, practitioners, and geographic entities such
as counties and communities. Dimension filters can be
used to monitor many problems with regard to chang-
ing dimensions.
D.J. Berndt et al. / Decision Support Systems 35 (2003) 367–
384378
These three broad categories of quality filters can
be further refined based on the type of comparison
being used. For instance, the intratuple filters involve
comparisons between attributes within a single record.
Of course, the record itself may be at an aggregate
level and represent a summary of a fact-level answer
set. For example, average pharmacy costs may have a
fairly predictable relationship with total charges for a
given disease. This type of comparison could be used
as a quality check within a given aggregate hospital
discharge record, an example of an intratuple aggre-
gate filter.
Comparisons across records, or intertuple filters,
provide a rich set of quality assurance opportunities
that examine relationships between fact table rows or
aggregates of these rows. An example of this type of
filter would be comparisons between disease volumes
by year. Unlikely disease distributions, after account-
ing for population growth, might indicate a data
quality problem with new data. The distinction
between intertuple and intratuple comparisons, com-
bined with the major filter categories, leads to six
interesting filter categories that seem to naturally
describe the many types of quality procedures being
built into our data warehouse.
An additional quality assurance strategy involves
comparing data warehouse aggregates with known
summaries published by outside sources. This process
can best be described as a quality benchmark, where
externally derived data is used to check internal data
warehouse procedures. This type of quality procedure
usually includes permanent data quality tables popu-
lated with externally produced data summaries based
on published reports or spreadsheet calculations. For
instance, state-level reports on the number of specific
disease occurrences provide a benchmark for data
warehouse aggregates based on the underlying hospi-
tal data. Automated comparison procedures report
only reasonably large departures based on user-
defined thresholds. Quality benchmarks are particu-
larly important as ongoing development activities
yield both larger volumes of data, as well as new
aggregation procedures. Before new versions of pro-
cedures or interface tools are deployed, historical
quality benchmarks can be used to evaluate their
performance. Both quality benchmarks and filters
are part of the substantial infrastructure necessary to
meet data quality goals. These tools account for a
significant portion of the CATCH data warehouse
development effort [2].
5. CATCH data warehouse applications
The data warehouse is used to support a variety of
activities, from automating the original CATCH
reports to supporting current health care research
initiatives. The CATCH methods provided a solid
foundation for the initial implementation efforts, but
as components have been added the synergies have
opened new application opportunities. Clearly, the
human–computer interface is of paramount impor-
tance in the data warehouse environment and the
primary determinant of success from the end-user
perspective [1]. In order to support analysis and
reporting tasks, the data warehouse must have high
quality data and make that data accessible through
effective interface technologies. The act of releasing
data in a warehouse is in a very real sense the same as
publishing that data in printed form—retractions in
both media can be very painful.
5.1. Producing CATCH reports
CATCH reports have been refined over the past
decade in the field. The field expertise available in the
interdisciplinary research team infused the require-
ments process and provided a clearly identifiable goal
as the first step in data warehouse construction.
Hundreds of stored procedures, as well as the design
itself, implement many aspects of this domain exper-
tise. The stored procedures generate the health status
indicators and move them upward in the data access
pyramid for final report production. The reports allow
quick and easy access to comprehensive summaries
and more detailed collections of information from the
data warehouse using standard report writing technol-
ogies. This type of pre-defined and thorough reporting
is critical for implementing a more automated CATCH
report and will probably be the preferred format for
many users. For example, the comparison between
target counties and peer counties, as well as state
averages, are fundamental components of the original
CATCH reports and important tools for community
health care planners. In addition, current and historical
trend information is provided on fact sheets for each
D.J. Berndt et al. / Decision Support Systems 35 (2003) 367–
384 379
health indicator. The final reports are really reference
books (numbering over 300 pages) with several major
parts, such as comparisons, fact sheets, and prioritized
lists.
New features that move beyond the original
CATCH reports include components that enable
user-defined communities to supplement the tradi-
tional county-level perspective. Users can define
smaller communities based on geographic or demo-
graphic criteria, with community fact sheets providing
an exploded view of selected health status indicators.
While CATCH has traditionally focused on large
hardcopy reports, the reports are now produced
directly in Web-friendly formats for electronic distri-
bution. The advantage of this approach is that a strong
methodological structure can be retained as the reports
are much more widely distributed. The interested
reader can refer to the Center for Health Outcomes
Research (CHOR) Web site for examples of current
reports [chor.hsc.usf.edu].
In addition to static reports, the high-level compo-
nents of the data warehouse can be accessed dynam-
ically using data browsing tools. It is usually possible
to constrain the navigation, while still providing
enough freedom to explore many more perspectives
than can be accommodated in a traditional report. Fig.
6 shows an online analytic processing (OLAP) tool
being used to browse through trend information for
specific indicators at the county level. Most of these
tools can support both desktop and Web browser
access, making this an important new avenue for data
dissemination.
5.2. Investigating health care issues
Data warehouse browsing tools provide star query-
like access through a flexible menu-based interface,
with pull-down menus representing important dimen-
sions. These types of tools are easy to use and support
some ad-hoc exploration, but are usually controlled
Fig. 6. Browsing screen for community indicators.
D.J. Berndt et al. / Decision Support Systems 35 (2003) 367–
384380
chor.hsc.usf.edu
through an administrative layer that determines the
data available to end-users. In developing a flexible
interface, there is a tradeoff between the ability to
express ad-hoc queries and the ease-of-use that results
from pre-defined constructs implemented by data
warehouse designers and administrators. Of course,
SQL can provide an ad-hoc query facility, but requires
some care in the data warehouse environment with
very large tables and ill-formed queries conspiring to
sharply degrade performance. In addition, use of SQL
by casual users often produces incorrect queries
resulting in erroneous results from the data ware-
house. As noted above, OLAP tools can be used to
empower standard report users and allow simple
navigation through many more views than can be
produced using traditional reporting tools, yet still
curtail unwanted operations.
A second, and in some ways more important role
for the browsing tools is to provide a flexible interface
for more customized analysis. Health care issues high-
lighted by preliminary reports can be investigated
more fully using the finer levels of detail maintained
in the data warehouse. These tasks might entail query-
ing the true dimensional star schemas that include age,
gender, race, and other dimensions, or even the event-
oriented data, such as hospital discharges. These data
warehouse resources support much more detailed
analyses, allowing the user to focus on issues such as
differences in age or race with regard to specific health
status indicators. Once decision-makers review the
CATCH report, they may have community-specific
issues that relate to the diverse population groupings
that inevitably fall within somewhat arbitrary political
boundaries. Dealing effectively with such important
Fig. 7. Browsing screen for hospital disease indicators.
D.J. Berndt et al. / Decision Support Systems 35 (2003) 367–
384 381
issues requires a more careful and focused analysis that
is precluded at the higher levels of aggregation that
make up the generic CATCH reports.
A current research initiative involves the explora-
tion of volume and cost-related issues in health out-
comes. Data browsing tools are used for exploratory
analysis. Used in this manner, OLAP tools provide a
first step in the data mining process [13]. Fig. 7
illustrates a browsing screen in which detailed vol-
ume, length of stay, and cost data are presented for
specific hospitals, or groups of hospitals. In addition
to tabular representations, these tools provide graphic
capabilities that support simple data visualization.
5.3. Security issues
Currently, dynamic access to the CATCH data
warehouse is restricted to the development team and
associated health care researchers. Obviously, some of
the data may be sensitive in nature. Data security is an
important issue in the health care environment and the
CATCH data warehouse attempts to balance the
information requirements for local health care plan-
ning with critical security issues. It is important to
note that no patient identifiers of any kind are incor-
porated within the data warehouse. Security issues are
mostly concerned with reporting rare events in geo-
graphic areas that might allow a person to be identi-
fied through other data sources. Detailed security
policies provide guidance for the manipulation and
reporting of health care data in the data warehouse.
6. Conclusions
The CATCH data warehouse can have an important
impact on our health status by making rigorous, quan-
titative information available to health care decision
makers in local, state, national, and international com-
munities. In this paper, we have described some of the
technical challenges faced in designing and implement-
ing a data warehouse for health care information. We
have presented innovative research contributions in the
areas of data warehouse design, data staging for ETL
processing, data quality assurance, and health care data
warehouse applications.
The CATCH data warehouse is now fully func-
tional. For example, it has been recently used to
produce a comprehensive CATCH report for Miami–
Dade COUNTY, Florida’s largest county. As part of
this report we were asked to provide more detailed
assessments of the eight commission districts within
the county. The flexibility of the data warehouse to
provide customized reporting allowed us to provide
these analyses rapidly and effectively. Because this
report was the first to be fully automated, we verified
the accuracy of the report with a complete hand check
of every data table. The discrepancies between the
automated data tables and the manually derived tables
were minimal and easily reconciled.
The CATCH data warehouse remains a work in
progress. We are pursuing an active research agenda
to enhance the technical data warehousing capabilities
and community health care applications. We invite the
reader to follow the progress of the data warehouse at
our CHOR web site [chor.hsc.usf.edu]. In the next
sections (Sections 6.1 and 6.2), we briefly review our
current research directions.
6.1. Data warehouse research directions
The CATCH data warehouse provides a rich
research environment for focused investigation in
the following areas.
. Data Warehouse Design—The variety and vola-
tility of health care data sources make the mainte-
nance of the data warehouse design a true challenge.
Changes to source data formats frequently require the
updating of dimension table schemas. Often historical
data cannot be placed into the new format without
information loss. Finding solutions for maintaining
historical accuracy while providing efficient use of all
data in current applications is difficult. We are
researching design techniques to minimize the impact
of dimension table changes on the maintenance and
operations of the data warehouse [2].
. Data Staging—As presented in Section 3, we
have implemented an innovative twin star data staging
procedure. Ongoing research will study the perform-
ance of twin star data staging on various data loads.
Enhancements to the procedure will be proposed and
implemented.
. Data Quality—Issues of data quality dominate
our research agenda. The health care field places
particular emphasis on data accuracy, timeliness, pri-
vacy, and ease of use [27]. We are in close contact
D.J. Berndt et al. / Decision Support Systems 35 (2003) 367–
384382
chor.hsc.usf.edu
with the communities who receive the CATCH reports
and have interviewed a number of users to elicit data
quality requirements. This information will be used to
drive our research to improve data quality in the
CATCH data warehouse.
. Data Dissemination—The technologies for dis-
seminating CATCH reports to communities are rap-
idly evolving. The requirements of the receiving
communities and the capabilities of the data ware-
house system will drive future research directions.
. Data Mining—We are aggressively investigating
several areas of knowledge discovery in the CATCH
data warehouse [11]. We have a unique capability to
perform detailed studies in such areas as physician and
hospital volume, racial disparities in health care, and
environmental impacts on community health status.
6.2. Community decision-making with CATCH data
The CATCH data warehouse will result in wide-
spread distribution of data previously unavailable to
most communities, as well as online access for spe-
cialized inquiry. Many issues arise as to how the
communities will make the most effective use of the
CATCH data for health care decision-making. This is
an area with considerable research potential.
There is a rich literature on the decision-making
process both with and without information technol-
ogy. The study of group decision support systems and
environments has a strong tradition in the manage-
ment information systems field [8]. In many ways,
this important body of work is appropriate to health
care decision-making, which is usually group-ori-
ented. For example, Dennis et al. [9] study the effects
of minority influence on decision-making and find
that the presence or absence of technology has very
different effects. Another important contributing area
would be the political process and its ramifications to
decision-making [20]. Certainly, policy making in
health care is very much a political process.
The use of the CATCH methodology and state-of-
the-art data warehousing technology across many
Florida communities will provide a rich research
opportunity for studying interesting issues on group
decision-making in community health care organiza-
tions. Such issues would include the composition of
the decision-making group, the community stakehold-
ers and their political influence, the decision-making
process, and dissemination patterns of health care
information in the community. The complexities and
the interrelationships among these issues make the
design of research studies both a challenge and an
opportunity. As the automated CATCH reports are
produced for various communities in Florida, we will
study how effectively the CATCH information is used
for health care planning.
Acknowledgements
The authors gratefully recognize the U.S. Depart-
ment of Commerce, which has provided funding
through a Technology Opportunities Program (TOP)
grant. The Florida Department of Health has been a
research partner. Research collaborators in the College
of Public Health include R. Campbell, E. Gilbert, S.
Luther, B. Myers, and B. Steverson. Contributing
graduate students in the College of Business Admin-
istration include S. Hedge-Desai, R. Marsh, D.
McCorkel, M. Nevrekar, M. Pearl, R. Rajendrababu,
and J. Slayton. The authors also thank Oracle
Corporation for making their state-of-the-art develop-
ment tools available through the Oracle Academic
Initiative.
References
[1] D. Berndt, A. Hevner, J. Studnicki, Data warehouse
dissemina-
tion strategies for community health assessments, informatik/
informatique, Journal of the Swiss Informatics Society (1) (Feb-
ruary 2001) 27–33.
[2] D. Berndt, J. Fisher, A. Hevner, J. Studnicki, Healthcare
data
warehousing and quality assurance, IEEE Computer 34 (12)
(December 2001) 33–42.
[3] D. Berndt, R. Satterfield, Customer and household matching:
resolving entity identity in data warehouses, Proceedings of
AeroSense 2000, Conference on Data Mining and Knowledge
Discovery, Orlando (April 2000).
[4] Center for Disease Control and Prevention, Principles of
Com-
munity Engagement, 1997, Atlanta.
[5] D. Chrislip, C. Larson, Collaborative Leadership: How Citi-
zens and Civic Leaders Can Make a Difference, Jossey-Bass,
San Francisco, 1994.
[6] M. Corey, M. Abbey, Oracle Data Warehousing, Oracle
Press
and Osborne McGraw-Hill, New York, 1997.
[7] S. Cropper, Collaborative working and the issue of
sustainabil-
ity, in: C. Huxham (Ed.), Creating Collaborative Advantage,
SAGE Publishers, London, 1996.
[8] A. Dennis, Information exchange and use in group decision
D.J. Berndt et al. / Decision Support Systems 35 (2003) 367–
384 383
making: you can lead a group to information but you can’t
make it think, MIS Quarterly 20 (4) (1996) 433–458.
[9] A. Dennis, K. Hilmer, N. Taylor, Information exchange and
use in GSS and verbal group decision making, Journal of MIS
14 (3) (1998) 61–88.
[10] D. Dey, S. Sarkar, P. De, A probabilistic decision model
for
entity matching in heterogeneous databases, Management Sci-
ence 44 (10) (October 1998) 1379–1396.
[11] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy
(Eds.), Advances in Knowledge Discovery and Data Mining,
The AAAI Press, Menlo Park, CA, 1996.
[12] P. Gray, H. Watson, Decision Support in the Data
Warehouse,
Prentice-Hall, Englewood Cliffs, NJ, 1998.
[13] J. Han, M. Kamber, Data Mining: Concepts and
Techniques,
Morgan Kaufmann Publishers, San Francisco, 2001.
[14] V. Harinarayan, A. Rajaraman, J. Ullman, Implementing
data
cubes efficiently, Proceedings of the 1996 ACM SIGMOD,
Montreal (June 1996).
[15] W. Inmon, Building the Data Warehouse, Wiley, New
York,
1992.
[16] Institute of Medicine, Summary of recommendations, in:
W.
Waterfall (Ed.), The Future of Public Health, National Acad-
emy Press, Washington, DC, 1988.
[17] Institute of Medicine, Healthy Communities: New Partner-
ships for the Future of Public Health, National Academy Press,
Washington, DC, 1996.
[18] Institute of Medicine, Measurement tools for a community
health improvement process, in: J. Durch, L. Bailey, M. Stoto
(Eds.), Improving Health in the Community, a Role for Per-
formance Monitoring, National Academy Press, Washington,
DC, 1997.
[19] R. Kimball, The Data Warehouse Toolkit, Wiley, New
York,
1996.
[20] H. Mintzberg, The Nature of Managerial Work, Harper and
Row, New York, 1973.
[21] H. Nakajima, Editorial: new players for a new era, World
Health 50 (3) (1997) 3.
[22] J. Srivastava, P. Chen, Warehouse creation—a potential
road-
block to data warehousing, IEEE Transactions on Knowledge
and Data Engineering 11 (1) (1999) 118–126.
[23] B. Starfield, Primary care and health: a cross-national
compar-
ison, Journal of the American Medical Association 266 (16)
(1991) 2268–2271.
[24] J. Studnicki, Evaluating the performance of public health
agencies: information needs, American Journal of Preventive
Medicine, Research and Measurement in Public Health Prac-
tice 11 (6) (1995) 74–80.
[25] J. Studnicki, B. Steverson, B. Myers, A. Hevner, D.
Berndt,
Comprehensive assessment for tracking community health
(CATCH), Best Practices and Benchmarking in Healthcare 2
(5) (September/October 1997) 196–207.
[26] J. Studnicki, A. Hevner, D. Berndt, S. Luther, Comparing
alternative methods for composing community peer groups:
a data warehouse application, Journal of Public Health Man-
agement and Practice 7 (6) (November 2001) 87–94.
[27] R. Wang, M. Ziad, Y. Lee, Data Quality, Kluwer Academic
Publishing, New York, 2001.
[28] J. Weissman, C. Gatsonis, A. Epstein, Rates of avoidable
hos-
pitalization by insurance status in Massachusetts and Mary-
land, Journal of the American Medical Association 268 (17)
(November 1992) 2388–2394.
[29] World Health Organization, The World Health Report
1995:
Bridging the Gaps, Report of the Director-General, Geneva,
1995.
Donald J. Berndt is an Assistant Professor
in the Information Systems and Decision
Sciences Department in the College of Busi-
ness Administration at the University of
South Florida. His research interests include
data warehousing, knowledge discovery,
and data mining. Dr. Berndt received a
PhD in Information Systems from the Stern
School of Business at New York University.
He is a member of Beta Gamma Sigma,
ACM, AIS, and INFORMS.
Alan R. Hevner is an Eminent Scholar and
Professor in the Information Systems and
Decision Sciences Department in the Col-
lege of Business Administration at the
University of South Florida. He holds the
Salomon Brothers/HRCP Chair of Distrib-
uted Technology. His research interests
include software engineering, software test-
ing, distributed database systems, and health
care information systems. He received a
PhD in Computer Science from Purdue
University. Dr. Hevner is a member of ACM, IEEE, AIS, and
INFORMS.
James Studnicki is a Professor of Health
Policy and Management at the University
of South Florida College of Public Health.
His research interests include measuring the
health status of communities, evaluating
alternative treatment outcomes, and study-
ing the influence of managed care penetra-
tion on the utilization and quality of health
services. Dr. Studnicki received a ScD from
Johns Hopkins University.
D.J. Berndt et al. / Decision Support Systems 35 (2003) 367–
384384
IntroductionThe CATCH methods of community
assessmentLimitationsCATCH data warehouse challengesThe
CATCH data warehouseThe dimensional modelFact
tablesDimension tablesData warehouse design: the data access
pyramidAggregated Florida Department of Health
DataTransaction-oriented hospital discharge dataPerformance
issuesAggregatesIndexingPartitioningData staging and quality
assuranceTwin star stagingStage 1Stage 2Stage 3Data quality
filtersCATCH data warehouse applicationsProducing CATCH
reportsInvestigating health care issuesSecurity
issuesConclusionsData warehouse research
directionsCommunity decision-making with CATCH
dataAcknowledgementsReferences

More Related Content

Similar to Chapter 51. Your company needs a small front-end loader .docx

Bus 640 Education Organization -- snaptutorial.com
Bus 640   Education Organization -- snaptutorial.comBus 640   Education Organization -- snaptutorial.com
Bus 640 Education Organization -- snaptutorial.com
DavisMurphyB66
 
Part 2 Complete the worksheet below.Each element has a uniq.docx
Part 2 Complete the worksheet below.Each element has a uniq.docxPart 2 Complete the worksheet below.Each element has a uniq.docx
Part 2 Complete the worksheet below.Each element has a uniq.docx
herbertwilson5999
 
2017 general assembly guide.docx
2017 general assembly guide.docx2017 general assembly guide.docx
2017 general assembly guide.docx
SOPEC
 
Exam 1 (covers Chapters 1-7)Math 140Show all work! Na.docx
Exam 1 (covers Chapters 1-7)Math 140Show all work!     Na.docxExam 1 (covers Chapters 1-7)Math 140Show all work!     Na.docx
Exam 1 (covers Chapters 1-7)Math 140Show all work! Na.docx
SANSKAR20
 
Deanna’s Input for Question 3As Chief Financial Management Of.docx
Deanna’s Input for Question 3As Chief Financial Management Of.docxDeanna’s Input for Question 3As Chief Financial Management Of.docx
Deanna’s Input for Question 3As Chief Financial Management Of.docx
edwardmarivel
 
SWS_Two superpowers_JHT.pptx
SWS_Two superpowers_JHT.pptxSWS_Two superpowers_JHT.pptx
SWS_Two superpowers_JHT.pptx
Sushilkumar Jogdankar
 
Mth 540 Massive Success / snaptutorial.com
Mth 540 Massive Success / snaptutorial.comMth 540 Massive Success / snaptutorial.com
Mth 540 Massive Success / snaptutorial.com
Reynoldsha
 
Mth 540 Success Begins / snaptutorial.com
Mth 540  Success Begins / snaptutorial.comMth 540  Success Begins / snaptutorial.com
Mth 540 Success Begins / snaptutorial.com
WilliamsTaylor63
 
GCMII Final ExaminationSTUDENT NAME ______.docx
GCMII  Final ExaminationSTUDENT NAME  ______.docxGCMII  Final ExaminationSTUDENT NAME  ______.docx
GCMII Final ExaminationSTUDENT NAME ______.docx
hanneloremccaffery
 
Modeling state based reinsurance
Modeling state based reinsuranceModeling state based reinsurance
Modeling state based reinsurance
soder145
 
Global Organic Food Market Forecast and Opportunities, 2020
Global Organic Food Market Forecast and Opportunities, 2020Global Organic Food Market Forecast and Opportunities, 2020
Global Organic Food Market Forecast and Opportunities, 2020
TechSci Research
 
(Ecothon Group2)ESG, SDGs, SCP ESG SLIDES
(Ecothon Group2)ESG, SDGs, SCP ESG SLIDES(Ecothon Group2)ESG, SDGs, SCP ESG SLIDES
(Ecothon Group2)ESG, SDGs, SCP ESG SLIDES
kursuskemahiranonlin
 
This assignment will serve as this weeks discussion, which is wor
This assignment will serve as this weeks discussion, which is worThis assignment will serve as this weeks discussion, which is wor
This assignment will serve as this weeks discussion, which is wor
GrazynaBroyles24
 
HCM565Module 4 CTChapter 11 Problem 1Winston Clinic is eva.docx
HCM565Module 4 CTChapter 11 Problem 1Winston Clinic is eva.docxHCM565Module 4 CTChapter 11 Problem 1Winston Clinic is eva.docx
HCM565Module 4 CTChapter 11 Problem 1Winston Clinic is eva.docx
shericehewat
 
Qnt 351 Education Organization-snaptutorial.com
Qnt 351 Education Organization-snaptutorial.comQnt 351 Education Organization-snaptutorial.com
Qnt 351 Education Organization-snaptutorial.com
robertlesew39
 
Studi kasus chapter 8
Studi kasus chapter 8Studi kasus chapter 8
Studi kasus chapter 8
Andreas Tri Panudju
 
Burson-Marsteller 2009 Green Power Progress Survey
Burson-Marsteller 2009 Green Power Progress SurveyBurson-Marsteller 2009 Green Power Progress Survey
Burson-Marsteller 2009 Green Power Progress Survey
bursonmarstellerUS
 
Acc 543 Teaching Effectively--tutorialrank.com
Acc 543 Teaching Effectively--tutorialrank.comAcc 543 Teaching Effectively--tutorialrank.com
Acc 543 Teaching Effectively--tutorialrank.com
Soaps69
 
Running head DEPARTMENTAL BUDGET AND PROPOSAL OUTLINE 1DEPART.docx
Running head DEPARTMENTAL BUDGET AND PROPOSAL OUTLINE 1DEPART.docxRunning head DEPARTMENTAL BUDGET AND PROPOSAL OUTLINE 1DEPART.docx
Running head DEPARTMENTAL BUDGET AND PROPOSAL OUTLINE 1DEPART.docx
healdkathaleen
 
Carbon-Credits-101-for-Investors (ESG, Carbon Offsets)
Carbon-Credits-101-for-Investors (ESG, Carbon Offsets)Carbon-Credits-101-for-Investors (ESG, Carbon Offsets)
Carbon-Credits-101-for-Investors (ESG, Carbon Offsets)
GlobalCapitalCanada
 

Similar to Chapter 51. Your company needs a small front-end loader .docx (20)

Bus 640 Education Organization -- snaptutorial.com
Bus 640   Education Organization -- snaptutorial.comBus 640   Education Organization -- snaptutorial.com
Bus 640 Education Organization -- snaptutorial.com
 
Part 2 Complete the worksheet below.Each element has a uniq.docx
Part 2 Complete the worksheet below.Each element has a uniq.docxPart 2 Complete the worksheet below.Each element has a uniq.docx
Part 2 Complete the worksheet below.Each element has a uniq.docx
 
2017 general assembly guide.docx
2017 general assembly guide.docx2017 general assembly guide.docx
2017 general assembly guide.docx
 
Exam 1 (covers Chapters 1-7)Math 140Show all work! Na.docx
Exam 1 (covers Chapters 1-7)Math 140Show all work!     Na.docxExam 1 (covers Chapters 1-7)Math 140Show all work!     Na.docx
Exam 1 (covers Chapters 1-7)Math 140Show all work! Na.docx
 
Deanna’s Input for Question 3As Chief Financial Management Of.docx
Deanna’s Input for Question 3As Chief Financial Management Of.docxDeanna’s Input for Question 3As Chief Financial Management Of.docx
Deanna’s Input for Question 3As Chief Financial Management Of.docx
 
SWS_Two superpowers_JHT.pptx
SWS_Two superpowers_JHT.pptxSWS_Two superpowers_JHT.pptx
SWS_Two superpowers_JHT.pptx
 
Mth 540 Massive Success / snaptutorial.com
Mth 540 Massive Success / snaptutorial.comMth 540 Massive Success / snaptutorial.com
Mth 540 Massive Success / snaptutorial.com
 
Mth 540 Success Begins / snaptutorial.com
Mth 540  Success Begins / snaptutorial.comMth 540  Success Begins / snaptutorial.com
Mth 540 Success Begins / snaptutorial.com
 
GCMII Final ExaminationSTUDENT NAME ______.docx
GCMII  Final ExaminationSTUDENT NAME  ______.docxGCMII  Final ExaminationSTUDENT NAME  ______.docx
GCMII Final ExaminationSTUDENT NAME ______.docx
 
Modeling state based reinsurance
Modeling state based reinsuranceModeling state based reinsurance
Modeling state based reinsurance
 
Global Organic Food Market Forecast and Opportunities, 2020
Global Organic Food Market Forecast and Opportunities, 2020Global Organic Food Market Forecast and Opportunities, 2020
Global Organic Food Market Forecast and Opportunities, 2020
 
(Ecothon Group2)ESG, SDGs, SCP ESG SLIDES
(Ecothon Group2)ESG, SDGs, SCP ESG SLIDES(Ecothon Group2)ESG, SDGs, SCP ESG SLIDES
(Ecothon Group2)ESG, SDGs, SCP ESG SLIDES
 
This assignment will serve as this weeks discussion, which is wor
This assignment will serve as this weeks discussion, which is worThis assignment will serve as this weeks discussion, which is wor
This assignment will serve as this weeks discussion, which is wor
 
HCM565Module 4 CTChapter 11 Problem 1Winston Clinic is eva.docx
HCM565Module 4 CTChapter 11 Problem 1Winston Clinic is eva.docxHCM565Module 4 CTChapter 11 Problem 1Winston Clinic is eva.docx
HCM565Module 4 CTChapter 11 Problem 1Winston Clinic is eva.docx
 
Qnt 351 Education Organization-snaptutorial.com
Qnt 351 Education Organization-snaptutorial.comQnt 351 Education Organization-snaptutorial.com
Qnt 351 Education Organization-snaptutorial.com
 
Studi kasus chapter 8
Studi kasus chapter 8Studi kasus chapter 8
Studi kasus chapter 8
 
Burson-Marsteller 2009 Green Power Progress Survey
Burson-Marsteller 2009 Green Power Progress SurveyBurson-Marsteller 2009 Green Power Progress Survey
Burson-Marsteller 2009 Green Power Progress Survey
 
Acc 543 Teaching Effectively--tutorialrank.com
Acc 543 Teaching Effectively--tutorialrank.comAcc 543 Teaching Effectively--tutorialrank.com
Acc 543 Teaching Effectively--tutorialrank.com
 
Running head DEPARTMENTAL BUDGET AND PROPOSAL OUTLINE 1DEPART.docx
Running head DEPARTMENTAL BUDGET AND PROPOSAL OUTLINE 1DEPART.docxRunning head DEPARTMENTAL BUDGET AND PROPOSAL OUTLINE 1DEPART.docx
Running head DEPARTMENTAL BUDGET AND PROPOSAL OUTLINE 1DEPART.docx
 
Carbon-Credits-101-for-Investors (ESG, Carbon Offsets)
Carbon-Credits-101-for-Investors (ESG, Carbon Offsets)Carbon-Credits-101-for-Investors (ESG, Carbon Offsets)
Carbon-Credits-101-for-Investors (ESG, Carbon Offsets)
 

More from christinemaritza

ENG315                                    Professional Scenari.docx
ENG315                                    Professional Scenari.docxENG315                                    Professional Scenari.docx
ENG315                                    Professional Scenari.docx
christinemaritza
 
ENG122 – Research Paper Peer Review InstructionsApply each of .docx
ENG122 – Research Paper Peer Review InstructionsApply each of .docxENG122 – Research Paper Peer Review InstructionsApply each of .docx
ENG122 – Research Paper Peer Review InstructionsApply each of .docx
christinemaritza
 
ENG122 – Research Paper Peer Review InstructionsApply each of th.docx
ENG122 – Research Paper Peer Review InstructionsApply each of th.docxENG122 – Research Paper Peer Review InstructionsApply each of th.docx
ENG122 – Research Paper Peer Review InstructionsApply each of th.docx
christinemaritza
 
ENG115ASSIGNMENT2STANCEESSAYDRAFTDueWeek.docx
ENG115ASSIGNMENT2STANCEESSAYDRAFTDueWeek.docxENG115ASSIGNMENT2STANCEESSAYDRAFTDueWeek.docx
ENG115ASSIGNMENT2STANCEESSAYDRAFTDueWeek.docx
christinemaritza
 
ENG 510 Final Project Milestone Three Guidelines and Rubric .docx
ENG 510 Final Project Milestone Three Guidelines and Rubric .docxENG 510 Final Project Milestone Three Guidelines and Rubric .docx
ENG 510 Final Project Milestone Three Guidelines and Rubric .docx
christinemaritza
 
ENG-105 Peer Review Worksheet Rhetorical Analysis of a Public.docx
ENG-105 Peer Review Worksheet Rhetorical Analysis of a Public.docxENG-105 Peer Review Worksheet Rhetorical Analysis of a Public.docx
ENG-105 Peer Review Worksheet Rhetorical Analysis of a Public.docx
christinemaritza
 
ENG 272-0Objective The purpose of this essay is t.docx
ENG 272-0Objective  The purpose of this essay is t.docxENG 272-0Objective  The purpose of this essay is t.docx
ENG 272-0Objective The purpose of this essay is t.docx
christinemaritza
 
ENG 360 01 American PoetrySpring 2019TuesdayFriday 800 –.docx
ENG 360 01 American PoetrySpring 2019TuesdayFriday 800 –.docxENG 360 01 American PoetrySpring 2019TuesdayFriday 800 –.docx
ENG 360 01 American PoetrySpring 2019TuesdayFriday 800 –.docx
christinemaritza
 
ENG 4034AHamlet Final AssessmentDUE DATE WEDNESDAY, 1220, 1.docx
ENG 4034AHamlet Final AssessmentDUE DATE WEDNESDAY, 1220, 1.docxENG 4034AHamlet Final AssessmentDUE DATE WEDNESDAY, 1220, 1.docx
ENG 4034AHamlet Final AssessmentDUE DATE WEDNESDAY, 1220, 1.docx
christinemaritza
 
ENG 3107 Writing for the Professions—Business & Social Scienc.docx
ENG 3107 Writing for the Professions—Business & Social Scienc.docxENG 3107 Writing for the Professions—Business & Social Scienc.docx
ENG 3107 Writing for the Professions—Business & Social Scienc.docx
christinemaritza
 
ENG 271Plato and Aristotlea Classical Greek philosophe.docx
ENG 271Plato and Aristotlea Classical Greek philosophe.docxENG 271Plato and Aristotlea Classical Greek philosophe.docx
ENG 271Plato and Aristotlea Classical Greek philosophe.docx
christinemaritza
 
ENG 315 Professional Communication Week 4 Discussion Deliver.docx
ENG 315 Professional Communication Week 4 Discussion Deliver.docxENG 315 Professional Communication Week 4 Discussion Deliver.docx
ENG 315 Professional Communication Week 4 Discussion Deliver.docx
christinemaritza
 
ENG 315 Professional Communication Week 9Professional Exp.docx
ENG 315 Professional Communication Week 9Professional Exp.docxENG 315 Professional Communication Week 9Professional Exp.docx
ENG 315 Professional Communication Week 9Professional Exp.docx
christinemaritza
 
ENG 202 Questions about Point of View in Ursula K. Le Guin’s .docx
ENG 202 Questions about Point of View in Ursula K. Le Guin’s .docxENG 202 Questions about Point of View in Ursula K. Le Guin’s .docx
ENG 202 Questions about Point of View in Ursula K. Le Guin’s .docx
christinemaritza
 
ENG 220250 Lab Report Requirements Version 0.8 -- 0813201.docx
ENG 220250 Lab Report Requirements Version 0.8 -- 0813201.docxENG 220250 Lab Report Requirements Version 0.8 -- 0813201.docx
ENG 220250 Lab Report Requirements Version 0.8 -- 0813201.docx
christinemaritza
 
ENG 203 Short Article Response 2 Sample Answer (Worth 13 mark.docx
ENG 203 Short Article Response 2 Sample Answer (Worth 13 mark.docxENG 203 Short Article Response 2 Sample Answer (Worth 13 mark.docx
ENG 203 Short Article Response 2 Sample Answer (Worth 13 mark.docx
christinemaritza
 
ENG 130 Literature and Comp ENG 130 Argumentative Resear.docx
ENG 130 Literature and Comp ENG 130 Argumentative Resear.docxENG 130 Literature and Comp ENG 130 Argumentative Resear.docx
ENG 130 Literature and Comp ENG 130 Argumentative Resear.docx
christinemaritza
 
ENG 132What’s Wrong With HoldenHere’s What You Should Do, .docx
ENG 132What’s Wrong With HoldenHere’s What You Should Do, .docxENG 132What’s Wrong With HoldenHere’s What You Should Do, .docx
ENG 132What’s Wrong With HoldenHere’s What You Should Do, .docx
christinemaritza
 
ENG 130- Literature and Comp Literary Response for Setting.docx
ENG 130- Literature and Comp Literary Response for Setting.docxENG 130- Literature and Comp Literary Response for Setting.docx
ENG 130- Literature and Comp Literary Response for Setting.docx
christinemaritza
 
ENG 130 Literature and Comp Literary Response for Point o.docx
ENG 130 Literature and Comp Literary Response for Point o.docxENG 130 Literature and Comp Literary Response for Point o.docx
ENG 130 Literature and Comp Literary Response for Point o.docx
christinemaritza
 

More from christinemaritza (20)

ENG315                                    Professional Scenari.docx
ENG315                                    Professional Scenari.docxENG315                                    Professional Scenari.docx
ENG315                                    Professional Scenari.docx
 
ENG122 – Research Paper Peer Review InstructionsApply each of .docx
ENG122 – Research Paper Peer Review InstructionsApply each of .docxENG122 – Research Paper Peer Review InstructionsApply each of .docx
ENG122 – Research Paper Peer Review InstructionsApply each of .docx
 
ENG122 – Research Paper Peer Review InstructionsApply each of th.docx
ENG122 – Research Paper Peer Review InstructionsApply each of th.docxENG122 – Research Paper Peer Review InstructionsApply each of th.docx
ENG122 – Research Paper Peer Review InstructionsApply each of th.docx
 
ENG115ASSIGNMENT2STANCEESSAYDRAFTDueWeek.docx
ENG115ASSIGNMENT2STANCEESSAYDRAFTDueWeek.docxENG115ASSIGNMENT2STANCEESSAYDRAFTDueWeek.docx
ENG115ASSIGNMENT2STANCEESSAYDRAFTDueWeek.docx
 
ENG 510 Final Project Milestone Three Guidelines and Rubric .docx
ENG 510 Final Project Milestone Three Guidelines and Rubric .docxENG 510 Final Project Milestone Three Guidelines and Rubric .docx
ENG 510 Final Project Milestone Three Guidelines and Rubric .docx
 
ENG-105 Peer Review Worksheet Rhetorical Analysis of a Public.docx
ENG-105 Peer Review Worksheet Rhetorical Analysis of a Public.docxENG-105 Peer Review Worksheet Rhetorical Analysis of a Public.docx
ENG-105 Peer Review Worksheet Rhetorical Analysis of a Public.docx
 
ENG 272-0Objective The purpose of this essay is t.docx
ENG 272-0Objective  The purpose of this essay is t.docxENG 272-0Objective  The purpose of this essay is t.docx
ENG 272-0Objective The purpose of this essay is t.docx
 
ENG 360 01 American PoetrySpring 2019TuesdayFriday 800 –.docx
ENG 360 01 American PoetrySpring 2019TuesdayFriday 800 –.docxENG 360 01 American PoetrySpring 2019TuesdayFriday 800 –.docx
ENG 360 01 American PoetrySpring 2019TuesdayFriday 800 –.docx
 
ENG 4034AHamlet Final AssessmentDUE DATE WEDNESDAY, 1220, 1.docx
ENG 4034AHamlet Final AssessmentDUE DATE WEDNESDAY, 1220, 1.docxENG 4034AHamlet Final AssessmentDUE DATE WEDNESDAY, 1220, 1.docx
ENG 4034AHamlet Final AssessmentDUE DATE WEDNESDAY, 1220, 1.docx
 
ENG 3107 Writing for the Professions—Business & Social Scienc.docx
ENG 3107 Writing for the Professions—Business & Social Scienc.docxENG 3107 Writing for the Professions—Business & Social Scienc.docx
ENG 3107 Writing for the Professions—Business & Social Scienc.docx
 
ENG 271Plato and Aristotlea Classical Greek philosophe.docx
ENG 271Plato and Aristotlea Classical Greek philosophe.docxENG 271Plato and Aristotlea Classical Greek philosophe.docx
ENG 271Plato and Aristotlea Classical Greek philosophe.docx
 
ENG 315 Professional Communication Week 4 Discussion Deliver.docx
ENG 315 Professional Communication Week 4 Discussion Deliver.docxENG 315 Professional Communication Week 4 Discussion Deliver.docx
ENG 315 Professional Communication Week 4 Discussion Deliver.docx
 
ENG 315 Professional Communication Week 9Professional Exp.docx
ENG 315 Professional Communication Week 9Professional Exp.docxENG 315 Professional Communication Week 9Professional Exp.docx
ENG 315 Professional Communication Week 9Professional Exp.docx
 
ENG 202 Questions about Point of View in Ursula K. Le Guin’s .docx
ENG 202 Questions about Point of View in Ursula K. Le Guin’s .docxENG 202 Questions about Point of View in Ursula K. Le Guin’s .docx
ENG 202 Questions about Point of View in Ursula K. Le Guin’s .docx
 
ENG 220250 Lab Report Requirements Version 0.8 -- 0813201.docx
ENG 220250 Lab Report Requirements Version 0.8 -- 0813201.docxENG 220250 Lab Report Requirements Version 0.8 -- 0813201.docx
ENG 220250 Lab Report Requirements Version 0.8 -- 0813201.docx
 
ENG 203 Short Article Response 2 Sample Answer (Worth 13 mark.docx
ENG 203 Short Article Response 2 Sample Answer (Worth 13 mark.docxENG 203 Short Article Response 2 Sample Answer (Worth 13 mark.docx
ENG 203 Short Article Response 2 Sample Answer (Worth 13 mark.docx
 
ENG 130 Literature and Comp ENG 130 Argumentative Resear.docx
ENG 130 Literature and Comp ENG 130 Argumentative Resear.docxENG 130 Literature and Comp ENG 130 Argumentative Resear.docx
ENG 130 Literature and Comp ENG 130 Argumentative Resear.docx
 
ENG 132What’s Wrong With HoldenHere’s What You Should Do, .docx
ENG 132What’s Wrong With HoldenHere’s What You Should Do, .docxENG 132What’s Wrong With HoldenHere’s What You Should Do, .docx
ENG 132What’s Wrong With HoldenHere’s What You Should Do, .docx
 
ENG 130- Literature and Comp Literary Response for Setting.docx
ENG 130- Literature and Comp Literary Response for Setting.docxENG 130- Literature and Comp Literary Response for Setting.docx
ENG 130- Literature and Comp Literary Response for Setting.docx
 
ENG 130 Literature and Comp Literary Response for Point o.docx
ENG 130 Literature and Comp Literary Response for Point o.docxENG 130 Literature and Comp Literary Response for Point o.docx
ENG 130 Literature and Comp Literary Response for Point o.docx
 

Recently uploaded

South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
paigestewart1632
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 

Recently uploaded (20)

South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 

Chapter 51. Your company needs a small front-end loader .docx

  • 1. Chapter 5 1. Your company needs a small front-end loader for handling bulk materials at the Wide place plant. It can be leased from the dealer for three years for $4050 per year including all maintenance. It can also be purchased for $14,000. You expect the loader to last for six years and to have a salvage value of $3000. You predict that maintenance will cost $400 the first year and increase by $200 per year in each year after the first. Your MARR is 15% per year. (a) Use AW analysis to determine whether to lease or buy the loader. 2. You have identified three alternatives for a small project at your plant. Any of the alternatives would save about $30,000 per year in operating costs. (a) Use AW analysis and an MARR of 15% per year to determine which alternative to select. Alternative F G H Initial Cost, $ 40,000 50,000 60,000 Salvage Value, $ 4000 6000 9000 Annual Cost, $/year 8000
  • 2. 6000 4000 Life, years 3 4 5 3. ABC Drinks purchases its 355ml cans in large bulk from China. The finish on the anodized aluminum surface is produced by mechanical finishing technology called Brushing and Bread Blasting Use MARR = 8% Alternatives Brush: P=-$400,000, n = 10years, S = $50,000, AOC =-$50,000 in year 1 decreasing by $5000 annually starting in year 2 Bread Blasting: P=-$300,000, n = Permanent, S =0, AOC =- $50,000 Select between the two alternatives 4. A contractor has been awarded the contract to construct a six miles long tunnel in the mountain of western Wisconsin. During the five year period, the contractor will need water from the nearby stream. He will construct a pipe line to convey the water to the main construction yard. An analysis of the various pipe sizes is as follows: Pipe sizes 2” 3” 4” 6” Installed cost of pipeline and pump $22000
  • 3. $23000 $25000 $30000 Cost per hour for pumping $1.20 $0.65 $0.50 $0.40 The pipe and the pump will have a salvage value at the end of five years equal to the cost to remove them. The pump will operate 2000 hours per year. The lowest rate at which the contractor is willing to invest is 7%. Select the best Pipe using Annual Worth 5. The expansion of the Wide-place Mall is delayed over the issue of parking. There is not enough now to support the new facility and more must be added. Let's suppose that there are 3 options: buying more land, filling wetlands at the rear of the site, or building a multilevel garage on the present lot. Assume a forty-year planning horizon and an interest rate of 9% per year. Use Annual worth analysis and the data below to determine which option should be selected. Purchase Land Fill Wetlands Garage Initial Cost, $ $12,000,000 $19,000,000 $44,000,000 Annual Benefit, $ per year 0 0 4,000,000 (parking fees) Annual Cost,
  • 4. $ per year 200,000 160,000 2,900,000 Text Book Problems 6. Problem 5.24 7. Problem 5.28 (Use Spread sheet only to solve this) Chapter 6 8. Two years ago, you bought 100 shares of XYZ stock at $60 per share. The stock paid a dividend of $6 per share per quarter. If you sell the shares now for $98 per share, what is your annual ROR on this investment? 9. At the end of 1987, you bought a piece of land for $35,000. In addition to the $35,000, you paid $1,700 in closing costs (costs associated with the purchase and title registration). For the years 1988 through 2002, you paid, on average, $950 in property taxes at the end of each year. At the end of 2003, you sold the land for $120,000. A sale time, you paid a 6% commission to the realtor and $1,600 was your share of the closing costs. What was the ROR on this investment? 10. UW-Stout is considering which of two devices to install to reduce costs in a particular situation. Both devices (A and B) cost $1000, have useful lives of 10years and no salvage value. Device A: Annual savings of $300 Device B: Annual savings of $400 in the first year but will decline $50 annually.
  • 5. If MARR is 7%, which device should Stout purchase? Use (A-B) 11. In your uncle’s will, you are to choose of the following two alternatives Alternative 1: $2000 cash Alternative 2: $150 now plus $100 per month for twenty months a. At what rate of return are the two alternatives equivalent? b. If you think the rate of return in (a) is too low, which alternative will you select? Chapter 7 12. Two machines are considered for purchase. Assume 10% interest, Use Benefit Cost analysis. Machine X Machine Y Initial Cost $200 $700 Uniform annual benefit $95 $120 Salvage value $50 $150 Useful life in years 6 12 a. Which machine should be bought? b. List the decision guideline for single project c. List the selection rule for incremental analysis
  • 6. 13. Which of the following alternatives will you select using benefit to cost ratio? A: First cost = $560, Annual benefit = $140, Salvage value = $40 B: First cost = $340, Annual benefit = $100, Salvage value = $0 C: First cost = $120, Annual benefit = $40, Salvage value = $40 Each alternative has 6 years useful live. Assume MARR = 10% The Catch data warehouse: support for community health care decision-making Donald J. Berndt a , Alan R. Hevner a,*, James Studnicki b a Information Systems and Decision Sciences Department, College of Business Administration, 4202 Fowler Ave., CIS1040,
  • 7. University of South Florida, Tampa, FL 33620, USA b College of Public Health, University of South Florida, Tampa, FL 33620, USA Accepted 1 April 2002 Abstract The measurement and assessment of health status in communities throughout the world is a massive information technology challenge. Comprehensive Assessment for Tracking Community Health (CATCH) provides systematic methods for community- level assessment that is invaluable for resource allocation and health care policy formulation. CATCH is based on health status indicators from multiple data sources, using an innovative comparative framework and weighted evaluation process to produce a rank-ordered list of critical community health care challenges. The community-level focus is intended to empower local decision- makers by providing a clear methodology for organizing and interpreting relevant health care data. Extensive field experience with the CATCH methods, in combination with expertise in data warehousing technology, has led to an innovative application of information technology in the health care arena. The data
  • 8. warehouse allows a core set of reports to be produced at a reasonable cost for community use. In addition, online analytic processing (OLAP) functionality can be used to gain a deeper understanding of specific health care issues. The data warehouse in conjunction with Web-enabled dissemination methods allows the infor- mation to be presented in a variety of formats and to be distributed more widely in the decision-making community. In this paper, we focus on the technical challenges of designing and implementing an effective data warehouse for health care information. Illustrations of actual data designs and reporting formats from the CATCH data warehouse are used throughout the discussion. Ongoing research directions in health care data warehousing and community health care decision-making conclude the paper. D 2002 Elsevier Science B.V. All rights reserved. Keywords: Health care information systems; Data warehousing; Data staging; Online analytic processing (OLAP); Decision support systems; Community decision-making; Data quality 1. Introduction The United States spends over a trillion dollars
  • 9. annually on health expenditures. Both as a percentage of national productivity and per capita, health care spending by the United States exceeds that of any other nation in the world. However, this tremendous expen- diture has not secured the U.S. a rank among the ‘healthiest’ nations. In fact, for many health indicators, such as infant mortality and measles immunizations, the U.S. ranks below some countries characterized as underdeveloped [23,29]. Prolonged public debates on health care policy in the United States have focused on 0167-9236/02/$ - see front matter D 2002 Elsevier Science B.V. All rights reserved. doi:10.1016/S0167-9236(02)00114-8 * Corresponding author. E-mail addresses: [email protected] (D.J. Berndt), [email protected] (A.R. Hevner), [email protected] (J. Studnicki). www.elsevier.com/locate/dsw
  • 10. Decision Support Systems 35 (2003) 367–384 insurance coverage and medical care financing pro- grams without any serious examination of the true health status of the nation. The need to assess the health status of U.S. communities in a comprehensive and systematic manner has been widely recognized within the health professions. The Institute of Medicine (IOM) of the National Academy of Sciences has acknowledged the importance of a population-based perspective in two influential reports, emphasizing the need for a regular and systematic collection, assemblage, and analysis of the health status of our nation’s communities [16,18]. A community health profile is comprised of socio-demographic characteristics, health status and quality of life indicators, health risk factors, and health resource measures. The intent of such a
  • 11. comprehensive health profile is to assist a community in developing, refining, and monitoring a long-term strategic view of its overall health status. Although there are many sources of health data, there are no standard data definitions, formats, or reports across the health care industry. Thus, health care data are widely used (and misused) in an ad-hoc manner to justify managerial objectives of health institutions and agencies, a maze of mandated categorical fund- ing, and a variety of political agendas. Sound infor- mation and accepted analytic techniques are even more important as funding is consolidated in block grants and local community decision-making is emphasized. As part of the ongoing clarification of the public health role at the community level and the transition from a disease to a health focus and from a treatment to a prevention strategy, there has been recognition
  • 12. that partnerships and collaboration are necessary to support effective action [17,21]. Health organizations, public sector agencies, medical care providers, busi- nesses, the religious community, educational institu- tions, and other community organizations are interdependent components of a multi-sectoral com- munity health environment. The overall community must be empowered to make the necessary, and sometimes difficult, resource allocation choices to improve health through information, education, behavior change, and social support [7]. Such collab- orative action at the community level must be informed by unbiased data describing the communi- ty’s health status, needs, and resources. The ability is also needed to track progress over time to meet the community’s health care goals [24]. The gap between current practice in community health care spending and the above goals of collabo-
  • 13. rative community health care decision-making is vast. The availability and quality of health indicators are problematic. There is little empirical evidence on the use, sharing, or integration of health data into deci- sion-making to provide guidance to community health organizations. While most of the literature on collab- orative leadership and community engagement emphasizes the process [4,5], little attention has been focused on the effect of the availability of a common set of data, such as the community health profile, on the quality and inclusiveness of decision-making. There is also scant information about the use of data and information technology to support and monitor the process. The purpose of this paper is to present an overview of the Comprehensive Assessment for Tracking Com- munity Health (CATCH) methods [25] and then focus on the construction of a comprehensive health care
  • 14. data warehouse that provides automated support for CATCH. The combination of extensive field experi- ence with CATCH and the application of current data warehousing technology make this an innovative interdisciplinary research effort. Section 2 briefly presents the CATCH methods and our motivation for building a data warehouse. In Section 3, we present a detailed discussion of the technical chal- lenges in designing and implementing the data ware- house. Twin star data staging, an effective approach for ensuring quality as data are entered into the warehouse, is highlighted in Section 4. Section 5 discusses the use of the data warehouse for advanced health care applications. The paper concludes with future research directions on data warehousing tech- nical challenges and the use of health profiles to support improved community health care decision- making.
  • 15. 2. The CATCH methods of community assessment The University of South Florida’s Center for Health Outcomes Research (CHOR) developed CATCH to provide comprehensive and objective health status data for community health planning D.J. Berndt et al. / Decision Support Systems 35 (2003) 367– 384368 purposes. CATCH collects, organizes, analyzes, pri- oritizes, and reports data on over 250 health and social indicators on a local community basis. The CATCH methods have been tested, refined, and validated in the field over the past 10 years. Reports have been prepared for more than 20 U.S. counties both within and outside of Florida. The CATCH process can be briefly described as shown in Fig. 1. Community health indicator data are gathered from a variety of sources. Secondary data sources include health care data reported by hospitals,
  • 16. local, state, and federal health agencies, and national health care groups. Primary data sources would involve data gathered from door-to-door or mail-in surveys. All health care data are translated into common formats and integrated with other data warehouse components to support the production of health care report cards. Over 250 indicators are used within CATCH and are organized into 10 indicator categories. These indicators and categories represent a wide spectrum of health care issues and have evolved through both research and field practice. Table 1 lists the 10 indicator categories and presents a few representative indicators to lend a sense of perspective to the level of detail provided in CATCH reports. These indicators are collected from a variety of sources. Each indicator value is compared against the state average, an average from a peer group of counties, and other interesting values (e.g., a national goal for
  • 17. that indicator) [26]. The results of these comparisons are organized into a multi-dimensional matrix based on favorable or unfavorable comparisons against each comparison dimension. Fig. 1 shows a 2-by-2 com- parison matrix based on state averages and peer Fig. 1. The CATCH process. Table 1 Ten indicator groups with representative indicators Demographic Characteristics Health Status: Morbidity and Mortality Total Population Breast Cancer Racial Composition Cardiovascular Disease Net Migration Stroke Socioeconomic Characteristics Sentinel Events
  • 18. Rubella Employment Measles High School Dropouts Late Stage Cancer Per Capita Income Avoidable Hospitalizations Maternal and Child Health Health Resource Availability Infant Mortality Licensed Hospital Beds Low Birthweight Licensed Medical Doctors Birth Defects Mortality Licensed Registered Nurses Social and Mental Health Infectious Disease Domestic Violence Syphilis Homicide Rate AIDS Psychiatric Admissions Hepatitis Physical Environmental Health Behavioral Risk Factors Foodborne Outbreaks Smoking Contaminated Wells Obesity Lead Poisoning Mammograms D.J. Berndt et al. / Decision Support Systems 35 (2003) 367– 384 369
  • 19. averages. Community indicators that demonstrate unfavorable comparisons on all dimensions are high- lighted as community health challenges. After this simple comparison, the health care challenges are prioritized using a set of five filters. Number Affected—number of persons in the community affected by the indicator. Economic Impact—an estimate of the direct cost per case for individuals affected by the indicator. Availability of Efficacious Intervention—an esti- mate of the relative degree to which treatment or prevention is likely to be effective. Magnitude of Difference—the degree to which the community indicator is worse than the dimensional comparisons. Trend Analysis—for a 5-year period is the trend favorable or unfavorable and what is the magnitude
  • 20. of change in the trend direction? The community stakeholders are given an oppor- tunity to weight the importance of each of the above factors. The final product of the CATCH methodology is a comprehensive, prioritized listing of community health care challenges. A more detailed description of the CATCH methodology with a complete listing of health care indicators can be found in Ref. [25]. 2.1. Limitations While the value of CATCH is incontrovertible, the ultimate deployment of CATCH throughout Florida and the nation has been constrained by several serious limitations: . The handcrafted process is labor-intensive and slow. Hundreds of individual sources of data must be identified and contacted. Data are often provided in hard copy formats and must be manually checked, validated, and entered into spreadsheets. With manual methods, it takes 3–4 months to complete a CATCH
  • 21. report for a single county. . Longitudinal trend analyses over many years are cost prohibitive for most communities. Since each application is expensive and time-consuming, the capability to fund and produce annual assessments in a single community is limited. . Most public health funding comes from state and federal governments. A statewide CATCH assessment would help to prioritize funding and serve to enable effective program evaluation based on quantifiable outcome assessment. Since nearly all data indicators available in Florida are available in most other states, there is reason to be confident that CATCH will be expanded nationally and even internationally. . With the massive amount of health care data involved, many interesting relationships and correla- tions between health status indicators can be found and investigated. In the manual system, such discov- ery was not feasible. A comprehensive and integrated
  • 22. data warehouse provides the infrastructure for such data mining efforts. 2.2. CATCH data warehouse challenges The application of data warehousing technologies for the automated support of CATCH holds tremen- dous promise. The remainder of this paper describes our work to construct an effective and efficient data warehouse solution, enabling both cost-effective report generation and ad-hoc analyses of critical health care issues. The construction of a data ware- house for public health care data poses major chal- lenges beyond those required for the construction of a commercial data warehouse (e.g., retail sales). Such challenges include the following. . Data come from a very diverse set of sources. Health care data are published in a wide variety of formats with differing semantics. There are currently few standards in the health care field for such data. The data integration task to build the data warehouse
  • 23. requires significant effort. . CATCH reports are disseminated to a diverse and geographically distributed set of stakeholders. . The data warehouse is required to support the activities of public policy formulation. The socio- political issues of health care planning impact design features such as security, availability, data quality, and performance. 3. The CATCH data warehouse The goals of the CATCH data warehouse include the support and enhancement of the CATCH methods, the provision of cost-effective and thorough reports to communities, and the creation of a rich environment D.J. Berndt et al. / Decision Support Systems 35 (2003) 367– 384370 for more detailed research into critical health care issues. In addition, a focus on data quality makes the data warehouse an especially valuable asset over time
  • 24. as a rich and trustworthy historical repository is built. Lastly, the data warehouse lends itself to a variety of dissemination strategies based on hardcopy reports, interactive access, and Web-enabled information delivery. The different access technologies allow a diverse group of community planners and stakehold- ers to investigate important health care issues using comparable data. All of these characteristics make the CATCH data warehouse a unique application of technology in the field of public health. In fact, the implementation of this type of data warehouse and its use in monitoring, as well as improving health status, will become a primary role of public health agencies in the future. The CATCH data warehouse includes a variety of components arranged in three broad categories: reporting tables for direct support of the CATCH methods, aggregated dimensional structures, and
  • 25. fine-grained or transaction-oriented dimensional struc- tures. In the sections that follow, examples of these data warehouse components are presented. All of the components draw on the dimensional model or star schema, some components with more than a dozen dimensions and some with a few simple dimensions. 3.1. The dimensional model Important missions of a data warehouse include the support of decision-making activities and the creation of an infrastructure for ad-hoc exploration of very large collections of data. Decision-makers should be able to pursue many of their investigations using browsing tools, without relying on database program- mers to construct queries. The emphasis on end-user data access places a premium on an understandable database design that provides an intuitive basis for navigating through the data. The star schema or dimensional model has been recognized as an effec-
  • 26. tive structure for organizing many data warehouse components [12,15,19]. The star schema is character- ized by a center fact table, which usually contains numeric information that can be used in summary reports. Radiating from the fact table are dimension tables that provide a rich query environment. This structure provides a logical data cube, with dimen- sions such as time and location identifying a set of numeric measurements within the cube. Fig. 2 con- tains a fragment from the hospital discharge trans- action-oriented star schema discussed in this paper. 3.1.1. Fact tables The most appropriate facts are additive numeric data items that can be summed, averaged, or combined in other ways across the dimensions to form summary statistics. The only way to compress the millions of data points and produce a reasonably sized answer set is to present some mathematical summarization. No human
  • 27. will want thousands, let alone millions, of items in answer to their queries. As Kimball [19] pointsout, ‘‘the best and most useful facts are numeric, continuously valued, and additive.’’ The CATCH data warehouse includes facts such as counts of hundreds of different health events, population-based rates, age-adjusted rates, and even fine-grained financial data in the case of the hospital discharge data depicted in Fig. 2. For example, using the hospital discharge star it is possible to focus on a single hospital (using the hospital dimen- sion), select a single disease (using the ICD DIAGNOSIS dimension), and investigate how the length of stay has varied over a specified time period. Using the hierarch- ical nature of the dimensions, it is also possible to ‘roll- up’ to compare types of hospitals, disease categories, or even patient age bands. While the dimensional structure is simple and readily understandable, it sup- ports a large and very useful universe of queries.
  • 28. 3.1.2. Dimension tables The dimensions define the query environment, the richer the set of dimensions the more ways the data can be accessed via queries. Two of the important characteristics of dimensions are the richness of the attributes that describe the dimension and the hier- archical nature of the dimension. For example, the COUNTY dimension in the CATCH data warehouse includes attributes that describe whether a county is coastal, wealthy, urban, dense, large in area, or includes a military base. Therefore, the counties can be organized by any value in this attribute set. Some of the attributes lend themselves to hierarchical organization. In the case of COUNTY, there is natural geographic hierarchy that includes groups of counties that form regions within the state and the state itself. The county is also composed of finer geographic units D.J. Berndt et al. / Decision Support Systems 35 (2003) 367– 384 371
  • 29. such as communities, ZIP codes, and census tracts. The dimension hierarchies enable roll-up and drill-down operations that control the level of detail in queries. These formally defined hierarchies also provide the framework for navigation or data browsing. In order to describe the dimension hierarchies suc- cinctly to both end-users and developers, dimension hierarchy diagrams have been utilized in the CATCH data warehouse design process. These diagrams show the hierarchical nature so that end-users have an uncluttered view of how they can navigate and design- ers can easily understand the dimensional structures. Fig. 3 illustrates an important health care dimension based on the International Classification of Disease (ICD) codes. Currently, we are using versions 9 and 10 of the ICD codes. These codes are divided into chapters and sections, which provides a natural hierarchy for the
  • 30. codes. Fig. 3 shows the hierarchical structure using separate tables, but these tables can be easily denor- malized to enhance query performance. In addition, there are several other tables that provide alternative hierarchies for this important dimension. This ICD PROCEDURE dimension is combined with many other dimensions such as patient age, gender, mortality risk, and severity of illness to form star schemas (see Fig. 2) with rich query environments. 3.2. Data warehouse design: the data access pyramid The mission of the CATCH data warehouse is to support the automated and cost-effective application of CATCH, as well as to enable more detailed Fig. 2. Hospital discharge star schema (not all dimensions are shown). D.J. Berndt et al. / Decision Support Systems 35 (2003) 367– 384372 analyses that were not possible using the coarse-
  • 31. grained data that typified past CATCH reports. In order to meet these goals, the data warehouse design includes several levels of data granularity, from the coarse-grained data used in generic report production to actual event-level data, such as hospital discharges. The data warehouse design includes major compo- nents at all three levels of granularity as illustrated in the data access pyramid found in Fig. 4. Report indicators—Reporting tables with derived or highly aggregated data are used to support the core CATCH reports, including comparisons between a target county and peer counties. These tables also provide fast response for interactive access via data browsing tools and can provide the foundation for simple community-wide Internet access. In addition, the metadata play an important role at the reporting level, providing indicator definitions, state or federal goals, and expert domain knowledge for priority
  • 32. filters (e.g., economic impact and treatment availabil- ity). This report level of the data warehouse may not be needed in all data warehouse applications but provides important support for rapid generation of community CATCH reports. Aggregate data—There are families of star sche- mas that provide true dimensional data warehouse capabilities, such as interactive roll-up and drill-down operations. These components have carefully designed dimensions that can be utilized by more sophisticated data browsing tools. The star schemas are populated using thorough data staging and quality procedures that usually involve processing detailed data sets extracted by various health care agencies and organizations. Typically, the data are aggregated and transformed for loading into a family of related star schemas—a con- stellation—that share important dimensions and sup- port interactive online analytic processing (OLAP)
  • 33. techniques. Transaction data—For certain types of informa- tion, the design calls for retaining very fine-grained or even event level data. An example is the hospital discharge data that includes each hospital discharge event for the more than 200 hospitals that are man- dated to report such information in Florida. These data are retained at the transaction level because of the rich set of facts and dimensions available for analysis and the density of potential aggregations that result in negligible space savings. These three levels of aggregation within the data warehouse combine to meet a wide range of reporting requirements and performance goals, thus providing a flexible basis for disseminating health care informa- tion to community decision-makers. The following two sections (Sections 3.3 and 3.4) provide some examples of the major data warehouse components.
  • 34. At the aggregate data level, a coarse-grained compo- nent based on the Public Health Information Data System (PHIDS) is used to support CATCH report production and high-level browsing. A second exam- Fig. 3. ICD PROCEDURE dimension hierarchy. Fig. 4. Data access pyramid. D.J. Berndt et al. / Decision Support Systems 35 (2003) 367– 384 373 ple aggregate is procedure volume information formed from the underlying hospital discharge data. The original hospital discharge data provide an exam- ple of transaction-oriented data that supports detailed analyses, along with other data such as vital statistics (e.g., births and deaths) and specific disease registries. 3.3. Aggregated Florida Department of Health Data An example of a highly aggregated data warehouse component is the Public Health Information Data
  • 35. System (PHIDS) star schema. The Florida Department of Health collects, analyzes, and reports a large number of public health indicators. These items have always provided critical assessment measures within CATCH. The importance of the PHIDS indicators made them obvious candidates for inclusion in the data warehouse and a natural resource for automation of the traditional CATCH report. The PHIDS indicators are clearly not the fine- grained data that support a detailed OLAP environ- ment. The data are highly aggregated and provided annually at the county level. Therefore, the data set is suitable for generating the traditional CATCH report, but unsuitable for more specific analyses. Essentially, the construction of the data warehouse has been a search for both fine and coarse data that can provide synergies through integration. The simple star schema used to implement the PHIDS-based data warehouse
  • 36. component has only the year reported and the county as explicit dimensions. Currently, many of the PHIDS indicators are maintained using spreadsheets at the Florida Department of Health. For use in the data warehouse, the data are first extracted from the spreadsheets, reformatted using custom staging pro- grams, and then loaded via a bulk loader utility. The twin star staging process, as described in Section 4, is used to ensure data quality. Data correctness is veri- fied by sampling the data and comparing the data warehouse values with published PHIDS reports. 3.4. Transaction-oriented hospital discharge data Florida hospital discharge transactions are col- lected by the Agency for Health Care Administration (AHCA) from the more than 200 short-term acute care hospitals in the state. These hospitals report every discharge transaction, regardless of payer, throughout the state. Hospital discharge data are used to derive
  • 37. several CATCH indicators such as avoidable hospital- izations due to diabetes and other chronic diseases. Typically, the large volume of hospital discharge transactions is scanned to form derived or aggregated data for CATCH indicators. However, the broader mission of the CATCH data warehouse is both to support the CATCH methods and enable more detailed investigations of critical local health care issues. It is the ability to fully explore issues at appropriate levels of detail that make the fine-grained components so important. While first staging and preprocessing the hospital discharge data for use in forming CATCH indicators, the value of the discharge transactions themselves became very apparent. The hospital discharge transactions provide an interesting set of numeric data items, such as length of stay and a breakdown of revenues, which are very well suited for a data warehousing approach. In addition, the trans-
  • 38. actions include a rich set of attributes that provide many natural dimensions for use in formulating queries. Transaction-based star schemas can provide very useful functionality within a data warehouse frame- work, making the hospital discharge star an important component of the CATCH data warehouse. The hos- pital discharge data includes over 20 interesting dimensions such as the discharging hospital character- istics, admission criteria, diagnostic codes, procedure codes, reimbursement categories, time, geographic location, and many others. Furthermore, many of these dimensions are hierarchical in nature, easily supporting important roll-up/drill-down operations. Fig. 2 is a partial representation of the discharge star schema. The discharge star is equally rich in additive numeric facts. For instance, length of patient stay is a particularly important measurement for analysis.
  • 39. There is also a measurement indicating elapsed days until the medical procedure. Finally, there is a total revenue item that provides important cost information. In fact, there is also a large text field with embedded revenue items that provides a breakdown of the various costs from room charges to laboratory fees. Procedures to parse this text field have been devel- oped as part of the data staging activities and are used to extract revenue items, providing nearly 30 interest- ing numeric facts for each transaction. It is not uncommon to have useful information buried in text D.J. Berndt et al. / Decision Support Systems 35 (2003) 367– 384374 fields that must be preprocessed using data staging tools or customized procedures. This can be a chal- lenging task since the source database has no under- standing of the structure embedded in such text fields and therefore, simple query access is impossible. In
  • 40. this case, the rich set of facts and highly dimensional structure of the hospital discharge data make it a powerful warehouse component for detailed investi- gations and customized analyses. The hospital discharge star has repeating groups for diagnoses (ICD DX 1–10) and procedures (ICD PROCEDURE 1–10). This design mirrors the underlying data and simplifies the data staging process for the millions of discharge records used in the project. An alternative design without repeating groups might simplify some queries, but this fine-grained data is at the bottom of the data access pyramid and is typically aggregated for most query processing. The original positional representation also conveys infor- mation relevant to health care coding practitioners and is used in several ancillary algorithms. For many purposes, the primary diagnosis or procedure is used in calculating higher-level health care indicators, so
  • 41. this structure is maintained in the transaction-oriented data [28]. It is sometimes preferable to store the actual trans- actions rather than lightly aggregated data that has been derived from the underlying transactions. Kim- ball [19] uses the term sparsity failure to describe the size explosion that can occur when creating aggregate data from a sparsely populated fact table. Detailed fact data such as hospital discharge transactions will probably not have all combinations of the dimensions present in the actual data. In other words, not all diseases occur in all hospitals during a particular year and therefore the effect with regard to size is not multiplicative. If we consider only the cardinality of the actual dimensions then the possible combinations of dimension key values is very large for the hospital discharge data. For example, consider the following four dimensions with approximate cardinalities, hos-
  • 42. pitals (250), ICD codes (15,000), severity ratings (5), and payers (10). This could result in 187.5 million dimension key combinations. Further, we can define density as the actual number of records (roughly 2 million/year for discharges) divided by the potential combinations of dimension keys, yielding a density of 2/187.5 or roughly 1.07%. This remarkably low density makes intuitive sense since the very fine ICD distinctions lead to sparse usage. Imagine that we decide to construct an aggregate table by creating 150 disease categories that summarize the 15,000 ICD codes, reducing the dimension size by a factor of 100. In this case, all 150 categories may appear for each hospital (a reasonable assumption) giving a density of 100% and roughly 1.9 million rows. This rather insignificant space savings comes at the expense of losing the richness of the original ICD codes and the flexibility of having individual cost data for each
  • 43. transaction. Therefore, in the CATCH data warehouse and many other applications, transaction-oriented components make good sense. In fact, to really under- stand the implications for tasks such as data ware- house capacity planning it is often necessary to sample the data to discover the actual distribution of dimension values. The design challenge is to carefully consider the number of fine-grained items that are summarized to form the aggregate data and look for a factor of 10 or more as a reasonable compression ratio [19]. 3.5. Performance issues The large volumes of data contained in the CATCH data warehouse coupled with demanding queries can conspire to produce some truly awful performance. As in any database project, good design is the most effective tool for enhancing performance. The CATCH data warehouse design continues to evolve
  • 44. in response to new challenges. In addition to design changes, three other techniques offer avenues for improving performance: aggregate tables, star schema indexing strategies, and physical table partitions. 3.5.1. Aggregates Many data warehouse designers identify aggre- gates as one of the most effective strategies for improving performance. Kimball [19] notes that ‘‘aggregates can have a very significant effect on performance, in some cases speeding queries by a factor of 100 or even 1000.’’ If the aggregate data are useful, having the data physically ready and waiting will certainly improve query speeds. In addition, if sparsity failure is avoided, then the amount of data required may also be substantially reduced. That is, benefits from both reduced space and previously D.J. Berndt et al. / Decision Support Systems 35 (2003) 367– 384 375
  • 45. handled computations can accrue through the use of aggregates. In addition, many data warehousing nav- igation tools are aggregate-aware, making the aggre- gate structures transparent to the end user. However, there are a potentially large number of aggregates that are possible given a rich set of dimensions. The choice of which aggregate tables to build is based on the type of queries being executed and will naturally change over time [14]. Aggregates play an important role in the CATCH data warehouse. Some data are extracted and loaded in aggregate form, such as the PHIDS indicators discussed above, and other aggregates are derived from more detailed data warehouse components. For instance, vital statistics such as death and birth certif- icates are used to derive a collection of aggregated mortality and birth-related indicators. There are two
  • 46. somewhat different purposes for aggregates. Highly aggregated data are used to directly support traditional CATCH report production, while lightly aggregated data are used to improve query performance. The continual re-evaluation of aggregates is an important task in data warehouse administration. 3.5.2. Indexing Many database management systems intended for data warehousing support bitmap index structures. Bitmap indexes are especially suited to low cardin- ality dimensions such as admission quarter, day of the week, gender, and others. These indexes are space efficient and speed the star queries that characterize access to fine-grained structures such as the hospital discharge data. Another technique is to cache the smaller dimension tables in memory for improved query performance. All of these techniques have been employed and performance tuning continues to be an
  • 47. ongoing activity as the user community grows and explores new uses for the data warehouse. 3.5.3. Partitioning The third important performance tuning technique is the use of physical table partitioning [6]. The use of table partitions is important both for query perform- ance, as well as data warehouse management. Since the data are loaded or staged at different times, these activities can be isolated through partitioning. This also allows preprocessing and data quality procedures to be run on separate partitions. In addition, parti- tioned indexes can also be used. One of the most important benefits of partitioned tables is the oppor- tunity for the optimizer to exclude large portions of the data when queries include restrictions on parti- tioning attributes. An excellent example of partitioned tables in the CATCH data warehouse is the hospital discharge data. In recent years, there have been
  • 48. roughly 2 million discharge transactions/year. The goal is to keep at least 10 years of discharge data or 20 million transactions available for analysis, but often only a few years are necessary for any given query, thereby creating an ideal parameter for parti- tioning. The hospital discharge data is partitioned by year, with roughly 1.5–2 million rows per partition. If a query specifies a single year or a small range of years, the optimizer can create an execution plan that only searches the required partitions, leaving the vast majority of data untouched. Since most of the detailed interactive analyses fit this mold, the performance tends to be quite good. However, the entire collection of data is still available for queries that cover a wide range of years, it just takes more time. 4. Data staging and quality assurance The extraction, transformation, and loading (ETL) functions in a data warehouse are considered the most
  • 49. time-consuming and expensive portion of the devel- opment lifecycle [22]. These processes are concerned with the extraction of data from legacy systems, transformation and preprocessing requirements to produce useful, integrated data, and the transportation of the data into the actual data warehouse structures. The CATCH data warehouse involves somewhat unusual challenges with regard to data staging activ- ities. The data are drawn from multiple organizations, which usually apply in-house transformations to data collected by yet another layer of organizations. For instance, the hospital discharge data are originally collected by hospitals and reported to the Florida Department of Health. These data are then integrated, preprocessed, and provided to other interested organ- izations, including the CATCH data warehouse proj- ect. In the case of demographic data, population levels are extracted from the Florida Governor’s Office and
  • 50. the Census Bureau. Overall, the data warehouse has continued to grow without the need for a data purging D.J. Berndt et al. / Decision Support Systems 35 (2003) 367– 384376 strategy. However, as the size continues to increase and finer geographic levels are used, a purging strat- egy will become necessary in the near term. The design is already multi-level, as described in Fig. 4, and it is the base of the data pyramid that accounts for most of the space. As space becomes an issue, the earlier years of fine-grained data will be purged and retained offline. These structures are maintained as physical partitions, so the purging operations can be conducted without disrupting data access and data can easily be re-introduced. Two innovative techniques, twin star data staging and data quality filters, have been developed to manage the ETL processing required in the CATCH
  • 51. data warehouse. 4.1. Twin star staging Fig. 5 outlines the twin star staging process and its three component stages. The approach is designed to utilize the power of commercial database systems, especially referential integrity constraints and excep- tion processing. The various stages use a combination of scripting languages, bulk-loading tools, and data- base procedures. 4.1.1. Stage 1 The process begins with file-based preprocessing and cleansing activities. These procedures can be written in any programming language, but AWK and Perl have been especially useful in the CATCH project with their built-in parsing and pattern match- ing capabilities. Data transformation, quality checks, and simple reports can all be performed on the initial data file. Even though many checks will be repeated
  • 52. throughout the data staging process, the presence of redundant checks is an asset with regard to data quality. Stage 1 of the twin star strategy involves using a bulk loader to move the data into a staging table within the database system. The staging table is designed for maximum flexibility in storing data, minimizing data type conflicts, and providing a work- bench for database-resident transformation proce- dures. Typically this includes additional attributes that are created as part of the preprocessing and cleansing tasks. Bulk loading utilities are used to quickly populate the staging table and capture prob- lematic data in a series of log files. Data type, unique- ness, and ‘‘not null’’ checks for critical staging table attributes can be used to control the thoroughness of this data staging step. With care, many simple data quality issues can be resolved at this early stage. 4.1.2. Stage 2
  • 53. The temporary star shares the critical data dimen- sions with the permanent star, and is essentially a ‘twin’ of the permanent star (though there may be different supporting dimensions for particular tasks). The fact table attributes and important dimensions should be exact duplicates so that any operations or referential integrity checks will be consistent between the stars. Stage 2 entails moving the data from the staging table to the temporary star. Attribute data types should be compatible and referential integrity constraints can be used to check for valid dimension keys. The referential integrity constraints are disabled and later re-enabled sequentially after the load to perform the checks in one sweep, thereby improving processing time. Most database systems provide a method of capturing invalid rows and it is important to make use of such capabilities during both the StageFig. 5. Twin star staging.
  • 54. D.J. Berndt et al. / Decision Support Systems 35 (2003) 367– 384 377 2 and 3 transfers. Since the temporary star is the functional equivalent of the permanent star, just much smaller, the interface and data browsing tools devel- oped for the actual data warehouse can be used to exercise the temporary star. Test reports, browsing by power users, and sanity checks based on comparisons with previously loaded data in the permanent star are all useful methods of ensuring high quality data in the temporary star. 4.1.3. Stage 3 The permanent star is the long-term storage area for the data warehouse. This star must be carefully indexed, distributed across storage devices to avoid I/ O bottlenecks, and possibly partitioned. As noted earlier, partitioned tables can provide performance improvements by distributing information across
  • 55. physical devices and by allowing the query optimizer to select only the relevant partitions. In addition, partitioned tables ease data warehouse management tasks through creation, loading, and archiving of independent partitions. The Stage 3 transfer from the temporary star to the permanent star should be fast and free of data type and referential integrity viola- tions. The simple transfer will allow large volumes to be processed within most load windows. Redundant referential integrity constraints can be used as a final check (again disabling and re-enabling for efficiency). The resulting exception tables should be empty, but any offending rows are a clear sign that somehow problems survived Stages 1 and 2. This provides a last opportunity to postpone releasing or publishing the data. 4.2. Data quality filters The data quality issues that surface while initially
  • 56. constructing a data warehouse are among the most challenging obstacles, contributing significantly to the time spent in data staging activities. As noted above, the ETL processes and quality assurance procedures can account for the majority of time and resource commitments in a data warehouse project. This has been the case in the CATCH data warehouse project, where there are a large number of data sources and many intermediate stages for errors to be introduced. In addition, the challenge of producing a truly inte- grated design requires translations to common defi- nitions and shared dimensions. Rather than any ‘‘magic bullet,’’ a long-term effort to develop a comprehensive set of preprocessing procedures will produce the best data quality. The procedures under development on the CATCH project include a meas- ure of redundancy to provide added insurance against quality problems surviving various phases of the ETL
  • 57. process. As more procedures have been added to the quality assurance arsenal, an interesting structure has emerged, mirroring the natural structure of the data warehouse, with procedures falling into the following categories of quality filters. . Fact filters are the quality procedures used to check the fine-grained data, such as hospital discharge transactions. For example, any discrepancies between itemized fees and total charges should be flagged. Quality procedures at this level compare attributes within a fact table row, or may compare between two rows, but the focus is on fine-grained data. . Aggregate filters include quality checks that become possible only when the focus is on summaries of fact-level answer sets. As we have seen, aggregates are important for boosting performance, but they also present data quality assurance opportunities. At this level, ‘roll-up’ operations over important dimensions allow aggregate averages, maximums, or other sum-
  • 58. maries to be compared. With regard to hospital dis- charge transactions, comparisons of average lengths of stay, maximum costs, or diagnostic volumes can all be usefully compared by hospital and by year. That is, large hospitals can be verified against each other and new data can be compared against previous years. Aggregate filters can be the basis for some very powerful data quality procedures, effectively using the capabilities existing in the data warehouse. . Dimension filters are the procedures used to investigate ‘dirty’ dimensions. For instance, many business-oriented data warehouses include a customer dimension that can be very large and may have severe data quality problems [19]. Duplicate customer entries, household matching, and data obsolescence issues are among the problems inherent in such dimensions [3,10]. In the CATCH data warehouse, dimensions that must be carefully monitored include
  • 59. hospitals, practitioners, and geographic entities such as counties and communities. Dimension filters can be used to monitor many problems with regard to chang- ing dimensions. D.J. Berndt et al. / Decision Support Systems 35 (2003) 367– 384378 These three broad categories of quality filters can be further refined based on the type of comparison being used. For instance, the intratuple filters involve comparisons between attributes within a single record. Of course, the record itself may be at an aggregate level and represent a summary of a fact-level answer set. For example, average pharmacy costs may have a fairly predictable relationship with total charges for a given disease. This type of comparison could be used as a quality check within a given aggregate hospital discharge record, an example of an intratuple aggre- gate filter.
  • 60. Comparisons across records, or intertuple filters, provide a rich set of quality assurance opportunities that examine relationships between fact table rows or aggregates of these rows. An example of this type of filter would be comparisons between disease volumes by year. Unlikely disease distributions, after account- ing for population growth, might indicate a data quality problem with new data. The distinction between intertuple and intratuple comparisons, com- bined with the major filter categories, leads to six interesting filter categories that seem to naturally describe the many types of quality procedures being built into our data warehouse. An additional quality assurance strategy involves comparing data warehouse aggregates with known summaries published by outside sources. This process can best be described as a quality benchmark, where externally derived data is used to check internal data
  • 61. warehouse procedures. This type of quality procedure usually includes permanent data quality tables popu- lated with externally produced data summaries based on published reports or spreadsheet calculations. For instance, state-level reports on the number of specific disease occurrences provide a benchmark for data warehouse aggregates based on the underlying hospi- tal data. Automated comparison procedures report only reasonably large departures based on user- defined thresholds. Quality benchmarks are particu- larly important as ongoing development activities yield both larger volumes of data, as well as new aggregation procedures. Before new versions of pro- cedures or interface tools are deployed, historical quality benchmarks can be used to evaluate their performance. Both quality benchmarks and filters are part of the substantial infrastructure necessary to meet data quality goals. These tools account for a
  • 62. significant portion of the CATCH data warehouse development effort [2]. 5. CATCH data warehouse applications The data warehouse is used to support a variety of activities, from automating the original CATCH reports to supporting current health care research initiatives. The CATCH methods provided a solid foundation for the initial implementation efforts, but as components have been added the synergies have opened new application opportunities. Clearly, the human–computer interface is of paramount impor- tance in the data warehouse environment and the primary determinant of success from the end-user perspective [1]. In order to support analysis and reporting tasks, the data warehouse must have high quality data and make that data accessible through effective interface technologies. The act of releasing data in a warehouse is in a very real sense the same as
  • 63. publishing that data in printed form—retractions in both media can be very painful. 5.1. Producing CATCH reports CATCH reports have been refined over the past decade in the field. The field expertise available in the interdisciplinary research team infused the require- ments process and provided a clearly identifiable goal as the first step in data warehouse construction. Hundreds of stored procedures, as well as the design itself, implement many aspects of this domain exper- tise. The stored procedures generate the health status indicators and move them upward in the data access pyramid for final report production. The reports allow quick and easy access to comprehensive summaries and more detailed collections of information from the data warehouse using standard report writing technol- ogies. This type of pre-defined and thorough reporting is critical for implementing a more automated CATCH
  • 64. report and will probably be the preferred format for many users. For example, the comparison between target counties and peer counties, as well as state averages, are fundamental components of the original CATCH reports and important tools for community health care planners. In addition, current and historical trend information is provided on fact sheets for each D.J. Berndt et al. / Decision Support Systems 35 (2003) 367– 384 379 health indicator. The final reports are really reference books (numbering over 300 pages) with several major parts, such as comparisons, fact sheets, and prioritized lists. New features that move beyond the original CATCH reports include components that enable user-defined communities to supplement the tradi- tional county-level perspective. Users can define
  • 65. smaller communities based on geographic or demo- graphic criteria, with community fact sheets providing an exploded view of selected health status indicators. While CATCH has traditionally focused on large hardcopy reports, the reports are now produced directly in Web-friendly formats for electronic distri- bution. The advantage of this approach is that a strong methodological structure can be retained as the reports are much more widely distributed. The interested reader can refer to the Center for Health Outcomes Research (CHOR) Web site for examples of current reports [chor.hsc.usf.edu]. In addition to static reports, the high-level compo- nents of the data warehouse can be accessed dynam- ically using data browsing tools. It is usually possible to constrain the navigation, while still providing enough freedom to explore many more perspectives than can be accommodated in a traditional report. Fig.
  • 66. 6 shows an online analytic processing (OLAP) tool being used to browse through trend information for specific indicators at the county level. Most of these tools can support both desktop and Web browser access, making this an important new avenue for data dissemination. 5.2. Investigating health care issues Data warehouse browsing tools provide star query- like access through a flexible menu-based interface, with pull-down menus representing important dimen- sions. These types of tools are easy to use and support some ad-hoc exploration, but are usually controlled Fig. 6. Browsing screen for community indicators. D.J. Berndt et al. / Decision Support Systems 35 (2003) 367– 384380 chor.hsc.usf.edu through an administrative layer that determines the data available to end-users. In developing a flexible
  • 67. interface, there is a tradeoff between the ability to express ad-hoc queries and the ease-of-use that results from pre-defined constructs implemented by data warehouse designers and administrators. Of course, SQL can provide an ad-hoc query facility, but requires some care in the data warehouse environment with very large tables and ill-formed queries conspiring to sharply degrade performance. In addition, use of SQL by casual users often produces incorrect queries resulting in erroneous results from the data ware- house. As noted above, OLAP tools can be used to empower standard report users and allow simple navigation through many more views than can be produced using traditional reporting tools, yet still curtail unwanted operations. A second, and in some ways more important role for the browsing tools is to provide a flexible interface for more customized analysis. Health care issues high-
  • 68. lighted by preliminary reports can be investigated more fully using the finer levels of detail maintained in the data warehouse. These tasks might entail query- ing the true dimensional star schemas that include age, gender, race, and other dimensions, or even the event- oriented data, such as hospital discharges. These data warehouse resources support much more detailed analyses, allowing the user to focus on issues such as differences in age or race with regard to specific health status indicators. Once decision-makers review the CATCH report, they may have community-specific issues that relate to the diverse population groupings that inevitably fall within somewhat arbitrary political boundaries. Dealing effectively with such important Fig. 7. Browsing screen for hospital disease indicators. D.J. Berndt et al. / Decision Support Systems 35 (2003) 367– 384 381 issues requires a more careful and focused analysis that
  • 69. is precluded at the higher levels of aggregation that make up the generic CATCH reports. A current research initiative involves the explora- tion of volume and cost-related issues in health out- comes. Data browsing tools are used for exploratory analysis. Used in this manner, OLAP tools provide a first step in the data mining process [13]. Fig. 7 illustrates a browsing screen in which detailed vol- ume, length of stay, and cost data are presented for specific hospitals, or groups of hospitals. In addition to tabular representations, these tools provide graphic capabilities that support simple data visualization. 5.3. Security issues Currently, dynamic access to the CATCH data warehouse is restricted to the development team and associated health care researchers. Obviously, some of the data may be sensitive in nature. Data security is an important issue in the health care environment and the
  • 70. CATCH data warehouse attempts to balance the information requirements for local health care plan- ning with critical security issues. It is important to note that no patient identifiers of any kind are incor- porated within the data warehouse. Security issues are mostly concerned with reporting rare events in geo- graphic areas that might allow a person to be identi- fied through other data sources. Detailed security policies provide guidance for the manipulation and reporting of health care data in the data warehouse. 6. Conclusions The CATCH data warehouse can have an important impact on our health status by making rigorous, quan- titative information available to health care decision makers in local, state, national, and international com- munities. In this paper, we have described some of the technical challenges faced in designing and implement- ing a data warehouse for health care information. We
  • 71. have presented innovative research contributions in the areas of data warehouse design, data staging for ETL processing, data quality assurance, and health care data warehouse applications. The CATCH data warehouse is now fully func- tional. For example, it has been recently used to produce a comprehensive CATCH report for Miami– Dade COUNTY, Florida’s largest county. As part of this report we were asked to provide more detailed assessments of the eight commission districts within the county. The flexibility of the data warehouse to provide customized reporting allowed us to provide these analyses rapidly and effectively. Because this report was the first to be fully automated, we verified the accuracy of the report with a complete hand check of every data table. The discrepancies between the automated data tables and the manually derived tables were minimal and easily reconciled.
  • 72. The CATCH data warehouse remains a work in progress. We are pursuing an active research agenda to enhance the technical data warehousing capabilities and community health care applications. We invite the reader to follow the progress of the data warehouse at our CHOR web site [chor.hsc.usf.edu]. In the next sections (Sections 6.1 and 6.2), we briefly review our current research directions. 6.1. Data warehouse research directions The CATCH data warehouse provides a rich research environment for focused investigation in the following areas. . Data Warehouse Design—The variety and vola- tility of health care data sources make the mainte- nance of the data warehouse design a true challenge. Changes to source data formats frequently require the updating of dimension table schemas. Often historical data cannot be placed into the new format without
  • 73. information loss. Finding solutions for maintaining historical accuracy while providing efficient use of all data in current applications is difficult. We are researching design techniques to minimize the impact of dimension table changes on the maintenance and operations of the data warehouse [2]. . Data Staging—As presented in Section 3, we have implemented an innovative twin star data staging procedure. Ongoing research will study the perform- ance of twin star data staging on various data loads. Enhancements to the procedure will be proposed and implemented. . Data Quality—Issues of data quality dominate our research agenda. The health care field places particular emphasis on data accuracy, timeliness, pri- vacy, and ease of use [27]. We are in close contact D.J. Berndt et al. / Decision Support Systems 35 (2003) 367– 384382 chor.hsc.usf.edu
  • 74. with the communities who receive the CATCH reports and have interviewed a number of users to elicit data quality requirements. This information will be used to drive our research to improve data quality in the CATCH data warehouse. . Data Dissemination—The technologies for dis- seminating CATCH reports to communities are rap- idly evolving. The requirements of the receiving communities and the capabilities of the data ware- house system will drive future research directions. . Data Mining—We are aggressively investigating several areas of knowledge discovery in the CATCH data warehouse [11]. We have a unique capability to perform detailed studies in such areas as physician and hospital volume, racial disparities in health care, and environmental impacts on community health status. 6.2. Community decision-making with CATCH data The CATCH data warehouse will result in wide- spread distribution of data previously unavailable to
  • 75. most communities, as well as online access for spe- cialized inquiry. Many issues arise as to how the communities will make the most effective use of the CATCH data for health care decision-making. This is an area with considerable research potential. There is a rich literature on the decision-making process both with and without information technol- ogy. The study of group decision support systems and environments has a strong tradition in the manage- ment information systems field [8]. In many ways, this important body of work is appropriate to health care decision-making, which is usually group-ori- ented. For example, Dennis et al. [9] study the effects of minority influence on decision-making and find that the presence or absence of technology has very different effects. Another important contributing area would be the political process and its ramifications to decision-making [20]. Certainly, policy making in
  • 76. health care is very much a political process. The use of the CATCH methodology and state-of- the-art data warehousing technology across many Florida communities will provide a rich research opportunity for studying interesting issues on group decision-making in community health care organiza- tions. Such issues would include the composition of the decision-making group, the community stakehold- ers and their political influence, the decision-making process, and dissemination patterns of health care information in the community. The complexities and the interrelationships among these issues make the design of research studies both a challenge and an opportunity. As the automated CATCH reports are produced for various communities in Florida, we will study how effectively the CATCH information is used for health care planning. Acknowledgements
  • 77. The authors gratefully recognize the U.S. Depart- ment of Commerce, which has provided funding through a Technology Opportunities Program (TOP) grant. The Florida Department of Health has been a research partner. Research collaborators in the College of Public Health include R. Campbell, E. Gilbert, S. Luther, B. Myers, and B. Steverson. Contributing graduate students in the College of Business Admin- istration include S. Hedge-Desai, R. Marsh, D. McCorkel, M. Nevrekar, M. Pearl, R. Rajendrababu, and J. Slayton. The authors also thank Oracle Corporation for making their state-of-the-art develop- ment tools available through the Oracle Academic Initiative. References [1] D. Berndt, A. Hevner, J. Studnicki, Data warehouse dissemina- tion strategies for community health assessments, informatik/ informatique, Journal of the Swiss Informatics Society (1) (Feb-
  • 78. ruary 2001) 27–33. [2] D. Berndt, J. Fisher, A. Hevner, J. Studnicki, Healthcare data warehousing and quality assurance, IEEE Computer 34 (12) (December 2001) 33–42. [3] D. Berndt, R. Satterfield, Customer and household matching: resolving entity identity in data warehouses, Proceedings of AeroSense 2000, Conference on Data Mining and Knowledge Discovery, Orlando (April 2000). [4] Center for Disease Control and Prevention, Principles of Com- munity Engagement, 1997, Atlanta. [5] D. Chrislip, C. Larson, Collaborative Leadership: How Citi- zens and Civic Leaders Can Make a Difference, Jossey-Bass, San Francisco, 1994. [6] M. Corey, M. Abbey, Oracle Data Warehousing, Oracle Press and Osborne McGraw-Hill, New York, 1997. [7] S. Cropper, Collaborative working and the issue of sustainabil-
  • 79. ity, in: C. Huxham (Ed.), Creating Collaborative Advantage, SAGE Publishers, London, 1996. [8] A. Dennis, Information exchange and use in group decision D.J. Berndt et al. / Decision Support Systems 35 (2003) 367– 384 383 making: you can lead a group to information but you can’t make it think, MIS Quarterly 20 (4) (1996) 433–458. [9] A. Dennis, K. Hilmer, N. Taylor, Information exchange and use in GSS and verbal group decision making, Journal of MIS 14 (3) (1998) 61–88. [10] D. Dey, S. Sarkar, P. De, A probabilistic decision model for entity matching in heterogeneous databases, Management Sci- ence 44 (10) (October 1998) 1379–1396. [11] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy (Eds.), Advances in Knowledge Discovery and Data Mining, The AAAI Press, Menlo Park, CA, 1996. [12] P. Gray, H. Watson, Decision Support in the Data
  • 80. Warehouse, Prentice-Hall, Englewood Cliffs, NJ, 1998. [13] J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, San Francisco, 2001. [14] V. Harinarayan, A. Rajaraman, J. Ullman, Implementing data cubes efficiently, Proceedings of the 1996 ACM SIGMOD, Montreal (June 1996). [15] W. Inmon, Building the Data Warehouse, Wiley, New York, 1992. [16] Institute of Medicine, Summary of recommendations, in: W. Waterfall (Ed.), The Future of Public Health, National Acad- emy Press, Washington, DC, 1988. [17] Institute of Medicine, Healthy Communities: New Partner- ships for the Future of Public Health, National Academy Press, Washington, DC, 1996. [18] Institute of Medicine, Measurement tools for a community
  • 81. health improvement process, in: J. Durch, L. Bailey, M. Stoto (Eds.), Improving Health in the Community, a Role for Per- formance Monitoring, National Academy Press, Washington, DC, 1997. [19] R. Kimball, The Data Warehouse Toolkit, Wiley, New York, 1996. [20] H. Mintzberg, The Nature of Managerial Work, Harper and Row, New York, 1973. [21] H. Nakajima, Editorial: new players for a new era, World Health 50 (3) (1997) 3. [22] J. Srivastava, P. Chen, Warehouse creation—a potential road- block to data warehousing, IEEE Transactions on Knowledge and Data Engineering 11 (1) (1999) 118–126. [23] B. Starfield, Primary care and health: a cross-national compar- ison, Journal of the American Medical Association 266 (16) (1991) 2268–2271. [24] J. Studnicki, Evaluating the performance of public health
  • 82. agencies: information needs, American Journal of Preventive Medicine, Research and Measurement in Public Health Prac- tice 11 (6) (1995) 74–80. [25] J. Studnicki, B. Steverson, B. Myers, A. Hevner, D. Berndt, Comprehensive assessment for tracking community health (CATCH), Best Practices and Benchmarking in Healthcare 2 (5) (September/October 1997) 196–207. [26] J. Studnicki, A. Hevner, D. Berndt, S. Luther, Comparing alternative methods for composing community peer groups: a data warehouse application, Journal of Public Health Man- agement and Practice 7 (6) (November 2001) 87–94. [27] R. Wang, M. Ziad, Y. Lee, Data Quality, Kluwer Academic Publishing, New York, 2001. [28] J. Weissman, C. Gatsonis, A. Epstein, Rates of avoidable hos- pitalization by insurance status in Massachusetts and Mary- land, Journal of the American Medical Association 268 (17) (November 1992) 2388–2394.
  • 83. [29] World Health Organization, The World Health Report 1995: Bridging the Gaps, Report of the Director-General, Geneva, 1995. Donald J. Berndt is an Assistant Professor in the Information Systems and Decision Sciences Department in the College of Busi- ness Administration at the University of South Florida. His research interests include data warehousing, knowledge discovery, and data mining. Dr. Berndt received a PhD in Information Systems from the Stern School of Business at New York University. He is a member of Beta Gamma Sigma, ACM, AIS, and INFORMS. Alan R. Hevner is an Eminent Scholar and Professor in the Information Systems and Decision Sciences Department in the Col-
  • 84. lege of Business Administration at the University of South Florida. He holds the Salomon Brothers/HRCP Chair of Distrib- uted Technology. His research interests include software engineering, software test- ing, distributed database systems, and health care information systems. He received a PhD in Computer Science from Purdue University. Dr. Hevner is a member of ACM, IEEE, AIS, and INFORMS. James Studnicki is a Professor of Health Policy and Management at the University of South Florida College of Public Health. His research interests include measuring the health status of communities, evaluating alternative treatment outcomes, and study- ing the influence of managed care penetra- tion on the utilization and quality of health
  • 85. services. Dr. Studnicki received a ScD from Johns Hopkins University. D.J. Berndt et al. / Decision Support Systems 35 (2003) 367– 384384 IntroductionThe CATCH methods of community assessmentLimitationsCATCH data warehouse challengesThe CATCH data warehouseThe dimensional modelFact tablesDimension tablesData warehouse design: the data access pyramidAggregated Florida Department of Health DataTransaction-oriented hospital discharge dataPerformance issuesAggregatesIndexingPartitioningData staging and quality assuranceTwin star stagingStage 1Stage 2Stage 3Data quality filtersCATCH data warehouse applicationsProducing CATCH reportsInvestigating health care issuesSecurity issuesConclusionsData warehouse research directionsCommunity decision-making with CATCH dataAcknowledgementsReferences