SlideShare a Scribd company logo
A Comprehensive Approach to Data Warehouse Testing
Matteo Golfarelli
DEIS - University of Bologna
Via Sacchi, 3
Cesena, Italy
[email protected]
Stefano Rizzi
DEIS - University of Bologna
VIale Risorgimento, 2
Bologna, Italy
[email protected]
ABSTRACT
Testing is an essential part of the design life-cycle of any
software product. Nevertheless, while most phases of data
warehouse design have received considerable attention in the
literature, not much has been said about data warehouse
testing. In this paper we introduce a number of data mart-
specific testing activities, we classify them in terms of what
is tested and how it is tested, and we discuss how they can
be framed within a reference design methodology.
Categories and Subject Descriptors
H.4.2 [Information Systems Applications]: Types of
Systems—Decision support; D.2.5 [Software Engineering]:
Testing and Debugging
General Terms
Verification, Design
Keywords
data warehouse, testing
1. INTRODUCTION
Testing is an essential part of the design life-cycle of any
software product. Needless to say, testing is especially crit-
ical to success in data warehousing projects because users
need to trust in the quality of the information they access.
Nevertheless, while most phases of data warehouse design
have received considerable attention in the literature, not
much has been said about data warehouse testing.
As agreed by most authors, the difference between test-
ing data warehouse systems and generic software systems or
even transactional systems depends on several aspects [21,
23, 13]:
• Software testing is predominantly focused on program
code, while data warehouse testing is directed at data
and information. As a matter of fact, the key to data
Permission to make digital or hard copies of all or part of this
work for
personal or classroom use is granted without fee provided that
copies are
not made or distributed for profit or commercial advantage and
that copies
bear this notice and the full citation on the first page. To copy
otherwise, to
republish, to post on servers or to redistribute to lists, requires
prior specific
permission and/or a fee.
DOLAP’09, November 6, 2009, Hong Kong, China.
Copyright 2009 ACM 978-1-60558-801-8/09/11 ...$10.00.
warehouse testing is to know the data and what the
answers to user queries are supposed to be.
• Differently from generic software systems, data ware-
house testing involves a huge data volume, which sig-
nificantly impacts performance and productivity.
• Data warehouse testing has a broader scope than soft-
ware testing because it focuses on the correctness and
usefulness of the information delivered to users. In
fact, data validation is one of the main goals of data
warehouse testing.
• Though a generic software system may have a large
number of different use scenarios, the valid combina-
tions of those scenarios are limited. On the other hand,
data warehouse systems are aimed at supporting any
views of data, so the possible combinations are virtu-
ally unlimited and cannot be fully tested.
• While most testing activities are carried out before de-
ployment in generic software systems, data warehouse
testing activities still go on after system release.
• Typical software development projects are self-contained.
Data warehousing projects never really come to an end;
it is very difficult to anticipate future requirements for
the decision-making process, so only a few require-
ments can be stated from the beginning. Besides, it
is almost impossible to predict all the possible types
of errors that will be encountered in real operational
data. For this reason, regression testing is inherently
involved.
Like for most generic software systems, different types of
tests can be devised for data warehouse systems. For in-
stance, it is very useful to distinguish between unit test,
a white-box test performed on each individual component
considered in isolation from the others, and integration test,
a black-box test where the system is tested in its entirety.
Also regression test, that checks that the system still func-
tions correctly after a change has occurred, is considered to
be very important for data warehouse systems because of
their ever-evolving nature. However, the peculiar character-
istics of data warehouse testing and the complexity of data
warehouse projects ask for a deep revision and contextualiza-
tion of these test types, aimed in particular at emphasizing
the relationships between testing activities on the one side,
design phases and project documentation on the other.
From the methodological point of view we mention that,
while testing issues are often considered only during the very
17
last phases of data warehouse projects, all authors agree that
advancing an accurate test planning to the early projects
phases is one of the keys to success. The main reason for
this is that, as software engineers know very well, the ear-
lier an error is detected in the software design cycle, the
cheapest correcting that error is. Besides, planning early
testing activities to be carried out during design and before
implementation gives project managers an effective way to
regularly measure and document the project progress state.
Since the correctness of a system can only be measured
with reference to a set of requirements, a successful test-
ing begins with the gathering and documentation of end-
user requirements [8]. Since most end-users requirements
are about data analysis and data quality, it is inevitable
that data warehouse testing primarily focuses on the ETL
process on the one hand (this is sometimes called back-end
testing [23]), on reporting and OLAP on the other (front-end
testing [23]). While back-end testing aims at ensuring that
data loaded into the data warehouse are consistent with the
source data, front-end testing aims at verifying that data are
correctly navigated and aggregated in the available reports.
From the organizational point of view, several roles are in-
volved with testing [8]. Analysts draw conceptual schemata,
that represent the users requirements to be used as a refer-
ence for testing. Designers are responsible for logical schemata
of data repositories and for data staging flows, that should be
tested for efficiency and robustness. Testers develop and ex-
ecute test plans and scripts. Developers perform white box
unit tests. Database administrators test for performance
and stress, and set up test environments. Finally, end-users
perform functional tests on reporting and OLAP front-ends.
In this paper we propose a comprehensive approach to
testing data warehouse systems. More precisely, consider-
ing that data warehouse systems are commonly built in a
bottom-up fashion, by iteratively designing and implement-
ing one data mart at a time, we will focus on the test of a
single data mart. The main features of our approach can be
summarized as follows:
• A consistent portion of the testing effort is advanced
to the design phase to reduce the impact of error cor-
rection.
• A number of data mart-specific testing activities are
identified and classified in terms of what is tested and
how it is tested.
• A tight relationship is established between testing ac-
tivities and design phases within the framework of a
reference methodological approach to design.
• When possible, testing activities are related to quality
metrics to allow their quantitative assessment.
The remainder of the paper is organized as follows. Af-
ter briefly reviewing the related literature in Section 2, in
Section 3 we propose the reference methodological frame-
work. Then, Section 4 classifies and describes the data mart-
specific testing activities we devised, while Section 5 briefly
discusses some issues related to test coverage. Section 6 pro-
poses a timeline for testing in the framework of the reference
design methodology, and Section 7 summarizes the lessons
we learnt.
2. RELATED WORKS
The literature on software engineering is huge, and it in-
cludes a detailed discussion of different approaches to the
testing of software systems (e.g., see [16, 19]). However,
only a few works discuss the issues raised by testing in the
data warehousing context.
[13] summarizes the main challenges of data warehouse
and ETL testing, and discusses their phases and goals dis-
tinguishing between retrospective and prospective testing. It
also proposes to base part of the testing activities (those
related to incremental load) on mock data.
In [20] the authors propose a basic set of attributes for a
data warehouse test scenario based on the IEEE 829 stan-
dard. Among these attributes, the test purpose, its perfor-
mance requirements, its acceptance criteria, and the activi-
ties for its completion.
[2] proposes a process for data warehouse testing centered
on a unit test phase, an integration test phase, and a user
acceptance test phase.
[21] reports eight classical mistakes in data warehouse
testing; among these: not closely involving end users, testing
reports rather than data, skipping the comparison between
data warehouse data and data source data, and using only
mock data. Besides, unit testing, system testing, acceptance
testing, and performance testing are proposed as the main
testing steps.
[23] explains the differences between testing OLTP and
OLAP systems and proposes a detailed list of testing cate-
gories. It also enumerates possible test scenarios and differ-
ent types of data to be used for testing.
[8] discusses the main aspects in data warehouse testing.
In particular, it distinguishes the different roles required in
the testing team, and the different types of testing each role
should carry out.
[4] presents some lessons learnt on data warehouse testing,
emphasizing the role played by constraint testing, source-to-
target testing, and validation of error processing procedures.
All papers mentioned above provide useful hints and list
some key testing activities. On the other hand, our paper
is the first attempt to define a comprehensive framework for
data mart testing.
3. METHODOLOGICAL FRAMEWORK
To discuss how testing relates to the different phases of
data mart design, we adopt as a methodological framework
the one described in [7]. As sketched in Figure 1, this frame-
work includes eight phases:
• Requirement analysis: requirements are elicited from
users and represented either informally by means of
proper glossaries or formally (e.g., by means of goal-
oriented diagrams as in [5]);
• Analysis and reconciliation: data sources are inspected,
normalized, and integrated to obtain a reconciled schema;
• Conceptual design: a conceptual schema for the data
mart –e.g., in the form of a set of fact schemata [6]– is
designed considering both user requirements and data
available in the reconciled schema;
• Workload refinement: the preliminary workload ex-
pressed by users is refined and user profiles are singled
out, using for instance UML use case diagrams;
18
Figure 1: Reference design methodology
• Logical design: a logical schema for the data mart (e.g.,
in the form of a set of star schemata) is obtained by
properly translating the conceptual schema;
• Data staging design: ETL procedures are designed con-
sidering the source schemata, the reconciled schema,
and the data mart logical schema; and
• Physical design: this includes index selection, schema
fragmentation, and all other issues related to physical
allocation.
• Implementation: this includes implementation of ETL
procedures and creation of front-end reports.
Note that this methodological framework is general enough
to host both supply-driven, demand-driven, and mixed ap-
proaches to design. While in a supply-driven approach con-
ceptual design is mainly based on an analysis of the available
source schemata [6], in a demand-driven approach user re-
quirements are the driving force of conceptual design [3]. Fi-
nally, in a mixed approach requirement analysis and source
schema analysis are carried out in parallel, and requirements
are used to actively reduce the complexity of source schema
analysis [1].
4. TESTING ACTIVITIES
In order to better frame the different testing activities, we
identify two distinct, though not independent, classification
coordinates: what is tested and how it is tested.
As concerns the first coordinate, what, we already men-
tioned that testing data quality is undoubtedly at the core of
data warehouse testing. Testing data quality mainly entails
an accurate check on the correctness of the data loaded by
ETL procedures and accessed by front-end tools. However,
in the light of the complexity of data warehouse projects and
of the close relationship between good design and good per-
formance, we argue that testing the design quality is almost
equally important. Testing design quality mainly implies
verifying that user requirements are well expressed by the
conceptual schema of the data mart and that the conceptual
and logical schemata are well-built. Overall, the items to be
tested can then be summarized as follows:
• Conceptual schema: it describes the data mart from an
implementation-independent point of view, specifying
the facts to be monitored, their measures, the hierar-
chies to be used for aggregation. Our methodological
framework adopts the Dimensional Fact Model to this
end [6].
• Logical schema: it describes the structure of the data
repository at the core of the data mart. If the im-
plementation target is a ROLAP platform, the logical
schema is actually a relational schema (typically, a star
schema or one of its variants).
• ETL procedures: the complex procedures that are in
charge of feeding the data repository starting from
data sources.
• Database: the repository storing data.
• Front-end: the applications accessed by end-users to
analyze data; typically, these are either static reporting
tools or more flexible OLAP tools.
As concerns the second coordinate, how, the eight types
of test that, in our experience, best fit the characteristics of
data warehouse systems are summarized below:
• Functional test: it verifies that the item is compliant
with its specified business requirements.
• Usability test: it evaluates the item by letting users
interact with it, in order to verify that the item is easy
to use and comprehensible.
• Performance test: it checks that the item performance
is satisfactory under typical workload conditions.
• Stress test: it shows how well the item performs with
peak loads of data and very heavy workloads.
• Recovery test: it checks how well an item is able to re-
cover from crashes, hardware failures and other similar
problems.
• Security test: it checks that the item protects data and
maintains functionality as intended.
• Regression test: It checks that the item still functions
correctly after a change has occurred.
Remarkably, these types of test are tightly related to six of
the software quality factors described in [12]: correctness,
usability, efficiency, reliability, integrity, flexibility.
The relationship between what and how is summarized in
Table 1, where each check mark points out that a given type
of test should be applied to a given item. Starting from this
table, in the following subsections we discuss the main test-
ing activities and how they are related to the methodological
framework outlined in Section 3.
19
Table 1: What vs. how in testing
C
o
n
c
e
p
tu
a
l
sc
h
e
m
a
L
o
g
ic
a
l
sc
h
e
m
a
E
T
L
p
ro
c
e
d
u
re
s
D
a
ta
b
a
se
F
ro
n
t-
e
n
d
Functional X X X X
Usability X X X
Performance X X X X
Stress X X X
Recovery X X
Security X X X
Regression X X X X X
Analysis & design Implementation
We preliminarily remark that a requirement for an effec-
tive test is the early definition, for each testing activity, of
the necessary conditions for passing the test. These condi-
tions should be verifiable and quantifiable. This means that
proper metrics should be introduced, together with their ac-
ceptance thresholds, so as to get rid of subjectivity and am-
biguity issues. Using quantifiable metrics is also necessary
for automating testing activities.
While for some types of test, such as performance tests,
the metrics devised for generic software systems (such as
the maximum query response time) can be reused and ac-
ceptance thresholds can be set intuitively, for other types of
test data warehouse-specific metrics are needed. Unfortu-
nately, very few data warehouse-specific metrics have been
defined in the literature. Besides, the criteria for setting
their acceptance thresholds often depend on the specific fea-
tures of the project being considered. So, in most cases, ad
hoc metrics will have to be defined together their thresholds.
An effective approach to this activity hinges on the following
main phases [18]:
1. Identify the goal of the metrics by specifying the un-
derlying hypotheses and the quality criteria to be mea-
sured.
2. Formally define the metrics.
3. Theoretically validate the metrics using either axiomatic
approaches or measurement theory-based approaches,
to get a first assessment of the metrics correctness and
applicability.
4. Empirically validate the metrics by applying it to data
of previous projects, in order to get a practical proof of
the metrics capability of measuring their goal quality
criteria. This phase is also aimed at understanding the
metrics implications and fixing the acceptance thresh-
olds.
5. Apply and accredit the metrics. The metrics defini-
tions and thresholds may evolve in time to adapt to
new projects and application domains.
4.1 Testing the Conceptual Schema
Software engineers know very well that the earlier an er-
ror is detected in the software design cycle, the cheapest
correcting that error is. One of the advantages of adopt-
ing a data warehouse methodology that entails a conceptual
design phase is that the conceptual schema produced can
be thoroughly tested for user requirements to be effectively
supported.
We propose two main types of test on the data mart con-
ceptual schema in the scope of functional testing. The first,
that we call fact test, verifies that the workload preliminar-
ily expressed by users during requirement analysis is actu-
ally supported by the conceptual schema. This can be easily
achieved by checking, for each workload query, that the re-
quired measures have been included in the fact schema and
that the required aggregation level can be expressed as a
valid grouping set on the fact schema. We call the second
type of test a conformity test, because it is aimed at as-
sessing how well conformed hierarchies have been designed.
This test can be carried out by measuring the sparseness
of the bus matrix [7], which associates each fact with its
dimensions, thus pointing out the existence of conformed
hierarchies. Intuitively, if the bus matrix is very sparse,
the designer probably failed to recognize the semantic and
structural similarities between apparently different hierar-
chies. Conversely, if the bus matrix is very dense, the de-
signer probably failed to recognize the semantic and struc-
tural similarities between apparently different facts.
Other types of test that can be executed on the conceptual
schema are related to its understandability, thus falling in
the scope of usability testing. An example of a quantitative
approach to this test is the one described in [18], where a set
of metrics for measuring the quality of a conceptual schema
from the point of view of its understandability are proposed
and validated. Example of these metrics are the average
number of levels per hierarchy and the number of measures
per fact.
4.2 Testing the Logical Schema
Testing the logical schema before it is implemented and
before ETL design can dramatically reduce the impact of
errors due to bad logical design. An effective approach
to functional testing consists in verifying that a sample of
queries in the preliminary workload can correctly be for-
mulated in SQL on the logical schema. We call this the
star test. In putting the sample together, priority should
be given to the queries involving irregular portions of hier-
archies (e.g., those including many-to-many associations or
cross-dimensional attributes), those based on complex ag-
gregation schemes (e.g., queries that require measures ag-
gregated through different operators along the different hi-
erarchies), and those leaning on non-standard temporal sce-
narios (such as yesterday-for-today).
In the scope of usability testing, in [17] some simple met-
rics based on the number of fact tables and dimension ta-
bles in a logical schema are proposed. These metrics can be
adopted to effectively capture schema understandability.
Finally, a performance test can be carried out on the log-
ical schema by checking to what extent it is compliant with
the multidimensional normal forms, that ensure summariz-
ability and support an efficient database design [11, 10].
Besides the above-mentioned metrics, focused on usabil-
ity and performance, the literature proposes other metrics
20
for database schemata, and relates them to abstract quality
factors. For instance, in the scope of maintainability, [15]
introduces a set of metrics aimed at evaluating the quality
of a data mart logical schema with respect to its ability to
sustain changes during an evolution process.
4.3 Testing the ETL Procedures
ETL testing is probably the most complex and critical
testing phase, because it directly affects the quality of data.
Since ETL is heavily code-based, most standard techniques
for generic software system testing can be reused here.
A functional test of ETL is aimed at checking that ETL
procedures correctly extract, clean, transform, and load data
into the data mart. The best approach here is to set up unit
tests and integration tests. Unit tests are white-box test
that each developer carries out on the units (s)he devel-
oped. They allow for breaking down the testing complexity,
and they also enable more detailed reports on the project
progress to be produced. Units for ETL testing can be ei-
ther vertical (one test unit for each conformed dimension,
plus one test unit for each group of correlated facts) or hor-
izontal (separate tests for static extraction, incremental ex-
traction, cleaning, transformation, static loading, incremen-
tal loading, view update); the most effective choice mainly
depends on the number of facts in the data marts, on how
complex cleaning and transformation are, and on how the
implementation plan was allotted to the developers. In par-
ticular, crucial aspects to be considered during the loading
test are related to both dimension tables (correctness of roll-
up functions, effective management of dynamic hierarchies
and irregular hierarchies), fact tables (effective management
of late updates), and materialized views (use of correct ag-
gregation functions).
After unit tests have been completed, an integration test
allows the correctness of data flows in ETL procedures to be
checked. Different quality dimensions, such as data coher-
ence (the respect of integrity constraints), completeness (the
percentage of data found), and freshness (the age of data)
should be considered. Some metrics for quantifying these
quality dimensions have been proposed in [22].
During requirement analysis the designer, together with
users and database administrators, should have singled out
and ranked by their gravity the most common causes of
faulty data, aimed at planning a proper strategy for dealing
with ETL errors. Common strategies for dealing with errors
of a given kind are “automatically clean faulty data”, “reject
faulty data”, “hand faulty data to data mart administrator”,
etc. So, not surprisingly, a distinctive feature of ETL func-
tional testing is that it should be carried out with at least
three different databases, including respectively (i) correct
and complete data, (ii) data simulating the presence of faulty
data of different kinds, and (iii) real data. In particular, tests
using dirty simulated data are sometimes called forced-error
tests: they are designed to force ETL procedures into error
conditions aimed at verifying that the system can deal with
faulty data as planned during requirement analysis.
Performance and stress tests are complementary in as-
sessing the efficiency of ETL procedures. Performance tests
evaluate the behavior of ETL with reference to a routine
workload, i.e., when a domain-typical amount of data has
to be extracted and loaded; in particular, they check that
the processing time be compatible with the time frames ex-
pected for data-staging processes. On the other hand, stress
tests simulate an extraordinary workload due to a signifi-
cantly larger amount of data.
The recovery test of ETL checks for robustness by simu-
lating faults in one or more components and evaluating the
system response. For example, you can cut off the power
supply while an ETL process is in progress or you can set
a database offline while an OLAP session is in progress to
check for restore policies’ effectiveness.
As concerns ETL security, there is a need for verifying
that the database used to temporarily store the data being
processed (the so-called data staging area) cannot be vio-
lated, and that the network infrastructure hosting the data
flows that connect the data sources and the data mart is
secure.
4.4 Testing the Database
We assume that the logical schema quality has already
been verified during the logical schema tests, and that all is-
sues related to data quality are in charge of ETL tests. Then,
database testing is mainly aimed at checking the database
performances using either standard (performance test) or
heavy (stress test) workloads. Like for ETL, the size of the
tested databases and their data distribution must be dis-
cussed with the designers and the database administrator.
Performance tests can be carried out either on a database
including real data or on a mock database, but the database
size should be compatible with the average expected data
volume. On the other hand, stress tests are typically car-
ried out on mock databases whose size is significantly larger
than what expected. Standard database metrics –such as
maximum query response time– can be used to quantify the
test results. To advance these testing activities as much as
possible and to make their results independent of front-end
applications, we suggest to use SQL to code the workload.
Recovery tests enable testers to verify the DBMS behav-
ior after critical errors such as power leaks during update,
network fault, and hard disk failures.
Finally, security tests mainly concern the possible adop-
tion of some cryptography technique to protect data and the
correct definition of user profiles and database access grants.
4.5 Testing the Front-End
Functional testing of the analysis front-ends must neces-
sarily involve a very large number of end-users, who gener-
ally are so familiar with application domains that they can
detect even the slightest abnormality in data. Nevertheless,
wrong results in OLAP analyses may be difficult to recog-
nize. They can be caused not only by faulty ETL proce-
dures, but even by incorrect data aggregations or selections
in front-end tools. Some errors are not due to the data mart;
instead, they result from the overly poor data quality of the
source database. In order to allow this situation to be rec-
ognized, a common approach to front-end functional testing
in real projects consists in comparing the results of OLAP
analyses with those obtained by directly querying the source
databases. Of course, though this approach can be effective
on a sample basis, it cannot be extensively adopted due to
the huge number of possible aggregations that characterize
multidimensional queries.
A significant sample of queries to be tested can be selected
in mainly two ways. In the “black-box way”, the workload
specification obtained in output by the workload refinement
phase (typically, a use case diagram where actors stand for
21
Table 2: Coverage criteria for some testing activities; the
expected coverage is expressed with reference to
the coverage criterion
Testing activity Coverage criterion Measurement Expected
coverage
fact test
each information need expressed
by users during requirement anal-
ysis must be tested
percentage of queries in the prelim-
inary workload that are supported
by the conceptual schema
partial, depending on the ex-
tent of the preliminary work-
load
conformity test
all data mart dimensions must be
tested
bus matrix sparseness total
usability test of the
conceptual schema
all facts, dimensions, and measures
must be tested
conceptual metrics total
ETL unit test all decision points must be tested correct loading
of the test data sets total
ETL forced-error test
all error types specified by users
must be tested
correct loading of the faulty data
sets
total
front-end unit test
at least one group-by set for each
attribute in the multidimensional
lattice of each fact must be tested
correct analysis result of a real
data set
total
user profiles and use cases represent the most frequent anal-
ysis queries) is used to determine the test cases, much like
use case diagrams are profitably employed for testing-in-the-
large in generic software systems. In the “white-box way”,
instead, the subset of data aggregations to be tested can be
determined by applying proper coverage criteria to the mul-
tidimensional lattice1 of each fact, much like decision, state-
ment, and path coverage criteria are applied to the control
graph of a generic software procedure during testing-in-the-
small.
Also in front-end testing it may be useful to distinguish
between unit and integration tests. While unit tests should
be aimed at checking the correctness of the reports involving
single facts, integration tests entail analysis sessions that ei-
ther correlate multiple facts (the so-called drill-across queries)
or take advantage of different application components. For
example, this is the case when dashboard data are “drilled
through” an OLAP application.
An integral part of front-end tests are usability tests, that
check for OLAP reports to be suitably represented and com-
mented to avoid any misunderstanding about the real mean-
ing of data.
A performance test submits a group of concurrent queries
to the front-end and checks for the time needed to process
those queries. Performance tests imply a preliminary spec-
ification of the standard workload in terms of number of
concurrent users, types of queries, and data volume. On the
other hand, stress tests entails applying progressively heav-
ier workloads than the standard loads to evaluate the system
stability. Note that performance and stress tests must be
focused on the front-end, that includes the client interface
but also the reporting and OLAP server-side engines. This
means that the access times due to the DBMS should be
subtracted from the overall response times measured.
Finally, in the scope of security test, it is particularly im-
portant to check for user profiles to be properly set up. You
should also check for single-sign-on policies to be set up prop-
erly after switching between different analysis applications.
4.6 Regression Tests
A very relevant problem for frequently updated systems,
1The multidimensional lattice of a fact is the lattice whose
nodes and arcs correspond, respectively, to the group-by sets
supported by that fact and to the roll-up relationships that
relate those group-by sets.
such as data marts, concerns checking for new components
and new add-on features to be compatible with the operation
of the whole system. In this case, the term regression test is
used to define the testing activities carried out to make sure
that any change applied to the system does not jeopardize
the quality of preexisting, already tested features and does
not corrupt the system performances.
Testing the whole system many times from scratch has
huge costs. Three main directions can be followed to reduce
these costs:
• An expensive part of each test is the validation of the
test results. In regression testing, it is often possible
to skip this phase by just checking that the test results
are consistent with those obtained at the previous it-
eration.
• Test automation allows the efficiency of testing ac-
tivities to be increased, so that reproducing previous
tests becomes less expensive. Test automation will be
briefly discussed in Section 6.
• Impact analysis can be used to significantly restrict
the scope of testing. In general, impact analysis is
aimed at determining what other application objects
are affected by a change in a single application ob-
ject [9]. Remarkably, some ETL tool vendors already
provide some impact analysis functionalities. An ap-
proach to impact analysis for changes in the source
data schemata is proposed in [14].
5. TEST COVERAGE
Testing can reduce the probability of a system fault but
cannot set it to zero, so measuring the coverage of tests is
necessary to assess the overall system reliability. Measuring
test coverage requires first of all the definition of a suitable
coverage criterion. Different coverage criteria, such as state-
ment coverage, decision coverage, and path coverage, were
devised in the scope of code testing. The choice of one or
another criterion deeply affects the test length and cost, as
well as the achievable coverage. So, coverage criteria are
chosen by trading off test effectiveness and efficiency. Ex-
amples of coverage criteria that we propose for some of the
testing activities described above are reported in Table 2.
22
Requirement
analysis
Analysis and
reconciliation
Conceptual
design
Workload
refinement
Logical design
Staging design Physical design
ETL
implementation
Front -end
implementation
Conc. schema:
conformity &
usability test
Conc. schema:
fact test
Logical schema:
functional, perf.
& usability test
ETL: unit test
Front -end:
security test
ETL: integration
test
Front -end:
functional &
usability test
ETL: forced -error,
perf., stress, recov.
& secur. test
Front -end:
performance &
stress test
Analyst / designer Tester Developer DBA End -user
Database: all
tests
Design and testing process
Test planning
Figure 2: UML activity diagram for design and testing
6. A TIMELINE FOR TESTING
From a methodological point of view, the three main phases
of testing are [13]:
• Create a test plan. The test plan describes the tests
that must be performed and their expected coverage
of the system requirements.
• Prepare test cases. Test cases enable the implementa-
tion of the test plan by detailing the testing steps to-
gether with their expect results. The reference databases
for testing should be prepared during this phase, and
a wide, comprehensive set of representative workloads
should be defined.
• Execute tests. A test execution log tracks each test
along and its results.
Figure 2 shows a UML activity diagram that frames the
different testing activities within the design methodology
outlined in Section 3. This diagram can be used in a project
as a starting point for preparing the test plan.
It is worth mentioning here that test automation plays
a basic role in reducing the costs of testing activities (es-
pecially regression tests) on the one hand, on increasing
test coverage on the other [4]. Remarkably, commercial
tools (such as QACenter by Computerware) can be used
for implementation-related testing activities to simulate spe-
cific workloads and analysis sessions, or to measure a process
outcome. As to design-related testing activities, the metrics
proposed in the previous sections can be measured by writ-
ing ad hoc procedures that access the meta-data repository
and the DBMS catalog.
7. CONCLUSIONS AND LESSONS LEARNT
In this paper we proposed a comprehensive approach which
adapts and extends the testing methodologies proposed for
general-purpose software to the peculiarities of data ware-
house projects. Our proposal builds on a set of tips and sug-
23
gestions coming from our direct experience on real projects,
as well as from some interviews we made to data warehouse
practitioners. As a result, a set of relevant testing activi-
ties were identified, classified, and framed within a reference
design methodology.
In order to experiment our approach on a case study, we
are currently supporting a professional design team engaged
in a large data warehouse project, which will help us better
focus on relevant issues such as test coverage and test doc-
umentation. In particular, to better validate our approach
and understand its impact, we will apply it to one out of
two data marts developed in parallel, so as to assess the
extra-effort due to comprehensive testing on the one hand,
the saving in post-deployment error correction activities and
the gain in terms of better data and design quality on the
other.
To close the paper, we would like to summarize the main
lessons we learnt so far:
• The chance to perform an effective test depends on the
documentation completeness and accuracy in terms
of collected requirements and project description. In
other words, if you did not specify what you want from
your system at the beginning, you cannot expect to get
it right later.
• The test phase is part of the data warehouse life-cycle,
and it acts in synergy with design. For this reason, the
test phase should be planned and arranged at the be-
ginning of the project, when you can specify the goals
of testing, which types of tests must be performed,
which data sets need to be tested, and which quality
level is expected.
• Testing is not a one-man activity. The testing team
should include testers, developers, designers, database
administrators, and end-users, and it should be set up
during the project planning phase.
• Testing of data warehouse systems is largely based on
data. A successful testing must rely on real data, but
it also must include mock data to reproduce the most
common error situations that can be encountered in
ETL. Accurately preparing the right data sets is one
of the most critical activities to be carried out during
test planning.
• No matter how deeply the system has been tested: it is
almost sure that, sooner or later, an unexpected data
fault, that cannot be properly handled by ETL, will oc-
cur. So keep in mind that, while testing must come to
an end someday, data quality certification is an ever
lasting process. The borderline between testing and
certification clearly depends on how precisely require-
ment were stated and on the contract that regulates
the project.
8. REFERENCES
[1] A. Bonifati, F. Cattaneo, S. Ceri, A. Fuggetta, and
S. Paraboschi. Designing data marts for data
warehouses. ACM Transactions on Software
Engineering Methodologies, 10(4):452–483, 2001.
[2] K. Brahmkshatriya. Data warehouse testing.
http://www.stickyminds.com, 2007.
[3] R. Bruckner, B. List, and J. Schiefer. Developing
requirements for data warehouse systems with use
cases. In Proc. Americas Conf. on Information
Systems, pages 329–335, 2001.
[4] R. Cooper and S. Arbuckle. How to thoroughly test a
data warehouse. In Proc. STAREAST, Orlando, 2002.
[5] P. Giorgini, S. Rizzi, and M. Garzetti. GRAnD: A
goal-oriented approach to requirement analysis in data
warehouses. Decision Support Systems, 5(1):4–21,
2008.
[6] M. Golfarelli, D. Maio, and S. Rizzi. The dimensional
fact model: A conceptual model for data warehouses.
International Journal of Cooperative Information
Systems, 7(2-3):215–247, 1998.
[7] M. Golfarelli and S. Rizzi. Data warehouse design:
Modern principles and methodologies. McGraw-Hill,
2009.
[8] D. Haertzen. Testing the data warehouse.
http://www.infogoal.com, 2009.
[9] R. Kimball and J. Caserta. The Data Warehouse ETL
Toolkit. John Wiley & Sons, 2004.
[10] J. Lechtenbörger and G. Vossen. Multidimensional
normal forms for data warehouse design. Information
Systems, 28(5):415–434, 2003.
[11] W. Lehner, J. Albrecht, and H. Wedekind. Normal
forms for multidimensional databases. In Proc.
SSDBM, pages 63–72, Capri, Italy, 1998.
[12] J. McCall, P. Richards, and G. Walters. Factors in
software quality. Technical Report AD-A049-014, 015,
055, NTIS, 1977.
[13] A. Mookerjea and P. Malisetty. Best practices in data
warehouse testing. In Proc. Test, New Delhi, 2008.
[14] G. Papastefanatos, P. Vassiliadis, A. Simitsis, and
Y. Vassiliou. What-if analysis for data warehouse
evolution. In Proc. DaWaK, pages 23–33, Regensburg,
Germany, 2007.
[15] G. Papastefanatos, P. Vassiliadis, A. Simitsis, and
Y. Vassiliou. Design metrics for data warehouse
evolution. In Proc. ER, pages 440–454, 2008.
[16] R. Pressman. Software Engineering: A practitioner’s
approach. The McGraw-Hill Companies, 2005.
[17] M. Serrano, C. Calero, and M. Piattini. Experimental
validation of multidimensional data models metrics. In
Proc. HICSS, page 327, 2003.
[18] M. Serrano, J. Trujillo, C. Calero, and M. Piattini.
Metrics for data warehouse conceptual models
understandability. Information & Software Technology,
49(8):851–870, 2007.
[19] I. Sommerville. Software Engineering. Pearson
Education, 2004.
[20] P. Tanuška, W. Verschelde, and M. Kopček. The
proposal of data warehouse test scenario. In Proc.
ECUMICT, Gent, Belgium, 2008.
[21] A. van Bergenhenegouwen. Data warehouse testing.
http://www.ti.kviv.be, 2008.
[22] P. Vassiliadis, M. Bouzeghoub, and C. Quix. Towards
quality-oriented data warehouse usage and evolution.
In Proc. CAiSE, Heidelberg, Germany, 1999.
[23] Vv. Aa. Data warehouse testing and implementation.
In Intelligent Enterprise Encyclopedia. BiPM
Institute, 2009.
24
CONFIDENTIAL
[Dave and Jacky Delicious Incorporation]
Business Plan
Prepared May 17th, 2020
Table of Contents
Executive Summary 1
Opportunity 1
Expectations 1
Opportunity 3
Problem &
Solution
3
Target Market 3
Competition 3
Execution 4
Marketing & Sales 4
Operations 4
Milestones And Metrics 5
Company 6
Overview 6
Team 6
Financial Plan 7
Forecast 7
Financing 9
Statements 10
Appendix 13
Profit and Loss Statement 13
Balance Sheet 15
Cash Flow Statement 17
Executive Summary
Opportunity
Problem Summary
INSTRUCTIONS: Describe very briefly why your business
needs to exist. What problem do you
solve for your customers?

More Related Content

Similar to A Comprehensive Approach to Data Warehouse TestingMatteo G.docx

Test case-point-analysis (whitepaper)
Test case-point-analysis (whitepaper)Test case-point-analysis (whitepaper)
Test case-point-analysis (whitepaper)
KMS Technology
 
Configuration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case PrioritizationConfiguration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case Prioritization
ijsrd.com
 
Some practical considerations and a
Some practical considerations and aSome practical considerations and a
Some practical considerations and a
ijseajournal
 
IRJET- Testing Improvement in Business Intelligence Area
IRJET- Testing Improvement in Business Intelligence AreaIRJET- Testing Improvement in Business Intelligence Area
IRJET- Testing Improvement in Business Intelligence Area
IRJET Journal
 
Testing throughout the software life cycle & statistic techniques
Testing throughout the software life cycle & statistic techniquesTesting throughout the software life cycle & statistic techniques
Testing throughout the software life cycle & statistic techniques
Novika Damai Yanti
 
Testing throughout the software life cycle & statistic techniques
Testing throughout the software life cycle & statistic techniquesTesting throughout the software life cycle & statistic techniques
Testing throughout the software life cycle & statistic techniques
YAObbiIkhsan
 
Bab 2
Bab 2Bab 2
Testing throughout the software life cycle
Testing throughout the software life cycleTesting throughout the software life cycle
Testing throughout the software life cycle
Emi Rizki Ayunanda
 
Characterization of Open-Source Applications and Test Suites
Characterization of Open-Source Applications and Test Suites Characterization of Open-Source Applications and Test Suites
Characterization of Open-Source Applications and Test Suites
ijseajournal
 
Unit 3 for st
Unit 3 for stUnit 3 for st
Unit 3 for st
Poonkodi Jayakumar
 
Test planning.ppt
Test planning.pptTest planning.ppt
Test planning.ppt
UmmERayyan2
 
Software Engineering Important Short Question for Exams
Software Engineering Important Short Question for ExamsSoftware Engineering Important Short Question for Exams
Software Engineering Important Short Question for Exams
MuhammadTalha436
 
Testing throughout the software life cycle - Testing & Implementation
Testing throughout the software life cycle - Testing & ImplementationTesting throughout the software life cycle - Testing & Implementation
Testing throughout the software life cycle - Testing & Implementation
yogi syafrialdi
 
Agile methodology
Agile methodologyAgile methodology
Agile methodology
bipulpwc
 
Software testing for project report .pdf
Software testing for project report .pdfSoftware testing for project report .pdf
Software testing for project report .pdf
Kamal Acharya
 
Fundamental test process
Fundamental test processFundamental test process
Fundamental test process
Dinul
 
Software testing and introduction to quality
Software testing and introduction to qualitySoftware testing and introduction to quality
Software testing and introduction to quality
DhanashriAmbre
 
Fundamental test process
Fundamental test process Fundamental test process
Fundamental test process
alex swandi
 
Fundamental test process
Fundamental test processFundamental test process
Fundamental test process
M Branikno Ramadhan
 
Performance testing methodologies
Performance testing methodologiesPerformance testing methodologies
Performance testing methodologiesDhanunjay Rasamala
 

Similar to A Comprehensive Approach to Data Warehouse TestingMatteo G.docx (20)

Test case-point-analysis (whitepaper)
Test case-point-analysis (whitepaper)Test case-point-analysis (whitepaper)
Test case-point-analysis (whitepaper)
 
Configuration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case PrioritizationConfiguration Navigation Analysis Model for Regression Test Case Prioritization
Configuration Navigation Analysis Model for Regression Test Case Prioritization
 
Some practical considerations and a
Some practical considerations and aSome practical considerations and a
Some practical considerations and a
 
IRJET- Testing Improvement in Business Intelligence Area
IRJET- Testing Improvement in Business Intelligence AreaIRJET- Testing Improvement in Business Intelligence Area
IRJET- Testing Improvement in Business Intelligence Area
 
Testing throughout the software life cycle & statistic techniques
Testing throughout the software life cycle & statistic techniquesTesting throughout the software life cycle & statistic techniques
Testing throughout the software life cycle & statistic techniques
 
Testing throughout the software life cycle & statistic techniques
Testing throughout the software life cycle & statistic techniquesTesting throughout the software life cycle & statistic techniques
Testing throughout the software life cycle & statistic techniques
 
Bab 2
Bab 2Bab 2
Bab 2
 
Testing throughout the software life cycle
Testing throughout the software life cycleTesting throughout the software life cycle
Testing throughout the software life cycle
 
Characterization of Open-Source Applications and Test Suites
Characterization of Open-Source Applications and Test Suites Characterization of Open-Source Applications and Test Suites
Characterization of Open-Source Applications and Test Suites
 
Unit 3 for st
Unit 3 for stUnit 3 for st
Unit 3 for st
 
Test planning.ppt
Test planning.pptTest planning.ppt
Test planning.ppt
 
Software Engineering Important Short Question for Exams
Software Engineering Important Short Question for ExamsSoftware Engineering Important Short Question for Exams
Software Engineering Important Short Question for Exams
 
Testing throughout the software life cycle - Testing & Implementation
Testing throughout the software life cycle - Testing & ImplementationTesting throughout the software life cycle - Testing & Implementation
Testing throughout the software life cycle - Testing & Implementation
 
Agile methodology
Agile methodologyAgile methodology
Agile methodology
 
Software testing for project report .pdf
Software testing for project report .pdfSoftware testing for project report .pdf
Software testing for project report .pdf
 
Fundamental test process
Fundamental test processFundamental test process
Fundamental test process
 
Software testing and introduction to quality
Software testing and introduction to qualitySoftware testing and introduction to quality
Software testing and introduction to quality
 
Fundamental test process
Fundamental test process Fundamental test process
Fundamental test process
 
Fundamental test process
Fundamental test processFundamental test process
Fundamental test process
 
Performance testing methodologies
Performance testing methodologiesPerformance testing methodologies
Performance testing methodologies
 

More from makdul

According to Davenport (2014) social media and health care are c.docx
According to Davenport (2014) social media and health care are c.docxAccording to Davenport (2014) social media and health care are c.docx
According to Davenport (2014) social media and health care are c.docx
makdul
 
According to (Fatehi, Gordon & Florida, N.D.) theoretical orient.docx
According to (Fatehi, Gordon & Florida, N.D.) theoretical orient.docxAccording to (Fatehi, Gordon & Florida, N.D.) theoretical orient.docx
According to (Fatehi, Gordon & Florida, N.D.) theoretical orient.docx
makdul
 
According to Libertarianism, there is no right to any social service.docx
According to Libertarianism, there is no right to any social service.docxAccording to Libertarianism, there is no right to any social service.docx
According to Libertarianism, there is no right to any social service.docx
makdul
 
According to Kirk (2016), most of your time will be spent working wi.docx
According to Kirk (2016), most of your time will be spent working wi.docxAccording to Kirk (2016), most of your time will be spent working wi.docx
According to Kirk (2016), most of your time will be spent working wi.docx
makdul
 
According to cultural deviance theorists like Cohen, deviant sub.docx
According to cultural deviance theorists like Cohen, deviant sub.docxAccording to cultural deviance theorists like Cohen, deviant sub.docx
According to cultural deviance theorists like Cohen, deviant sub.docx
makdul
 
According to Gray et al, (2017) critical appraisal is the proce.docx
According to Gray et al, (2017) critical appraisal is the proce.docxAccording to Gray et al, (2017) critical appraisal is the proce.docx
According to Gray et al, (2017) critical appraisal is the proce.docx
makdul
 
According to article Insecure Policing Under Racial Capitalism by.docx
According to article Insecure Policing Under Racial Capitalism by.docxAccording to article Insecure Policing Under Racial Capitalism by.docx
According to article Insecure Policing Under Racial Capitalism by.docx
makdul
 
Abstract In this experiment, examining the equivalence poi.docx
Abstract  In this experiment, examining the equivalence poi.docxAbstract  In this experiment, examining the equivalence poi.docx
Abstract In this experiment, examining the equivalence poi.docx
makdul
 
ACC 403- ASSIGNMENT 2 RUBRIC!!!Points 280Assignment 2 Audi.docx
ACC 403- ASSIGNMENT 2 RUBRIC!!!Points 280Assignment 2 Audi.docxACC 403- ASSIGNMENT 2 RUBRIC!!!Points 280Assignment 2 Audi.docx
ACC 403- ASSIGNMENT 2 RUBRIC!!!Points 280Assignment 2 Audi.docx
makdul
 
ACC 601 Managerial Accounting Group Case 3 (160 points) .docx
ACC 601 Managerial Accounting Group Case 3 (160 points) .docxACC 601 Managerial Accounting Group Case 3 (160 points) .docx
ACC 601 Managerial Accounting Group Case 3 (160 points) .docx
makdul
 
Academic Integrity A Letter to My Students[1] Bill T.docx
Academic Integrity A Letter to My Students[1]  Bill T.docxAcademic Integrity A Letter to My Students[1]  Bill T.docx
Academic Integrity A Letter to My Students[1] Bill T.docx
makdul
 
Access the Center for Disease Control and Prevention’s (CDC’s) Nu.docx
Access the Center for Disease Control and Prevention’s (CDC’s) Nu.docxAccess the Center for Disease Control and Prevention’s (CDC’s) Nu.docx
Access the Center for Disease Control and Prevention’s (CDC’s) Nu.docx
makdul
 
According to DSM 5 This patient had very many symptoms that sugg.docx
According to DSM 5 This patient had very many symptoms that sugg.docxAccording to DSM 5 This patient had very many symptoms that sugg.docx
According to DSM 5 This patient had very many symptoms that sugg.docx
makdul
 
Acceptable concerts include professional orchestras, soloists, jazz,.docx
Acceptable concerts include professional orchestras, soloists, jazz,.docxAcceptable concerts include professional orchestras, soloists, jazz,.docx
Acceptable concerts include professional orchestras, soloists, jazz,.docx
makdul
 
ACA was passed in 2010, under the presidency of Barack Obama. Pr.docx
ACA was passed in 2010, under the presidency of Barack Obama. Pr.docxACA was passed in 2010, under the presidency of Barack Obama. Pr.docx
ACA was passed in 2010, under the presidency of Barack Obama. Pr.docx
makdul
 
Access the FASB website. Once you login, click the FASB Accounting S.docx
Access the FASB website. Once you login, click the FASB Accounting S.docxAccess the FASB website. Once you login, click the FASB Accounting S.docx
Access the FASB website. Once you login, click the FASB Accounting S.docx
makdul
 
Academic Paper  Overview  This performance task was intended to asse.docx
Academic Paper  Overview  This performance task was intended to asse.docxAcademic Paper  Overview  This performance task was intended to asse.docx
Academic Paper  Overview  This performance task was intended to asse.docx
makdul
 
Academic Research Team Project PaperCOVID-19 Open Research Datas.docx
Academic Research Team Project PaperCOVID-19 Open Research Datas.docxAcademic Research Team Project PaperCOVID-19 Open Research Datas.docx
Academic Research Team Project PaperCOVID-19 Open Research Datas.docx
makdul
 
AbstractVoice over Internet Protocol (VoIP) is an advanced t.docx
AbstractVoice over Internet Protocol (VoIP) is an advanced t.docxAbstractVoice over Internet Protocol (VoIP) is an advanced t.docx
AbstractVoice over Internet Protocol (VoIP) is an advanced t.docx
makdul
 
Abstract                                 Structure of Abstra.docx
Abstract                                 Structure of Abstra.docxAbstract                                 Structure of Abstra.docx
Abstract                                 Structure of Abstra.docx
makdul
 

More from makdul (20)

According to Davenport (2014) social media and health care are c.docx
According to Davenport (2014) social media and health care are c.docxAccording to Davenport (2014) social media and health care are c.docx
According to Davenport (2014) social media and health care are c.docx
 
According to (Fatehi, Gordon & Florida, N.D.) theoretical orient.docx
According to (Fatehi, Gordon & Florida, N.D.) theoretical orient.docxAccording to (Fatehi, Gordon & Florida, N.D.) theoretical orient.docx
According to (Fatehi, Gordon & Florida, N.D.) theoretical orient.docx
 
According to Libertarianism, there is no right to any social service.docx
According to Libertarianism, there is no right to any social service.docxAccording to Libertarianism, there is no right to any social service.docx
According to Libertarianism, there is no right to any social service.docx
 
According to Kirk (2016), most of your time will be spent working wi.docx
According to Kirk (2016), most of your time will be spent working wi.docxAccording to Kirk (2016), most of your time will be spent working wi.docx
According to Kirk (2016), most of your time will be spent working wi.docx
 
According to cultural deviance theorists like Cohen, deviant sub.docx
According to cultural deviance theorists like Cohen, deviant sub.docxAccording to cultural deviance theorists like Cohen, deviant sub.docx
According to cultural deviance theorists like Cohen, deviant sub.docx
 
According to Gray et al, (2017) critical appraisal is the proce.docx
According to Gray et al, (2017) critical appraisal is the proce.docxAccording to Gray et al, (2017) critical appraisal is the proce.docx
According to Gray et al, (2017) critical appraisal is the proce.docx
 
According to article Insecure Policing Under Racial Capitalism by.docx
According to article Insecure Policing Under Racial Capitalism by.docxAccording to article Insecure Policing Under Racial Capitalism by.docx
According to article Insecure Policing Under Racial Capitalism by.docx
 
Abstract In this experiment, examining the equivalence poi.docx
Abstract  In this experiment, examining the equivalence poi.docxAbstract  In this experiment, examining the equivalence poi.docx
Abstract In this experiment, examining the equivalence poi.docx
 
ACC 403- ASSIGNMENT 2 RUBRIC!!!Points 280Assignment 2 Audi.docx
ACC 403- ASSIGNMENT 2 RUBRIC!!!Points 280Assignment 2 Audi.docxACC 403- ASSIGNMENT 2 RUBRIC!!!Points 280Assignment 2 Audi.docx
ACC 403- ASSIGNMENT 2 RUBRIC!!!Points 280Assignment 2 Audi.docx
 
ACC 601 Managerial Accounting Group Case 3 (160 points) .docx
ACC 601 Managerial Accounting Group Case 3 (160 points) .docxACC 601 Managerial Accounting Group Case 3 (160 points) .docx
ACC 601 Managerial Accounting Group Case 3 (160 points) .docx
 
Academic Integrity A Letter to My Students[1] Bill T.docx
Academic Integrity A Letter to My Students[1]  Bill T.docxAcademic Integrity A Letter to My Students[1]  Bill T.docx
Academic Integrity A Letter to My Students[1] Bill T.docx
 
Access the Center for Disease Control and Prevention’s (CDC’s) Nu.docx
Access the Center for Disease Control and Prevention’s (CDC’s) Nu.docxAccess the Center for Disease Control and Prevention’s (CDC’s) Nu.docx
Access the Center for Disease Control and Prevention’s (CDC’s) Nu.docx
 
According to DSM 5 This patient had very many symptoms that sugg.docx
According to DSM 5 This patient had very many symptoms that sugg.docxAccording to DSM 5 This patient had very many symptoms that sugg.docx
According to DSM 5 This patient had very many symptoms that sugg.docx
 
Acceptable concerts include professional orchestras, soloists, jazz,.docx
Acceptable concerts include professional orchestras, soloists, jazz,.docxAcceptable concerts include professional orchestras, soloists, jazz,.docx
Acceptable concerts include professional orchestras, soloists, jazz,.docx
 
ACA was passed in 2010, under the presidency of Barack Obama. Pr.docx
ACA was passed in 2010, under the presidency of Barack Obama. Pr.docxACA was passed in 2010, under the presidency of Barack Obama. Pr.docx
ACA was passed in 2010, under the presidency of Barack Obama. Pr.docx
 
Access the FASB website. Once you login, click the FASB Accounting S.docx
Access the FASB website. Once you login, click the FASB Accounting S.docxAccess the FASB website. Once you login, click the FASB Accounting S.docx
Access the FASB website. Once you login, click the FASB Accounting S.docx
 
Academic Paper  Overview  This performance task was intended to asse.docx
Academic Paper  Overview  This performance task was intended to asse.docxAcademic Paper  Overview  This performance task was intended to asse.docx
Academic Paper  Overview  This performance task was intended to asse.docx
 
Academic Research Team Project PaperCOVID-19 Open Research Datas.docx
Academic Research Team Project PaperCOVID-19 Open Research Datas.docxAcademic Research Team Project PaperCOVID-19 Open Research Datas.docx
Academic Research Team Project PaperCOVID-19 Open Research Datas.docx
 
AbstractVoice over Internet Protocol (VoIP) is an advanced t.docx
AbstractVoice over Internet Protocol (VoIP) is an advanced t.docxAbstractVoice over Internet Protocol (VoIP) is an advanced t.docx
AbstractVoice over Internet Protocol (VoIP) is an advanced t.docx
 
Abstract                                 Structure of Abstra.docx
Abstract                                 Structure of Abstra.docxAbstract                                 Structure of Abstra.docx
Abstract                                 Structure of Abstra.docx
 

Recently uploaded

1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
Celine George
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 

Recently uploaded (20)

1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 

A Comprehensive Approach to Data Warehouse TestingMatteo G.docx

  • 1. A Comprehensive Approach to Data Warehouse Testing Matteo Golfarelli DEIS - University of Bologna Via Sacchi, 3 Cesena, Italy [email protected] Stefano Rizzi DEIS - University of Bologna VIale Risorgimento, 2 Bologna, Italy [email protected] ABSTRACT Testing is an essential part of the design life-cycle of any software product. Nevertheless, while most phases of data warehouse design have received considerable attention in the literature, not much has been said about data warehouse testing. In this paper we introduce a number of data mart- specific testing activities, we classify them in terms of what is tested and how it is tested, and we discuss how they can be framed within a reference design methodology. Categories and Subject Descriptors H.4.2 [Information Systems Applications]: Types of Systems—Decision support; D.2.5 [Software Engineering]: Testing and Debugging General Terms
  • 2. Verification, Design Keywords data warehouse, testing 1. INTRODUCTION Testing is an essential part of the design life-cycle of any software product. Needless to say, testing is especially crit- ical to success in data warehousing projects because users need to trust in the quality of the information they access. Nevertheless, while most phases of data warehouse design have received considerable attention in the literature, not much has been said about data warehouse testing. As agreed by most authors, the difference between test- ing data warehouse systems and generic software systems or even transactional systems depends on several aspects [21, 23, 13]: • Software testing is predominantly focused on program code, while data warehouse testing is directed at data and information. As a matter of fact, the key to data Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. DOLAP’09, November 6, 2009, Hong Kong, China.
  • 3. Copyright 2009 ACM 978-1-60558-801-8/09/11 ...$10.00. warehouse testing is to know the data and what the answers to user queries are supposed to be. • Differently from generic software systems, data ware- house testing involves a huge data volume, which sig- nificantly impacts performance and productivity. • Data warehouse testing has a broader scope than soft- ware testing because it focuses on the correctness and usefulness of the information delivered to users. In fact, data validation is one of the main goals of data warehouse testing. • Though a generic software system may have a large number of different use scenarios, the valid combina- tions of those scenarios are limited. On the other hand, data warehouse systems are aimed at supporting any views of data, so the possible combinations are virtu- ally unlimited and cannot be fully tested. • While most testing activities are carried out before de- ployment in generic software systems, data warehouse testing activities still go on after system release. • Typical software development projects are self-contained. Data warehousing projects never really come to an end; it is very difficult to anticipate future requirements for the decision-making process, so only a few require- ments can be stated from the beginning. Besides, it is almost impossible to predict all the possible types of errors that will be encountered in real operational data. For this reason, regression testing is inherently involved.
  • 4. Like for most generic software systems, different types of tests can be devised for data warehouse systems. For in- stance, it is very useful to distinguish between unit test, a white-box test performed on each individual component considered in isolation from the others, and integration test, a black-box test where the system is tested in its entirety. Also regression test, that checks that the system still func- tions correctly after a change has occurred, is considered to be very important for data warehouse systems because of their ever-evolving nature. However, the peculiar character- istics of data warehouse testing and the complexity of data warehouse projects ask for a deep revision and contextualiza- tion of these test types, aimed in particular at emphasizing the relationships between testing activities on the one side, design phases and project documentation on the other. From the methodological point of view we mention that, while testing issues are often considered only during the very 17 last phases of data warehouse projects, all authors agree that advancing an accurate test planning to the early projects phases is one of the keys to success. The main reason for this is that, as software engineers know very well, the ear- lier an error is detected in the software design cycle, the cheapest correcting that error is. Besides, planning early testing activities to be carried out during design and before implementation gives project managers an effective way to regularly measure and document the project progress state. Since the correctness of a system can only be measured with reference to a set of requirements, a successful test- ing begins with the gathering and documentation of end-
  • 5. user requirements [8]. Since most end-users requirements are about data analysis and data quality, it is inevitable that data warehouse testing primarily focuses on the ETL process on the one hand (this is sometimes called back-end testing [23]), on reporting and OLAP on the other (front-end testing [23]). While back-end testing aims at ensuring that data loaded into the data warehouse are consistent with the source data, front-end testing aims at verifying that data are correctly navigated and aggregated in the available reports. From the organizational point of view, several roles are in- volved with testing [8]. Analysts draw conceptual schemata, that represent the users requirements to be used as a refer- ence for testing. Designers are responsible for logical schemata of data repositories and for data staging flows, that should be tested for efficiency and robustness. Testers develop and ex- ecute test plans and scripts. Developers perform white box unit tests. Database administrators test for performance and stress, and set up test environments. Finally, end-users perform functional tests on reporting and OLAP front-ends. In this paper we propose a comprehensive approach to testing data warehouse systems. More precisely, consider- ing that data warehouse systems are commonly built in a bottom-up fashion, by iteratively designing and implement- ing one data mart at a time, we will focus on the test of a single data mart. The main features of our approach can be summarized as follows: • A consistent portion of the testing effort is advanced to the design phase to reduce the impact of error cor- rection. • A number of data mart-specific testing activities are identified and classified in terms of what is tested and how it is tested.
  • 6. • A tight relationship is established between testing ac- tivities and design phases within the framework of a reference methodological approach to design. • When possible, testing activities are related to quality metrics to allow their quantitative assessment. The remainder of the paper is organized as follows. Af- ter briefly reviewing the related literature in Section 2, in Section 3 we propose the reference methodological frame- work. Then, Section 4 classifies and describes the data mart- specific testing activities we devised, while Section 5 briefly discusses some issues related to test coverage. Section 6 pro- poses a timeline for testing in the framework of the reference design methodology, and Section 7 summarizes the lessons we learnt. 2. RELATED WORKS The literature on software engineering is huge, and it in- cludes a detailed discussion of different approaches to the testing of software systems (e.g., see [16, 19]). However, only a few works discuss the issues raised by testing in the data warehousing context. [13] summarizes the main challenges of data warehouse and ETL testing, and discusses their phases and goals dis- tinguishing between retrospective and prospective testing. It also proposes to base part of the testing activities (those related to incremental load) on mock data. In [20] the authors propose a basic set of attributes for a data warehouse test scenario based on the IEEE 829 stan- dard. Among these attributes, the test purpose, its perfor- mance requirements, its acceptance criteria, and the activi-
  • 7. ties for its completion. [2] proposes a process for data warehouse testing centered on a unit test phase, an integration test phase, and a user acceptance test phase. [21] reports eight classical mistakes in data warehouse testing; among these: not closely involving end users, testing reports rather than data, skipping the comparison between data warehouse data and data source data, and using only mock data. Besides, unit testing, system testing, acceptance testing, and performance testing are proposed as the main testing steps. [23] explains the differences between testing OLTP and OLAP systems and proposes a detailed list of testing cate- gories. It also enumerates possible test scenarios and differ- ent types of data to be used for testing. [8] discusses the main aspects in data warehouse testing. In particular, it distinguishes the different roles required in the testing team, and the different types of testing each role should carry out. [4] presents some lessons learnt on data warehouse testing, emphasizing the role played by constraint testing, source-to- target testing, and validation of error processing procedures. All papers mentioned above provide useful hints and list some key testing activities. On the other hand, our paper is the first attempt to define a comprehensive framework for data mart testing. 3. METHODOLOGICAL FRAMEWORK To discuss how testing relates to the different phases of
  • 8. data mart design, we adopt as a methodological framework the one described in [7]. As sketched in Figure 1, this frame- work includes eight phases: • Requirement analysis: requirements are elicited from users and represented either informally by means of proper glossaries or formally (e.g., by means of goal- oriented diagrams as in [5]); • Analysis and reconciliation: data sources are inspected, normalized, and integrated to obtain a reconciled schema; • Conceptual design: a conceptual schema for the data mart –e.g., in the form of a set of fact schemata [6]– is designed considering both user requirements and data available in the reconciled schema; • Workload refinement: the preliminary workload ex- pressed by users is refined and user profiles are singled out, using for instance UML use case diagrams; 18 Figure 1: Reference design methodology • Logical design: a logical schema for the data mart (e.g., in the form of a set of star schemata) is obtained by properly translating the conceptual schema; • Data staging design: ETL procedures are designed con- sidering the source schemata, the reconciled schema, and the data mart logical schema; and • Physical design: this includes index selection, schema
  • 9. fragmentation, and all other issues related to physical allocation. • Implementation: this includes implementation of ETL procedures and creation of front-end reports. Note that this methodological framework is general enough to host both supply-driven, demand-driven, and mixed ap- proaches to design. While in a supply-driven approach con- ceptual design is mainly based on an analysis of the available source schemata [6], in a demand-driven approach user re- quirements are the driving force of conceptual design [3]. Fi- nally, in a mixed approach requirement analysis and source schema analysis are carried out in parallel, and requirements are used to actively reduce the complexity of source schema analysis [1]. 4. TESTING ACTIVITIES In order to better frame the different testing activities, we identify two distinct, though not independent, classification coordinates: what is tested and how it is tested. As concerns the first coordinate, what, we already men- tioned that testing data quality is undoubtedly at the core of data warehouse testing. Testing data quality mainly entails an accurate check on the correctness of the data loaded by ETL procedures and accessed by front-end tools. However, in the light of the complexity of data warehouse projects and of the close relationship between good design and good per- formance, we argue that testing the design quality is almost equally important. Testing design quality mainly implies verifying that user requirements are well expressed by the conceptual schema of the data mart and that the conceptual and logical schemata are well-built. Overall, the items to be
  • 10. tested can then be summarized as follows: • Conceptual schema: it describes the data mart from an implementation-independent point of view, specifying the facts to be monitored, their measures, the hierar- chies to be used for aggregation. Our methodological framework adopts the Dimensional Fact Model to this end [6]. • Logical schema: it describes the structure of the data repository at the core of the data mart. If the im- plementation target is a ROLAP platform, the logical schema is actually a relational schema (typically, a star schema or one of its variants). • ETL procedures: the complex procedures that are in charge of feeding the data repository starting from data sources. • Database: the repository storing data. • Front-end: the applications accessed by end-users to analyze data; typically, these are either static reporting tools or more flexible OLAP tools. As concerns the second coordinate, how, the eight types of test that, in our experience, best fit the characteristics of data warehouse systems are summarized below: • Functional test: it verifies that the item is compliant with its specified business requirements. • Usability test: it evaluates the item by letting users interact with it, in order to verify that the item is easy to use and comprehensible.
  • 11. • Performance test: it checks that the item performance is satisfactory under typical workload conditions. • Stress test: it shows how well the item performs with peak loads of data and very heavy workloads. • Recovery test: it checks how well an item is able to re- cover from crashes, hardware failures and other similar problems. • Security test: it checks that the item protects data and maintains functionality as intended. • Regression test: It checks that the item still functions correctly after a change has occurred. Remarkably, these types of test are tightly related to six of the software quality factors described in [12]: correctness, usability, efficiency, reliability, integrity, flexibility. The relationship between what and how is summarized in Table 1, where each check mark points out that a given type of test should be applied to a given item. Starting from this table, in the following subsections we discuss the main test- ing activities and how they are related to the methodological framework outlined in Section 3. 19 Table 1: What vs. how in testing C o n
  • 13. c e d u re s D a ta b a se F ro n t- e n d Functional X X X X Usability X X X Performance X X X X Stress X X X Recovery X X Security X X X
  • 14. Regression X X X X X Analysis & design Implementation We preliminarily remark that a requirement for an effec- tive test is the early definition, for each testing activity, of the necessary conditions for passing the test. These condi- tions should be verifiable and quantifiable. This means that proper metrics should be introduced, together with their ac- ceptance thresholds, so as to get rid of subjectivity and am- biguity issues. Using quantifiable metrics is also necessary for automating testing activities. While for some types of test, such as performance tests, the metrics devised for generic software systems (such as the maximum query response time) can be reused and ac- ceptance thresholds can be set intuitively, for other types of test data warehouse-specific metrics are needed. Unfortu- nately, very few data warehouse-specific metrics have been defined in the literature. Besides, the criteria for setting their acceptance thresholds often depend on the specific fea- tures of the project being considered. So, in most cases, ad hoc metrics will have to be defined together their thresholds. An effective approach to this activity hinges on the following main phases [18]: 1. Identify the goal of the metrics by specifying the un- derlying hypotheses and the quality criteria to be mea- sured. 2. Formally define the metrics. 3. Theoretically validate the metrics using either axiomatic approaches or measurement theory-based approaches, to get a first assessment of the metrics correctness and applicability.
  • 15. 4. Empirically validate the metrics by applying it to data of previous projects, in order to get a practical proof of the metrics capability of measuring their goal quality criteria. This phase is also aimed at understanding the metrics implications and fixing the acceptance thresh- olds. 5. Apply and accredit the metrics. The metrics defini- tions and thresholds may evolve in time to adapt to new projects and application domains. 4.1 Testing the Conceptual Schema Software engineers know very well that the earlier an er- ror is detected in the software design cycle, the cheapest correcting that error is. One of the advantages of adopt- ing a data warehouse methodology that entails a conceptual design phase is that the conceptual schema produced can be thoroughly tested for user requirements to be effectively supported. We propose two main types of test on the data mart con- ceptual schema in the scope of functional testing. The first, that we call fact test, verifies that the workload preliminar- ily expressed by users during requirement analysis is actu- ally supported by the conceptual schema. This can be easily achieved by checking, for each workload query, that the re- quired measures have been included in the fact schema and that the required aggregation level can be expressed as a valid grouping set on the fact schema. We call the second type of test a conformity test, because it is aimed at as- sessing how well conformed hierarchies have been designed. This test can be carried out by measuring the sparseness of the bus matrix [7], which associates each fact with its dimensions, thus pointing out the existence of conformed hierarchies. Intuitively, if the bus matrix is very sparse,
  • 16. the designer probably failed to recognize the semantic and structural similarities between apparently different hierar- chies. Conversely, if the bus matrix is very dense, the de- signer probably failed to recognize the semantic and struc- tural similarities between apparently different facts. Other types of test that can be executed on the conceptual schema are related to its understandability, thus falling in the scope of usability testing. An example of a quantitative approach to this test is the one described in [18], where a set of metrics for measuring the quality of a conceptual schema from the point of view of its understandability are proposed and validated. Example of these metrics are the average number of levels per hierarchy and the number of measures per fact. 4.2 Testing the Logical Schema Testing the logical schema before it is implemented and before ETL design can dramatically reduce the impact of errors due to bad logical design. An effective approach to functional testing consists in verifying that a sample of queries in the preliminary workload can correctly be for- mulated in SQL on the logical schema. We call this the star test. In putting the sample together, priority should be given to the queries involving irregular portions of hier- archies (e.g., those including many-to-many associations or cross-dimensional attributes), those based on complex ag- gregation schemes (e.g., queries that require measures ag- gregated through different operators along the different hi- erarchies), and those leaning on non-standard temporal sce- narios (such as yesterday-for-today). In the scope of usability testing, in [17] some simple met- rics based on the number of fact tables and dimension ta- bles in a logical schema are proposed. These metrics can be
  • 17. adopted to effectively capture schema understandability. Finally, a performance test can be carried out on the log- ical schema by checking to what extent it is compliant with the multidimensional normal forms, that ensure summariz- ability and support an efficient database design [11, 10]. Besides the above-mentioned metrics, focused on usabil- ity and performance, the literature proposes other metrics 20 for database schemata, and relates them to abstract quality factors. For instance, in the scope of maintainability, [15] introduces a set of metrics aimed at evaluating the quality of a data mart logical schema with respect to its ability to sustain changes during an evolution process. 4.3 Testing the ETL Procedures ETL testing is probably the most complex and critical testing phase, because it directly affects the quality of data. Since ETL is heavily code-based, most standard techniques for generic software system testing can be reused here. A functional test of ETL is aimed at checking that ETL procedures correctly extract, clean, transform, and load data into the data mart. The best approach here is to set up unit tests and integration tests. Unit tests are white-box test that each developer carries out on the units (s)he devel- oped. They allow for breaking down the testing complexity, and they also enable more detailed reports on the project progress to be produced. Units for ETL testing can be ei- ther vertical (one test unit for each conformed dimension,
  • 18. plus one test unit for each group of correlated facts) or hor- izontal (separate tests for static extraction, incremental ex- traction, cleaning, transformation, static loading, incremen- tal loading, view update); the most effective choice mainly depends on the number of facts in the data marts, on how complex cleaning and transformation are, and on how the implementation plan was allotted to the developers. In par- ticular, crucial aspects to be considered during the loading test are related to both dimension tables (correctness of roll- up functions, effective management of dynamic hierarchies and irregular hierarchies), fact tables (effective management of late updates), and materialized views (use of correct ag- gregation functions). After unit tests have been completed, an integration test allows the correctness of data flows in ETL procedures to be checked. Different quality dimensions, such as data coher- ence (the respect of integrity constraints), completeness (the percentage of data found), and freshness (the age of data) should be considered. Some metrics for quantifying these quality dimensions have been proposed in [22]. During requirement analysis the designer, together with users and database administrators, should have singled out and ranked by their gravity the most common causes of faulty data, aimed at planning a proper strategy for dealing with ETL errors. Common strategies for dealing with errors of a given kind are “automatically clean faulty data”, “reject faulty data”, “hand faulty data to data mart administrator”, etc. So, not surprisingly, a distinctive feature of ETL func- tional testing is that it should be carried out with at least three different databases, including respectively (i) correct and complete data, (ii) data simulating the presence of faulty data of different kinds, and (iii) real data. In particular, tests using dirty simulated data are sometimes called forced-error tests: they are designed to force ETL procedures into error
  • 19. conditions aimed at verifying that the system can deal with faulty data as planned during requirement analysis. Performance and stress tests are complementary in as- sessing the efficiency of ETL procedures. Performance tests evaluate the behavior of ETL with reference to a routine workload, i.e., when a domain-typical amount of data has to be extracted and loaded; in particular, they check that the processing time be compatible with the time frames ex- pected for data-staging processes. On the other hand, stress tests simulate an extraordinary workload due to a signifi- cantly larger amount of data. The recovery test of ETL checks for robustness by simu- lating faults in one or more components and evaluating the system response. For example, you can cut off the power supply while an ETL process is in progress or you can set a database offline while an OLAP session is in progress to check for restore policies’ effectiveness. As concerns ETL security, there is a need for verifying that the database used to temporarily store the data being processed (the so-called data staging area) cannot be vio- lated, and that the network infrastructure hosting the data flows that connect the data sources and the data mart is secure. 4.4 Testing the Database We assume that the logical schema quality has already been verified during the logical schema tests, and that all is- sues related to data quality are in charge of ETL tests. Then, database testing is mainly aimed at checking the database performances using either standard (performance test) or heavy (stress test) workloads. Like for ETL, the size of the
  • 20. tested databases and their data distribution must be dis- cussed with the designers and the database administrator. Performance tests can be carried out either on a database including real data or on a mock database, but the database size should be compatible with the average expected data volume. On the other hand, stress tests are typically car- ried out on mock databases whose size is significantly larger than what expected. Standard database metrics –such as maximum query response time– can be used to quantify the test results. To advance these testing activities as much as possible and to make their results independent of front-end applications, we suggest to use SQL to code the workload. Recovery tests enable testers to verify the DBMS behav- ior after critical errors such as power leaks during update, network fault, and hard disk failures. Finally, security tests mainly concern the possible adop- tion of some cryptography technique to protect data and the correct definition of user profiles and database access grants. 4.5 Testing the Front-End Functional testing of the analysis front-ends must neces- sarily involve a very large number of end-users, who gener- ally are so familiar with application domains that they can detect even the slightest abnormality in data. Nevertheless, wrong results in OLAP analyses may be difficult to recog- nize. They can be caused not only by faulty ETL proce- dures, but even by incorrect data aggregations or selections in front-end tools. Some errors are not due to the data mart; instead, they result from the overly poor data quality of the source database. In order to allow this situation to be rec- ognized, a common approach to front-end functional testing in real projects consists in comparing the results of OLAP analyses with those obtained by directly querying the source
  • 21. databases. Of course, though this approach can be effective on a sample basis, it cannot be extensively adopted due to the huge number of possible aggregations that characterize multidimensional queries. A significant sample of queries to be tested can be selected in mainly two ways. In the “black-box way”, the workload specification obtained in output by the workload refinement phase (typically, a use case diagram where actors stand for 21 Table 2: Coverage criteria for some testing activities; the expected coverage is expressed with reference to the coverage criterion Testing activity Coverage criterion Measurement Expected coverage fact test each information need expressed by users during requirement anal- ysis must be tested percentage of queries in the prelim- inary workload that are supported by the conceptual schema partial, depending on the ex- tent of the preliminary work- load conformity test all data mart dimensions must be
  • 22. tested bus matrix sparseness total usability test of the conceptual schema all facts, dimensions, and measures must be tested conceptual metrics total ETL unit test all decision points must be tested correct loading of the test data sets total ETL forced-error test all error types specified by users must be tested correct loading of the faulty data sets total front-end unit test at least one group-by set for each attribute in the multidimensional lattice of each fact must be tested correct analysis result of a real data set total user profiles and use cases represent the most frequent anal- ysis queries) is used to determine the test cases, much like
  • 23. use case diagrams are profitably employed for testing-in-the- large in generic software systems. In the “white-box way”, instead, the subset of data aggregations to be tested can be determined by applying proper coverage criteria to the mul- tidimensional lattice1 of each fact, much like decision, state- ment, and path coverage criteria are applied to the control graph of a generic software procedure during testing-in-the- small. Also in front-end testing it may be useful to distinguish between unit and integration tests. While unit tests should be aimed at checking the correctness of the reports involving single facts, integration tests entail analysis sessions that ei- ther correlate multiple facts (the so-called drill-across queries) or take advantage of different application components. For example, this is the case when dashboard data are “drilled through” an OLAP application. An integral part of front-end tests are usability tests, that check for OLAP reports to be suitably represented and com- mented to avoid any misunderstanding about the real mean- ing of data. A performance test submits a group of concurrent queries to the front-end and checks for the time needed to process those queries. Performance tests imply a preliminary spec- ification of the standard workload in terms of number of concurrent users, types of queries, and data volume. On the other hand, stress tests entails applying progressively heav- ier workloads than the standard loads to evaluate the system stability. Note that performance and stress tests must be focused on the front-end, that includes the client interface but also the reporting and OLAP server-side engines. This means that the access times due to the DBMS should be subtracted from the overall response times measured.
  • 24. Finally, in the scope of security test, it is particularly im- portant to check for user profiles to be properly set up. You should also check for single-sign-on policies to be set up prop- erly after switching between different analysis applications. 4.6 Regression Tests A very relevant problem for frequently updated systems, 1The multidimensional lattice of a fact is the lattice whose nodes and arcs correspond, respectively, to the group-by sets supported by that fact and to the roll-up relationships that relate those group-by sets. such as data marts, concerns checking for new components and new add-on features to be compatible with the operation of the whole system. In this case, the term regression test is used to define the testing activities carried out to make sure that any change applied to the system does not jeopardize the quality of preexisting, already tested features and does not corrupt the system performances. Testing the whole system many times from scratch has huge costs. Three main directions can be followed to reduce these costs: • An expensive part of each test is the validation of the test results. In regression testing, it is often possible to skip this phase by just checking that the test results are consistent with those obtained at the previous it- eration. • Test automation allows the efficiency of testing ac- tivities to be increased, so that reproducing previous tests becomes less expensive. Test automation will be briefly discussed in Section 6.
  • 25. • Impact analysis can be used to significantly restrict the scope of testing. In general, impact analysis is aimed at determining what other application objects are affected by a change in a single application ob- ject [9]. Remarkably, some ETL tool vendors already provide some impact analysis functionalities. An ap- proach to impact analysis for changes in the source data schemata is proposed in [14]. 5. TEST COVERAGE Testing can reduce the probability of a system fault but cannot set it to zero, so measuring the coverage of tests is necessary to assess the overall system reliability. Measuring test coverage requires first of all the definition of a suitable coverage criterion. Different coverage criteria, such as state- ment coverage, decision coverage, and path coverage, were devised in the scope of code testing. The choice of one or another criterion deeply affects the test length and cost, as well as the achievable coverage. So, coverage criteria are chosen by trading off test effectiveness and efficiency. Ex- amples of coverage criteria that we propose for some of the testing activities described above are reported in Table 2. 22 Requirement analysis Analysis and reconciliation
  • 26. Conceptual design Workload refinement Logical design Staging design Physical design ETL implementation Front -end implementation Conc. schema: conformity & usability test Conc. schema: fact test Logical schema: functional, perf. & usability test
  • 27. ETL: unit test Front -end: security test ETL: integration test Front -end: functional & usability test ETL: forced -error, perf., stress, recov. & secur. test Front -end: performance & stress test Analyst / designer Tester Developer DBA End -user Database: all tests Design and testing process
  • 28. Test planning Figure 2: UML activity diagram for design and testing 6. A TIMELINE FOR TESTING From a methodological point of view, the three main phases of testing are [13]: • Create a test plan. The test plan describes the tests that must be performed and their expected coverage of the system requirements. • Prepare test cases. Test cases enable the implementa- tion of the test plan by detailing the testing steps to- gether with their expect results. The reference databases for testing should be prepared during this phase, and a wide, comprehensive set of representative workloads should be defined. • Execute tests. A test execution log tracks each test along and its results. Figure 2 shows a UML activity diagram that frames the different testing activities within the design methodology outlined in Section 3. This diagram can be used in a project as a starting point for preparing the test plan. It is worth mentioning here that test automation plays a basic role in reducing the costs of testing activities (es- pecially regression tests) on the one hand, on increasing test coverage on the other [4]. Remarkably, commercial tools (such as QACenter by Computerware) can be used for implementation-related testing activities to simulate spe- cific workloads and analysis sessions, or to measure a process
  • 29. outcome. As to design-related testing activities, the metrics proposed in the previous sections can be measured by writ- ing ad hoc procedures that access the meta-data repository and the DBMS catalog. 7. CONCLUSIONS AND LESSONS LEARNT In this paper we proposed a comprehensive approach which adapts and extends the testing methodologies proposed for general-purpose software to the peculiarities of data ware- house projects. Our proposal builds on a set of tips and sug- 23 gestions coming from our direct experience on real projects, as well as from some interviews we made to data warehouse practitioners. As a result, a set of relevant testing activi- ties were identified, classified, and framed within a reference design methodology. In order to experiment our approach on a case study, we are currently supporting a professional design team engaged in a large data warehouse project, which will help us better focus on relevant issues such as test coverage and test doc- umentation. In particular, to better validate our approach and understand its impact, we will apply it to one out of two data marts developed in parallel, so as to assess the extra-effort due to comprehensive testing on the one hand, the saving in post-deployment error correction activities and the gain in terms of better data and design quality on the other. To close the paper, we would like to summarize the main lessons we learnt so far:
  • 30. • The chance to perform an effective test depends on the documentation completeness and accuracy in terms of collected requirements and project description. In other words, if you did not specify what you want from your system at the beginning, you cannot expect to get it right later. • The test phase is part of the data warehouse life-cycle, and it acts in synergy with design. For this reason, the test phase should be planned and arranged at the be- ginning of the project, when you can specify the goals of testing, which types of tests must be performed, which data sets need to be tested, and which quality level is expected. • Testing is not a one-man activity. The testing team should include testers, developers, designers, database administrators, and end-users, and it should be set up during the project planning phase. • Testing of data warehouse systems is largely based on data. A successful testing must rely on real data, but it also must include mock data to reproduce the most common error situations that can be encountered in ETL. Accurately preparing the right data sets is one of the most critical activities to be carried out during test planning. • No matter how deeply the system has been tested: it is almost sure that, sooner or later, an unexpected data fault, that cannot be properly handled by ETL, will oc- cur. So keep in mind that, while testing must come to an end someday, data quality certification is an ever lasting process. The borderline between testing and certification clearly depends on how precisely require-
  • 31. ment were stated and on the contract that regulates the project. 8. REFERENCES [1] A. Bonifati, F. Cattaneo, S. Ceri, A. Fuggetta, and S. Paraboschi. Designing data marts for data warehouses. ACM Transactions on Software Engineering Methodologies, 10(4):452–483, 2001. [2] K. Brahmkshatriya. Data warehouse testing. http://www.stickyminds.com, 2007. [3] R. Bruckner, B. List, and J. Schiefer. Developing requirements for data warehouse systems with use cases. In Proc. Americas Conf. on Information Systems, pages 329–335, 2001. [4] R. Cooper and S. Arbuckle. How to thoroughly test a data warehouse. In Proc. STAREAST, Orlando, 2002. [5] P. Giorgini, S. Rizzi, and M. Garzetti. GRAnD: A goal-oriented approach to requirement analysis in data warehouses. Decision Support Systems, 5(1):4–21, 2008. [6] M. Golfarelli, D. Maio, and S. Rizzi. The dimensional fact model: A conceptual model for data warehouses. International Journal of Cooperative Information Systems, 7(2-3):215–247, 1998. [7] M. Golfarelli and S. Rizzi. Data warehouse design: Modern principles and methodologies. McGraw-Hill, 2009. [8] D. Haertzen. Testing the data warehouse.
  • 32. http://www.infogoal.com, 2009. [9] R. Kimball and J. Caserta. The Data Warehouse ETL Toolkit. John Wiley & Sons, 2004. [10] J. Lechtenbörger and G. Vossen. Multidimensional normal forms for data warehouse design. Information Systems, 28(5):415–434, 2003. [11] W. Lehner, J. Albrecht, and H. Wedekind. Normal forms for multidimensional databases. In Proc. SSDBM, pages 63–72, Capri, Italy, 1998. [12] J. McCall, P. Richards, and G. Walters. Factors in software quality. Technical Report AD-A049-014, 015, 055, NTIS, 1977. [13] A. Mookerjea and P. Malisetty. Best practices in data warehouse testing. In Proc. Test, New Delhi, 2008. [14] G. Papastefanatos, P. Vassiliadis, A. Simitsis, and Y. Vassiliou. What-if analysis for data warehouse evolution. In Proc. DaWaK, pages 23–33, Regensburg, Germany, 2007. [15] G. Papastefanatos, P. Vassiliadis, A. Simitsis, and Y. Vassiliou. Design metrics for data warehouse evolution. In Proc. ER, pages 440–454, 2008. [16] R. Pressman. Software Engineering: A practitioner’s approach. The McGraw-Hill Companies, 2005. [17] M. Serrano, C. Calero, and M. Piattini. Experimental validation of multidimensional data models metrics. In Proc. HICSS, page 327, 2003.
  • 33. [18] M. Serrano, J. Trujillo, C. Calero, and M. Piattini. Metrics for data warehouse conceptual models understandability. Information & Software Technology, 49(8):851–870, 2007. [19] I. Sommerville. Software Engineering. Pearson Education, 2004. [20] P. Tanuška, W. Verschelde, and M. Kopček. The proposal of data warehouse test scenario. In Proc. ECUMICT, Gent, Belgium, 2008. [21] A. van Bergenhenegouwen. Data warehouse testing. http://www.ti.kviv.be, 2008. [22] P. Vassiliadis, M. Bouzeghoub, and C. Quix. Towards quality-oriented data warehouse usage and evolution. In Proc. CAiSE, Heidelberg, Germany, 1999. [23] Vv. Aa. Data warehouse testing and implementation. In Intelligent Enterprise Encyclopedia. BiPM Institute, 2009. 24 CONFIDENTIAL [Dave and Jacky Delicious Incorporation]
  • 34. Business Plan Prepared May 17th, 2020 Table of Contents Executive Summary 1 Opportunity 1 Expectations 1 Opportunity 3 Problem & Solution
  • 35. 3 Target Market 3 Competition 3 Execution 4 Marketing & Sales 4 Operations 4 Milestones And Metrics 5 Company 6 Overview 6 Team 6 Financial Plan 7 Forecast 7 Financing 9
  • 36. Statements 10 Appendix 13 Profit and Loss Statement 13 Balance Sheet 15 Cash Flow Statement 17 Executive Summary Opportunity
  • 37. Problem Summary INSTRUCTIONS: Describe very briefly why your business needs to exist. What problem do you solve for your customers?