The document discusses three papers related to data warehouse design.
Paper 1 presents the X-META methodology, which addresses developing a first data warehouse project and integrates metadata creation and management into the development process. It proposes starting with a pilot project and defines three iteration types.
Paper 2 proposes extending the ER conceptual data model to allow modeling of multi-dimensional aggregated entities. It includes entity types for basic dimensions, simple aggregations, and multi-dimensional aggregated entities.
Paper 3 presents a comprehensive UML-based method for designing all phases of a data warehouse, from source data to implementation. It defines four schemas - operational, conceptual, storage, and business - and the mappings between them. It also provides steps
3. Context
In this work ,I’ll present the Data wherehouse
concept, Different methods of modeling and
advantages of each proposed methods.
For this, I worked on 3 papers talking about The
X-META (a methodology for data warehouse
design with metadata management) & gave
conceptual data models for dw design.
4. Introduction
In computing, a data warehouse (DW or DWH), also known as
an enterprise data warehouse (EDW), is a system used
for reporting and data analysis.
Date Warehousing is a powerful tool for supporting decision-
making processes in modern corporations. However, developing a
Data Warehouse (DW) is a complex and costly activity. It requires
strategies, which should be specific to the characteristics and
needs of the organization where it will be introduced.
5. Problematic
How to design a Data
wharehouse ?
What are the method
studied ?
Which are the steps to
design a DW?
6. Papers
A DataWarehouse Conceptual Data Model : Enrico
Franconi and Anand Kamble ( Faculty of Computer Science, Free
University of Bozen-Bolzano, Italy )
X-META: A Methodology for Data Warehouse Design
with Metadata Management : Liane Carneiro - Angelo
Brayner ( University of Fortaleza – UNIFOR Fortaleza - Ceará,
Brazil )
A comprehensive Method for Data warehouse design :
Sergio Lujan - Mora and Juan Trujillo ( Departement of Software
and Computing Systems University of Alicante Spain )
7. Paper 1 : X-META
Abstract:
the contributions of this paper are
twofold. First of all, it proposes a
methodology, which addresses the
development of DW projects.
Second, it presents a strategy to
cope with metadata management
integrated to the DW
development process.
Related Works:
the existent methodologies are
normally addressed to generic
environments. Thus, they do
not address DW development in
organizations that need a
methodology adapted to their
characteristics and
expectations
8. Metadata Creation & Management
Metadata is “Data that provides information about
other data". Two types of metadata exist:
structural metadata & descriptive metadata.
Structural metadata is data about the containers of
data.
Descriptive metadata uses individual instances of
application data or the data content.
9. Metadata Creation & Management
Metadata should exist throughout the development
process and during all the useful life of the DW.
Therefore, metadata, in a DW project, play a key role.
It is a big challenge to many DW projects, because
there exists much heterogeneity among tools and
products for creating and managing metadata in a
Data Warehounsing environment.
10. X-META: A METHODOLOGY FOR DW
DESIGN
The key idea of the proposed
methodology is to start with a
DW pilot project in order to
introduce experience in
constructing a DW into the
internal team and to prove the
viability and the DW
importance to an organization.
The overall methodology
lifecycle is divided into 5 major
phases, as depicted in figure 1.
11. The X-Meta methodology defines three distinct iteration types,
each one with its own specific purpose:
(3) DM/DW Project – In this
level, many pilot projects may
have already been developed
and many uncertainties have
already been eliminated.
Besides, the development
process in this iteration type,
which corresponds to DM/DW
projects, uses the same
methodology used in the pilot
project iteration type.
(2) Pilot Project – it allows the
incremental and evolutionary
development of pilot projects, each
of which with its own purpose, for
instance: testing products,
acquiring experience, developing
metadata repositories, etc. Each
pilot project represents an
iteration, which begins in the
Planning phase and uses one or
more input from the DM/DW
Construction Cycle phase,
depending on the evolutionary
level and on the project goal.
(1) First prototype – it has
the main goal of allowing
the insertion of the DW
technology in the
organization. This iteration,
which is executed only once
in the DW development
process, uses only the
Introduction phase in the
development of the first pilot
project in the organization;
X-META: A METHODOLOGY FOR DW
DESIGN
12. A presented example of possible projects to be developed in an organization using
the methodology presented in this paper. Each spiral represents a project and each
project uses a specific iteration type.
X-META: A METHODOLOGY FOR DW
DESIGN
13. Metadata modeling
It has the goal of defining and controlling the activities related to
the metadata, supplying corporative directions for its creation, use
and maintenance.
there is a list of activities to be performed in this module. Those
activities have the goal of producing a metadata model that will be
used during the construction of a metadata repository and during
the metadata management phase:
Define the metadata management team and its responsibilities;
Define and classify metadata types that will be stored;
Define user types a repository and the equivalent access levels;
Determine the metadata sources in the organization
Define and construct the metadata model and metadata flow.
14. Metadata infrastructure
The metadata repository is an infrastructure component of the DW
environment that is useful to all other environment components,
working as a tool to help in the integration. The goal of this module is to
define the general metadata infrastructure in the organization.
The activities belonging to this module are:
Construction of an architecture for the metadata (centralized,
decentralized, distributed);
Evaluation of metadata tools in order to select the best one to achieve
the project goals.
Physical implementation of the metadata repository;
Defining security procedures.
15. Abstract :
The goal of the work is to extend the standard ER conceptual data
model, as defined in the database textbooks, with constructs which
allow the modelling of multi-dimensional aggregated entities together
with their interrelationships with the other parts of the conceptual
schema.
This work is also based on a similar preliminary work done by one
author on the use of Description Logics as a mean to give precise
semantics to a data warehouse conceptual data model and to study its
computational properties.
Paper 2 : A DataWarehouse Conceptual
Data Model
17. Explication
the entity Calls represents a basic cube whose dimensions are Date,
Destination, Source, which are restricted to the basic levels Day, Point, and
again Point, respectively.
A first extension of the language can be seen with the simple aggregated
entities, non-dimensional aggregations—Weekday and Customertype, which
represent dimensional levels built from the basic dimensional entities Day
and Point, respectively.
A simple aggregations aggregate the collections of objects that are in the
extension of the aggregated entities. So, in our example, since the entities
Mon, . . ., Sun form a partition of the entity Day, the Weekday entity denotes
exactly seven objects, one for all the Mondays, one for all the Tuesdays,etc.
18. On the other hand, the aggregated entity Customertype denotes
exactly two objects, consumer and business. In this way, by
interleaving partitioning and simple aggregations, we are able to
construct level hierarchies starting from some basic dimensional
level.
A second extension of the language is the multidimensional
aggregated entity, by the entity Calls-by-Weekday-and-
Customertype. This entity denotes all the cells of a cube whose
coordinates are the weekdays of the date of the calls, and the
customer types of the originators of the calls.
A multi-dimensional aggregated entity is an entity itself in the ER
diagram, and it can have attributes and can be part of further
relationships or constraints.
Explication
19. Paper 3 : A Comprehensive Method For
DW Design
Abstract:
Various methods and appraches
have been presented for designing
different parts of DW,but, no
general and standard method exists
to date for dealing with the whole
design of a DW .In this paper we
fill this gap by presenting a method
based on UML allows the user to
tackle all DW design phases and
steps from the the operational data
source to the final implementation
and including the definition of ETL
Processes .
Related Works:
Different case of studies of data
marts (DM) are presented.The MD
modeling is based in the use of the
schema and it’s different varaitions.
Most recently , a method for the
DW is proposed which basn a MD
model called IDEA and it proposes
a set of steps to adress the
conceptual , logical and physical
design of a DW.
20. Overview of the method
We have adopted the OO paradigm because it is
semantically richer that others and it offers
numerous advantages.
The design of a DW is a joint effort of DW
developpers ( Technical user ) and final
user ( the user who are only interested in
the business content ).
Therefore, a powerfull method with the correspond
models is needed and we believe the OO paradigm
is the best approach for the DW design .
21. Design for a data wharehouse
Overview of the method
22. Design for a data wharehouse
The architecture of DW is depicted as various layers ( every layer is derived
from the previous layer ) .
Following this consideration , we consider that the developpement of a
DW can be structered into an integrated model with four different shemas :
o Operational Data Shemas (ODS) : Defines the structure of the operational
and external data sources.
o DW Conceptual Schema (DWCS) : Defines the conceptual shema of DW .
o DW Storage Shema (DWSS) : Defines the physical storage of the DW
depending on the target platform .
o Business model (BM) : it defines the different ways or view of accessing the
DW from final user’ point of view .
Overview of the method
23. Two shcema mapping are also needed in ordre to obtain a global and
integrated DW design approach that covers the necessary shcemas:
ETL Process : Defines the mapping between the ODS & DWCS.
Exportation Process : Defines the mapping between the DWCS &
DWSS.
Design for a data wharehouse
Overview of the method
This method accomplish each one of the shcemas and mapping in an
integrated manner : We use a modeling notion based on the UML.
24. Data warehouse conceptual schema (DWCS) : The most important feature of
the MD paradigm is dividing data into facts and dimensions, to provide data
on a suitable level of granuliraty, hierarchies are defined on the dimensions.
Overview of the method
Star
Schema1
Star
Schema2
Dimension1 Dimension2
Fact 1
Dimension2
Dimension2
Level 1
Dimension2
Level 3
Dimension 2
Level 2
The UML Profile includes the use of the UML package , in this way , when
modeling complex and large DW systems, we are not restricted to use flat
UML class diagrams ans , therefore, cluttered diagrams are avoided.
Design for a data wharehouse
25. LeveL 1: Model Definition . A package represent a star schema of a
conceptual MD level. A depency between two package indicates that the
star schemas share at least one dimension.
LeveL 2: Star Schemas Definition . A package represents a fact or a
dimension of a star schema. A depency between two dimension package
indicates that the package share at least one level of a dimension
hierarchy.
LeveL 3: Dimension/fact definition. A package from the second level is
exploded into a set of classes that represent the hierarchy levels in a
dimension package ,or the whole star schema in the case of the fact
package.
Overview of the method
Design for a data wharehouse
26. How to apply the Method
We propose a set of steps to guide
the design of a DW following our
approach .
The UML diagram is devided into
two swimlanes depending ion who
leads the activities : DW final
user ( final users guide the work
of DW designers and
administrators ) and DW
designers & administrators
( they don’t need the help of final
users because all the needed
informations has been previously
gathered ).
Activities where the
schemas are created
Activities where the
mapping between
schemas are created
27. Different substeps can be achieved during requirements gathering ;
specifically, the designer has to :
1) Analysis :
*Determine the
desired data
format,level &
elements users wish
*Classify different
summaries.
*Define acces
control & security
rules.
*Help the final user
to understand wht
they do not know
they need whereas
others are less
common .
How to apply the Method
28. DWCS : two “extreme” strategies can be adopted in this activity : top-down
(definition of DWCS based on the final user requirements) , bottom-down (
definition of DWCS based on the available data sources ).
We suggest to adopt a combined solution : the DW is designed from the
final user’s requirements, but bearing in mind the available data source.
ETL process : it’s defined as a mapping between the data source and the
DWCS.
this activity and the previous one define a cycle, because during the
definition of the ELT process some errors in the DWCS can be detected and ,
therefore, the DWCS may be modified.
Define DMs (BM): different models in the BM are defined from the final
user’s initiale requirements and the DWCS, the BM can be implemented as
real or virtual DMs.
2) Design :
How to apply the Method
29. Define Storage (DWSS) : the target platform is selected ( relational,
OO..) and the corresponding logical schema (DWSS) is defined ; the
query performance can be improved by simplifying the data schema(
so that it only contains the essential data ) or by the definition of
summaries ( aggregates ) based on the final user’s requirement .
Define exportation process : the mapping between DWCS and DWSS
are manually or automatically defined .
Implement Reports & queries : requested by the final users and
implemented in the query tool used.
3) Implementation :
How to apply the Method
30. Validate DW : the solution obtained ( the DW built ) is checked against the
existing problem (final user requirements ) . If any discrepancy exists, some
corrective actions can be taken and the process can return to one of the
previous activities.
4) Test :
How to apply the Method
31. No Pratique
application yet
Based on Uml
modeling
A tutorial is
available
online.
Uses multiple
extended EER or
UML diagrams
Together.
The proposed
methodology is
being used to
construct a DW
for a department
of the Fortaleza
City Hall.
New concept
Compariason
32. This work present a
global data warehouse
design method that is
based on UML for
designing the different
DW schemas and the
corresponding
transformation in an
integrated manner .
The goal of the work is to
extend the standard ER
conceptual data model,
as defined in the
database textbooks, with
constructs which allow
the modelling of multi-
dimensional aggregated
entities together with
their interrelationships
with the other parts of
the conceptual Schema.
X-Meta addresses the
problem of developing a
first DW project in
corporations, which do
not have staff with
practical experience in
such development.
Furthermore, it
integrates metadata
creation and
management to the DW
developing process.
Discussion
Complicated
Notions Brief explanation
with many
missing parts
Complete and
Clear work
33. Data warehousing is a business analyst's dream — all the
informations about the organization's activities gathered in one
place, open to a single set of analytical tools. But how do you
make the dream a reality?
You have to plan your data warehouse system. You must
understand what questions users will ask it because the purpose
of a data warehouse system is to provide decision-makers the
accurate, timely information they need to make the right choices.
Conclusion