A Comprehensive Method for Data Warehouse Design

917 views
774 views

Published on

Published in:
5th International Workshop on Design and Management of Data Warehouses (DMDW'03), p. 1.1-1.14, Berlin (Germany), September 8 2003.

Download:
http://gplsi.dlsi.ua.es/almacenes/ver.php?pdf=48

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
917
On SlideShare
0
From Embeds
0
Number of Embeds
33
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Good morning to everybody, my name is Sergio Luján-Mora.
    The work I am going to present (pri’zent) and I have developed (di’velopt) with my colleague (‘kolig) Juan Trujillo is entitled (in’taitl) “A Comprehensive (kompri’hensiv) Method (‘mezod) for Data Warehouse Design (di’sain)”. This work has been carried out in the “Department of Software and Computing Systems” at the “University of Alicante” in Spain.
  • Good morning to everybody, my name is Sergio Luján-Mora.
    The work I am going to present (pri’zent) and I have developed (di’velopt) with my colleague (‘kolig) Juan Trujillo is entitled (in’taitl) “A Comprehensive (kompri’hensiv) Method (‘mezod) for Data Warehouse Design (di’sain)”. This work has been carried out in the “Department of Software and Computing Systems” at the “University of Alicante” in Spain.
  • I have divided my presentation into six main points.
    Firstly, I will start with the motivation of our work.
    Then, in the second section I will provide a short background about the UML extension mechanisms (‘mek&nIz&m).
    Next, I will show the different schemas that we have defined in our data warehouse design approach.
    And then I will propose a set of steps that help the user to apply our method.
    Finally, I will end my presentation with the main conclusions and future work.
    Let us start with the first part of the presentation.
  • Data warehouses are complex information systems.
    Nowadays, data warehouses are a key component of information systems because they provide support to OLAP applications, data mining, decision support systems, and so on.
    It’s well-known that building a data warehouse is time consuming, expensive and prone to fail. There are a lot of studies about building data warehouse and the problems that can be involved (In'vAlvt).
    Therefore, modeling a data warehouse can be crucial (‘cru:sol) in the building of a data warehouse.
  • During the last few years, different approaches for modeling data warehouses have appeared (a’piart). However, they are partial approaches because they only address different parts of data warehouses. For example, …
    On the other hand, some data warehouse methods have been proposed, but they don’t include a general model for the different design steps of a data warehouse.
  • Therefore, we have been working in the development (dI'vel&pm&nt) of “A Comprehensive (kompri’hensiv) Method (‘mezod) for Data Warehouse Design (di’sain)”.
    Different principles have driven the design of our approach. First, instead of defining our own graphical notation, we use the UML, a standard visual modelling language. We say that our approach is comprehensive (kompri’hensiv) because we include the main phases of data warehouse design. Moreover, the design of a data warehouse is a joint effort DW developers and final users. Therefore, a powerful (but also easy to understand) method is needed. Finally, we provide a method as a starting point, not as a rigid template. Therefore, it’s not a software development (dI'velopment) process (‘proses) that defines the who, what, when and how of developing software.
  • Before continuing, I am going to provide a short background about the UML extension mechanisms (‘mek&nIz&m).
  • The UML is a general purpose visual modeling language for systems.
    The designers of UML realized that it was simply not possible to design a completely universal modeling language that would satisfy everyone’s needs present and future, so UML incorporates three simple extensibility mechanisms.
    Stereotypes…, Tagged values…, Constraints…
  • The main UML extension mechanism (‘mekanIzem) is the stereotype.
    In a UML diagram, there are four possible representations of a stereotyped element: icon (the stereotype icon is displayed instead of the normal representation of the element), decoration (the stereotype decoration is displayed inside the element), label (the stereotype name is displayed and appears inside guillemots), and none (the stereotype is not indicated).
  • Now, I will introduce the different schemas that are part of our proposal.
  • We consider that the development of a data warehouse can be structured into an integrated model with four different schemas (ODS, DWCS, DWSS, BM) and two schema mappings (ETL Process and Exportation Process).
    Let’s discuss in greater detail each one of the schemas.
  • I am going to use a motivating example along all the presentation. This the general diagram, the level 0 (‘zIr&U) of the example.
    Each one of the schemas and mappings is represented as a stereotyped UML package. We have defined 6 stereotypes for this level: ODS, DWCS, DWSS, BM, ETL, and Exportation.
  • The ODS reflects the structure of the operational data sources and external sources.
    Nowadays, there does not exist an accepted UML extension for modeling different types of data sources. Therefore, we have to use different UML extensions to model the ODS according to the source.
  • For examples, if the data source is a relational database…
    However, if the data source is an object-relational database
    And if the data source is an XML document, we use…
  • We use UML packages to divide (dI'vaId) the design process into three levels. In this way, we avoid flat diagrams.
  • Our UML profile includes the definition of different stereotypes for package, class and attribute. The most important stereotypes are…
  • The DWSS defines the storage (‘sto:rIdZ) of the data warehouse depending on the target platform.
  • We have defined a reduced and yet highly powerful set of ETL mechanisms. We have decided to reduce the number of mechanisms in order to reduce the complexity of our proposal.
  • Providing a graphical notation is not enough to propose a method, instead a method must specify how to properly use the corresponding graphical notation. Therefore, we propose a set of steps to guide the design of a data warehouse following our approach.
  • Moreover, thanks to the use of the UML packages, we avoid flat diagrams and our method can scale up to handle huge (hju:ch) and complex DWs.
  • We also plan to incorporate in our method more stages of the DW life cycle (‘saIkl), such as the design of the refresh processes.
  • A Comprehensive Method for Data Warehouse Design

    1. 1. Department of Software and Computing Systems A Comprehensive Method for Data Warehouse Design Sergio Luján-Mora, Juan Trujillo (sergio.lujan@ua.es / @sergiolujanmora) Published in: 5th International Workshop on Design and Management of Data Warehouses (DMDW'03), p. 1.1-1.14, Berlin (Germany), September 8 2003. Download: http://gplsi.dlsi.ua.es/almacenes/ver.php?pdf=48
    2. 2. Department of Software and Computing Systems A Comprehensive Method for Data Warehouse Design Sergio Luján-Mora Juan Trujillo DMDW 2003
    3. 3. A Comprehensive Method for Data Warehouse Design Contents • Motivation • • • • • UML extension mechanisms DW modeling schemas Applying modeling schemas Conclusions Future Work
    4. 4. A Comprehensive Method for Data Warehouse Design Motivation • Data warehouses are complex information systems • Support: – OLAP – Data mining – Decision Support Systems –… • Building a DW: time consuming, expensive and prone to fail
    5. 5. A Comprehensive Method for Data Warehouse Design Motivation • Partial approaches: – ETL processes – Logical and conceptual design of the DW based on the multidimensional paradigm – Derive DW schema from ER schemas of the data sources –… • DW methods, but not a general model for the different phases
    6. 6. A Comprehensive Method for Data Warehouse Design Motivation • Goal: A Comprehensive Method for Data Warehouse Design • Principles that drive our approach: – Standard modeling notation  UML – Comprehensive  Include main phases of DW design – Powerful but easy to understand  Different levels of detail for different users (technical and final users) – Method  Starting point, not a rigid template
    7. 7. A Comprehensive Method for Data Warehouse Design Contents • Motivation • UML extension mechanisms • • • • DW modeling schemas Applying modeling schemas Conclusions Future Work
    8. 8. A Comprehensive Method for Data Warehouse Design UML extension mechanisms • UML is a general purpose visual modeling language for systems • Extension mechanisms allow the user to tailor it to specific domains • Mechanisms: – Stereotypes  New building elements – Tagged values  New properties – Constraints  New semantics
    9. 9. A Comprehensive Method for Data Warehouse Design UML extension mechanisms Icon Decoration Label None
    10. 10. A Comprehensive Method for Data Warehouse Design Contents • Motivation • UML extension mechanisms • DW modeling schemas • Applying modeling schemas • Conclusions • Future Work
    11. 11. A Comprehensive Method for Data Warehouse Design Data Warehouse Conceptual Schema (DWCS) *** *** *** *** l edo M ETL Process *** *** *** *** *** *** *** *** Analyze *** *** *** *** Exportation Process Business Model Operational Data Schema Data Warehouse Storage Schema (ODS) (DWSS) Diagrams (windows or views into the model) (BM)
    12. 12. A Comprehensive Method for Data Warehouse Design General diagram (level 0) <<ODS>>, <<DWCS>>, <<DWSS>>, <<BM>>, <<ETL>>, <<Exportation>> <<BM>> Manager <<BM>> Accounting <<DWCS>> Data warehouse <<ODS>> Sales data <<DWSS>> Informix Metacube <<ODS>> Production data <<ODS>> Syndicated data <<ETL>> Transformations <<Exportation>> Mappings <<DWSS>> Cognos PowerPlay
    13. 13. A Comprehensive Method for Data Warehouse Design Data Warehouse Conceptual Schema (DWCS) *** *** *** *** ETL Process *** *** *** *** *** *** *** *** Analyze *** *** *** *** Exportation Process Business Model Operational Data Schema Data Warehouse Storage Schema (ODS) (DWSS) (BM)
    14. 14. A Comprehensive Method for Data Warehouse Design ODS • Operational Data Schema • Represents: – Transaction processing systems (OLTP) – External sources (census data, economic data, competitors’ data, etc.) • Not exists a UML extension for modeling different types of data sources
    15. 15. A Comprehensive Method for Data Warehouse Design ODS • RDBMS  Rational’s UML Profile for Database Design: <<Database>>, <<Schema>>, <<Table>>, … • ORDBMS  Marcos et al. UML Profile for Object-Relational Database Design: <<array>>, <<row>>, <<ref>>, … • XML  Rational’s XML-DTD UML Profile: <<DTDElement>>, <<DTDElementEmpty>>, <<DTDEntity>>, • …
    16. 16. A Comprehensive Method for Data Warehouse Design <<ODS>> Sales data 0..n 0..n 1 1..n 1 <<ODS>> Production data Salesmen 1 0..n <<ODS>> Syndicated data Cities 1 1 1 1..n Counties Groups 0..n 0..n Discount policies 0..n 0..n 1 Families 0..n 1 Products 0..n 0..n 1 1 Packages 0..n Invoices 1 Storage conditions 0..n Lines States 0..n 0..n 1 1 1 Customers 0..n Agents 0..n 1 Categories 1
    17. 17. A Comprehensive Method for Data Warehouse Design Data Warehouse Conceptual Schema (DWCS) *** *** *** *** ETL Process *** *** *** *** *** *** *** *** Analyze *** *** *** *** Exportation Process Business Model Operational Data Schema Data Warehouse Storage Schema (ODS) (DWSS) (BM)
    18. 18. A Comprehensive Method for Data Warehouse Design DWCS • Data Warehouse Conceptual Schema • UML Profile for Multidimensional Modeling • Basic components: – Facts: the transactions or values being analyzed – Dimensions: descriptive information about the facts • Properties: – – – – Shared dimensions Heterogeneous dimensions Degenerate facts and dimensions Multiple and alternative path classification hierarchies –…
    19. 19. A Comprehensive Method for Data Warehouse Design DWCS Level 1 Level 2 Level 3 Model Star schema Dimension/fact definition definition definition
    20. 20. A Comprehensive Method for Data Warehouse Design DWCS Package stereotypes Class stereotypes StarPackage (Level 1) Fact (Level 3) FactPackage (Level 2) Dimension (Level 3) DimensionPackage (Level 2) Base (Level 3)
    21. 21. A Comprehensive Method for Data Warehouse Design Model definition (level 1) <<StarPackage>> Production schema Sales schema Salesmen schema
    22. 22. A Comprehensive Method for Data Warehouse Design Star schema definition (level 2) <<FactPackage>>, <<DimensionPackage>> Production schema Sales schema Salesmen schema Stores dimension Times dimension Sales fact Products dimension Customers dimension
    23. 23. A Comprehensive Method for Data Warehouse Design Dimension/fact definition (level 3) <<Fact>>, <<Dimension>>, <<Base>> Customers dim 1 Production schema Sales schema 1 Salesmen schema Customers +child Stores dimension Times dimension +parent 0..n 0..n +child 1 Sales fact Products dimension Customers dimension ZIPs +child 0..n +parent 1 +parent +child Cities 0..n +parent 1 1 States
    24. 24. A Comprehensive Method for Data Warehouse Design Data Warehouse Conceptual Schema (DWCS) *** *** *** *** ETL Process *** *** *** *** *** *** *** *** Analyze *** *** *** *** Exportation Process Business Model Operational Data Schema Data Warehouse Storage Schema (ODS) (DWSS) (BM)
    25. 25. A Comprehensive Method for Data Warehouse Design DWSS • Data Warehouse Storage Schema • Depending on the implementation (RDMS, ORDBMS, MD, …)  Similar to the ODS • Two possibilities: manual or automatic
    26. 26. A Comprehensive Method for Data Warehouse Design Data Warehouse Conceptual Schema (DWCS) *** *** *** *** ETL Process *** *** *** *** *** *** *** *** Analyze *** *** *** *** Exportation Process Business Model Operational Data Schema Data Warehouse Storage Schema (ODS) (DWSS) (BM)
    27. 27. A Comprehensive Method for Data Warehouse Design BM • Business Model • Adapt the DW to final users: – Easier to understand – Security concerns –… • UML importing mechanism  Different submodels of DWCS
    28. 28. A Comprehensive Method for Data Warehouse Design <<DWCS>> Data warehouse Production schema Sales schema <<BM>> Accounting Salesmen schema Sales schema (from Data warehouse) Importing
    29. 29. A Comprehensive Method for Data Warehouse Design Data Warehouse Conceptual Schema (DWCS) *** *** *** *** ETL Process *** *** *** *** *** *** *** *** Analyze *** *** *** *** Exportation Process Business Model Operational Data Schema Data Warehouse Storage Schema (ODS) (DWSS) (BM)
    30. 30. A Comprehensive Method for Data Warehouse Design ETL Process • • • • Extraction-Transformation-Loading Mapping between ODS and DWCS UML Profile for Modeling ETL Processes Common mechanisms: – – – – Integration different data sources Transformati Generation of surrogate keys …
    31. 31. A Comprehensive Method for Data Warehouse Design ETL Process Aggregation Loader Conversion Log Filter Merge Incorrect Surrogate Join Wrapper
    32. 32. A Comprehensive Method for Data Warehouse Design LeftJoin(Storage = IdStorage) Name = Products.Name StName = [Storage conditions].Name StDescription = [Storage conditions].Description Storage conditions (from Sales data) - IdStorage - Name - Description Products dim 1 (from Products dimension) 0..n Products (from Sales data) - IdProduct - Name - Price - Family - Storage NewClass2 - IdProduct - Name - Price - Family - StName - StDescription ProdEuro ProdLoader ProdDescription (from Products dimension) Price = DollarToEuro(Price)
    33. 33. A Comprehensive Method for Data Warehouse Design Data Warehouse Conceptual Schema (DWCS) *** *** *** *** ETL Process *** *** *** *** *** *** *** *** Analyze *** *** *** *** Exportation Process Business Model Operational Data Schema Data Warehouse Storage Schema (ODS) (DWSS) (BM)
    34. 34. A Comprehensive Method for Data Warehouse Design Exportation Process • Mapping between DWCS and DWSS • Two possibilities: manual or automatic
    35. 35. A Comprehensive Method for Data Warehouse Design Contents • Motivation • UML extension mechanisms • DW modeling schemas • Applying modeling schemas • Conclusions • Future Work
    36. 36. A Comprehensive Method for Data Warehouse Design
    37. 37. A Comprehensive Method for Data Warehouse Design
    38. 38. A Comprehensive Method for Data Warehouse Design
    39. 39. A Comprehensive Method for Data Warehouse Design Contents • • • • Motivation UML extension mechanisms DW modeling schemas Applying modeling schemas • Conclusions • Future Work
    40. 40. A Comprehensive Method for Data Warehouse Design Conclusions • Global DW design method • Best advantages: – Same standard notation (UML) – Integration of different design phases in a single and coherent framework – Scale up to handle huge and complex DWs • CASE tool support with Rational Rose  Add-in
    41. 41. A Comprehensive Method for Data Warehouse Design Contents • • • • • Motivation UML extension mechanisms DW modeling schemas Applying modeling schemas Conclusions • Future Work
    42. 42. A Comprehensive Method for Data Warehouse Design Future work • Data mapping at attribute level • Diagramming and style guidelines for creating better diagrams • More stages of the DW life cycle (e.g., refresh processes)
    43. 43. A Comprehensive Method for Data Warehouse Design Department of Software and Computing Systems A Comprehensive Method for Data Warehouse Design Sergio Luján-Mora Juan Trujillo

    ×