Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using SLE for creation of data warehouses


Published on

Doctoral Symposium on Software Language Engineering 2010.
This is a presentation of a paper that describes how software language engineering is
applied to the process of data warehouse creation. The creation of a data
warehouse is a complex process and therefore costly. The indroduced approach decomposes
the data warehouse creation process into different aspects. These
aspects are described with different languages which are integrated by a
metamodel. Based on this metamodel, large parts of the data warehouse
creation process can be generated. With this approach data warehouses
are created more comfortable in less time.

Published in: Software
  • Be the first to comment

  • Be the first to like this

Using SLE for creation of data warehouses

  1. 1. Using SLE for creation of Data Warehouses 22.11.2015 1 Yvette Teiken OFFIS Institute for Information Technology, Escherweg 2, 26121 Oldenburg, Germany
  2. 2. Problem Description and Motivation I ► Goal of a Data Warehouses: ► Perform complex analysis of all organizational data ► Used for decision support ► Time-variant ► Non-volatile ► Integrated data from different sources and different formats in one integrated dataset ► Utilization of OLAP paradigm to allow easy analysis and accessibility ► Addressed Problems in my thesis: ► Efficient creation of domain specific DWH ► Example of use: ► Health Reporting: preparation and presentation of health relevant issues relating to population 2 22.11.2015
  3. 3. Problem Description and Motivation II ► Problems during DWH creation: ► No standardized process exists ► Documentation by many large documents ► Missing, distributed, inconsistent information ► A lot of schematic work performed during realization ► Many different user roles involved ► Initial build-up is a complex task ► Expected benefits: ► Faster realization of DWH ► Better documentation of whole creation process ► Not so well trained person can realize a DWH 3 22.11.2015 Analysis organizational data Define information demand Data source transformation Multidimensional model Data quality
  4. 4. Related Work ► Languages for covering aspects of DWH creation: ► Application Design for Analytical Processing Technologies (ADAPT) ► R2O mapping for relational databases ► InDaQu for Data quality ► MDA and DWA ► Rizzi et. al.: Modelling different aspects of DWHs ► Only deal with a certain aspect, not whole process ► My approach ► Use languages that cover the whole process of DWH creation ► Integrated through a common metamodel ► Deal with multidimensional structures ► Transformations generating large parts of the DWH ► Process model that orders different aspects and connect and refined 4 22.11.2015
  5. 5. Proposed Solution I ► Idea: Describe DWH with SLE techniques, generate semi-automatic DWH ► Decompose DWH in different aspects, describe each aspect with a language: ► Aspects: ► Data Sources Schemas: Subject, the representation, and technical accessibility of sources ► Data Source Transformation: Use existing languages like R2O ► Analysis Schema: Multidimensional data models, based on ADAPT ► Measures: Mathematical functions on multidimensional data ► Hierarchy: Central aspect, complex tree structures ► Data Quality: Integrate consistency constraints (InDaQu) 5 22.11.2015
  6. 6. Example ► Hospital markt analysis: ► Find out percentages of birth ► Measure: ► ► Data Source Schema: ► Own Cases: Hospital information system: „§21 Data“ ► All Cases: Buy from external source 6 22.11.2015 AllCases OwnCases eOfBirthMarketShar  Name Typ Arity Id of Insurance Numeric 10 Year of Birth Numeric 4 Month of Birth Numeric 2 Gender String 1 PLZ Numeric 5 Start date Numeric 12 Reason of admisson String 1 End date String 12 Age in years String 3 DRG String 4
  7. 7. Example ► Analysis Schema: ► Generated relational schema 7 22.11.2015
  8. 8. Example Own Cases start date Reason of admisson year of Birth DRG Gender Id of Insurence month of Birth End date age in years PLZ 8 22.11.2015 Target schema day ICD Year DRG Gender =new Datetime(Q[10,11],Q[4,5],Q[0-3]) (G==m  M || G==w  F) ► Data Source Transformation: ► Consistency Rules: ► ICD=O10-O16 & G=M  invalid ► DRG=O01F & G=M  invalid
  9. 9. Current Status ► Already done ► Analysis Schema DSL ► Hierarchy DSL ► Data Quality DSL ► Transformations for Data Integration and Cubes ► Integrated Metamodel for these aspects ► Left to be done ► Data Source Schema ► Measures ► Data Source Transformation ► Integrate these aspects 9 22.11.2015
  10. 10. Research Method and Conclusion ► Research Method ► Validation via implementation ► Described languages, metamodels, and transformations on basis of the MUSTANG platform ► Ability to generate a configuration for a DWH ► Conclusion ► Experts can design and analyze all aspects of the DWH independently in DSLs ► Enables semi-automatic DWH creation ► Makes development faster 10 22.11.2015