A Survey on the following papers:
[22] Felix Naumann & Stefano Rizzi 2013 – Fusion Cubes
[30] MaSM: Efficient Online Updates in Data Warehouses
[37] Managing Late Measurements In Data Warehouses (Matteo Golfarelli & Stefano Rizzi 2007)
[38] A SchemaGuide for Accelerating the View Adaptation Process
[35] Temporal Query Processing in Teradata
[33] Toward Propagating the Evolution of Data Warehouse on Data Marts
[31] Wrembel and Bebel (2007)
The document discusses various concepts related to data warehousing including:
1. The key characteristics of a data warehouse including being subject-oriented, integrated, time-variant, and non-updatable.
2. Common data warehouse architectures including two-level, independent data marts, dependent data marts with an operational data store, logical data marts with an active warehouse, and a three-layer architecture.
3. The Extract, Transform, Load (ETL) process and data reconciliation to integrate and transform data from source systems into the data warehouse.
CS101- Introduction to Computing- Lecture 37Bilal Ahmed
This document discusses database software and relational databases. It begins by focusing on issues with data management as the amount of data increases. Relational databases are introduced as a way to organize large amounts of interrelated data across multiple tables that can be queried. Examples of database management software are provided. The document then demonstrates creating related tables to store book inventory and customer order data. It discusses how a report can be generated by combining data from these tables. Finally, an assignment is provided to design a database with two tables, populate them with data, and generate a report.
This document discusses key concepts related to data warehousing including:
- The definition of a data warehouse as a subject-oriented, integrated, time-variant collection of data used for analysis and decision making.
- Common features of data warehouses such as being separate from operational databases, containing consolidated historical data, and being non-volatile.
- Types of data warehouse applications including information processing, analytical processing, and data mining.
- Common schemas used in data warehousing including star schemas, snowflake schemas, and fact constellation schemas.
This document summarizes the results of performance tests conducted on SAP HANA to demonstrate its efficiency and scalability for real-time business intelligence (BI) queries on large datasets. Key findings include:
- HANA processed a 100TB dataset representing 5 years of sales data across a 16-node cluster, with queries returning results in under 4 seconds.
- Throughput testing showed HANA could process over 52,000 queries per hour under mixed concurrent workloads.
- HANA achieved over 20x data compression and was able to analyze the entire dataset without additional configuration or tuning.
A data warehouse is a consolidated view of enterprise data structured for dynamic queries and analytics. It has the following key characteristics: integrated, subject-oriented, time-variant, and non-volatile. A data warehouse uses a three-tier architecture including a database bottom tier, middle OLAP server tier, and top reporting tools tier. It enables improved decision making by storing large volumes of historical data separately from operational systems and facilitating analysis through dimensional modeling.
The document discusses data warehousing and OLAP technology for data mining. It defines a data warehouse as a subject-oriented, integrated, time-variant, and non-volatile collection of data that supports management decision making. It describes how data warehouses use a multi-dimensional data model with dimensions and facts to organize data into cubes that can be sliced, diced, and aggregated. It also discusses how data warehouse architecture, implementation, indexing techniques, and metadata repositories help optimize online analytical processing queries on historical and summarized data to support data mining.
This document discusses different types of slowly changing dimensions in a data warehouse: Type 1, Type 2, and Type 3. Type 1 dimensions involve corrections to existing data. Type 2 dimensions track true changes over time by adding new rows. Type 3 dimensions store both old and new attribute values in the same row. The document also covers junk dimensions, large dimensions, and rapidly changing dimensions.
* Know the characteristics of a BI platform;
* Discuss about ETL processes;
* Understand what is Data Warehouse:
* * Differences between Transactional database and Data warehouse;
* * Best Practices for Data Warehousing;
* Talk about OLAP Cubes:
* * Differences between Data Warehouse and OLAP cubes;
* * Know when and why to use OLAP cubes.
The document discusses various concepts related to data warehousing including:
1. The key characteristics of a data warehouse including being subject-oriented, integrated, time-variant, and non-updatable.
2. Common data warehouse architectures including two-level, independent data marts, dependent data marts with an operational data store, logical data marts with an active warehouse, and a three-layer architecture.
3. The Extract, Transform, Load (ETL) process and data reconciliation to integrate and transform data from source systems into the data warehouse.
CS101- Introduction to Computing- Lecture 37Bilal Ahmed
This document discusses database software and relational databases. It begins by focusing on issues with data management as the amount of data increases. Relational databases are introduced as a way to organize large amounts of interrelated data across multiple tables that can be queried. Examples of database management software are provided. The document then demonstrates creating related tables to store book inventory and customer order data. It discusses how a report can be generated by combining data from these tables. Finally, an assignment is provided to design a database with two tables, populate them with data, and generate a report.
This document discusses key concepts related to data warehousing including:
- The definition of a data warehouse as a subject-oriented, integrated, time-variant collection of data used for analysis and decision making.
- Common features of data warehouses such as being separate from operational databases, containing consolidated historical data, and being non-volatile.
- Types of data warehouse applications including information processing, analytical processing, and data mining.
- Common schemas used in data warehousing including star schemas, snowflake schemas, and fact constellation schemas.
This document summarizes the results of performance tests conducted on SAP HANA to demonstrate its efficiency and scalability for real-time business intelligence (BI) queries on large datasets. Key findings include:
- HANA processed a 100TB dataset representing 5 years of sales data across a 16-node cluster, with queries returning results in under 4 seconds.
- Throughput testing showed HANA could process over 52,000 queries per hour under mixed concurrent workloads.
- HANA achieved over 20x data compression and was able to analyze the entire dataset without additional configuration or tuning.
A data warehouse is a consolidated view of enterprise data structured for dynamic queries and analytics. It has the following key characteristics: integrated, subject-oriented, time-variant, and non-volatile. A data warehouse uses a three-tier architecture including a database bottom tier, middle OLAP server tier, and top reporting tools tier. It enables improved decision making by storing large volumes of historical data separately from operational systems and facilitating analysis through dimensional modeling.
The document discusses data warehousing and OLAP technology for data mining. It defines a data warehouse as a subject-oriented, integrated, time-variant, and non-volatile collection of data that supports management decision making. It describes how data warehouses use a multi-dimensional data model with dimensions and facts to organize data into cubes that can be sliced, diced, and aggregated. It also discusses how data warehouse architecture, implementation, indexing techniques, and metadata repositories help optimize online analytical processing queries on historical and summarized data to support data mining.
This document discusses different types of slowly changing dimensions in a data warehouse: Type 1, Type 2, and Type 3. Type 1 dimensions involve corrections to existing data. Type 2 dimensions track true changes over time by adding new rows. Type 3 dimensions store both old and new attribute values in the same row. The document also covers junk dimensions, large dimensions, and rapidly changing dimensions.
* Know the characteristics of a BI platform;
* Discuss about ETL processes;
* Understand what is Data Warehouse:
* * Differences between Transactional database and Data warehouse;
* * Best Practices for Data Warehousing;
* Talk about OLAP Cubes:
* * Differences between Data Warehouse and OLAP cubes;
* * Know when and why to use OLAP cubes.
The document discusses data warehousing and OLAP (online analytical processing). It defines a data warehouse as a subject-oriented, integrated, time-variant and non-volatile collection of data used to support management decision making. The document outlines common data warehouse architectures like star schemas and snowflake schemas and discusses how data is modeled and organized in multidimensional data cubes. It also describes typical OLAP operations for analyzing and exploring cube data like roll-up, drill-down, slice and dice.
Designing high performance datawarehouseUday Kothari
Just when the world of “Data 1.0” showed some signs of maturing; the “Outside In” driven demands seem to have already initiated some the disruptive changes to the data landscape. Parallel growth in volume, velocity and variety of data coupled with incessant war on finding newer insights and value from data has posed a Big Question: Is Your Data Warehouse Relevant?
In short, the surrounding changes happening real time is the new “Data 2.0”. It is characterized by feeding the ever hungry minds with sharper insights whether it is related to regulation, finance, corporate action, risk management or purely aimed at improving operational efficiencies. The source in this new “Data 2.0” has to be commensurate to the outside in demands from customers, regulators, stakeholders and business users; and hence, you would need a high relformance (relevance + performance) data warehouse which will be relevant to your business eco-system and will have the power to scale exponentially.
We starts this webinar by giving the audiences a sneak preview of what happened in the Data 1.0 world & which characteristics are shaping the new Data 2.0 world. It then delves deep on the challenges that growing data volumes have posed to the Data warehouse teams. It also presents the audiences some of the practical and proven methodologies to address these performance challenges. Finally, in the end it will highlight some of the thought provoking ways to turbo charge your data warehouse related initiatives by leveraging some of the newer technologies like Hadoop. Overall, the webinar will educate audiences with building high performance and relevant data warehouses which is capable of meeting the newer demands while significantly driving down the total cost of ownership.
The document discusses data warehousing, including what a data warehouse is, why organizations use them, and key concepts like ETL processes, dimensional modeling using star schemas, and OLAP for data analysis. A data warehouse consolidates data from multiple sources, structures it by subject areas like customers or products, and loads it in a time-variant, nonvolatile structure to support management decision making. It also describes common data warehouse architectures and compares OLTP and OLAP systems.
The document discusses data warehousing, including what a data warehouse is, why organizations use them, and key concepts like ETL processes, dimensional modeling using star schemas, and online analytical processing (OLAP). A data warehouse is a subject-oriented database that integrates data from multiple sources to support analysis and decision-making. It undergoes ETL to clean, transform, and load operational data into a structure optimized for queries. Common architectures include independent data marts, dependent data marts with an operational data store, and logical data marts viewing an active warehouse.
A data warehouse is a subject-oriented, consolidated collection of integrated data from multiple sources used to support management decision making. It is separate from operational databases and contains historical data for analysis. Data warehouses use a star schema with fact and dimension tables and support online analytical processing (OLAP) for complex analysis and reporting.
This is my presentation at SQLBits 8, Brighton, 9th April 2011. This session is about advanced dimensional modelling topics such as Fact Table Primary Key, Vertical Fact Tables, Aggregate Fact Tables, SCD Type 6, Snapshotting Transaction Fact Tables, 1 or 2 Dimensions, Dealing with Currency Rates, When to Snowflake, Dimensions with Multi Valued Attributes, Transaction-Level Dimensions, Very Large Dimensions, A Dimension With Only 1 Attribute, Rapidly Changing Dimensions, Banding Dimension Rows, Stamping Dimension Rows and Real Time Fact Table. Prerequisites: You need have a basic knowledge of dimensional modelling and relational database design.
My name is Vincent Rainardi. I am a data warehouse & BI architect. I wrote a book on SQL Server data warehousing & BI, as well as many articles on my blog, www.datawarehouse.org.uk. I welcome questions and discussions on data warehousing on vrainardi@gmail.com. Enjoy the presentation.
Data Warehouses & Deployment By Ankita dubeyAnkita Dubey
This document contains the notes about data warehouses and life cycle for data warehouse deployment project. This can be useful for students or working professionals to gain the basic knowledge about Data warehouses.
The document discusses data warehousing and OLAP technology for data mining. It defines a data warehouse as a subject-oriented, integrated, time-variant, and nonvolatile collection of data to support management decision making. It describes how a data warehouse uses a multi-dimensional data model with dimensions and measures. It also discusses efficient computation of data cubes, OLAP operations, and further developments in data cube technology like discovery-driven and multi-feature cubes to support data mining applications from information processing to analytical processing and knowledge discovery.
This chapter discusses how various types of information systems within organizations use data to support decision making. It describes transaction processing systems that manage daily operational data and how that data is then used in other systems. Customer relationship management systems are discussed as are data warehousing systems, which accumulate data over time for analysis. The chapter also outlines marketing information systems, human resources information systems, executive information systems, and how data mining and analytics are used to discover new insights within large data sets.
Data Warehousing and Bitmap Indexes - More than just some bitsTrivadis
The document discusses bitmap indexes and their usage in data warehousing. It begins with an overview of bitmap index concepts and how they compare to B-tree indexes. It then covers best practices for using bitmap indexes in a star schema data warehouse, including maintaining bitmap indexes during ETL processes. The presentation concludes that bitmap indexes are highly effective for data warehousing queries and there are few reasons to use B-tree indexes within a data warehouse.
The document discusses key concepts related to data staging and processing in SAP's Business Information Warehouse (BW), including:
- The Administrator Workbench (AWB) is the tool used to control, monitor, and maintain all data staging and processing processes in BW.
- Functional areas in AWB include modeling, monitoring, reporting agents, transport connections, documents, business content, and translation.
- Important components of AWB include info providers, info objects, info sources, source systems, and planning and scheduling agents.
- Info objects structure the business data needed for reporting. Info providers are objects used to create and execute queries. Info sources and source systems describe where the raw data originates from
A data warehouse is a subject-oriented, integrated, time-variant collection of data that supports management's decision-making processes. It contains data extracted from various operational databases and data sources. The data is cleaned, transformed, integrated and loaded into the data warehouse for analysis. A data warehouse uses a multidimensional model with facts and dimensions to allow for complex analytical and ad-hoc queries from multiple perspectives. It is separately administered from operational databases to avoid impacting transaction processing systems and allow optimized access for decision support.
SAP Data Archiving allows organizations to remove old data from their SAP database and store it externally to reduce costs and improve performance. The archiving process involves creating archive files, running delete programs to remove data from the database, and storing the archive files externally. Archiving objects define which data to archive and how. The archive information system then allows users to search and retrieve archived data.
This document contains a question bank for the subject of data warehousing and mining. It provides definitions and characteristics of data warehouses, including that they are subject-oriented, integrated, time-variant, and non-volatile stores of data from multiple sources made available for analysis. It also defines multidimensional data models using fact and dimension tables, and classifies OLAP tools as relational, multidimensional, or hybrid. Key differences between star and snowflake schemas are that snowflake schemas further normalize dimension tables. Metadata is defined as data about data.
A database is a structured collection of records that can be easily retrieved. The relational model, developed by Edgar Codd in 1969, is one of the most popular database models. Organizations store employee and customer information in databases, and websites store user login information in databases. Data warehousing involves integrating data from different sources into a single database to facilitate reporting and analysis. It allows organizations to consolidate data to better understand business metrics and make strategic decisions.
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALASaikiran Panjala
This document discusses data warehouses, including what they are, how they are implemented, and how they can be further developed. It provides definitions of key concepts like data warehouses, data cubes, and OLAP. It also describes techniques for efficient data cube computation, indexing of OLAP data, and processing of OLAP queries. Finally, it discusses different approaches to data warehouse implementation and development of data cube technology.
1. The document provides a brief comparison of the SAP SEM-BCS and SAP BusinessObjects Planning and Consolidation (BPC) solutions for financial consolidation, comparing features such as the data model, master data, consolidation monitor, balance carry forward, data collection, journal entries, validations, data reclassifications, inter-unit reconciliations, and currency translation.
2. Key differences noted are that SEM-BCS uses a multi-dimensional data model and stores data periodically, while BPC uses a flat data model and stores data in year-to-date format. BPC also offers more flexibility in mapping rules and data entry layouts.
3. Both solutions can perform the
This document provides an overview of key concepts in data warehousing including:
1. The need for data warehousing to consolidate data from multiple sources and support decision making.
2. Common data warehouse architectures like the two-tier architecture and data marts.
3. The extract, transform, load (ETL) process used to reconcile data and populate the data warehouse.
These slides will help in understanding what is Data warehouse? why we need it? DWh architecture, OLAP, Metadata, Data Mart, Schemas for multidimensional data, partitioning of data warehouse
- Fact tables contain measurements and facts about a business process. Each measurement is stored as a row, with dimensions like time, product, customer as columns.
- The grain defines the level of granularity of the measurements in the fact table, such as daily, hourly. Surrogate keys are used instead of natural keys for faster lookups and referential integrity.
- Data is loaded into partitions in the fact tables for improved performance. Loading is done in batches with indexes added later to avoid slowing down the load process. Updates are minimized by deleting and re-inserting records if needed.
The document provides an overview of key concepts related to data warehousing including:
- A data warehouse is a subject-oriented, non-volatile collection of integrated data used to support management decision making. It is separate from operational databases.
- Data is extracted from multiple sources, transformed, cleaned, and loaded into the data warehouse. Dimensional modeling organizes data into fact and dimension tables.
- OLAP tools allow users to perform multidimensional analysis of data through operations like roll-up, drill-down, slice and dice to gain insights for decision making.
The document discusses data warehousing and OLAP (online analytical processing). It defines a data warehouse as a subject-oriented, integrated, time-variant and non-volatile collection of data used to support management decision making. The document outlines common data warehouse architectures like star schemas and snowflake schemas and discusses how data is modeled and organized in multidimensional data cubes. It also describes typical OLAP operations for analyzing and exploring cube data like roll-up, drill-down, slice and dice.
Designing high performance datawarehouseUday Kothari
Just when the world of “Data 1.0” showed some signs of maturing; the “Outside In” driven demands seem to have already initiated some the disruptive changes to the data landscape. Parallel growth in volume, velocity and variety of data coupled with incessant war on finding newer insights and value from data has posed a Big Question: Is Your Data Warehouse Relevant?
In short, the surrounding changes happening real time is the new “Data 2.0”. It is characterized by feeding the ever hungry minds with sharper insights whether it is related to regulation, finance, corporate action, risk management or purely aimed at improving operational efficiencies. The source in this new “Data 2.0” has to be commensurate to the outside in demands from customers, regulators, stakeholders and business users; and hence, you would need a high relformance (relevance + performance) data warehouse which will be relevant to your business eco-system and will have the power to scale exponentially.
We starts this webinar by giving the audiences a sneak preview of what happened in the Data 1.0 world & which characteristics are shaping the new Data 2.0 world. It then delves deep on the challenges that growing data volumes have posed to the Data warehouse teams. It also presents the audiences some of the practical and proven methodologies to address these performance challenges. Finally, in the end it will highlight some of the thought provoking ways to turbo charge your data warehouse related initiatives by leveraging some of the newer technologies like Hadoop. Overall, the webinar will educate audiences with building high performance and relevant data warehouses which is capable of meeting the newer demands while significantly driving down the total cost of ownership.
The document discusses data warehousing, including what a data warehouse is, why organizations use them, and key concepts like ETL processes, dimensional modeling using star schemas, and OLAP for data analysis. A data warehouse consolidates data from multiple sources, structures it by subject areas like customers or products, and loads it in a time-variant, nonvolatile structure to support management decision making. It also describes common data warehouse architectures and compares OLTP and OLAP systems.
The document discusses data warehousing, including what a data warehouse is, why organizations use them, and key concepts like ETL processes, dimensional modeling using star schemas, and online analytical processing (OLAP). A data warehouse is a subject-oriented database that integrates data from multiple sources to support analysis and decision-making. It undergoes ETL to clean, transform, and load operational data into a structure optimized for queries. Common architectures include independent data marts, dependent data marts with an operational data store, and logical data marts viewing an active warehouse.
A data warehouse is a subject-oriented, consolidated collection of integrated data from multiple sources used to support management decision making. It is separate from operational databases and contains historical data for analysis. Data warehouses use a star schema with fact and dimension tables and support online analytical processing (OLAP) for complex analysis and reporting.
This is my presentation at SQLBits 8, Brighton, 9th April 2011. This session is about advanced dimensional modelling topics such as Fact Table Primary Key, Vertical Fact Tables, Aggregate Fact Tables, SCD Type 6, Snapshotting Transaction Fact Tables, 1 or 2 Dimensions, Dealing with Currency Rates, When to Snowflake, Dimensions with Multi Valued Attributes, Transaction-Level Dimensions, Very Large Dimensions, A Dimension With Only 1 Attribute, Rapidly Changing Dimensions, Banding Dimension Rows, Stamping Dimension Rows and Real Time Fact Table. Prerequisites: You need have a basic knowledge of dimensional modelling and relational database design.
My name is Vincent Rainardi. I am a data warehouse & BI architect. I wrote a book on SQL Server data warehousing & BI, as well as many articles on my blog, www.datawarehouse.org.uk. I welcome questions and discussions on data warehousing on vrainardi@gmail.com. Enjoy the presentation.
Data Warehouses & Deployment By Ankita dubeyAnkita Dubey
This document contains the notes about data warehouses and life cycle for data warehouse deployment project. This can be useful for students or working professionals to gain the basic knowledge about Data warehouses.
The document discusses data warehousing and OLAP technology for data mining. It defines a data warehouse as a subject-oriented, integrated, time-variant, and nonvolatile collection of data to support management decision making. It describes how a data warehouse uses a multi-dimensional data model with dimensions and measures. It also discusses efficient computation of data cubes, OLAP operations, and further developments in data cube technology like discovery-driven and multi-feature cubes to support data mining applications from information processing to analytical processing and knowledge discovery.
This chapter discusses how various types of information systems within organizations use data to support decision making. It describes transaction processing systems that manage daily operational data and how that data is then used in other systems. Customer relationship management systems are discussed as are data warehousing systems, which accumulate data over time for analysis. The chapter also outlines marketing information systems, human resources information systems, executive information systems, and how data mining and analytics are used to discover new insights within large data sets.
Data Warehousing and Bitmap Indexes - More than just some bitsTrivadis
The document discusses bitmap indexes and their usage in data warehousing. It begins with an overview of bitmap index concepts and how they compare to B-tree indexes. It then covers best practices for using bitmap indexes in a star schema data warehouse, including maintaining bitmap indexes during ETL processes. The presentation concludes that bitmap indexes are highly effective for data warehousing queries and there are few reasons to use B-tree indexes within a data warehouse.
The document discusses key concepts related to data staging and processing in SAP's Business Information Warehouse (BW), including:
- The Administrator Workbench (AWB) is the tool used to control, monitor, and maintain all data staging and processing processes in BW.
- Functional areas in AWB include modeling, monitoring, reporting agents, transport connections, documents, business content, and translation.
- Important components of AWB include info providers, info objects, info sources, source systems, and planning and scheduling agents.
- Info objects structure the business data needed for reporting. Info providers are objects used to create and execute queries. Info sources and source systems describe where the raw data originates from
A data warehouse is a subject-oriented, integrated, time-variant collection of data that supports management's decision-making processes. It contains data extracted from various operational databases and data sources. The data is cleaned, transformed, integrated and loaded into the data warehouse for analysis. A data warehouse uses a multidimensional model with facts and dimensions to allow for complex analytical and ad-hoc queries from multiple perspectives. It is separately administered from operational databases to avoid impacting transaction processing systems and allow optimized access for decision support.
SAP Data Archiving allows organizations to remove old data from their SAP database and store it externally to reduce costs and improve performance. The archiving process involves creating archive files, running delete programs to remove data from the database, and storing the archive files externally. Archiving objects define which data to archive and how. The archive information system then allows users to search and retrieve archived data.
This document contains a question bank for the subject of data warehousing and mining. It provides definitions and characteristics of data warehouses, including that they are subject-oriented, integrated, time-variant, and non-volatile stores of data from multiple sources made available for analysis. It also defines multidimensional data models using fact and dimension tables, and classifies OLAP tools as relational, multidimensional, or hybrid. Key differences between star and snowflake schemas are that snowflake schemas further normalize dimension tables. Metadata is defined as data about data.
A database is a structured collection of records that can be easily retrieved. The relational model, developed by Edgar Codd in 1969, is one of the most popular database models. Organizations store employee and customer information in databases, and websites store user login information in databases. Data warehousing involves integrating data from different sources into a single database to facilitate reporting and analysis. It allows organizations to consolidate data to better understand business metrics and make strategic decisions.
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALASaikiran Panjala
This document discusses data warehouses, including what they are, how they are implemented, and how they can be further developed. It provides definitions of key concepts like data warehouses, data cubes, and OLAP. It also describes techniques for efficient data cube computation, indexing of OLAP data, and processing of OLAP queries. Finally, it discusses different approaches to data warehouse implementation and development of data cube technology.
1. The document provides a brief comparison of the SAP SEM-BCS and SAP BusinessObjects Planning and Consolidation (BPC) solutions for financial consolidation, comparing features such as the data model, master data, consolidation monitor, balance carry forward, data collection, journal entries, validations, data reclassifications, inter-unit reconciliations, and currency translation.
2. Key differences noted are that SEM-BCS uses a multi-dimensional data model and stores data periodically, while BPC uses a flat data model and stores data in year-to-date format. BPC also offers more flexibility in mapping rules and data entry layouts.
3. Both solutions can perform the
This document provides an overview of key concepts in data warehousing including:
1. The need for data warehousing to consolidate data from multiple sources and support decision making.
2. Common data warehouse architectures like the two-tier architecture and data marts.
3. The extract, transform, load (ETL) process used to reconcile data and populate the data warehouse.
These slides will help in understanding what is Data warehouse? why we need it? DWh architecture, OLAP, Metadata, Data Mart, Schemas for multidimensional data, partitioning of data warehouse
- Fact tables contain measurements and facts about a business process. Each measurement is stored as a row, with dimensions like time, product, customer as columns.
- The grain defines the level of granularity of the measurements in the fact table, such as daily, hourly. Surrogate keys are used instead of natural keys for faster lookups and referential integrity.
- Data is loaded into partitions in the fact tables for improved performance. Loading is done in batches with indexes added later to avoid slowing down the load process. Updates are minimized by deleting and re-inserting records if needed.
The document provides an overview of key concepts related to data warehousing including:
- A data warehouse is a subject-oriented, non-volatile collection of integrated data used to support management decision making. It is separate from operational databases.
- Data is extracted from multiple sources, transformed, cleaned, and loaded into the data warehouse. Dimensional modeling organizes data into fact and dimension tables.
- OLAP tools allow users to perform multidimensional analysis of data through operations like roll-up, drill-down, slice and dice to gain insights for decision making.
We offer online IT training with placements, project assistance in different platforms with real time industry consultants to provide quality training for all it professionals, corporate clients and students etc. Special features by InformaticaTrainingClasses are Extensive Training will be in both Informatica Online Training and Placement. We help you in resume preparation and conducting Mock Interviews.
Emphasis is given on important topics which are essential and mostly used in real time projects. Informatica training Classes is an Online Training Leader when it comes to high-end effective and efficient I.T Training. We have always been and still are focusing on the key aspects which are providing utmost effective and competent training to both students and professionals who are eager to enrich their technical skills.
Training Features at Informatica training classes:
We believe that online training has to be measured by three major aspects viz., Quality, Content and Relationship with the Trainer and Student. Not only our online training classes are important but apart from that the material which we provide are in tune with the latest IT training standards, so a student has not to worry at all whether the training imparted is outdated or latest.
Course content:
• Basics of data warehousing concepts
• Power center components
• Informatica concepts and overview
• Sources
• Targets
• Transformations
• Advanced Informatica concepts
Please Visit us for the Demo Classes, we have regular batches and weekend batches.
Informatica online training classes
Phone: (404)-900-9988
Email: info@informaticatrainingclasses.com
Web: http://www.informaticatrainingclasses.com
The document provides an introduction to data warehousing. It defines a data warehouse as a subject-oriented, integrated, time-varying, and non-volatile collection of data used for organizational decision making. It describes key characteristics of a data warehouse such as maintaining historical data, facilitating analysis to improve understanding, and enabling better decision making. It also discusses dimensions, facts, ETL processes, and common data warehouse architectures like star schemas.
SKILLWISE-SSIS DESIGN PATTERN FOR DATA WAREHOUSINGSkillwise Group
This document provides an overview of the SSIS design pattern for data warehousing and change data capture. It discusses what design patterns are and how they are commonly used for SSIS and data warehousing projects. It then covers 13 specific patterns including truncate and load, slowly changing dimensions, hashbytes, change data capture, merge, and master/child workflows. The document explains when each pattern is best used and provides pros and cons. It also provides guidance on configuring and using SQL Server change data capture functionality.
This document provides an overview of data warehousing. It defines a data warehouse as a subject-oriented, integrated collection of data used to support management decision making. The benefits of data warehousing include high returns on investment and increased productivity. A data warehouse differs from an OLTP system in its design for analytics rather than transactions. The typical architecture includes data sources, an operational data store, warehouse manager, query manager and end user tools. Key components are extracting, cleaning, transforming and loading data, and managing metadata. Data flows include inflows from sources and upflows of summarized data to users.
INFORMATICA ONLINE TRAINING BY QUONTRA SOLUTIONS WITH PLACEMENT ASSISTANCE
We offer online IT training with placements, project assistance in different platforms with real time industry consultants to provide quality training for all it professionals, corporate clients and students etc. Special features by Quontra Solutions are Extensive Training will be in both Informatica Online Training and Placement. We help you in resume preparation and conducting Mock Interviews.
Emphasis is given on important topics which are essential and mostly used in real time projects. Quontra Solutions is an Online Training Leader when it comes to high-end effective and efficient I.T Training. We have always been and still are focusing on the key aspects which are providing utmost effective and competent training to both students and professionals who are eager to enrich their technical skills.
Training Features at Quontra Solutions:
We believe that online training has to be measured by three major aspects viz., Quality, Content and Relationship with the Trainer and Student. Not only our online training classes are important but apart from that the material which we provide are in tune with the latest IT training standards, so a student has not to worry at all whether the training imparted is outdated or latest.
Course content:
• Basics of data warehousing concepts
• Power center components
• Informatica concepts and overview
• Sources
• Targets
• Transformations
• Advanced Informatica concepts
Please Visit us for the Demo Classes, we have regular batches and weekend batches.
QUONTRASOLUTIONS
204-226 Imperial Drive,Rayners Lane, Harrow-HA2 7HH
Phone : +44 (0)20 3734 1498 / 99
Email: info@quontrasolutions.co.uk
SQL Server 2016 includes new features such as columnstore indexes, in-memory OLTP, live query statistics, temporal tables, and security enhancements like row-level security and dynamic data masking. It also improves existing features such as query store and transparent data encryption. The document provides details on many of the new and enhanced features in SQL Server 2016.
SQL Server 2016 includes several new features such as columnstore indexes, in-memory OLTP, live query statistics, temporal tables, and row-level security. It also features improved manage backup functionality, support for multiple tempdb files, and new ways to format and encrypt query results. Advanced capabilities like PolyBase and Stretch Database further enhance analytics and management of historical data.
Data flow in Extraction of ETL data warehousingDr. Dipti Patil
The document discusses data flow processes in data warehousing including extraction, cleaning, conforming, and delivery.
Extraction involves reading data from source systems, connecting to data sources, scheduling data retrieval, capturing changed data, and dumping extracted data to disk. Cleaning ensures proper data types and structure and enforces data rules. Conforming loads dimensions, facts, and aggregations and handles delayed data. Delivery includes scheduling, job execution, recovery, and quality checks.
The document also discusses logical data mapping, which provides the foundation for metadata. It involves planning ETL processes, identifying data sources, and designing fact and dimension tables based on business rules and requirements. Components of a logical data map include table names, column names
The document discusses ETL (Extract, Transform, Load) versus ELT (Extract, Load, Transform). ETL is the traditional approach where data is extracted from source systems, transformed into the required format, and then loaded into the data warehouse. ELT is a more modern approach enabled by cloud data warehouses and data lakes. With ELT, raw data is extracted and loaded directly into the data lake without transformation. Transformation occurs later, after loading, when the data is needed for a specific analysis. While ELT can slow analysis, it offers more flexibility since data can be transformed in different ways on demand to produce various metrics and reports.
Asper database presentation - Data Modeling TopicsTerry Bunio
The document discusses dimensional data modeling concepts. It begins with definitions of data modeling and dimensional modeling. It then covers dimensional concepts like facts, dimensions, and star schemas. It discusses challenges like slowly changing dimensions, bridge tables, and recursive hierarchies. It provides advice on how to approach dimensional modeling, including starting simple and iterating as understanding improves. It emphasizes letting the data define the optimal solution rather than rigidly applying patterns. Finally, it suggests data modeling and migration skills will grow in importance as databases are increasingly used for analysis.
A data warehouse is a large collection of integrated data from multiple sources that is structured for analysis and reporting. It allows users to gain insights from historical data to support business decisions and identify trends. Data is extracted from operational systems, transformed for consistency and quality, and loaded into the data warehouse where it is stored in a multidimensional structure to enable analysis. This involves fact and dimension tables along with techniques like denormalization to optimize query performance.
The document discusses metadata in data warehousing and business intelligence contexts. Some key points:
1. Metadata provides information about data in a data warehouse or warehouse components like data marts. It describes data structures, attributes, transformations and more.
2. Metadata is important for tasks like ETL processing, querying, reporting and overall data management. It helps users understand what data is available and how to access and analyze it.
3. There are different types of metadata including technical metadata about data storage and processes, and business metadata that provides business definitions and rules. Maintaining accurate and consistent metadata is vital for a successful data warehouse.
A data warehouse is a pool of data structured to support decision making. It integrates data from multiple sources and is time-variant and nonvolatile. Data warehouses can take the form of enterprise data warehouses, used across an organization for decision support, or data marts designed for a specific department. The data warehousing process involves extracting data from sources, transforming and loading it into a comprehensive database, and using middleware tools and metadata. Real-time data warehousing allows for information-based decision making using up-to-date data.
This document describes how to set up and use changed data capture (CDC) in Oracle Data Integrator 11g to track changes in source data. It discusses CDC techniques like trigger-based and log-based capture and the components involved, including journals, capture processes, subscribers, and views. It then provides steps to set up a sample CDC on an Oracle database table to track inserts, updates and deletes, demonstrating capturing, viewing, and verifying changed data.
Various Applications of Data Warehouse.pptRafiulHasan19
The document discusses various applications of data warehousing. It begins by describing problems with traditional transactional systems and how data warehouses address these issues. It then defines key components of a data warehouse including the extraction, transformation, and loading of data from various sources. The document outlines how online analytical processing (OLAP) tools, metadata repositories, and data mining techniques analyze and explore the collected data. Finally, it weighs the benefits of a data warehouse against the costs of implementation and maintenance.
The document provides an overview of dimensional data modeling. It defines key concepts such as facts, dimensions, and star schemas. It discusses the differences between relational and dimensional modeling and how dimensional modeling organizes data into facts and dimensions. The document also covers more complex dimensional modeling topics such as slowly changing dimensions, bridge tables, and hierarchies. It emphasizes the importance of understanding the data and iterating on the design. Finally, it provides 10 recommendations for dimensional modeling including using surrogate keys and type 2 slowly changing dimensions.
This presentation is about the following points,
1. Introduction to ETL testing,
2. What is use of testing
3. What is quality & standards
4. Responsibilities of a ETL Tester
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Tatiana Kojar
Skybuffer AI, built on the robust SAP Business Technology Platform (SAP BTP), is the latest and most advanced version of our AI development, reaffirming our commitment to delivering top-tier AI solutions. Skybuffer AI harnesses all the innovative capabilities of the SAP BTP in the AI domain, from Conversational AI to cutting-edge Generative AI and Retrieval-Augmented Generation (RAG). It also helps SAP customers safeguard their investments into SAP Conversational AI and ensure a seamless, one-click transition to SAP Business AI.
With Skybuffer AI, various AI models can be integrated into a single communication channel such as Microsoft Teams. This integration empowers business users with insights drawn from SAP backend systems, enterprise documents, and the expansive knowledge of Generative AI. And the best part of it is that it is all managed through our intuitive no-code Action Server interface, requiring no extensive coding knowledge and making the advanced AI accessible to more users.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfflufftailshop
When it comes to unit testing in the .NET ecosystem, developers have a wide range of options available. Among the most popular choices are NUnit, XUnit, and MSTest. These unit testing frameworks provide essential tools and features to help ensure the quality and reliability of code. However, understanding the differences between these frameworks is crucial for selecting the most suitable one for your projects.
4. Data Warehouse & Data Marts
•
•
•
a data warehouse contains data from several databases maintained by different
business units with historical and summary information
It is a database used for reporting and data analysis where the data is arranged into
hierarchical groups often called dimensions and represented into facts and aggregate
facts
Data warehouses can be subdivided into data marts. Data marts store subsets of data
from a warehouse
5. Temporal database
Temporal database
is a database with built-in
support for handling data
involving time, temporal
data is data that keeps
track of changes over time
Contains the following
attributes
• Valid time
• Transaction time
• Bitemporal data
combines both Valid
and Transaction Time
6. Multidimensional data model
The multidimensional data
model is designed to solve
complex queries in real
time, it produces a Cube
which is like a 3d
spreadsheet, It represents
its information in
dimensions & facts and they
are usually maintained in a
star schema
7. Multidimensional Model Terms
• Fact
– A business performance
measurement, typically
numeric and additive,
• Dimension
– Is an object that includes
attributes allowing the user
to explore the measures from
different perspectives of
analysis
• Measures
– Are the numeric records
stored in the fact table
• Hierarchy
– Is a collection of dimensions
that form a hierarchy, such as
country/city/state
• Property
– Are additional descriptive
attributes to a dimension
8. Multidimensional Model Operations
• Drill-up/Drill-Down:
Moving from
summary category to
individual categories
and vice versa
• Roll Up cities were
vertically and
products horizontally,
after the roll up it has
swapped both
dimensions
9. Schema & Data Changes
• Databases Schema Changes
– With loss of data by simply
changing the schema
– Without loss of data but the
data is evolved with keeping
the attributes of the old
schema (evolution)
– Without loss of data by
changing schemas with
keeping old versions of the
old schemas (versioning)
• Data Changes in
warehouses
– Transient Data by deletions
and updates without
maintaining the old data
– Periodic Data which handles
deletions and updates by
adding new records
– Semi Periodic is same as
periodic but keeps only a
recent collection of changes
– Keeping snapshots of
complete data which is
popular in datamarts
10. Materialized Views
• View Maintenance:
– The process of updating a
materialized view in
response to changes to the
underlying data
• View adaption
– View adaptation aims to
leverage the previously
materialized view to
generate the new view,
since the cost of rebuilding
the materialized view from
scratch may be expensive
• Materialized view
– Is a stored view query
result which is like a cache
– Used by query optimizer to
speed up querying
13. A SchemaGuide for Accelerating the View Adaptation
Process
• an efficient process for view
adaptation in XML Databases
upon the fragment-based view
representation by segmenting
materialized data into
fragments and developing
algorithms to update only
those materialized fragments
that have affected by the view
definition changes
– Their Adaption process
• Calling an optimized
containment check for the most
suitable fragment that contain
the requested fragment
• Adapt the XFM structure to the
fragments found
• find a materialized fragment
that is affected by the change
• search for existing materialized
fragments that can be reused
and mapped to the affected
materialized fragment
– It has shown significant
improvement by reducing up to
2.6% of recomposing the
materialized view
14.
15. Multi-version Data Warehouse
1. Automatic detection of structural
and content changes in the data
sources and reflection on the data
warehouse by keeping a sequence of
persistent versions
• Their Solution Supports
– monitoring External data sources
with respect to content and
structural changes
– automatic generation of processes
monitoring External Data Sources
– applying discovered External Data
Sources changes to a selected DW
version
– describing the structure of every
DW version
– querying multiple DW versions at
the same time and presenting the
results coming from multiple
versions
– visualizing the schema
16. MaSM (Materialized Sort Merge)
2. Efficient Online Updates in Data Warehouses
• an approach for supporting online updates by making
use of SSDs to cache incoming updates
• model the problem of query processing with differential
updates as a type of outer join between the data residing
on disks and the updates residing on SSDs
• present algorithms for performing such joins and
periodic migrations, as for example The updates are
migrated to disks only when the system load is low or
when updates reach a certain threshold (e.g., 90%) of the
SSD size
17.
18. Data changes in the data mart
• changes can be first
updated in the
warehouse then data
marts under it
• Changes in data mart
are segregated to
– dimensional data
changes
– factual data changes
– schema changes
19. Dimensional Data Changes
• Which are Changes in a
hierarchy, Can be either
a dimension or a level
or a property
• Kimball Proposes Three
solutions to changes in
ROLAP
multidimensional
models
20. ROLAP multidimensional models
• In the Type I solution he simply proposes to overwrite old
tuples in dimension tables with new data. The problem is you
cannot track changes but keeps the data mart up to date
• Type II solution, each change produces a new record in the
dimension table. Surrogate keys must be used, you can keep
and track changes along with the new data
• the Type III solution is based on augmenting the schema of
the dimension table by representing both the current and the
previous value for each level or attribute subject to change
• Keeping the complete history TypeVI (I+II+III) the more data
you keep in hierarchy the more expensive, you need to keep
additional timestamps
21. Changes in Factual data
• Examples of changes happens for such cases like
errors in measurements such as levels of the sea
were captured incorrectly and fixed later
• The facts are classified based on the conceptual role
to Flow facts & Stock facts
22. Managing Late Measurements In Data Warehouses
• a proposal to couple valid time and transaction time and distinguish two
different solutions for managing late measurements
• Flow Model - delta solution, where each new measurement for an event
is represented as a delta (current registration – previous registration) with
respect to the previous measurement and transaction time is modeled by
adding to the schema a new temporal dimension that models the valid
time to represent when each registration was made in the data mart,
queries are answered by summing for each event all registrations,
historical queries are answered by selectively summing all registrations
for an event for the time queried
• Stock Model - consolidated solution, where late measurements are
represented by recording the consolidated value for the event by using 2
timestamps, and transaction time is modeled by two temporal
dimensions that delimit the time interval during which each registration is
current, like the currency and its time interval
23.
24. Managing Late Measurements In Data Warehouses
• 2 approaches are followed for handling schema
changes:
– schema evolution, maintain old information without data
loss but loss of old schema
– schema versioning, separate versions are stored and user
can access different schema versions
25. Schema Evolution
• Propagating the Evolution of Data Warehouse on
Data Marts
• operators to support changing the data mart schema
– evolution operators for the data warehouse
• basic operations and composite operations
– evolution operators for the data mart
• mapping function
– a set of rules for the evolution
26. Propagating the Evolution of Data Warehouse on
Data Marts
• This mapping function is embedded into the Extract
Transform Load process from the data warehouse to
the datamart, these functions are for example:
• Fact(Table): returns a set of facts from the data
warehouse tables
• Dim(Table): returns a superset of dimensions from
the data warehouse table, each superset contains all
dimensions of the data mart cube
27. Propagating the Evolution of Data Warehouse on
Data Marts
Propagation operations
• Add_Dim(Dname, Fi, T): adds a new dimention
which will be named as “Dname” to the data mart
fact “Fi”, it will take the primary key of table T from
the data warehouse and a subset of textual
attributes contained in T
• Add_Fact(Fname, T, set(D)): adds a new fact
“Fname” with dimensions set(D), the fact measures
are the numeric attributes of T
28. Propagating the Evolution of Data Warehouse on
Data Marts
A set of rules applies on the data warehouse to data mart mapping
process. Such as:
• If a table T to be added to the data warehouse has foreign keys in
another table existing in the data warehouse that concerns a fact,
the table will add a new dimension to the for the fact with
attributes of the table to be the attributes of the dimension
• If T doesn’t have foreign keys in another table in the data
warehouse and has different foreign keys pointing to other tables
that loads dimensions in the datamart and T has numeric
attributes, then T will probably create a new fact
Note: Commercially, SQL Compare & Oracle Change Management
Pack supports evolving schemas compare and generate scripts
29. Schema Versioning
• Decision makers may have built their decision on an
old schema and changes appeared after their
executing their queries which also may have
measure changes, to run the same query again and
produce the same result, non-volatility is required.
With changes at the schema level there has to be
some versioning approaches
30. Schema Versioning
• A comprehensive approach to versioning is presented in the
multiversion data warehouse [31], they propose two
metamodels: one for managing a multi-version data mart and
one for detecting changes in the operational sources, along
with “real” versions which are versions used in the
application domain, also “alternative” versions are
introduced which are used for simulating and managing
hypothetical business scenarios within what-if analysis
settings
• Commercially, several database management systems
(DBMSs) offer support for valid and transaction time: Oracle
11g, IBM DB2 10 for z/OS, and Teradata Database 14. Part 2
(SQL Foundation) of SQL:2011 was just released
31. Querying temporal data
• Cross Version Querying
– Multiversion Data Warehouse it allows users to specify
either specify a time interval for a data warehouse or
specify versions to query
• Temporal Querying
– Temporal Queries On TerraData
• Native Temporal Implementation
• Rewriting Approach
32. Querying temporal data
Disadvantages of native approach
– Since temporal data is stored in a new data type, SQL
execution code needs to be modified for joins and
aggregation on temporal data
– Query optimization needs to be adapted to the new
temporal tables
– Some duplications might occur in code of functions of a
DBMS to support temporal data
33. Rewrite approach
– There is no impact on execution code
– There is a small impact on the optimizer
– No duplication as it will add a step before the query
optimizer
– But it will add complexity to the query structure
34. Rewrite approach
– Rewrites will modify projection, selection & join
• Select * for example will exclude the time dimension if the
qualifier is CURRENT
• For CURRENT & SEQUENCED qualifiers, it will add time predicate
respectively
• For Join, temporal qualifiers are applied before the join, For inner
join it works, but for outer joins we need to apply the qualifiers on
each table separately then from the derived tables we perform
the join
– They showed in their study that rewrites were only adding
5% to the execution time of the query in comparison to
the native implementation
35.
36. Fusion Cubes
• A framework to support
self-service business
intelligence in
multidimensional cubes
that can be dynamically
extended both in their
schema and their
instances
• it can include both
stationary and
situational data
37. Situational Query
• A user poses an OLAP-like situational query, one that
cannot be answered on stationary data only
• The system discovers potentially relevant data sources
• The system fetches relevant situational data from selected
sources
• The system integrates situational data with the user’s data,
if any
• The system visualizes the results and the user employs
them for making her decision
• The user stores and shares the results
40. Situational Query
• integration of external data sources can be in
– RDF format to be integrated to the data warehouse on the fly
– Social networks, there is an implantation called
MicroStrategy that analyses social networks
– Blogs where we can do “opinion mining”, but to integrate it,
it is challenging because the data is unstructured
41. Drill Beyond
• Drill-Beyond operator can go beyond:
– The schema - A user can click on a dimension or a fact that is
not available
– The instances – a user can request for new instance for an
attribute, such as a new country so the values will be
retrieved
• Query formulation can involve different technologies
for situational data:
– SPARQL for querying RDF data
– Use of Web APIs that provides data in XML or JSON format
42. Integration
• Once the data is available, it has to be integrated with the
stationary data to formulate the fusion cubes
– Extract the structure of different situational data
• Google fusion tables (Gonzalez et al., 2010) offers cloud-based storage of
basic relational tables that can be shared with others, annotated, fused
with other data, this helps extracting relations from unstructured data
– Map the schema of data sources
• XFM for example
– Reconcile with stationary data to formulate the fusion cube
• Google Refine (code.google.com/p/google-refine/), which provides
functionalities for working with messy data, cleaning it up, transforming
it from one format into another, extending it with web services, and
linking it to databases
43. Support On Commercial Systems
• There are many commercial systems available to support
the fusion cubes:
• illo system (illo.com) allows non-business users to store
small cubes in a cloud-based environment, analyze the data
using powerful mechanisms, and provide visual analysis
results that can be shared with others and annotated
• The Stratosphere project (www.stratosphere.eu), which
explores the power of massively parallel computing for Big
Data analytics
46. Conclusion
• We have explored different methodologies for handling
temporal data and changes of schema, factual data,
dimensional data in Data Warehouses, researches are
coming with bright ideas helping different scenarios to
facilitate dynamic features in a Data Warehouse and
speed up its query processing. We encourage
commercial systems to look into these new
methodologies and make it available for practical use
to the public
47. Conclusion
Propagating
A
the
ROLAP
SchemaGui
Managing
MaSM
Evolution of Temporal
multidimens de for
MultiLate
(Materialize
Data
Queries On Fusion
ional
Accelerating version Data
Measureme
d Sort
Warehouse TerraData Cubes
models
the View
Warehouse
nts In Data
Merge)
on Data
Adaptation
Warehouses
Marts
Process
Method
View Adaption
YES
DW in sync with the sources
dimensional data
changes
YES
YES
YES
YES
YES
Changes
in the
data
mart
YES
factual data changes
YES
schema
changes
schema
evolution
schema
versioning
YES
YES
YES
YES
Software
Prototyp
e
YES
YES
YES
48. Issues & Future Work
•
•
•
There is a need to formulate a query that can span different data base schema and
produce results at once we suggest an approach to use an extra attribute in the data
warehouse to store heterogeneous an non-normalized data in an XML format which is
highly extensible.
The data warehouse design needs to have dynamic updates in schema without
physically modifying or replicating the current store, as above an XML extension can
help in storing attributes that seem to be changing frequently, although it might be a
drawback in performance there can be an approach to identify fixed dimensions and
properties that can be structured and queried appropriately, then there can be a
second phase to query processing where XML data can be explored and embedded in
the multidimensional data form
Queries that span across different versions in the scope of version based schema can
be time consuming, either a revision to restructure versioning can be done to store
unchanged values in a different version that the changed values it can significantly
reduce the space and time requirements and be more feasible in querying
49. Main References
• [22] Felix Naumann & Stefano Rizzi 2013 – Fusion Cubes
• [30] MaSM: Efficient Online Updates in Data Warehouses
http://www.cs.cmu.edu/~chensm/papers/MaSM-sigmod11.pdf
• [37] Managing Late Measurements In Data Warehouses (Matteo
Golfarelli & Stefano Rizzi 2007)
• [38] A SchemaGuide for Accelerating the View Adaptation Process (Jun
Liu1, Mark Roantree1, and Zohra Bellahsene2 2010)
• [35] Temporal Query Processing in Teradata (Mohammed Al-Kateb,
Ahmad Ghazal, Alain Crolotte 2013)
• [33] Toward Propagating the Evolution of Data Warehouse on Data
Marts (Saïd Taktak and Jamel Feki, 2012)
• [31] Wrembel and Bebel (2007)
50. Supporting References
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
[1] Ramakrishnan (DBMS 3rd ed) Chapter 25
[2] http://en.wikipedia.org/wiki/Data_warehouse
[3] Introduction to Information Systems (Marakas & O'Brien 2009)
[4] Kimball, The Data Warehouse Toolkit 2nd Ed (2002) Chapter 1
[5] http://en.wikipedia.org/wiki/Temporal_database
[6] http://www.olapcouncil.org/research/glossaryly.htm
[7] http://en.wikipedia.org/wiki/OLAP_cube
[8] http://docs.oracle.com/cd/B12037_01/olap.101/b10333/multimodel.htm
[9] Multidimensional Database Technology: Bach Pedersen, Torben; S. Jensen, Christian (December 2001).
[10] TSQL2 Language Specification https://cs.arizona.edu/~rts/initiatives/tsql2/finalspec.pdf
[11] Sybase Infocenter
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc00269.1571/doc/html/bde1279401694270.html
[12] (Roddick, 1995)
[13] (Grandi, 2002)
[15] (Devlin, 1997)
[16] "Information technology -- Database languages -- SQL -- Part 2: Foundation (SQL/Foundation)," International Standards Organization,
December 2011
[18] Gupta Maintenance of Materialized Views: Problems, Techniques, and Applications
[20] De Amo & Halfeld Ferrari Alves (2000)
[21] Avoiding re-computation: View adaptation in data warehouses (1997) M Mohania
[23] http://en.wikipedia.org/wiki/Resource_Description_Framework
This presentation demonstrates the new capabilities of PowerPoint and it is best viewed in Slide Show. These slides are designed to give you great ideas for the presentations you’ll create in PowerPoint 2010!For more sample templates, click the File tab, and then on the New tab, click Sample Templates.
performs slicing (filtering data) and dicing (grouping data) of the additive measures located in the fact table of the dimensional model, these attributes are called “Dimension Levels”quantity, cost, number of customersthe hierarchy of dimensions can offer a summarized and detailed view of an analysis
performs slicing (filtering data) and dicing (grouping data) of the additive measures located in the fact table of the dimensional model, these attributes are called “Dimension Levels”quantity, cost, number of customersthe hierarchy of dimensions can offer a summarized and detailed view of an analysis
The main problem in the multidimensional model that it relies on static dimensions, which is not realistic due to for example product catalogue changes in addition and removalAnother problem is commonly there are no modifications, only additions which is again not realistic, because after Extraction Transform Load is run there might be some mistakes in data that needs to be modified
The facts are classified based on the conceptual role [37]:Flow facts, group transactions happening in a time interval into a single transaction like purchases of items and enrollmentsStock facts , monitoring items by their time state like price of share and level of a river
The facts are classified based on the conceptual role [37]:Flow facts, group transactions happening in a time interval into a single transaction like purchases of items and enrollmentsStock facts , monitoring items by their time state like price of share and level of a river
there are different factors for schema changes in data marts for example:Changes in the user requirements, triggered for instance by the need for producing more sophisticated reports, or by new categories of users that subscribe to the data mart.Changes in the application domain, arising from modifications in the business world, such as a change in the way a business is done, or a changing in the organizational structure of the company.New versions of software components being installed.System tuning activities.
there are different factors for schema changes in data marts for example:Changes in the user requirements, triggered for instance by the need for producing more sophisticated reports, or by new categories of users that subscribe to the data mart.Changes in the application domain, arising from modifications in the business world, such as a change in the way a business is done, or a changing in the organizational structure of the company.New versions of software components being installed.System tuning activities.
there are different factors for schema changes in data marts for example:Changes in the user requirements, triggered for instance by the need for producing more sophisticated reports, or by new categories of users that subscribe to the data mart.Changes in the application domain, arising from modifications in the business world, such as a change in the way a business is done, or a changing in the organizational structure of the company.New versions of software components being installed.System tuning activities.
there are different factors for schema changes in data marts for example:Changes in the user requirements, triggered for instance by the need for producing more sophisticated reports, or by new categories of users that subscribe to the data mart.Changes in the application domain, arising from modifications in the business world, such as a change in the way a business is done, or a changing in the organizational structure of the company.New versions of software components being installed.System tuning activities.
there are different factors for schema changes in data marts for example:Changes in the user requirements, triggered for instance by the need for producing more sophisticated reports, or by new categories of users that subscribe to the data mart.Changes in the application domain, arising from modifications in the business world, such as a change in the way a business is done, or a changing in the organizational structure of the company.New versions of software components being installed.System tuning activities.
there are different factors for schema changes in data marts for example:Changes in the user requirements, triggered for instance by the need for producing more sophisticated reports, or by new categories of users that subscribe to the data mart.Changes in the application domain, arising from modifications in the business world, such as a change in the way a business is done, or a changing in the organizational structure of the company.New versions of software components being installed.System tuning activities.
there are different factors for schema changes in data marts for example:Changes in the user requirements, triggered for instance by the need for producing more sophisticated reports, or by new categories of users that subscribe to the data mart.Changes in the application domain, arising from modifications in the business world, such as a change in the way a business is done, or a changing in the organizational structure of the company.New versions of software components being installed.System tuning activities.
Keeping temporal data warehouses is useless without companies using queries that support temporal data warehousesWith standard SQL it is possible but infeasible
Keeping temporal data warehouses is useless without companies using queries that support temporal data warehousesWith standard SQL it is possible but infeasible
Keeping temporal data warehouses is useless without companies using queries that support temporal data warehousesWith standard SQL it is possible but infeasible
For query rewrite optimizations, suppose the last scenario was happening, if there is an additional predicate on the original query, it should have been applied on the appropriate table before the join, this is where Teradata optimizer solves the issue by folding the derived tables into their parent queries, view folding is an internal feature of the Teradata optimizer where it converts queries on derived tables that require a temporary table to a parent query that doesn’t require a temporary table
The above scenario should be controlled by the user, the user also can do many iterations to improve data
The above scenario should be controlled by the user, the user also can do many iterations to improve data
A user interface is available to enable users to submit situational queries in an OLAP-like fashion. Queries are then handed on to a query processor, translates the user queries to executable query processing code. The submitted query can refer to a stationary cube or to a fusion cube already defined by another user or the user may need to create a new fusion cube, in that case new situational data must be found. In that case a data finder uses external registries as well as external ontologies or it just accesses the metadata already in the catalog as well as in the internal ontology. Registries are complex services or just a simple search engine
The above scenario should be controlled by the user, the user also can do many iterations to improve data
For processing time reduction, MapReduce approaches can reduce query times by splitting the query into nodes
The drill beyond feature involves an operation called cube discovery, where it formulates dimensions and hierarchies from situational data, this part is semi-automated, as the user types a keyword and the system proposes data, if not satisfactory the user has to iterate it again
The drill beyond feature involves an operation called cube discovery, where it formulates dimensions and hierarchies from situational data, this part is semi-automated, as the user types a keyword and the system proposes data, if not satisfactory the user has to iterate it again
The drill beyond feature involves an operation called cube discovery, where it formulates dimensions and hierarchies from situational data, this part is semi-automated, as the user types a keyword and the system proposes data, if not satisfactory the user has to iterate it again
The drill beyond feature involves an operation called cube discovery, where it formulates dimensions and hierarchies from situational data, this part is semi-automated, as the user types a keyword and the system proposes data, if not satisfactory the user has to iterate it again
The drill beyond feature involves an operation called cube discovery, where it formulates dimensions and hierarchies from situational data, this part is semi-automated, as the user types a keyword and the system proposes data, if not satisfactory the user has to iterate it again
The drill beyond feature involves an operation called cube discovery, where it formulates dimensions and hierarchies from situational data, this part is semi-automated, as the user types a keyword and the system proposes data, if not satisfactory the user has to iterate it again
The drill beyond feature involves an operation called cube discovery, where it formulates dimensions and hierarchies from situational data, this part is semi-automated, as the user types a keyword and the system proposes data, if not satisfactory the user has to iterate it again