The document discusses dimensional data modeling concepts. It begins with definitions of data modeling and dimensional modeling. It then covers dimensional concepts like facts, dimensions, and star schemas. It discusses challenges like slowly changing dimensions, bridge tables, and recursive hierarchies. It provides advice on how to approach dimensional modeling, including starting simple and iterating as understanding improves. It emphasizes letting the data define the optimal solution rather than rigidly applying patterns. Finally, it suggests data modeling and migration skills will grow in importance as databases are increasingly used for analysis.
This document discusses how to take an agile approach to data warehouse projects. It introduces agile practices like iterative development, minimal inventory, and frequent delivery that can be applied. It proposes using both a normalized and dimensional data model to validate understanding of the data and business domains. Visualization tools like kanban boards and thermometers are recommended. Version control is key to integrate the data model with the rest of the project. The "Spock approach" combines relational and dimensional modeling in a hybrid method.
This document discusses an agile approach to developing a data warehouse. It advocates using an Agile Enterprise Data Model to provide vision and guidance. The "Spock Approach" is described, which uses an operational data store, dimensional data warehouse, and iterative development of data marts. Data visualization techniques like data hexes are recommended to improve planning and visibility. Leadership, version control, adaptability, refinement, and refactoring are identified as important ongoing processes for an agile data warehouse project.
Data extraction, cleanup & transformation tools 29.1.16Dhilsath Fathima
Data preprocessing is an important step in the data mining process that involves transforming raw data into an understandable format. It includes tasks like data cleaning, integration, transformation, and reduction. Data cleaning identifies outliers, handles missing data and resolves inconsistencies. Data integration combines data from multiple sources. Transformation includes normalization and aggregation. Reduction techniques like binning, clustering, and sampling reduce data volume while maintaining analytical quality. Dimensionality reduction selects a minimum set of important features.
Business intelligence and data warehousesDhani Ahmad
This chapter discusses business intelligence and data warehouses. It covers how operational data differs from decision support data, the components of a data warehouse including facts, dimensions and star schemas, and how online analytical processing (OLAP) and SQL extensions support analysis of multidimensional decision support data. The chapter also discusses data mining, requirements for decision support databases, and considerations for implementing a successful data warehouse project.
A data warehouse is a central repository for storing historical and integrated data from multiple sources to be used for analysis and reporting. It contains a single version of the truth and is optimized for read access. In contrast, operational databases are optimized for transaction processing and contain current detailed data. A key aspect of data warehousing is using a dimensional model with fact and dimension tables. This allows for analyzing relationships between measures and dimensions in a multi-dimensional structure known as a data cube.
This document discusses data warehousing and OLAP (online analytical processing) technology. It defines a data warehouse as a subject-oriented, integrated, time-variant, and nonvolatile collection of data to support management decision making. It describes how data warehouses use a multi-dimensional data model with facts and dimensions to organize historical data from multiple sources for analysis. Common data warehouse architectures like star schemas and snowflake schemas are also summarized.
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB plc
MariaDB ColumnStore is a high performance columnar storage engine that provides fast and efficient analytics on large datasets in distributed environments. It stores data column-by-column for high compression and read performance. Queries are processed in parallel across nodes for scalability. MariaDB ColumnStore is used for real-time analytics use cases in industries like healthcare, life sciences, and telecommunications to gain insights from large datasets.
This document discusses how to take an agile approach to data warehouse projects. It introduces agile practices like iterative development, minimal inventory, and frequent delivery that can be applied. It proposes using both a normalized and dimensional data model to validate understanding of the data and business domains. Visualization tools like kanban boards and thermometers are recommended. Version control is key to integrate the data model with the rest of the project. The "Spock approach" combines relational and dimensional modeling in a hybrid method.
This document discusses an agile approach to developing a data warehouse. It advocates using an Agile Enterprise Data Model to provide vision and guidance. The "Spock Approach" is described, which uses an operational data store, dimensional data warehouse, and iterative development of data marts. Data visualization techniques like data hexes are recommended to improve planning and visibility. Leadership, version control, adaptability, refinement, and refactoring are identified as important ongoing processes for an agile data warehouse project.
Data extraction, cleanup & transformation tools 29.1.16Dhilsath Fathima
Data preprocessing is an important step in the data mining process that involves transforming raw data into an understandable format. It includes tasks like data cleaning, integration, transformation, and reduction. Data cleaning identifies outliers, handles missing data and resolves inconsistencies. Data integration combines data from multiple sources. Transformation includes normalization and aggregation. Reduction techniques like binning, clustering, and sampling reduce data volume while maintaining analytical quality. Dimensionality reduction selects a minimum set of important features.
Business intelligence and data warehousesDhani Ahmad
This chapter discusses business intelligence and data warehouses. It covers how operational data differs from decision support data, the components of a data warehouse including facts, dimensions and star schemas, and how online analytical processing (OLAP) and SQL extensions support analysis of multidimensional decision support data. The chapter also discusses data mining, requirements for decision support databases, and considerations for implementing a successful data warehouse project.
A data warehouse is a central repository for storing historical and integrated data from multiple sources to be used for analysis and reporting. It contains a single version of the truth and is optimized for read access. In contrast, operational databases are optimized for transaction processing and contain current detailed data. A key aspect of data warehousing is using a dimensional model with fact and dimension tables. This allows for analyzing relationships between measures and dimensions in a multi-dimensional structure known as a data cube.
This document discusses data warehousing and OLAP (online analytical processing) technology. It defines a data warehouse as a subject-oriented, integrated, time-variant, and nonvolatile collection of data to support management decision making. It describes how data warehouses use a multi-dimensional data model with facts and dimensions to organize historical data from multiple sources for analysis. Common data warehouse architectures like star schemas and snowflake schemas are also summarized.
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB plc
MariaDB ColumnStore is a high performance columnar storage engine that provides fast and efficient analytics on large datasets in distributed environments. It stores data column-by-column for high compression and read performance. Queries are processed in parallel across nodes for scalability. MariaDB ColumnStore is used for real-time analytics use cases in industries like healthcare, life sciences, and telecommunications to gain insights from large datasets.
MariaDB AX: Solución analítica con ColumnStoreMariaDB plc
MariaDB ColumnStore is a high performance columnar storage engine that provides fast and efficient analytics on large datasets in distributed environments. It stores data column-by-column for high compression and read performance. Queries are processed in parallel across nodes for scalability. MariaDB ColumnStore is used for real-time analytics use cases in industries like healthcare, life sciences, and telecommunications to gain insights from large datasets for applications like customer behavior analysis, genome research, and call data monitoring.
This document discusses data warehousing concepts and technologies. It defines a data warehouse as a subject-oriented, integrated, non-volatile, and time-variant collection of data used to support management decision making. It describes the data warehouse architecture including extract-transform-load processes, OLAP servers, and metadata repositories. Finally, it outlines common data warehouse applications like reporting, querying, and data mining.
The document provides an overview of data warehousing, decision support, online analytical processing (OLAP), and data mining. It discusses what data warehousing is, how it can help organizations make better decisions by integrating data from various sources and making it available for analysis. It also describes OLAP as a way to transform warehouse data into meaningful information for interactive analysis, and lists some common OLAP operations like roll-up, drill-down, slice and dice, and pivot. Finally, it gives a brief introduction to data mining as the process of extracting patterns and relationships from data.
This document provides an introduction to data warehousing. It defines key concepts like data, databases, information and metadata. It describes problems with heterogeneous data sources and fragmented data management in large enterprises. The solution is a data warehouse, which provides a unified view of data from various sources. A data warehouse is defined as a subject-oriented, integrated collection of historical data used for analysis and decision making. It differs from operational databases in aspects like data volume, volatility, and usage. The document outlines the extract-transform-load process and common architecture of data warehousing.
This document provides an overview of data warehousing. It defines a data warehouse as a central database that includes information from several different sources and keeps both current and historical data to support management decision making. The document describes key characteristics of a data warehouse including being subject-oriented, integrated, time-variant, and non-volatile. It also discusses common data warehouse architectures and applications.
The document discusses the key concepts and components of a data warehouse. It defines a data warehouse as a subject-oriented, integrated, non-volatile, and time-variant collection of data used for decision making. The document outlines the typical characteristics of a data warehouse including being subject-oriented, integrated, time-variant, and non-volatile. It also describes the common components of a data warehouse such as the source data, data staging, data storage, information delivery, and metadata. Finally, the document provides examples of applications and uses of data warehouses.
- Data warehousing aims to help knowledge workers make better decisions by integrating data from multiple sources and providing historical and aggregated data views. It separates analytical processing from operational processing for improved performance.
- A data warehouse contains subject-oriented, integrated, time-variant, and non-volatile data to support analysis. It is maintained separately from operational databases. Common schemas include star schemas and snowflake schemas.
- Online analytical processing (OLAP) supports ad-hoc querying of data warehouses for analysis. It uses multidimensional views of aggregated measures and dimensions. Relational and multidimensional OLAP are common architectures. Measures are metrics like sales, and dimensions provide context like products and time periods.
A data warehouse stores current and historical data for analysis and decision making. It uses a star schema with fact and dimension tables. The fact table contains measures that can be aggregated and connected to dimension tables through foreign keys. Dimensions describe the facts and contain descriptive attributes to analyze measures over time, products, locations etc. This allows analyzing large volumes of historical data for informed decisions.
The document presents information on data warehousing. It defines a data warehouse as a repository for integrating enterprise data for analysis and decision making. It describes the key components, including operational data sources, an operational data store, and end-user access tools. It also outlines the processes of extracting, cleaning, transforming, loading and accessing the data, as well as common management tools. Data marts are discussed as focused subsets of a data warehouse tailored for a specific department.
Business Intelligence (BI) involves transforming raw transactional data into meaningful information for analysis using techniques like OLAP. OLAP allows for multidimensional analysis of data through features like drill-down, slicing, dicing, and pivoting. It provides a comprehensive view of the business using concepts like dimensional modeling. The core of many BI systems is an OLAP engine and multidimensional storage that enables flexible and ad-hoc querying of consolidated data for planning, problem solving and decision making.
This document provides an introduction to data warehousing. It discusses why data warehouses are used, as they allow organizations to store historical data and perform complex analytics across multiple data sources. The document outlines common use cases and decisions in building a data warehouse, such as normalization, dimension modeling, and handling changes over time. It also notes some potential issues like performance bottlenecks and discusses strategies for addressing them, such as indexing and considering alternative data storage options.
The document defines and describes key concepts related to data warehousing. It provides definitions of data warehousing, data warehouse features including being subject-oriented, integrated, and time-variant. It discusses why data warehousing is needed, using scenarios of companies wanting consolidated sales reports. The 3-tier architecture of extraction/transformation, data warehouse storage, and retrieval is covered. Data marts are defined as subsets of the data warehouse. Finally, the document contrasts databases with data warehouses and describes OLAP operations.
1) A data warehouse is a collection of data from multiple sources used to enable informed decision making. It contains data, metadata, dimensions, facts and aggregates.
2) The typical processes in a data warehouse are extract and load, data cleaning and transformation, user queries, and data archiving.
3) The key components that manage these processes are the load manager, warehouse manager and query manager. The load manager extracts, loads and does simple transformations on the data. The warehouse manager performs more complex transformations, integrity checks and generates summaries. The query manager directs user queries to the appropriate data.
The document provides information about what a data warehouse is and why it is important. A data warehouse is a relational database designed for querying and analysis that contains historical data from transaction systems and other sources. It allows organizations to access, analyze, and report on integrated information to support business processes and decisions.
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Soujanya V
The document discusses big data issues, challenges, tools and good practices. It defines big data as large amounts of data from various sources that requires new technologies to extract value. Common big data properties include volume, velocity, variety and value. Hadoop is presented as an important tool for big data, using a distributed file system and MapReduce framework to process large datasets in parallel across clusters of servers. Good practices for big data include creating data dimensions, integrating structured and unstructured data, and improving data quality.
Data Warehouses & Deployment By Ankita dubeyAnkita Dubey
This document contains the notes about data warehouses and life cycle for data warehouse deployment project. This can be useful for students or working professionals to gain the basic knowledge about Data warehouses.
The document discusses data mining and provides an overview of key concepts. It describes data mining as the process of discovering patterns in large data sets involving techniques like classification, clustering, association rule mining, and outlier detection. It also discusses different types of data that can be mined, including transactional data and text data. Additionally, it presents different classifications of data mining systems based on the type of data, knowledge discovered, and techniques used.
The document provides an overview of dimensional data modeling. It defines key concepts such as facts, dimensions, and star schemas. It discusses the differences between relational and dimensional modeling and how dimensional modeling organizes data into facts and dimensions. The document also covers more complex dimensional modeling topics such as slowly changing dimensions, bridge tables, and hierarchies. It emphasizes the importance of understanding the data and iterating on the design. Finally, it provides 10 recommendations for dimensional modeling including using surrogate keys and type 2 slowly changing dimensions.
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015Terry Bunio
This document provides a summary of a presentation on dimensional data modeling. It begins with introducing the presenter and their background. It then covers key concepts in dimensional modeling including facts, dimensions, and different modeling approaches like star schemas and snowflakes. It discusses more complex concepts like multi-valued dimensions, slowly changing dimensions, and hierarchies. It concludes by discussing why dimensional modeling is used and provides tips on how to start dimensional modeling a database.
MariaDB AX: Solución analítica con ColumnStoreMariaDB plc
MariaDB ColumnStore is a high performance columnar storage engine that provides fast and efficient analytics on large datasets in distributed environments. It stores data column-by-column for high compression and read performance. Queries are processed in parallel across nodes for scalability. MariaDB ColumnStore is used for real-time analytics use cases in industries like healthcare, life sciences, and telecommunications to gain insights from large datasets for applications like customer behavior analysis, genome research, and call data monitoring.
This document discusses data warehousing concepts and technologies. It defines a data warehouse as a subject-oriented, integrated, non-volatile, and time-variant collection of data used to support management decision making. It describes the data warehouse architecture including extract-transform-load processes, OLAP servers, and metadata repositories. Finally, it outlines common data warehouse applications like reporting, querying, and data mining.
The document provides an overview of data warehousing, decision support, online analytical processing (OLAP), and data mining. It discusses what data warehousing is, how it can help organizations make better decisions by integrating data from various sources and making it available for analysis. It also describes OLAP as a way to transform warehouse data into meaningful information for interactive analysis, and lists some common OLAP operations like roll-up, drill-down, slice and dice, and pivot. Finally, it gives a brief introduction to data mining as the process of extracting patterns and relationships from data.
This document provides an introduction to data warehousing. It defines key concepts like data, databases, information and metadata. It describes problems with heterogeneous data sources and fragmented data management in large enterprises. The solution is a data warehouse, which provides a unified view of data from various sources. A data warehouse is defined as a subject-oriented, integrated collection of historical data used for analysis and decision making. It differs from operational databases in aspects like data volume, volatility, and usage. The document outlines the extract-transform-load process and common architecture of data warehousing.
This document provides an overview of data warehousing. It defines a data warehouse as a central database that includes information from several different sources and keeps both current and historical data to support management decision making. The document describes key characteristics of a data warehouse including being subject-oriented, integrated, time-variant, and non-volatile. It also discusses common data warehouse architectures and applications.
The document discusses the key concepts and components of a data warehouse. It defines a data warehouse as a subject-oriented, integrated, non-volatile, and time-variant collection of data used for decision making. The document outlines the typical characteristics of a data warehouse including being subject-oriented, integrated, time-variant, and non-volatile. It also describes the common components of a data warehouse such as the source data, data staging, data storage, information delivery, and metadata. Finally, the document provides examples of applications and uses of data warehouses.
- Data warehousing aims to help knowledge workers make better decisions by integrating data from multiple sources and providing historical and aggregated data views. It separates analytical processing from operational processing for improved performance.
- A data warehouse contains subject-oriented, integrated, time-variant, and non-volatile data to support analysis. It is maintained separately from operational databases. Common schemas include star schemas and snowflake schemas.
- Online analytical processing (OLAP) supports ad-hoc querying of data warehouses for analysis. It uses multidimensional views of aggregated measures and dimensions. Relational and multidimensional OLAP are common architectures. Measures are metrics like sales, and dimensions provide context like products and time periods.
A data warehouse stores current and historical data for analysis and decision making. It uses a star schema with fact and dimension tables. The fact table contains measures that can be aggregated and connected to dimension tables through foreign keys. Dimensions describe the facts and contain descriptive attributes to analyze measures over time, products, locations etc. This allows analyzing large volumes of historical data for informed decisions.
The document presents information on data warehousing. It defines a data warehouse as a repository for integrating enterprise data for analysis and decision making. It describes the key components, including operational data sources, an operational data store, and end-user access tools. It also outlines the processes of extracting, cleaning, transforming, loading and accessing the data, as well as common management tools. Data marts are discussed as focused subsets of a data warehouse tailored for a specific department.
Business Intelligence (BI) involves transforming raw transactional data into meaningful information for analysis using techniques like OLAP. OLAP allows for multidimensional analysis of data through features like drill-down, slicing, dicing, and pivoting. It provides a comprehensive view of the business using concepts like dimensional modeling. The core of many BI systems is an OLAP engine and multidimensional storage that enables flexible and ad-hoc querying of consolidated data for planning, problem solving and decision making.
This document provides an introduction to data warehousing. It discusses why data warehouses are used, as they allow organizations to store historical data and perform complex analytics across multiple data sources. The document outlines common use cases and decisions in building a data warehouse, such as normalization, dimension modeling, and handling changes over time. It also notes some potential issues like performance bottlenecks and discusses strategies for addressing them, such as indexing and considering alternative data storage options.
The document defines and describes key concepts related to data warehousing. It provides definitions of data warehousing, data warehouse features including being subject-oriented, integrated, and time-variant. It discusses why data warehousing is needed, using scenarios of companies wanting consolidated sales reports. The 3-tier architecture of extraction/transformation, data warehouse storage, and retrieval is covered. Data marts are defined as subsets of the data warehouse. Finally, the document contrasts databases with data warehouses and describes OLAP operations.
1) A data warehouse is a collection of data from multiple sources used to enable informed decision making. It contains data, metadata, dimensions, facts and aggregates.
2) The typical processes in a data warehouse are extract and load, data cleaning and transformation, user queries, and data archiving.
3) The key components that manage these processes are the load manager, warehouse manager and query manager. The load manager extracts, loads and does simple transformations on the data. The warehouse manager performs more complex transformations, integrity checks and generates summaries. The query manager directs user queries to the appropriate data.
The document provides information about what a data warehouse is and why it is important. A data warehouse is a relational database designed for querying and analysis that contains historical data from transaction systems and other sources. It allows organizations to access, analyze, and report on integrated information to support business processes and decisions.
Bigdataissueschallengestoolsngoodpractices 141130054740-conversion-gate01Soujanya V
The document discusses big data issues, challenges, tools and good practices. It defines big data as large amounts of data from various sources that requires new technologies to extract value. Common big data properties include volume, velocity, variety and value. Hadoop is presented as an important tool for big data, using a distributed file system and MapReduce framework to process large datasets in parallel across clusters of servers. Good practices for big data include creating data dimensions, integrating structured and unstructured data, and improving data quality.
Data Warehouses & Deployment By Ankita dubeyAnkita Dubey
This document contains the notes about data warehouses and life cycle for data warehouse deployment project. This can be useful for students or working professionals to gain the basic knowledge about Data warehouses.
The document discusses data mining and provides an overview of key concepts. It describes data mining as the process of discovering patterns in large data sets involving techniques like classification, clustering, association rule mining, and outlier detection. It also discusses different types of data that can be mined, including transactional data and text data. Additionally, it presents different classifications of data mining systems based on the type of data, knowledge discovered, and techniques used.
The document provides an overview of dimensional data modeling. It defines key concepts such as facts, dimensions, and star schemas. It discusses the differences between relational and dimensional modeling and how dimensional modeling organizes data into facts and dimensions. The document also covers more complex dimensional modeling topics such as slowly changing dimensions, bridge tables, and hierarchies. It emphasizes the importance of understanding the data and iterating on the design. Finally, it provides 10 recommendations for dimensional modeling including using surrogate keys and type 2 slowly changing dimensions.
Dimensional modeling primer - SQL Saturday Madison - April 11th, 2015Terry Bunio
This document provides a summary of a presentation on dimensional data modeling. It begins with introducing the presenter and their background. It then covers key concepts in dimensional modeling including facts, dimensions, and different modeling approaches like star schemas and snowflakes. It discusses more complex concepts like multi-valued dimensions, slowly changing dimensions, and hierarchies. It concludes by discussing why dimensional modeling is used and provides tips on how to start dimensional modeling a database.
This document provides an overview of data warehousing and related concepts. It defines a data warehouse as a centralized database for analysis and reporting that stores current and historical data from multiple sources. The document describes key elements of data warehousing including Extract-Transform-Load (ETL) processes, multidimensional data models, online analytical processing (OLAP), and data marts. It also outlines advantages such as enhanced access and consistency, and disadvantages like time required for data extraction and loading.
INFORMATICA ONLINE TRAINING BY QUONTRA SOLUTIONS WITH PLACEMENT ASSISTANCE
We offer online IT training with placements, project assistance in different platforms with real time industry consultants to provide quality training for all it professionals, corporate clients and students etc. Special features by Quontra Solutions are Extensive Training will be in both Informatica Online Training and Placement. We help you in resume preparation and conducting Mock Interviews.
Emphasis is given on important topics which are essential and mostly used in real time projects. Quontra Solutions is an Online Training Leader when it comes to high-end effective and efficient I.T Training. We have always been and still are focusing on the key aspects which are providing utmost effective and competent training to both students and professionals who are eager to enrich their technical skills.
Training Features at Quontra Solutions:
We believe that online training has to be measured by three major aspects viz., Quality, Content and Relationship with the Trainer and Student. Not only our online training classes are important but apart from that the material which we provide are in tune with the latest IT training standards, so a student has not to worry at all whether the training imparted is outdated or latest.
Course content:
• Basics of data warehousing concepts
• Power center components
• Informatica concepts and overview
• Sources
• Targets
• Transformations
• Advanced Informatica concepts
Please Visit us for the Demo Classes, we have regular batches and weekend batches.
QUONTRASOLUTIONS
204-226 Imperial Drive,Rayners Lane, Harrow-HA2 7HH
Phone : +44 (0)20 3734 1498 / 99
Email: info@quontrasolutions.co.uk
We offer online IT training with placements, project assistance in different platforms with real time industry consultants to provide quality training for all it professionals, corporate clients and students etc. Special features by InformaticaTrainingClasses are Extensive Training will be in both Informatica Online Training and Placement. We help you in resume preparation and conducting Mock Interviews.
Emphasis is given on important topics which are essential and mostly used in real time projects. Informatica training Classes is an Online Training Leader when it comes to high-end effective and efficient I.T Training. We have always been and still are focusing on the key aspects which are providing utmost effective and competent training to both students and professionals who are eager to enrich their technical skills.
Training Features at Informatica training classes:
We believe that online training has to be measured by three major aspects viz., Quality, Content and Relationship with the Trainer and Student. Not only our online training classes are important but apart from that the material which we provide are in tune with the latest IT training standards, so a student has not to worry at all whether the training imparted is outdated or latest.
Course content:
• Basics of data warehousing concepts
• Power center components
• Informatica concepts and overview
• Sources
• Targets
• Transformations
• Advanced Informatica concepts
Please Visit us for the Demo Classes, we have regular batches and weekend batches.
Informatica online training classes
Phone: (404)-900-9988
Email: info@informaticatrainingclasses.com
Web: http://www.informaticatrainingclasses.com
Data warehousing is an architectural model that gathers data from various sources into a single unified data model for analysis purposes. It consists of extracting data from operational systems, transforming it, and loading it into a database optimized for querying and analysis. This allows organizations to integrate data from different sources, provide historical views of data, and perform flexible analysis without impacting transaction systems. While implementation and maintenance of a data warehouse requires significant costs, the benefits include a single access point for all organizational data and optimized systems for analysis and decision making.
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
Joe Caserta went over the details inside the big data ecosystem and the Caserta Concepts Data Pyramid, which includes Data Ingestion, Data Lake/Data Science Workbench and the Big Data Warehouse. He then dove into the foundation of dimensional data modeling, which is as important as ever in the top tier of the Data Pyramid. Topics covered:
- The 3 grains of Fact Tables
- Modeling the different types of Slowly Changing Dimensions
- Advanced Modeling techniques like Ragged Hierarchies, Bridge Tables, etc.
- ETL Architecture.
He also talked about ModelStorming, a technique used to quickly convert business requirements into an Event Matrix and Dimensional Data Model.
This was a jam-packed abbreviated version of 4 days of rigorous training of these techniques being taught in September by Joe Caserta (Co-Author, with Ralph Kimball, The Data Warehouse ETL Toolkit) and Lawrence Corr (Author, Agile Data Warehouse Design).
For more information, visit http://casertaconcepts.com/.
This document provides an overview of data warehousing, including its definition, types, components, architecture, database design, OLAP, and metadata repository. It discusses the differences between OLTP and data warehousing systems and describes the key steps in building a data warehouse, including data extraction, transformation, loading, storage, analysis, delivery of information to users, and ongoing management of the data warehouse system.
This document provides an overview of data warehousing, including its definition, types, components, architecture, database design, OLAP, and metadata repository. It discusses the differences between OLTP and data warehousing systems and describes the key steps in building a data warehouse, including data extraction, transformation, loading, storage, analysis, delivery of information to users, and ongoing management of the data warehouse system.
This document provides an overview of data warehousing, including its definition, types, components, architecture, database design, OLAP, and metadata repository. It discusses the differences between OLTP and data warehousing systems and describes the key steps in building a data warehouse, including data extraction, transformation, loading, storage, analysis, delivery of information to users, and ongoing management of the data warehouse system.
This document provides an overview of data warehousing, including its definition, types, components, architecture, database design, OLAP, and metadata repository. It discusses the differences between OLTP and data warehousing systems and describes the key steps in building a data warehouse, including data extraction, transformation, loading, storage, analysis, delivery of information to users, and ongoing management of the data warehouse system.
A data warehouse is a large collection of integrated data from multiple sources that is structured for analysis and reporting. It allows users to gain insights from historical data to support business decisions and identify trends. Data is extracted from operational systems, transformed for consistency and quality, and loaded into the data warehouse where it is stored in a multidimensional structure to enable analysis. This involves fact and dimension tables along with techniques like denormalization to optimize query performance.
Various Applications of Data Warehouse.pptRafiulHasan19
The document discusses various applications of data warehousing. It begins by describing problems with traditional transactional systems and how data warehouses address these issues. It then defines key components of a data warehouse including the extraction, transformation, and loading of data from various sources. The document outlines how online analytical processing (OLAP) tools, metadata repositories, and data mining techniques analyze and explore the collected data. Finally, it weighs the benefits of a data warehouse against the costs of implementation and maintenance.
This document provides an introduction to data warehousing. It defines a data warehouse as a single, consistent store of data from various sources made available to end users in a way they can understand and use in a business context. Data warehouses consolidate information, improve query performance, and separate decision support functions from operational systems. They support knowledge discovery, reporting, data mining, and analysis to help answer business questions and make better decisions.
The document provides an introduction to data warehousing. It defines a data warehouse as a subject-oriented, integrated, time-varying, and non-volatile collection of data used for organizational decision making. It describes key characteristics of a data warehouse such as maintaining historical data, facilitating analysis to improve understanding, and enabling better decision making. It also discusses dimensions, facts, ETL processes, and common data warehouse architectures like star schemas.
CS101- Introduction to Computing- Lecture 37Bilal Ahmed
This document discusses database software and relational databases. It begins by focusing on issues with data management as the amount of data increases. Relational databases are introduced as a way to organize large amounts of interrelated data across multiple tables that can be queried. Examples of database management software are provided. The document then demonstrates creating related tables to store book inventory and customer order data. It discusses how a report can be generated by combining data from these tables. Finally, an assignment is provided to design a database with two tables, populate them with data, and generate a report.
The document discusses databases and data warehouses. It explains the differences between traditional file organization and database management. Relational and object-oriented database models are used to construct and manipulate databases. Data modeling creates a conceptual design for databases. Data is extracted from transactional databases and transformed for loading into data warehouses to support analysis and decision making.
This document discusses fundamental database concepts and data resource management. It defines key database elements like characters, fields, records, files and databases. It describes different database structures like hierarchical, network and relational structures. It also discusses distributed databases, data warehouses, data mining and compares traditional file processing with database management. Database management systems are introduced as software to create, maintain and use organizational databases.
The document provides an overview of data warehousing concepts. It defines a data warehouse as a subject-oriented, integrated, time-variant, and non-volatile collection of data. It discusses the differences between OLTP and OLAP systems. It also covers data warehouse architectures, components, and processes. Additionally, it explains key concepts like facts and dimensions, star schemas, normalization forms, and metadata.
This document provides an overview of key concepts related to database management and business intelligence. It discusses the database approach to data management, including entities, attributes, relationships, keys, normalization, and entity-relationship diagrams. It also covers relational database management systems, their operations, capabilities and querying languages. Additional topics include big data, business intelligence tools for capturing, organizing and analyzing data, and ensuring data quality. The agenda outlines a review of chapters from the textbook and an in-class ERD exercise in preparation for the first exam.
Similar to Asper database presentation - Data Modeling Topics (20)
This document discusses empathy's role in requirement gathering. It describes how using techniques like metaphors, show and tell, and innovation games during requirement gathering can help address real user needs and pains. It also discusses how ranking requirements from 1 to X based on priority and slicing large requirements into smaller user stories helps improve the requirement gathering process. The document provides examples of innovation games, empathy maps, user story maps that can be used to better understand users and their requirements.
Terry Bunio provides examples from their experience as a data modeler of common mistakes made in data modeling. Some key mistakes discussed include: anthropomorphizing data models by modeling real world entities instead of application needs; overengineering models with unnecessary flexibility; choosing poor primary keys like GUIDs; overusing surrogate keys; using composite primary keys; handling deleted records and null values incorrectly; and making complex historical data models instead of deferring history to a data warehouse. The document advocates for simpler, more application-focused data models without unnecessary complexity.
This document discusses estimating in software development. It makes the following key points in 3 sentences:
Estimating is a learned skill and should be focused on improving, as most projects will require estimates for clients. Estimates are not always wrong or waste, and even a small amount of estimating can provide more value than no estimating at all. The document advocates for estimating at the right level of detail, adapting estimates over time, and focusing on honest, assumption-based estimating done with team input to minimize risks from anchoring biases.
Pr dc 2015 sql server is cheaper than open sourceTerry Bunio
SQL Server was found to be cheaper than open source options for a data warehouse project with the following requirements:
- Serve 100% operational reports from 1TB of data
- No need for advanced features like big data support
- Requirement was for basic textual reporting
An investigation was conducted of SQL Server, Oracle, Sybase, MySQL, and PostgreSQL. SQL Server and PostgreSQL were evaluated further based on costs and functionality. After a 10 year total cost of ownership analysis, SQL Server was found to be cheaper despite having a higher initial license cost. The lessons learned were that open source options are not always cheaper, to test options yourself rather than rely on biased reports, and that Oracle is very expensive.
This document summarizes Terry Bunio's presentation on breaking and fixing broken data. It begins by thanking sponsors and providing information about Terry Bunio and upcoming SQL events. It then discusses the three types of broken data: inconsistent, incoherent, and ineffectual data. For each type, it provides an example and suggestions on how to identify and fix the issues. It demonstrates how to use tools like Oracle Data Modeler, execution plans, SQL Profiler, and OStress to diagnose problems to make data more consistent, coherent and effective.
Ssrs and sharepoint there and back again - SQL SAT FargoTerry Bunio
This document provides information about a presentation given by Terry Bunio on SSRS and SharePoint. It includes Terry Bunio's contact information and a link to their blog. It also includes two tables listing evaluation criteria and weights for evaluating different database management systems and technologies including MSSQL, Excel, SSAS, SSMS, DQS, Report Builder, Visual Studio, SharePoint, SSRS, SSIS, MDS, Power View and Power Pivot.
A data driven etl test framework sqlsat madisonTerry Bunio
This document provides an overview and summary of a SQL Saturday event on automated database testing. It discusses:
1. The presenter's background and their company Protegra which focuses on Agile and Lean practices.
2. The learning objectives of the presentation which are around why and how to automate database testing using tools like tSQLt and SQLtest.
3. A comparison of Waterfall and Agile methodologies with a focus on how Agile lends itself better to test automation.
4. A demonstration of setting up and running simple tests using tSQLt to showcase how it can automate database testing and make it easier compared to traditional methods.
SSRS and Sharepoint there and back againTerry Bunio
This document discusses integrating SQL Server Reporting Services (SSRS) reports with SharePoint. It provides an overview of how SSRS reports can be deployed and managed within SharePoint sites and lists. Key components involved include SSRS, SharePoint, SQL Server Management Studio, and Visual Studio.
The document discusses the differences between traditional project management and agile project management. An agile project manager manages the unknown rather than following a detailed plan and focuses on facilitating decision making rather than making decisions. Key traits of an agile project manager include being cross-functional, building relationships, encouraging innovation, and facilitating collaboration rather than controlling the project. Agile project management values respecting individuals over processes.
The document provides an overview of project estimating. It discusses what estimating is, the objectives of estimating, categories of estimates like rough order of magnitude and definitive estimating. It covers items to estimate like development tasks and risks. Types of estimating like task-based and deliverable-based are presented. The document emphasizes that all estimates involve assumptions and contingencies to account for risks and unknowns. It provides examples of risk and contingency factors. Effective estimating requires experience, making reasonable assumptions, and validation techniques like triangulation of different estimate types.
The document provides an introduction to lean and agile software development. It discusses common problems with traditional software development approaches and introduces the Agile Manifesto and its 12 principles. It then describes several popular agile methodologies like Scrum, Extreme Programming (XP), and Lean Software Development. Finally, it outlines some standard practices for agile software development projects, including iteration planning, key meetings, roles, requirements management, tools, development, and testing.
The document discusses the Parks Reservation Service project which involved developing a reservation system for Manitoba parks. A consortium approach was used with multiple companies to develop the system within a tight timeline. The system was developed using lean software principles with iterative delivery of working code. It provides reservation capabilities for the public, campground staff, and administrative functions for parks management. The system was successfully delivered in three phases on time and within budget.
The document discusses a parks reservation service project done using lean software development principles. It provides an overview of:
1) The client for the project was Manitoba Conservation and the challenge was to develop a new reservation system by spring 2006 to replace an existing campground reservation system.
2) A consortium approach was used including multiple companies to help address the tight timelines. The project was delivered in phases with the first phase completed on time.
3) Lean software development principles like iterative delivery and continuous feedback were used. This differed from traditional "waterfall" approaches and required different skills from project managers. The results were an on-time delivery that exceeded expectations.
The document discusses agile practices for business analysts and mindset shifts needed. It provides two examples of agile practices - specification by example to clarify requirements and different techniques for gathering requirements such as silent brainstorming. It encourages focusing on good design, defect prevention and test-first approaches when using specification by example. The document advocates for dialog over discussion when gathering requirements and shows examples of gathering requirements individually and collaboratively in silence.
Sdec11 when user stories are not enoughTerry Bunio
This document discusses the concept of "active architecture" as a way to address potential shortcomings of user stories in agile development. It provides examples of how active architecture uses component conversations and technical tasks to help ensure design consistency and completeness across iterations. The document advocates for using active architecture to envision solutions at a high level through lightweight documentation and design diagrams in order to minimize rework and catch any missing requirements or design gaps earlier.
The document discusses how lean and agile methodologies can be applied to application maintenance services (AMS). It defines AMS and its key components, including a work ticket queue, business prioritization of tickets, release planning, and a support team. The document argues that AMS is well-suited for lean because its components align with lean principles like iterative delivery, minimizing waste, and continuous value delivery. It provides examples of lean enhancements for AMS project management and reporting, such as daily stand-ups and visual status reporting.
Sdec09 kick off to deployment in 92daysTerry Bunio
The document outlines an agenda for discussing a lean software development project to create a parks reservation system over 92 days. It summarizes the client needs, challenges of the project which involved multiple companies in a consortium approach, and the iterative solution delivered in phases. Results showed increased reservation counts year over year and benefits of lean principles like empowering teams and frequent delivery of working software. Lean development differs from traditional project management in its emphasis on adapting to changes and managing uncertainty through visual tools and high feedback loops.
This document discusses how lean principles can be applied to package implementation projects involving commercial off-the-shelf software. It recommends defining requirements using user stories and test cases instead of detailed specifications. It also advocates for implementing the package in iterations where each iteration is tested and deployed to production, rather than a "big bang" approach. Other lean aspects discussed include co-locating the project team, vendor, and client to improve communication, and including contractual incentives to encourage finishing early.
An Agile Project Manager manages the unknown rather than the known details like a Traditional Project Manager. Some key differences are that an Agile PM encourages change and ambiguity, delegates tasks while managing value, and has an idealist temperament focused on collaboration rather than guarding a strict plan. An Agile PM acts as a facilitator rather than sole decision maker, provides visibility through visual tools instead of text reports, and prefers collaborating to find innovative solutions.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
4. Definition
• “A database model is a specification
describing how a database is
structured and used” – Wikipedia
5. Definition
• “A database model is a specification
describing how a database is
structured and used” – Wikipedia
• “A data model describes how the
data entities are related to each other
in the real world” - Terry
6. Data Model Characteristics
• Organize/Structure like Data Elements
• Define relationships between Data
Entities
• Highly Cohesive
• Loosely Coupled
7. The Project
• Major Health Service provider is
switching their claims system to SAP
• As part of this, they are totally
redeveloping their entire Data
Warehouse solution
8. The Project
• 3+ years duration
• 100+ Integration Projects
• 200+ Systems Personnel
9. Integration Project Streams
• Client Administration – Policy Systems
• Data Warehouse
• Legacy – Conversion from Legacy
• Queries – Queries for internal and
external
• Web – External Web Applications
10. Data Warehouse Team
• Terry – Data Architect/Modeler and
PM
• Hanaa – Lead Analyst and Data
Analyst
• Kevin – Lead Data Migration
Developer
• Lisa – Lead Report Analyst
• Les – Lead Report Developer
11. Current State
• Sybase Data Warehouse
– Combination of Normalized and
Dimensional design
• Data Migration
– Series of SQL Scripts that move data from
Legacy (Cobol) and Java Applications
• Impromptu
– 1000+ Reports
12. Target State
• SQL Server 2012
• SQL Server Integration Services for
Data Migration
• SQL Server Reporting Services for
Report Development
• Sharepoint for Report Portal
13. Target Solution
• Initial load moves 2.5 Terabytes of data
• Initial load runs once
• Incremental load runs every hour
14. Target Solution
• Operational Data Store
– Normalized
– 400+ tables
• Data Warehouse
– Dimensional
– 60+ tables
• Why both?
– ODS does not have history (Just
Transactions)
15. Our #1 Challenge
• We needed to be Agile like the other
projects!
– We are now on revision 3500
• We spent weeks planning of how to be
flexible
• Instead of spending time planning, we
spent time planning how we could
quickly change and adapt
• This also meant we created a new
automated test framework
16.
17. Beef?
• Where are the hot topics like
– Big Data
– NoSQL
– MySQL
– Data Warehouse Appliances
– Cloud
– Open Source Databases
18. Big Data
• “Commercial Databases” have come
a long way to handle large data
volume
• Big Data is still important but probably
is not required for the vast majority of
databases
– But it applicable for the Facebooks and
Amazons out there
19. Big Data
• For example, many of the Big Data
solutions featured ColumnStore
Indexing
• Now almost all commercial databases
offer ColumnStore Indexes
20. NoSQL
• NoSQL was heralded a few years ago
as the death of structured databases
• Mainly promoted from the developer
community
• Seems to have found a niche for
supporting more mainly unstructured
and dynamic data
• Traditional databases still the most
efficient for structured data
21. MySQL
• MySQL was also promoted as a great
lightweight, high performance option’
• We actually investigated it as an
option for the project
• Great example of never trusting what
you hear
22. MySQL
• All of the great MySQL benchmarks use
the simplest database engine with no
ACID compliance
– MySQL has the option to use different
engines with different features
• Once you use the ACID compliant
engine, the performance is
equivalent(or worse) to SQL Server and
PostgreSQL
23. Data Warehouse Appliances
• “marketing term for an integrated set
of servers, storage, operating
system(s), DBMS and software
specifically pre-installed and pre-
optimized for data warehousing”
24. Data Warehouse Appliances
• Recently in the Data Warehouse
Industry, there has been the rise of a
the Data Warehouse appliances
• These appliances are a one-stop
solution that builds in Big Data
capabilities
25. Data Warehouse Appliances
• Cool Names like:
– Teradata
– GreenPlum
– Netezza
– InfoSphere
– EMC
• Like Big Data these solution are valuable
if you need to play in the Big Data/Big
Analysis arena
• Most solutions don’t require them
26. Cloud
• Great to store pictures and music – the
concept still makes businesses nervous
– Also regulatory requirements sometime
prevent it
• Business is starting to become more
comfortable
– Still a ways to go
• Very few business go to the Cloud unless
they have to
– Amazon/Microsoft is changing this with their
services
27. Open Source Databases
• We investigated Open Sources
databases for our solution. We looked
at:
– MySQL
– PostgreSQL
– others
28. Open Sources Databases
• We were surprised to learn that once
you factor in all the things you get from
SQL Server, it actually is cheaper over
10 years than Open Source!!
• So we select SQL Server
30. Two design methods
• Relational
– “Database normalization is the process of organizing
the fields and tables of a relational database to
minimize redundancy and dependency. Normalization
usually involves dividing large tables into smaller (and less
redundant) tables and defining relationships between
them. The objective is to isolate data so that additions,
deletions, and modifications of a field can be made in just
one table and then propagated through the rest of the
database via the defined relationships.”.”
31. Two design methods
• Dimensional
– “Dimensional modeling always uses the concepts of facts
(measures), and dimensions (context). Facts are typically
(but not always) numeric values that can be aggregated,
and dimensions are groups of hierarchies and descriptors
that define the facts
33. Relational
• Relational Analysis
– Database design is usually in Third Normal
Form
– Database is optimized for transaction
processing. (OLTP)
– Normalized tables are optimized for
modification rather than retrieval
34. Normal forms
• 1st - Under first normal form, all occurrences of a
record type must contain the same number of
fields.
• 2nd - Second normal form is violated when a non-
key field is a fact about a subset of a key. It is only
relevant when the key is composite
• 3rd - Third normal form is violated when a non-key
field is a fact about another non-key field
Source: William Kent - 1982
36. Dimensional
• Dimensional Analysis
– Star Schema/Snowflake
– Database is optimized for analytical
processing. (OLAP)
– Facts and Dimensions optimized for
retrieval
• Facts – Business events – Transactions
• Dimensions – context for Transactions
– People
– Accounts
– Products
– Date
37. Relational
• 3 Dimensions
• Spatial Model
– No historical components except for
transactional tables
• Relational – Models the one truth of
the data
– One account ‘11’
– One person ‘Terry Bunio’
– One transaction of ‘$100.00’ on April 10th
38. Dimensional
• 4 Dimensions
• Temporal Model
– All tables have a time component
• Dimensional – Models the one truth of
the data at a point in time
– Multiple versions of Accounts over time
– Multiple versions of people over time
– One transaction
• Transactions are already temporal
39. Fact Tables
• Contains the measurements or facts
about a business process
• Are thin and deep
• Usually is:
– Business transaction
– Business Event
• The grain of a Fact table is the level of
the data recorded
– Order, Invoice, Invoice Item
40. Special Fact Tables
• Degenerate Dimensions
– Degenerate Dimensions are Dimensions
that can typically provide additional
context about a Fact
• For example, flags that describe a transaction
• Degenerate Dimensions can either be
a separate Dimension table or be
collapsed onto the Fact table
– My preference is the latter
41. Dimension Tables
• Unlike fact tables, dimension tables
contain descriptive attributes that are
typically textual fields
• These attributes are designed to serve
two critical purposes:
– query constraining and/or filtering
– query result set labeling.
Source: Wikipedia
42. Dimension Tables
• Shallow and Wide
• Usually corresponds to entities that the
business interacts with
– People
– Locations
– Products
– Accounts
43. Time Dimension
• All Dimensional Models need a time
component
• This is either a:
– Separate Time Dimension
(recommended)
– Time attributes on each Fact Table
45. Mini-Dimensions
• Splitting a Dimension up due to the
activity of change for a set of
attributes
• Helps to reduce the growth of the
Dimension table
46. Slowly Changing Dimensions
• Type 1 – Overwrite the row with the
new values and update the effective
date
– Pre-existing Facts now refer to the
updated Dimension
– May cause inconsistent reports
47. Slowly Changing Dimensions
• Type 2 – Insert a new Dimension row with
the new data and new effective date
– Update the expiry date on the prior row
• Don’t update old Facts that refer to the old
row
– Only new Facts will refer to this new Dimension
row
• Type 2 Slowly Changing Dimension
maintains the historical context of the data
48. Slowly Changing Dimensions
• No longer to I have one row to
represent:
– Account 10123
– Terry Bunio
– Sales Representative 11092
• This changes the mindset and query
syntax to retrieve data
49. Slowly Changing Dimensions
• Type 3 – The Dimension stores multiple
versions for the attribute in question
• This usually involves a current and
previous value for the attribute
• When a change occurs, no rows are
added but both the current and
previous attributes are updated
• Like Type 1, Type 3 does not retain full
historical context
50. Complexity
• Most textbooks stop here only show
the simplest Dimensional Models
• Unfortunately, I’ve never run into a
Dimensional Model like that
55. Snowflake vs Star Schema
• These extra tables are termed
outriggers
• They are used to address real world
complexities with the data
– Excessive row length
– Repeating groups of data within the
Dimension
• I will use outriggers in a limited way for
repeating data
56. Multi-Valued Dimensions
• Multi-Valued Dimensions are when a
Fact needs to connect more than
once to a Dimension
– Primary Sales Representative
– Secondary Sales Representative
57. Multi-Valued Dimensions
• Two possible solutions
– Create copies of the Dimensions for each
role
– Create a Bridge table to resolve the many
to many relationship
60. Bridge Tables
• Bridge Tables can be used to resolve any
many to many relationships
• This is frequently required with more
complex data areas
• These bridge tables need to be
considered a Dimension and they need
to use the same Slowly Changing
Dimension Design as the base Dimension
– My Recommendation
62. Why?
• Why Dimensional Model?
• Allows for a concise representation of
data for reporting. This is especially
important for Self-Service Reporting
– We reduced from 400+ tables in our
Operational Data Store to 60+ tables in
our Data Warehouse
– Aligns with real world business concepts
63. Why?
• The most important reason –
– Requires detailed understanding of the
data
– Validates the solution
– Uncovers inconsistencies and errors in the
Normalized Model
• Easy for inconsistencies and errors to hide in
400+ tables
• No place to hide when those tables are
reduced down
64. Why?
• Ultimately there must be a business
requirement for a temporal data
model and not just a spatial one.
• Although you could go through the
exercise to validate your
understanding and not implement the
Dimensional Data Model
65. How?
• Start with your simplest Dimension and Fact
tables and define the Natural Keys for them
– i.e. People, Product, Transaction, Time
• De-Normalize Reference tables to Dimensions
(And possibly Facts based on how large the
Fact tables will be)
– I place both codes and descriptions on the
Dimension and Fact tables
• Look to De-normalize other tables with the
same Cardinality into one Dimension
– Validate the Natural Keys still define one row
66. How?
• Don’t force entities on the same
Dimension
– Tempting but you will find it doesn’t
represent the data and will cause issues
for loading or retrieval
– Bridge table or mini-snowflakes are not
bad
• I don’t like a deep snowflake, but shallow
snowflakes can be appropriate
• Don’t fall into the Star-Schema/Snowflake Holy
War – Let your data define the solution
67. How?
• Iterate, Iterate, Iterate
– Your initial solution will be wrong
– Create it and start to define the load
process and reports
– You will learn more by using the data than
months of analysis to try and get the
model right
69. Two things to Ponder
• In the Information Age ahead,
databases will be used more for
analysis than operational
– More Dimensional Models and analytical
processes
70. Two things to Ponder
• Critical skills going forward will be:
– Data Modeling/Data Architecture
– Data Migration
• There is a whole subject area here for a
subsequent presentation. More of an art than
science
– Data Verbalization
• Again a real art form to take a huge amount
of data and present it in a readable form