This document discusses key concepts in data warehousing and modeling. It describes a multitier architecture for data warehousing consisting of a bottom tier warehouse database, middle tier OLAP server, and top tier front-end client tools. It also discusses different data warehouse models including enterprise warehouses, data marts, and virtual warehouses. The document outlines the extraction, transformation, and loading process used to populate data warehouses and the role of metadata repositories.
Unit-IV-Introduction to Data Warehousing .pptxHarsha Patel
Data warehousing combines data from multiple sources to ensure data quality and accuracy. It separates analytics processing from transactional databases. A data warehouse stores historical data and allows fast querying of all data, using OLAP, while a database stores current transactions for online processing using OLTP. A multidimensional data model organizes data into cubes with dimensions and facts to allow analyzing data from different perspectives. Key components of a data warehouse architecture include external data sources, a staging area using ETL, the data warehouse, and data marts containing subsets of warehouse data.
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptxshruthisweety4
The document discusses data warehousing and data warehouse architectures. It defines a data warehouse as a system that aggregates data from different sources into a consistent data store to support analysis and machine learning on huge volumes of historical data. It describes three common types of data warehouses and characteristics like being subject-oriented, integrated, and time-variant. It then outlines common data warehouse architectures including single tier, two tier, and three tier architectures and discusses components like the source layer, data staging, data warehouse layer, and analysis layer. Finally, it discusses properties of data warehouse architectures like separation of analytical and transactional processing and scalability.
Implementation of Data Marts in Data ware houseIJARIIT
A data mart is a persistent physical store of operational and aggregated data statistically processed data that supports businesspeople in making decisions based primarily on analyses of past activities and results. A data mart contains a predefined subset of enterprise data organized for rapid analysis and reporting. Data warehousing has come into being because the file structure of the large mainframe core business systems is inimical to information retrieval. The purpose of the data warehouse is to combine core business and data from other sources in a format that facilitates reporting and decision support. In just a few years, data warehouses have evolved from large, centralized data repositories to subject specific, but independent, data marts and now to dependent marts that load data from a central repository of Data Staging files that has previously extracted data from the institution’s operational business systems (e.g., student record, finance and human resource systems, etc.).
Operational database systems are designed to support transaction processing while data warehouses are designed to support analytical processing and report generation. Operational systems focus on business processes, contain current data, and are optimized for fast updates. Data warehouses are subject-oriented, contain historical data that is rarely changed, and are optimized for fast data retrieval. The three main components of a data warehouse architecture are the database server, OLAP server, and client tools. Data is extracted from operational systems, transformed, cleansed, and loaded into fact and dimension tables in the data warehouse using the ETL process. Multidimensional schemas like star, snowflake, and constellation organize this data. Common OLAP operations performed on the data include roll-up,
The document provides an overview of data warehousing. It defines a data warehouse as a repository of information gathered from multiple sources and organized under a unified schema for analysis and reporting. It describes the typical architecture of a data warehouse including data sources, extraction/transformation/loading, the data repository, reporting tools, and metadata. It also covers dimensional modeling, normalization, advantages like increased access and consistency, and concerns around extraction/loading time and compatibility.
A data warehouse is a central repository of historical data from an organization's various sources designed for analysis and reporting. It contains integrated data from multiple systems optimized for querying and analysis rather than transactions. Data is extracted, cleaned, and loaded from operational sources into the data warehouse periodically. The data warehouse uses a dimensional model to organize data into facts and dimensions for intuitive analysis and is optimized for reporting rather than transaction processing like operational databases. Data warehousing emerged to meet the growing demand for analysis that operational systems could not support due to impacts on performance and limitations in reporting capabilities.
The document discusses databases versus data warehousing. It notes that databases are for operational purposes like storage and retrieval for applications, while data warehouses are used for informational purposes like business reporting and analysis. A data warehouse contains integrated, subject-oriented data from multiple sources that is used to support management decisions.
This document provides an overview of data warehousing. It defines data warehousing as collecting data from multiple sources into a central repository for analysis and decision making. The document outlines the history of data warehousing and describes its key characteristics like being subject-oriented, integrated, and time-variant. It also discusses the architecture of a data warehouse including sources, transformation, storage, and reporting layers. The document compares data warehousing to traditional DBMS and explains how data warehouses are better suited for analysis versus transaction processing.
Unit-IV-Introduction to Data Warehousing .pptxHarsha Patel
Data warehousing combines data from multiple sources to ensure data quality and accuracy. It separates analytics processing from transactional databases. A data warehouse stores historical data and allows fast querying of all data, using OLAP, while a database stores current transactions for online processing using OLTP. A multidimensional data model organizes data into cubes with dimensions and facts to allow analyzing data from different perspectives. Key components of a data warehouse architecture include external data sources, a staging area using ETL, the data warehouse, and data marts containing subsets of warehouse data.
UNIT 2 DATA WAREHOUSING AND DATA MINING PRESENTATION.pptxshruthisweety4
The document discusses data warehousing and data warehouse architectures. It defines a data warehouse as a system that aggregates data from different sources into a consistent data store to support analysis and machine learning on huge volumes of historical data. It describes three common types of data warehouses and characteristics like being subject-oriented, integrated, and time-variant. It then outlines common data warehouse architectures including single tier, two tier, and three tier architectures and discusses components like the source layer, data staging, data warehouse layer, and analysis layer. Finally, it discusses properties of data warehouse architectures like separation of analytical and transactional processing and scalability.
Implementation of Data Marts in Data ware houseIJARIIT
A data mart is a persistent physical store of operational and aggregated data statistically processed data that supports businesspeople in making decisions based primarily on analyses of past activities and results. A data mart contains a predefined subset of enterprise data organized for rapid analysis and reporting. Data warehousing has come into being because the file structure of the large mainframe core business systems is inimical to information retrieval. The purpose of the data warehouse is to combine core business and data from other sources in a format that facilitates reporting and decision support. In just a few years, data warehouses have evolved from large, centralized data repositories to subject specific, but independent, data marts and now to dependent marts that load data from a central repository of Data Staging files that has previously extracted data from the institution’s operational business systems (e.g., student record, finance and human resource systems, etc.).
Operational database systems are designed to support transaction processing while data warehouses are designed to support analytical processing and report generation. Operational systems focus on business processes, contain current data, and are optimized for fast updates. Data warehouses are subject-oriented, contain historical data that is rarely changed, and are optimized for fast data retrieval. The three main components of a data warehouse architecture are the database server, OLAP server, and client tools. Data is extracted from operational systems, transformed, cleansed, and loaded into fact and dimension tables in the data warehouse using the ETL process. Multidimensional schemas like star, snowflake, and constellation organize this data. Common OLAP operations performed on the data include roll-up,
The document provides an overview of data warehousing. It defines a data warehouse as a repository of information gathered from multiple sources and organized under a unified schema for analysis and reporting. It describes the typical architecture of a data warehouse including data sources, extraction/transformation/loading, the data repository, reporting tools, and metadata. It also covers dimensional modeling, normalization, advantages like increased access and consistency, and concerns around extraction/loading time and compatibility.
A data warehouse is a central repository of historical data from an organization's various sources designed for analysis and reporting. It contains integrated data from multiple systems optimized for querying and analysis rather than transactions. Data is extracted, cleaned, and loaded from operational sources into the data warehouse periodically. The data warehouse uses a dimensional model to organize data into facts and dimensions for intuitive analysis and is optimized for reporting rather than transaction processing like operational databases. Data warehousing emerged to meet the growing demand for analysis that operational systems could not support due to impacts on performance and limitations in reporting capabilities.
The document discusses databases versus data warehousing. It notes that databases are for operational purposes like storage and retrieval for applications, while data warehouses are used for informational purposes like business reporting and analysis. A data warehouse contains integrated, subject-oriented data from multiple sources that is used to support management decisions.
This document provides an overview of data warehousing. It defines data warehousing as collecting data from multiple sources into a central repository for analysis and decision making. The document outlines the history of data warehousing and describes its key characteristics like being subject-oriented, integrated, and time-variant. It also discusses the architecture of a data warehouse including sources, transformation, storage, and reporting layers. The document compares data warehousing to traditional DBMS and explains how data warehouses are better suited for analysis versus transaction processing.
- A data warehouse is a central repository for an organization's historical data that is used to support management reporting and decision making. It contains data from multiple sources integrated into a consistent structure.
- Data warehouses are optimized for querying and analysis rather than transactions. They use a dimensional model and denormalized structures to improve query performance for business users.
- There are two main approaches to data warehouse design - the dimensional model advocated by Kimball and the normalized model advocated by Inmon. Both have advantages and disadvantages for query performance and ease of use.
A data warehouse consists of several key components:
- Current detail data from operational systems of record which is stored for analysis.
- Integration and transformation programs that convert operational data into a common format for the data warehouse.
- Summarized and archived data used for reporting and analysis over time.
- Metadata that describes the structure and meaning of the data.
Data warehouses are used for standard reporting, queries on summarized data, and data mining of patterns in large datasets to gain business insights.
This document is about Data Warehouse Tools such as:
OLAP (On – line Analytical Processing)
OLTP (On – Line Transaction Processing)
Business Intelligence
Driving Force
Data Mart
Meta Data
This document discusses a student's assignment submission on the topics of data warehousing and data mining. It provides definitions and explanations of key concepts related to data warehousing such as the three layers of a data warehouse (staging, integration, access), ETL processes, dimensional vs normalized data storage approaches, and top-down vs bottom-up design methodologies. For data mining, it outlines the typical processes of pre-processing, data mining tasks like classification and clustering, and results validation. Sample applications are also listed for both data warehousing and data mining.
The document discusses data warehousing concepts and technologies. It defines a data warehouse as a subject-oriented, integrated, time-variant, and non-volatile collection of data used for decision making. Key aspects covered include multidimensional data modeling using facts, dimensions, and cubes; data warehouse architectures; and efficient cube computation methods such as ROLAP-based algorithms.
The document discusses two common data warehouse architectures: independent data marts and a three-layer approach. With independent data marts, data is extracted from source systems into separate data marts, each with their own ETL process. This can result in redundant work and inconsistent data across marts. The three-layer approach includes an enterprise data warehouse, operational data store, and dependent data marts filled from the warehouse, allowing for consistent, consolidated data and easier analysis across subjects.
A data mart is a smaller subset of data from a data warehouse that is tailored to a specific business unit or function. It provides faster access to relevant data than searching an entire data warehouse. There are three main types of data marts - dependent, which get data from a data warehouse; independent, which access data directly from sources; and hybrid, which integrate multiple data sources. Data marts use either a star or snowflake schema to logically structure the data in dimension and fact tables for analysis. Implementing a data mart involves designing it, constructing the logical and physical structures, transferring data using ETL tools, configuring access, and ongoing management.
1. The document discusses data warehousing and data mining. Data warehousing involves collecting and integrating data from multiple sources to support analysis and decision making. Data mining involves analyzing large datasets to discover patterns.
2. Web mining is discussed as a type of data mining that analyzes web data. There are three domains of web mining: web content mining, web structure mining, and web usage mining. Common techniques for web mining include clustering, association rules, path analysis, and sequential patterns.
3. Web mining has benefits like addressing ineffective search engines and monitoring user visit habits to improve website design. Data warehousing and data mining can provide useful business intelligence when the right analysis techniques are applied to large amounts of integrated
This document provides an overview of data warehousing concepts. It defines a data warehouse as a collection of data marts representing historical data from different company operations. It discusses the top-down and bottom-up approaches to building a data warehouse, as well as considerations for data warehouse design including data content, metadata, data distribution, and tools. Finally, it briefly describes different architectures for mapping a data warehouse to a multiprocessor system, including shared memory, shared disk, and shared nothing architectures.
Top 60+ Data Warehouse Interview Questions and Answers.pdfDatacademy.ai
This is a comprehensive guide to the most frequently asked data warehouse interview questions and answers. It covers a wide range of topics including data warehousing concepts, ETL processes, dimensional modeling, data storage, and more. The guide aims to assist job seekers, students, and professionals in preparing for data warehouse job interviews and exams.
The document discusses emerging trends in database systems, including data warehousing and data mining. It provides definitions and characteristics of data warehousing, including that it involves centralizing organizational data in a central repository for analysis. It contrasts operational and transactional databases with data warehouses, noting that data warehouses are designed for analysis rather than transactions and contain historical, aggregated data. It also discusses data marts, ETL processes, and the components of a typical data warehouse architecture.
The document discusses emerging trends in database systems, including data warehousing and data mining. It provides definitions and characteristics of data warehousing, including that it involves centralizing organizational data in a central repository for analysis. It contrasts operational and transactional databases with data warehouses, noting that data warehouses are designed for analysis rather than transactions and contain historical, summarized data. It also discusses data marts, ETL processes, and the components of a typical data warehouse architecture.
The document discusses key concepts related to data warehousing including:
1) What data warehousing is, its main components, and differences from OLTP systems.
2) The typical architecture of a data warehouse including operational data sources, storage, and end-user access tools.
3) Important considerations like data flows, integration, management of metadata, and tools/technologies used.
4) Additional topics such as benefits, challenges, administration, and data marts.
Data Warehouse – Introduction, characteristics, architecture, scheme and modelling, Differences between operational database systems and data warehouse.
Data warehouse is defined as " A Subject-Oriented integrated, time-varient and nonvolatile collection of data in support of management decision making process
The document provides information about data warehousing concepts. It defines a data warehouse as a relational database designed for query and analysis rather than transactions. It contains historical data from various sources and separates analysis from transaction workloads. The goals of a data warehouse are to provide a single source of integrated information, give users direct access to data without relying on IT, and allow predictive modeling. Factors like significant user requests for related historical data and advanced decision support needs should be considered when implementing a data warehouse.
this is the ppt this contains definition of data ware house , data , ware house, data modeling , data warehouse architecture and its type , data warehouse types, single tire, two tire, three tire .
1. Storage challenges - The exponentially growing volumes of data can overwhelm traditional storage systems and databases.
2. Processing challenges - Analyzing large and diverse datasets in a timely manner requires massively parallel processing across thousands of CPU cores.
3. Skill challenges - There is a shortage of data scientists and engineers with the skills needed to unlock insights from big data. Traditional IT skills are insufficient.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
- A data warehouse is a central repository for an organization's historical data that is used to support management reporting and decision making. It contains data from multiple sources integrated into a consistent structure.
- Data warehouses are optimized for querying and analysis rather than transactions. They use a dimensional model and denormalized structures to improve query performance for business users.
- There are two main approaches to data warehouse design - the dimensional model advocated by Kimball and the normalized model advocated by Inmon. Both have advantages and disadvantages for query performance and ease of use.
A data warehouse consists of several key components:
- Current detail data from operational systems of record which is stored for analysis.
- Integration and transformation programs that convert operational data into a common format for the data warehouse.
- Summarized and archived data used for reporting and analysis over time.
- Metadata that describes the structure and meaning of the data.
Data warehouses are used for standard reporting, queries on summarized data, and data mining of patterns in large datasets to gain business insights.
This document is about Data Warehouse Tools such as:
OLAP (On – line Analytical Processing)
OLTP (On – Line Transaction Processing)
Business Intelligence
Driving Force
Data Mart
Meta Data
This document discusses a student's assignment submission on the topics of data warehousing and data mining. It provides definitions and explanations of key concepts related to data warehousing such as the three layers of a data warehouse (staging, integration, access), ETL processes, dimensional vs normalized data storage approaches, and top-down vs bottom-up design methodologies. For data mining, it outlines the typical processes of pre-processing, data mining tasks like classification and clustering, and results validation. Sample applications are also listed for both data warehousing and data mining.
The document discusses data warehousing concepts and technologies. It defines a data warehouse as a subject-oriented, integrated, time-variant, and non-volatile collection of data used for decision making. Key aspects covered include multidimensional data modeling using facts, dimensions, and cubes; data warehouse architectures; and efficient cube computation methods such as ROLAP-based algorithms.
The document discusses two common data warehouse architectures: independent data marts and a three-layer approach. With independent data marts, data is extracted from source systems into separate data marts, each with their own ETL process. This can result in redundant work and inconsistent data across marts. The three-layer approach includes an enterprise data warehouse, operational data store, and dependent data marts filled from the warehouse, allowing for consistent, consolidated data and easier analysis across subjects.
A data mart is a smaller subset of data from a data warehouse that is tailored to a specific business unit or function. It provides faster access to relevant data than searching an entire data warehouse. There are three main types of data marts - dependent, which get data from a data warehouse; independent, which access data directly from sources; and hybrid, which integrate multiple data sources. Data marts use either a star or snowflake schema to logically structure the data in dimension and fact tables for analysis. Implementing a data mart involves designing it, constructing the logical and physical structures, transferring data using ETL tools, configuring access, and ongoing management.
1. The document discusses data warehousing and data mining. Data warehousing involves collecting and integrating data from multiple sources to support analysis and decision making. Data mining involves analyzing large datasets to discover patterns.
2. Web mining is discussed as a type of data mining that analyzes web data. There are three domains of web mining: web content mining, web structure mining, and web usage mining. Common techniques for web mining include clustering, association rules, path analysis, and sequential patterns.
3. Web mining has benefits like addressing ineffective search engines and monitoring user visit habits to improve website design. Data warehousing and data mining can provide useful business intelligence when the right analysis techniques are applied to large amounts of integrated
This document provides an overview of data warehousing concepts. It defines a data warehouse as a collection of data marts representing historical data from different company operations. It discusses the top-down and bottom-up approaches to building a data warehouse, as well as considerations for data warehouse design including data content, metadata, data distribution, and tools. Finally, it briefly describes different architectures for mapping a data warehouse to a multiprocessor system, including shared memory, shared disk, and shared nothing architectures.
Top 60+ Data Warehouse Interview Questions and Answers.pdfDatacademy.ai
This is a comprehensive guide to the most frequently asked data warehouse interview questions and answers. It covers a wide range of topics including data warehousing concepts, ETL processes, dimensional modeling, data storage, and more. The guide aims to assist job seekers, students, and professionals in preparing for data warehouse job interviews and exams.
The document discusses emerging trends in database systems, including data warehousing and data mining. It provides definitions and characteristics of data warehousing, including that it involves centralizing organizational data in a central repository for analysis. It contrasts operational and transactional databases with data warehouses, noting that data warehouses are designed for analysis rather than transactions and contain historical, aggregated data. It also discusses data marts, ETL processes, and the components of a typical data warehouse architecture.
The document discusses emerging trends in database systems, including data warehousing and data mining. It provides definitions and characteristics of data warehousing, including that it involves centralizing organizational data in a central repository for analysis. It contrasts operational and transactional databases with data warehouses, noting that data warehouses are designed for analysis rather than transactions and contain historical, summarized data. It also discusses data marts, ETL processes, and the components of a typical data warehouse architecture.
The document discusses key concepts related to data warehousing including:
1) What data warehousing is, its main components, and differences from OLTP systems.
2) The typical architecture of a data warehouse including operational data sources, storage, and end-user access tools.
3) Important considerations like data flows, integration, management of metadata, and tools/technologies used.
4) Additional topics such as benefits, challenges, administration, and data marts.
Data Warehouse – Introduction, characteristics, architecture, scheme and modelling, Differences between operational database systems and data warehouse.
Data warehouse is defined as " A Subject-Oriented integrated, time-varient and nonvolatile collection of data in support of management decision making process
The document provides information about data warehousing concepts. It defines a data warehouse as a relational database designed for query and analysis rather than transactions. It contains historical data from various sources and separates analysis from transaction workloads. The goals of a data warehouse are to provide a single source of integrated information, give users direct access to data without relying on IT, and allow predictive modeling. Factors like significant user requests for related historical data and advanced decision support needs should be considered when implementing a data warehouse.
this is the ppt this contains definition of data ware house , data , ware house, data modeling , data warehouse architecture and its type , data warehouse types, single tire, two tire, three tire .
1. Storage challenges - The exponentially growing volumes of data can overwhelm traditional storage systems and databases.
2. Processing challenges - Analyzing large and diverse datasets in a timely manner requires massively parallel processing across thousands of CPU cores.
3. Skill challenges - There is a shortage of data scientists and engineers with the skills needed to unlock insights from big data. Traditional IT skills are insufficient.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
ACEP Magazine edition 4th launched on 05.06.2024Rahul
This document provides information about the third edition of the magazine "Sthapatya" published by the Association of Civil Engineers (Practicing) Aurangabad. It includes messages from current and past presidents of ACEP, memories and photos from past ACEP events, information on life time achievement awards given by ACEP, and a technical article on concrete maintenance, repairs and strengthening. The document highlights activities of ACEP and provides a technical educational article for members.
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
Low power architecture of logic gates using adiabatic techniquesnooriasukmaningtyas
The growing significance of portable systems to limit power consumption in ultra-large-scale-integration chips of very high density, has recently led to rapid and inventive progresses in low-power design. The most effective technique is adiabatic logic circuit design in energy-efficient hardware. This paper presents two adiabatic approaches for the design of low power circuits, modified positive feedback adiabatic logic (modified PFAL) and the other is direct current diode based positive feedback adiabatic logic (DC-DB PFAL). Logic gates are the preliminary components in any digital circuit design. By improving the performance of basic gates, one can improvise the whole system performance. In this paper proposed circuit design of the low power architecture of OR/NOR, AND/NAND, and XOR/XNOR gates are presented using the said approaches and their results are analyzed for powerdissipation, delay, power-delay-product and rise time and compared with the other adiabatic techniques along with the conventional complementary metal oxide semiconductor (CMOS) designs reported in the literature. It has been found that the designs with DC-DB PFAL technique outperform with the percentage improvement of 65% for NOR gate and 7% for NAND gate and 34% for XNOR gate over the modified PFAL techniques at 10 MHz respectively.
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...University of Maribor
Slides from talk presenting:
Aleš Zamuda: Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapter and Networking.
Presentation at IcETRAN 2024 session:
"Inter-Society Networking Panel GRSS/MTT-S/CIS
Panel Session: Promoting Connection and Cooperation"
IEEE Slovenia GRSS
IEEE Serbia and Montenegro MTT-S
IEEE Slovenia CIS
11TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTING ENGINEERING
3-6 June 2024, Niš, Serbia
Presentation of IEEE Slovenia CIS (Computational Intelligence Society) Chapte...
DMDW 1st module.pdf
1. VTUPulse.com
18CS641
Dept. of CSE, ATMECE, Mysuru
Module-1
DATA WAREHOUSING & MODELLING
1.1. Introduction
1.2. Data Warehousing: A multitier Architecture
1.3. Data warehouse models: Enterprise warehouse, Data mart and virtual warehouse
1.4. Extraction, Transformation and loading
1.5. Data Cube: A multidimensional data model, Stars, Snowflakes
1.6. Fact constellations: Schemas for multidimensional Data models
1.7. Dimensions: The role of concept Hierarchies
1.8. Measures: Their Categorization and computation
1.9. Typical OLAP Operations
1.10. Outcome
1.11. Important Questions
1.1 Introduction
Data warehouses generalize and consolidate data in multidimensional space. The construction of data
warehouses involves data cleaning, data integration, and data transformation, and can be viewed as an
important preprocessing step for data mining. Moreover, data warehouses provide online analytical
processing (OLAP) tools for the interactive analysis of multidimensional data of varied granularities,
which facilitates effective data generalization and data mining.
Many other data mining functions, such as association, classification, prediction, and clustering, can be
integrated with OLAP operations to enhance interactive mining of knowledge at multiple levels of
abstraction. Hence, the data warehouse has become an increasingly important platform For data analysis
and OLAP and will provide an effective platform for datamining. Therefore ,data warehousing and OLAP
form an essential step in the knowledge discovery process.
1.2 Data Warehouse: Basic Concepts
What Is a Data Warehouse?
Data warehousing provides architectures and tools for business executives to systematically organize,
understand, and use their data to make strategic decisions. Data warehouse systems are valuable tools in
today’s competitive, fast-evolving world. In the last several years, many firms have spent millions of
dollars in building enterprise-wide data warehouses. Many people feel that with competition mounting in
every industry, data warehousing is the latest must-have marketing weapon—a way to retain customers by
learning more about their needs.
A data warehouse refers to a data repository that is maintained separately from an organization’s
operational databases. Data warehouse systems allow for integration of a variety of application systems.
They support information processing by providing a solid platform of consolidated historic data for
analysis.
According to William H. Inmon, a leading architect in the construction of data warehouse systems, “A
data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support
of management’s decision making process”.
2. VTUPulse.com
18CS641
Dept. of CSE, ATMECE, Mysuru
Key features:
1. Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For example,
"sales" can be a particular subject.
2. Integrated: A data warehouse integrates data from multiple data sources. For example, source A
and source B may have different ways of identifying a product, but in a data warehouse, there will be only
a single way of identifying a product.
3. Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from
3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a
transactions system, where often only the most recent data is kept. For example, a transaction system may
hold the most recent address of a customer, where a data warehouse can hold all addresses associated with
a customer.
4. Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data
warehouse should never be altered.
Differences between Operational Database Systems and Data Warehouses
The major task of online operational database systems is to perform online transaction and query
processing. These systems are called online transaction processing (OLTP) systems. They cover most
of the day-to-day operations of an organization such as purchasing, inventory, manufacturing, banking,
payroll, registration, and accounting.
Data warehouse systems, on the other hand, serve users or knowledge workers in the role of data analysis
and decision making. Such systems can organize and present data in various formats in order to
accommodate the diverse needs of different users. These systems are known as online analytical
processing(OLAP) systems. The major distinguishing features of OLTP and OLAP are summarized as
follows:
3. VTUPulse.com
18CS641
Dept. of CSE, ATMECE, Mysuru
Data Warehousing: A Multitiered Architecture
Tier-1:
The bottom tier is a warehouse database server that is almost always a relational database system. Back-
end tools and utilities are used to feed data into the bottom tier from operational databases or other
external sources (such as customer profile information provided by external consultants). These tools and
utilities perform data extraction, cleaning, and transformation (e.g., to merge similar data from different
sources into a unified format), as well as load and refresh functions to update the data warehouse . The
data are extracted using application program interfaces known as gateways. A gateway is supported by the
underlying DBMS and allows client programs to generate SQL code to be executed at a server.
Examples of gateways includes ODBC (open database connection) and OLEDB (Open Linking and
Embedding for Databases) by Microsoft and JDBC (Java Database Connection). This tier also contains a
metadata repository, which stores information about the data warehouse and its contents.
4. VTUPulse.com
18CS641
Dept. of CSE, ATMECE, Mysuru
Tier-2:
The middle tier is an OLAP server that is typically implemented using either a relational OLAP (ROLAP)
model or a multidimensional OLAP.
OLAP model is an extended relational DBMS that maps operations on multidimensional data to
standard relational operations.
A multidimensional OLAP (MOLAP) model, that is, a special-purpose server that directly
implements multidimensional data and operations.
Tier-3:
The top tier is a front-end client layer, which contains query and reporting tools, analysis
tools, and/or data mining tools (e.g., trend analysis, prediction, and so on).
1.3 Data Warehouse Models
There are three data warehouse models.
1. Enterprise warehouse:
An enterprise warehouse collects all of the information about subjects spanning the entire
organization.
5. VTUPulse.com
18CS641
Dept. of CSE, ATMECE, Mysuru
It provides corporate-wide data integration, usually from one or more operational systems or
external information providers, and is cross-functional in scope.
It typically contains detailed data aswell as summarized data, and can range in size from a few
gigabytes to hundreds of gigabytes, terabytes, or beyond.
An enterprise data warehouse may be implemented on traditional mainframes, computer super
servers, or parallel architecture platforms. It requires extensive business modeling and may take years to
design and build.
2. Data mart:
A data mart contains a subset of corporate-wide data that is of value to a specific group of users.
The scope is confined to specific selected subjects. For example, a marketing data mart may confine its
subjects to customer, item, and sales. The data contained in data marts tend to be summarized.
Data marts are usually implemented on low-cost departmental servers that are UNIX/LINUX-
or Windows-based. The implementation cycle of a data mart is more likely to be measured in weeks rather
than months or years. However, it may involve complex integration in the long run if its design and
planning were not enterprise-wide.
Depending on the source of data, data marts can be categorized as independent or dependent. Independent
data marts are sourced from data captured from one or more operational systems or external information
providers, or from data generated locally within a particular department or geographic area. Dependent
data marts are sourced directly from
enterprise data warehouses.
3. Virtual warehouse:
A virtual warehouse is a set of views over operational databases. For efficient query processing,
only some of the possible summary views may be materialized.
A virtual warehouse is easy to build but requires excess capacity on operational database
servers.
6. VTUPulse.com
18CS641
Dept. of CSE, ATMECE, Mysuru
1.4 Extraction, Transformation, and Loading
Data warehouse systems use back-end tools and utilities to populate and refresh their data
.These tools and utilities include the following functions:
Data extraction, which typically gathers data from multiple, heterogeneous, and external
sources.
Data cleaning, which detects errors in the data and rectifies them when possible.
Data transformation, which converts data from legacy or host format to warehouse format.
Load, which sorts, summarizes, consolidates, computes views, checks integrity, and builds
indices and partitions.
Refresh, which propagates the updates from the data sources to the warehouse.
Meta Data Repository:
Metadata are data about data. When used in a data warehouse, metadata are the data that define warehouse
objects. Metadata are created for the data names and definitions of the given warehouse. Additional
metadata are created and captured for time stamping any extracted data, the source of the extracted data,
and missing fields that have been added by data cleaning or integration processes.
A metadata repository should contain the following:
A description of the structure of the data warehouse, which includes the warehouse schema,
view, dimensions, hierarchies, and derived data definitions, as well as data mart locations and contents.
Operational metadata, which include data lineage (history of migrated data and the sequence of
transformations applied to it), currency of data (active, archived, or purged), and monitoring information
(warehouse usage statistics, error reports, and audit trails).
The algorithms used for summarization, which include measure and dimension definition
algorithms, data on granularity, partitions, subject areas, aggregation, summarization, and predefined
queries and reports.
The mapping from the operational environment to the data warehouse, which includes source
databases and their contents, gateway descriptions, data partitions, data extraction, cleaning,
transformation rules and defaults, data refresh and purging rules, and security (user authorization and
access control).
Data related to system performance, which include indices and profiles that improve data
access and retrieval performance, in addition to rules for the timing and scheduling of refresh, update, and
replication cycles.
Business metadata, which include business terms and definitions, data ownership information,
and charging policies.
1.5 Data Warehouse Modeling: Data Cube and OLAP
Data warehouses and OLAP tools are based on a multidimensional data model. This model views data in
the form of a data cube.
Data Cube : A multidimensional Data model
A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions
7. VTUPulse.com
18CS641
Dept. of CSE, ATMECE, Mysuru
Dimension tables, such as item (item_name, brand, type), or time(day, week, month,
quarter, year)
Fact table contains measures (such as dollars_sold) and keys to each of the related
dimension tables
In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D
cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids
forms a data cube.
Given a set of dimensions, we can generate a cuboid for each of the possible subsets of the given
dimensions. The result would form a lattice of cuboids, each showing the data at a different level of
summarization, or group-by. The lattice of cuboids is then referred to as a data cube. Figure shows a
lattice of cuboids forming a data cube for the dimensions time, item, location, and supplier. The cuboid
that holds the lowest level of summarization is called the base cuboid.
8. VTUPulse.com
18CS641
Dept. of CSE, ATMECE, Mysuru
1.6 Stars, Snowflakes, and Fact Constellations: Schemas for Multidimensional Data Models The
most popular data model for a data warehouse is a multidimensional model, which can exist in the form of a
star schema , a snow flake schema, or a fact constellation schema.
Schemas for multidimensional data models
Star schema: A fact table in the middle connected to a set of dimension tables Snowflake schema: A
refinement of star schema where some dimensional hierarchy is
normalized into a set of smaller dimension tables, forming a shape similar to snowflake
Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore
called galaxy schema or fact constellation
Star schema: The most common modeling paradigm is the star schema, in which the data warehouse
contains (1) a large central table (fact table) containing the bulk of the data, with no redundancy, and
(2) a set of smaller attendant tables (dimension tables), one for each dimension. The schema graph
resembles a starburst, with the dimension tables displayed in a radial pattern around the central fact table.
9. VTUPulse.com
18CS641
Dept. of CSE, ATMECE, Mysuru
Snowflake schema: The snowflake schema is a variant of the star schema model, where some dimension
tables are normalized, thereby further splitting the data into additional tables. The resulting schema graph
forms a shape similar to a snowflake.
Fact constellation: Sophisticated applications may require multiple fact tables to share dimension tables.
This kind of schema can be viewed as a collection of stars, and hence is called a galaxy schema or a fact
constellation.
11. VTUPulse.com
18CS641
Dept. of CSE, ATMECE, Mysuru
1.7 Dimensions: The Role of Concept Hierarchies
A concept hierarchy defines a sequence of mappings from a set of low-level concepts to higher-level,
more general concepts. Consider a concept hierarchy for the dimension location. City values for location
include Vancouver, Toronto, New York, and Chicago. Each city, however, can be mapped to the
province or state to which it belongs
For example, suppose that the dimension location is described by the attributes number, street, city,
province or state, zip code, and country. These attributes are related by a total order, forming a concept
hierarchy such as “street < city < province or state < country.” This hierarchy is shown in Figure
1.8 Measures: Their Categorization and Computation
Distributive: if the result derived by applying the function to n aggregate values is the same as that
derived by applying the function on all the data without partitioning
12. VTUPulse.com
18CS641
Dept. of CSE, ATMECE, Mysuru
Algebraic: if it can be computed by an algebraic function with M arguments (where M is a bounded
integer), each of which is obtained by applying a distributive aggregate function
Holistic: if there is no constant bound on the storage size needed to describe a subaggregate.
1.9 Typical OLAP Operations
ROLL-UP
This is like zooming-out on the data-cube This is required when the user needs further abstraction or less
detail. • Initially, the location-hierarchy was "street < city < province < country". • On rolling up, the data
is aggregated by ascending the location-hierarchy from the level-of city to level-of- country.
DRILL DOWN
This is like zooming-in on the data. This is the reverse of roll-up. • This is an appropriate operation →
when the user needs further details or → when the user wants to partition more finely or
→ when the user wants to focus on some particular values of certain dimensions. • This adds more details
to the data. • Initially, the time-hierarchy was "day < month < quarter < year”. • On drill-up, the time
dimension is descended from the level-of-quarter to the level-of-month.
PIVOT (OR ROTATE)
This is used when the user wishes to re-orient the view of the data-cube. This may involve → swapping
the rows and columns or → moving one of the row-dimensions into the column-dimension.
SLICE & DICE
These are operations for browsing the data in the cube. • These operations allow ability to look at
information from different viewpoints. • A slice is a subset of cube corresponding to a single value for 1
or more members of dimensions..
A dice operation is done by performing a selection of 2 or more dimensions.
13. VTUPulse.com
18CS641
Dept. of CSE, ATMECE, Mysuru
1.11Question Bank
1. What is data warehouse? Discuss key features
2. Differentiate between Operational Database Systems and Data Warehouses.
3. Differentiate between OLAP and OLTP
4. Why multidimensional views of data and data-cubes are used? 5.With a neat diagram, explain data-
cube implementations.
6. Describe the Multitiered Architecture of data warehousing.
7. Explain the data warehouse models