OLTP systems emphasize short, frequent transactions with a focus on data integrity and query speed. OLAP systems handle fewer but more complex queries involving data aggregation. OLTP uses a normalized schema for transactional data while OLAP uses a multidimensional schema for aggregated historical data. A data warehouse stores a copy of transaction data from operational systems structured for querying and reporting, and is used for knowledge discovery, consolidated reporting, and data mining. It differs from operational systems in being subject-oriented, larger in size, containing historical rather than current data, and optimized for complex queries rather than transactions.
This document provides an overview of key concepts related to data warehousing including what a data warehouse is, common data warehouse architectures, types of data warehouses, and dimensional modeling techniques. It defines key terms like facts, dimensions, star schemas, and snowflake schemas and provides examples of each. It also discusses business intelligence tools that can analyze and extract insights from data warehouses.
Data warehousing combines data from multiple sources into a single database to provide businesses with analytics results from data mining, OLAP, scorecarding and reporting. It extracts, transforms and loads data from operational data stores and data marts into a data warehouse and staging area to integrate and store large amounts of corporate data. Data mining analyzes large databases to extract previously unknown and potentially useful patterns and relationships to improve business processes.
This document discusses data warehousing and decision support systems. It defines a data warehouse as a subject-oriented, integrated, time-variant, and non-volatile collection of data used to support management decision making. It describes key features of a data warehouse including being subject-oriented, integrated, time-variant, and non-volatile. The document also discusses the need for decision support systems in business and different architectural styles for data warehousing like OLTP and OLAP.
The document provides an overview of key concepts in data warehousing and business intelligence, including:
1) It defines data warehousing concepts such as the characteristics of a data warehouse (subject-oriented, integrated, time-variant, non-volatile), grain/granularity, and the differences between OLTP and data warehouse systems.
2) It discusses the evolution of business intelligence and key components of a data warehouse such as the source systems, staging area, presentation area, and access tools.
3) It covers dimensional modeling concepts like star schemas, snowflake schemas, and slowly and rapidly changing dimensions.
The document discusses multidimensional databases and data warehousing. It describes multidimensional databases as optimized for data warehousing and online analytical processing to enable interactive analysis of large amounts of data for decision making. It discusses key concepts like data cubes, dimensions, measures, and common data warehouse schemas including star schema, snowflake schema, and fact constellations.
This document provides an overview of data warehousing concepts including dimensional modeling, online analytical processing (OLAP), and indexing techniques. It discusses the evolution of data warehousing, definitions of data warehouses, architectures, and common applications. Dimensional modeling concepts such as star schemas, snowflake schemas, and slowly changing dimensions are explained. The presentation concludes with references for further reading.
The document discusses data warehouses and their advantages. It describes the different views of a data warehouse including the top-down view, data source view, data warehouse view, and business query view. It also discusses approaches to building a data warehouse, including top-down and bottom-up, and steps involved including planning, requirements, design, integration, and deployment. Finally, it discusses technologies used to populate and refresh data warehouses like extraction, cleaning, transformation, load, and refresh tools.
OLTP systems emphasize short, frequent transactions with a focus on data integrity and query speed. OLAP systems handle fewer but more complex queries involving data aggregation. OLTP uses a normalized schema for transactional data while OLAP uses a multidimensional schema for aggregated historical data. A data warehouse stores a copy of transaction data from operational systems structured for querying and reporting, and is used for knowledge discovery, consolidated reporting, and data mining. It differs from operational systems in being subject-oriented, larger in size, containing historical rather than current data, and optimized for complex queries rather than transactions.
This document provides an overview of key concepts related to data warehousing including what a data warehouse is, common data warehouse architectures, types of data warehouses, and dimensional modeling techniques. It defines key terms like facts, dimensions, star schemas, and snowflake schemas and provides examples of each. It also discusses business intelligence tools that can analyze and extract insights from data warehouses.
Data warehousing combines data from multiple sources into a single database to provide businesses with analytics results from data mining, OLAP, scorecarding and reporting. It extracts, transforms and loads data from operational data stores and data marts into a data warehouse and staging area to integrate and store large amounts of corporate data. Data mining analyzes large databases to extract previously unknown and potentially useful patterns and relationships to improve business processes.
This document discusses data warehousing and decision support systems. It defines a data warehouse as a subject-oriented, integrated, time-variant, and non-volatile collection of data used to support management decision making. It describes key features of a data warehouse including being subject-oriented, integrated, time-variant, and non-volatile. The document also discusses the need for decision support systems in business and different architectural styles for data warehousing like OLTP and OLAP.
The document provides an overview of key concepts in data warehousing and business intelligence, including:
1) It defines data warehousing concepts such as the characteristics of a data warehouse (subject-oriented, integrated, time-variant, non-volatile), grain/granularity, and the differences between OLTP and data warehouse systems.
2) It discusses the evolution of business intelligence and key components of a data warehouse such as the source systems, staging area, presentation area, and access tools.
3) It covers dimensional modeling concepts like star schemas, snowflake schemas, and slowly and rapidly changing dimensions.
The document discusses multidimensional databases and data warehousing. It describes multidimensional databases as optimized for data warehousing and online analytical processing to enable interactive analysis of large amounts of data for decision making. It discusses key concepts like data cubes, dimensions, measures, and common data warehouse schemas including star schema, snowflake schema, and fact constellations.
This document provides an overview of data warehousing concepts including dimensional modeling, online analytical processing (OLAP), and indexing techniques. It discusses the evolution of data warehousing, definitions of data warehouses, architectures, and common applications. Dimensional modeling concepts such as star schemas, snowflake schemas, and slowly changing dimensions are explained. The presentation concludes with references for further reading.
The document discusses data warehouses and their advantages. It describes the different views of a data warehouse including the top-down view, data source view, data warehouse view, and business query view. It also discusses approaches to building a data warehouse, including top-down and bottom-up, and steps involved including planning, requirements, design, integration, and deployment. Finally, it discusses technologies used to populate and refresh data warehouses like extraction, cleaning, transformation, load, and refresh tools.
This document provides an overview of data warehousing. It defines data warehousing as collecting data from multiple sources into a central repository for analysis and decision making. The document outlines the history of data warehousing and describes its key characteristics like being subject-oriented, integrated, and time-variant. It also discusses the architecture of a data warehouse including sources, transformation, storage, and reporting layers. The document compares data warehousing to traditional DBMS and explains how data warehouses are better suited for analysis versus transaction processing.
In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence.[1] DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for knowledge workers throughout the enterprise.
This document discusses data warehousing and OLAP (online analytical processing) technology. It defines a data warehouse as a subject-oriented, integrated, time-variant, and nonvolatile collection of data to support management decision making. It describes how data warehouses use a multi-dimensional data model with facts and dimensions to organize historical data from multiple sources for analysis. Common data warehouse architectures like star schemas and snowflake schemas are also summarized.
This document discusses data warehousing, including its definition, importance, components, strategies, ETL processes, and considerations for success and pitfalls. A data warehouse is a collection of integrated, subject-oriented, non-volatile data used for analysis. It allows more effective decision making through consolidated historical data from multiple sources. Key components include summarized and current detailed data, as well as transformation programs. Common strategies are enterprise-wide and data mart approaches. ETL processes extract, transform and load the data. Clean data and proper implementation, training and maintenance are important for success.
This document defines a data warehouse as a collection of corporate information derived from operational systems and external sources to support business decisions rather than operations. It discusses the purpose of data warehousing to realize the value of data and make better decisions. Key components like staging areas, data marts, and operational data stores are described. The document also outlines evolution of data warehouse architectures and best practices for implementation.
A data warehouse is a central repository for storing historical and integrated data from multiple sources to be used for analysis and reporting. It contains a single version of the truth and is optimized for read access. In contrast, operational databases are optimized for transaction processing and contain current detailed data. A key aspect of data warehousing is using a dimensional model with fact and dimension tables. This allows for analyzing relationships between measures and dimensions in a multi-dimensional structure known as a data cube.
Online analytical processing (OLAP) allows users to easily extract and analyze data from different perspectives. It originated in the 1970s and was formalized in 1993, with OLAP cubes organizing numeric facts by dimensions to enable fast analysis. OLAP provides operations like roll-up, drill-down, slice, and dice to analyze aggregated data across multiple systems. It offers advantages over relational databases for consistent reporting and analysis.
This document provides an overview of the 3-tier data warehouse architecture. It discusses the three tiers: the bottom tier contains the data warehouse server which fetches relevant data from various data sources and loads it into the data warehouse using backend tools for extraction, cleaning, transformation and loading. The bottom tier also contains the data marts and metadata repository. The middle tier contains the OLAP server which presents multidimensional data to users from the data warehouse and data marts. The top tier contains the front-end tools like query, reporting and analysis tools that allow users to access and analyze the data.
Basic Introduction of Data Warehousing from Adiva Consultingadivasoft
This document provides an overview of Hyperion Essbase & Planning Training. It discusses key concepts like raw data transformation into information, online transaction processing (OLTP) systems, challenges with current data management, the purpose of data warehousing and data marts. It also covers dimensional modeling best practices, types of fact and dimension tables, and how Essbase is tuned for analysis and provides advantages over traditional databases for analytics.
This document provides an overview of data warehousing concepts including:
- Data warehouses store historical data from operational systems for analysis and reporting. The data passes through a staging area and operational data store for cleaning before loading into the data warehouse.
- Common data warehouse architectures include star schemas with fact and dimension tables and snowflake schemas with normalized dimensions. Data marts contain summarized data for specific business questions.
- ETL processes extract, transform, and load the data in three phases. Transformation cleans and prepares the data before loading into dimensional schemas.
- Data warehouses typically contain historical data, derived data generated from existing data, and metadata describing the data and schemas.
Data warehousing involves collecting data from different sources and organizing it in a way that allows for analysis to make business decisions. It provides a single, complete view of data that end users can easily understand. A data warehouse stores integrated data from multiple sources and provides historical views of data to support analysis. It allows organizations to access critical information to support reporting, queries and decision making. Common applications of data warehousing include banking, healthcare, airlines and telecommunications.
ETL tools extract data from various sources, transform it for reporting and analysis, cleanse errors, and load it into a data warehouse. They save time and money compared to manual coding by automating this process. Popular open-source ETL tools include Pentaho Kettle and Talend, while Informatica is a leading commercial tool. A comparison found that Pentaho Kettle uses a graphical interface and standalone engine, has a large user community, and includes data quality features, while Talend generates code to run ETL jobs.
This document discusses various techniques for data preprocessing including data cleaning, integration, transformation, reduction, and discretization.
Data cleaning involves filling in missing values, smoothing noisy data, identifying outliers, and resolving inconsistencies. Data integration combines data from multiple sources by integrating schemas and resolving value conflicts. Data transformation techniques include normalization, aggregation, generalization, and smoothing.
Data reduction aims to reduce the volume of data while maintaining similar analytical results. This includes data cube aggregation, dimensionality reduction by removing unimportant attributes, data compression, and discretization which converts continuous attributes to categorical bins.
This document provides an overview of data warehousing and related concepts. It defines a data warehouse as a centralized database for analysis and reporting that stores current and historical data from multiple sources. The document describes key elements of data warehousing including Extract-Transform-Load (ETL) processes, multidimensional data models, online analytical processing (OLAP), and data marts. It also outlines advantages such as enhanced access and consistency, and disadvantages like time required for data extraction and loading.
This document defines key concepts in data warehousing including dimensions, facts, star schemas, and snowflake schemas. Dimensions categorize data and provide structure through labeling and tagging. Facts contain measures and aggregates. A star schema consists of dimension tables modeled around a fact table, optimized for querying large datasets. A snowflake schema is similar to a star schema but with normalized dimension tables to separate null data for faster lookups. Conformed dimensions have a common structure that can be applied across multiple fact tables.
This document provides an introduction to data warehousing. It discusses why data warehouses are used, as they allow organizations to store historical data and perform complex analytics across multiple data sources. The document outlines common use cases and decisions in building a data warehouse, such as normalization, dimension modeling, and handling changes over time. It also notes some potential issues like performance bottlenecks and discusses strategies for addressing them, such as indexing and considering alternative data storage options.
OLAP provides multidimensional analysis of large datasets to help solve business problems. It uses a multidimensional data model to allow for drilling down and across different dimensions like students, exams, departments, and colleges. OLAP tools are classified as MOLAP, ROLAP, or HOLAP based on how they store and access multidimensional data. MOLAP uses a multidimensional database for fast performance while ROLAP accesses relational databases through metadata. HOLAP provides some analysis directly on relational data or through intermediate MOLAP storage. Web-enabled OLAP allows interactive querying over the internet.
Data is often incomplete, noisy, and inconsistent which can negatively impact mining results. Effective data cleaning is needed to fill in missing values, identify and remove outliers, and resolve inconsistencies. Other important tasks include data integration, transformation, reduction, and discretization to prepare the data for mining and obtain reduced representation that produces similar analytical results. Proper data preparation is essential for high quality knowledge discovery.
Data Mining, KDD Process, Data mining functionalities, Characterization,
Discrimination ,
Association,
Classification,
Prediction,
Clustering,
Outlier analysis, Data Cleaning as a Process
Designing Scalable Data Warehouse Using MySQLVenu Anuganti
The document discusses designing scalable data warehouses using MySQL. It covers topics like the role of MySQL in data warehousing and analytics, typical data warehouse architectures, scaling out MySQL, and limitations of MySQL for large datasets or as a scalable warehouse solution. Real-time analytics are also discussed, noting the challenges of performance and scalability for near real-time analytics.
Cloud computing is creating a new era for IT by providing a set of services that appear to have infinite capacity, immediate deployment and high availability at trivial cost. These are all appealing to someone running a data warehouse when data volume, use and cost are growing at a rapid rate.
Today most organizations look at cloud as a way to lower data center and IT costs. While cost reduction is a real benefit, there is more value in the increased scalability, speed to procure (and give up) resources, and ease of delivery in cloud environments.
Database workloads are particularly challenging in the cloud. Cloud deployments beyond a moderate scale favor shared-nothing database architectures designed to run transparently in a multi-node environment. We are still in an early period of standardization and design of software to run in the cloud. Not all workloads are suitable for deployment on a collection of small virtualized servers today. Business intelligence and analytic database workloads fall into this area, raising the importance of analysis for fit with public and private cloud options.
This document provides an overview of data warehousing. It defines data warehousing as collecting data from multiple sources into a central repository for analysis and decision making. The document outlines the history of data warehousing and describes its key characteristics like being subject-oriented, integrated, and time-variant. It also discusses the architecture of a data warehouse including sources, transformation, storage, and reporting layers. The document compares data warehousing to traditional DBMS and explains how data warehouses are better suited for analysis versus transaction processing.
In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence.[1] DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for knowledge workers throughout the enterprise.
This document discusses data warehousing and OLAP (online analytical processing) technology. It defines a data warehouse as a subject-oriented, integrated, time-variant, and nonvolatile collection of data to support management decision making. It describes how data warehouses use a multi-dimensional data model with facts and dimensions to organize historical data from multiple sources for analysis. Common data warehouse architectures like star schemas and snowflake schemas are also summarized.
This document discusses data warehousing, including its definition, importance, components, strategies, ETL processes, and considerations for success and pitfalls. A data warehouse is a collection of integrated, subject-oriented, non-volatile data used for analysis. It allows more effective decision making through consolidated historical data from multiple sources. Key components include summarized and current detailed data, as well as transformation programs. Common strategies are enterprise-wide and data mart approaches. ETL processes extract, transform and load the data. Clean data and proper implementation, training and maintenance are important for success.
This document defines a data warehouse as a collection of corporate information derived from operational systems and external sources to support business decisions rather than operations. It discusses the purpose of data warehousing to realize the value of data and make better decisions. Key components like staging areas, data marts, and operational data stores are described. The document also outlines evolution of data warehouse architectures and best practices for implementation.
A data warehouse is a central repository for storing historical and integrated data from multiple sources to be used for analysis and reporting. It contains a single version of the truth and is optimized for read access. In contrast, operational databases are optimized for transaction processing and contain current detailed data. A key aspect of data warehousing is using a dimensional model with fact and dimension tables. This allows for analyzing relationships between measures and dimensions in a multi-dimensional structure known as a data cube.
Online analytical processing (OLAP) allows users to easily extract and analyze data from different perspectives. It originated in the 1970s and was formalized in 1993, with OLAP cubes organizing numeric facts by dimensions to enable fast analysis. OLAP provides operations like roll-up, drill-down, slice, and dice to analyze aggregated data across multiple systems. It offers advantages over relational databases for consistent reporting and analysis.
This document provides an overview of the 3-tier data warehouse architecture. It discusses the three tiers: the bottom tier contains the data warehouse server which fetches relevant data from various data sources and loads it into the data warehouse using backend tools for extraction, cleaning, transformation and loading. The bottom tier also contains the data marts and metadata repository. The middle tier contains the OLAP server which presents multidimensional data to users from the data warehouse and data marts. The top tier contains the front-end tools like query, reporting and analysis tools that allow users to access and analyze the data.
Basic Introduction of Data Warehousing from Adiva Consultingadivasoft
This document provides an overview of Hyperion Essbase & Planning Training. It discusses key concepts like raw data transformation into information, online transaction processing (OLTP) systems, challenges with current data management, the purpose of data warehousing and data marts. It also covers dimensional modeling best practices, types of fact and dimension tables, and how Essbase is tuned for analysis and provides advantages over traditional databases for analytics.
This document provides an overview of data warehousing concepts including:
- Data warehouses store historical data from operational systems for analysis and reporting. The data passes through a staging area and operational data store for cleaning before loading into the data warehouse.
- Common data warehouse architectures include star schemas with fact and dimension tables and snowflake schemas with normalized dimensions. Data marts contain summarized data for specific business questions.
- ETL processes extract, transform, and load the data in three phases. Transformation cleans and prepares the data before loading into dimensional schemas.
- Data warehouses typically contain historical data, derived data generated from existing data, and metadata describing the data and schemas.
Data warehousing involves collecting data from different sources and organizing it in a way that allows for analysis to make business decisions. It provides a single, complete view of data that end users can easily understand. A data warehouse stores integrated data from multiple sources and provides historical views of data to support analysis. It allows organizations to access critical information to support reporting, queries and decision making. Common applications of data warehousing include banking, healthcare, airlines and telecommunications.
ETL tools extract data from various sources, transform it for reporting and analysis, cleanse errors, and load it into a data warehouse. They save time and money compared to manual coding by automating this process. Popular open-source ETL tools include Pentaho Kettle and Talend, while Informatica is a leading commercial tool. A comparison found that Pentaho Kettle uses a graphical interface and standalone engine, has a large user community, and includes data quality features, while Talend generates code to run ETL jobs.
This document discusses various techniques for data preprocessing including data cleaning, integration, transformation, reduction, and discretization.
Data cleaning involves filling in missing values, smoothing noisy data, identifying outliers, and resolving inconsistencies. Data integration combines data from multiple sources by integrating schemas and resolving value conflicts. Data transformation techniques include normalization, aggregation, generalization, and smoothing.
Data reduction aims to reduce the volume of data while maintaining similar analytical results. This includes data cube aggregation, dimensionality reduction by removing unimportant attributes, data compression, and discretization which converts continuous attributes to categorical bins.
This document provides an overview of data warehousing and related concepts. It defines a data warehouse as a centralized database for analysis and reporting that stores current and historical data from multiple sources. The document describes key elements of data warehousing including Extract-Transform-Load (ETL) processes, multidimensional data models, online analytical processing (OLAP), and data marts. It also outlines advantages such as enhanced access and consistency, and disadvantages like time required for data extraction and loading.
This document defines key concepts in data warehousing including dimensions, facts, star schemas, and snowflake schemas. Dimensions categorize data and provide structure through labeling and tagging. Facts contain measures and aggregates. A star schema consists of dimension tables modeled around a fact table, optimized for querying large datasets. A snowflake schema is similar to a star schema but with normalized dimension tables to separate null data for faster lookups. Conformed dimensions have a common structure that can be applied across multiple fact tables.
This document provides an introduction to data warehousing. It discusses why data warehouses are used, as they allow organizations to store historical data and perform complex analytics across multiple data sources. The document outlines common use cases and decisions in building a data warehouse, such as normalization, dimension modeling, and handling changes over time. It also notes some potential issues like performance bottlenecks and discusses strategies for addressing them, such as indexing and considering alternative data storage options.
OLAP provides multidimensional analysis of large datasets to help solve business problems. It uses a multidimensional data model to allow for drilling down and across different dimensions like students, exams, departments, and colleges. OLAP tools are classified as MOLAP, ROLAP, or HOLAP based on how they store and access multidimensional data. MOLAP uses a multidimensional database for fast performance while ROLAP accesses relational databases through metadata. HOLAP provides some analysis directly on relational data or through intermediate MOLAP storage. Web-enabled OLAP allows interactive querying over the internet.
Data is often incomplete, noisy, and inconsistent which can negatively impact mining results. Effective data cleaning is needed to fill in missing values, identify and remove outliers, and resolve inconsistencies. Other important tasks include data integration, transformation, reduction, and discretization to prepare the data for mining and obtain reduced representation that produces similar analytical results. Proper data preparation is essential for high quality knowledge discovery.
Data Mining, KDD Process, Data mining functionalities, Characterization,
Discrimination ,
Association,
Classification,
Prediction,
Clustering,
Outlier analysis, Data Cleaning as a Process
Designing Scalable Data Warehouse Using MySQLVenu Anuganti
The document discusses designing scalable data warehouses using MySQL. It covers topics like the role of MySQL in data warehousing and analytics, typical data warehouse architectures, scaling out MySQL, and limitations of MySQL for large datasets or as a scalable warehouse solution. Real-time analytics are also discussed, noting the challenges of performance and scalability for near real-time analytics.
Cloud computing is creating a new era for IT by providing a set of services that appear to have infinite capacity, immediate deployment and high availability at trivial cost. These are all appealing to someone running a data warehouse when data volume, use and cost are growing at a rapid rate.
Today most organizations look at cloud as a way to lower data center and IT costs. While cost reduction is a real benefit, there is more value in the increased scalability, speed to procure (and give up) resources, and ease of delivery in cloud environments.
Database workloads are particularly challenging in the cloud. Cloud deployments beyond a moderate scale favor shared-nothing database architectures designed to run transparently in a multi-node environment. We are still in an early period of standardization and design of software to run in the cloud. Not all workloads are suitable for deployment on a collection of small virtualized servers today. Business intelligence and analytic database workloads fall into this area, raising the importance of analysis for fit with public and private cloud options.
This document provides an overview of data mining and data warehousing concepts. It defines data mining as the process of identifying patterns in data. The data mining process involves tasks like classification, clustering, and association rule mining. It also discusses data warehousing concepts like dimensional modeling using star schemas and snowflake schemas to organize data for analysis. Common data mining techniques like decision trees, neural networks, and association rule mining are also summarized.
The document discusses data warehousing, including its purpose of realizing value from data to support better business decisions. A data warehouse contains integrated data from multiple sources to support analysis. It discusses the components of a data warehouse like staging areas, data marts, and operational data stores. The document also covers topics like the evolution of data warehouse architectures, complexities in creating a data warehouse, potential pitfalls, and best practices.
A data warehouse is a subject-oriented, integrated, time-variant collection of data that supports management's decision-making processes. It contains data extracted from various operational databases and data sources. The data is cleaned, transformed, integrated and loaded into the data warehouse for analysis. A data warehouse uses a multidimensional model with facts and dimensions to allow for complex analytical and ad-hoc queries from multiple perspectives. It is separately administered from operational databases to avoid impacting transaction processing systems and allow optimized access for decision support.
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...Denodo
Companies such as Autodesk are fast replacing the once-true- and-tried physical data warehouses with logical data warehouses/ data lakes. Why? Because they are able to accomplish the same results in 1/6 th of the time and with 1/4 th of the resources.
In this webinar, Autodesk’s Platform Lead, Kurt Jackson,, will describe how they designed a modern fast data architecture as a single unified logical data warehouse/ data lake using data virtualization and contemporary big data analytics like Spark.
Logical data warehouse / data lake is a virtual abstraction layer over the physical data warehouse, big data repositories, cloud, and other enterprise applications. It unifies both structured and unstructured data in real-time to power analytical and operational use cases.
Data Migration Using AWS Snowball, Snowball Edge & SnowmobileAmazon Web Services
The document discusses AWS Snowball, Snowball Edge, and Snowmobile - physical data transport solutions for migrating large amounts of data into AWS. Snowball is designed for petabyte-scale data migration, Snowball Edge provides petabyte-scale hybrid storage and compute capabilities, and Snowmobile is for exabyte-scale data migration using a 45-foot shipping container. The document provides details on their capabilities, use cases, security features, and cost. It also includes examples like how Oregon State University uses Snowball to migrate terabytes of oceanic research data and how DigitalGlobe migrated 100 petabytes of satellite imagery to AWS using Snowmobile.
Cloud Computing and your Data Warehousedrluckyspin
This document provides an overview of cloud computing. It defines cloud computing as "a mechanism for delivering scalable business services that execute on a decentralized computing fabric composed of commodity software and hardware." The document discusses why organizations use cloud computing, including that it allows for faster infrastructure on demand, lower costs by moving to an operational expenditure model rather than capital expenditure, and the ability to focus on core business needs by making infrastructure "someone else's problem." It also outlines some of the major players in cloud computing and different cloud computing models including private, public, hybrid and community clouds. The document discusses concepts like cloud stratification, APIs, and virtualization. It provides examples of how Teradata delivers analytics capabilities in private, public and hybrid
Data Warehouse Design on Cloud ,A Big Data approach Part_OnePanchaleswar Nayak
This document discusses data warehouse design on the cloud using a big data approach. It covers topics such as business intelligence, data warehousing, data marts, data mining, ETL architecture, data warehouse design methodologies, Bill Inmon's top-down approach, Ralph Kimball's bottom-up approach, and addressing the new challenges of volume, velocity and variety of big data with Hadoop. The document proposes an architecture for next generation data warehousing using Hadoop to handle these new big data challenges.
AWS re:Invent 2016: Migrating Your Data Warehouse to Amazon Redshift (DAT202)Amazon Web Services
Amazon Redshift is a fast, simple, cost-effective data warehousing solution, and in this session, we look at the tools and techniques you can use to migrate your existing data warehouse to Amazon Redshift. We will then present a case study on Scholastic’s migration to Amazon Redshift. Scholastic, a large 100-year-old publishing company, was running their business with older, on-premise, data warehousing and analytics solutions, which could not keep up with business needs and were expensive. Scholastic also needed to include new capabilities like streaming data and real time analytics. Scholastic migrated to Amazon Redshift, and achieved agility and faster time to insight while dramatically reducing costs. In this session, Scholastic will discuss how they achieved this, including options considered, technical architecture implemented, results, and lessons learned.
Business intelligence norms are evolving across the retail industry, and leading retailers are prioritizing analytics initiatives as a result. While the trend toward retail analytics isn’t new, maturing technologies and techniques are. Here are the trends that will shape retail analytics in 2017.
The document discusses several retail trends for 2017 including the continued growth of e-commerce, the evolution of the in-store experience to be more focused on fulfillment, inspiration and frictionless transactions, the rise of conversational commerce through voice assistants, the impact of automation and robotics on jobs, and the need for agility at large retailers to keep up with innovation. It also touches on trends like the sharing economy, crowdsourcing, mobile payments, IoT, VR and blockchain. Additionally, it discusses changes in the role of the physical store, optimizing real estate footprints, increasing productivity and prioritizing leisure experiences downtown.
The document outlines a 7-step process for mapping an entity-relationship (ER) schema to a relational database schema. The steps include mapping regular and weak entity types, binary 1:1, 1:N, and M:N relationship types, multivalued attributes, and n-ary relationship types to tables. For each type of schema element, the document describes how to represent it as a table with primary keys and foreign key attributes that preserve the relationships in the original ER schema.
Retail 2020: Retail Will Change more in the Next 5 Years than the Last 50FITCH
Against a backdrop of seismic shifts in our retail landscape, Christian Davies, Executive Creative Director, Americas at FITCH took the audience on a global tour of the major trends that will be the norm by the time we’re ringing in the New Year of 2020. Emerging trends are mapped against new shopper behaviors and the rise of Gen Z – set to be the largest group of shoppers globally by 2020 – and by new realities of retail operations, language and purpose. This presentation was given at Globalshop in Las Vegas on March 26th, 2015.
The document discusses data warehousing concepts and technologies. It defines a data warehouse as a subject-oriented, integrated, time-variant, and non-volatile collection of data used for decision making. Key aspects covered include multidimensional data modeling using facts, dimensions, and cubes; data warehouse architectures; and efficient cube computation methods such as ROLAP-based algorithms.
The document defines a data warehouse as a subject-oriented, integrated, time-variant and non-volatile collection of data to support management decision making. A data warehouse is maintained separately from operational databases and provides a platform for consolidated historical data analysis. Key features of a data warehouse include dimensional modeling using facts, dimensions, and star or snowflake schemas.
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptSubrata Kumer Paul
Jiawei Han, Micheline Kamber and Jian Pei
Data Mining: Concepts and Techniques, 3rd ed.
The Morgan Kaufmann Series in Data Management Systems
Morgan Kaufmann Publishers, July 2011. ISBN 978-0123814791
Data Mining Concept & Technique-ch04.pptMutiaSari53
This chapter discusses data warehousing and online analytical processing (OLAP). It defines a data warehouse as a subject-oriented collection of integrated and nonvolatile data used for analysis. Key concepts covered include the multidimensional data cube model used to organize warehouse data, ETL processes for loading data into the warehouse, and star and snowflake schemas for conceptual modeling. The chapter also distinguishes between OLTP and OLAP systems and operations.
Data warehousing and online analytical processingVijayasankariS
The document discusses data warehousing and online analytical processing (OLAP). It defines a data warehouse as a subject-oriented, integrated, time-variant and non-volatile collection of data used to support management decision making. It describes key concepts such as data warehouse modeling using data cubes and dimensions, extraction, transformation and loading of data, and common OLAP operations. The document also provides examples of star schemas and how they are used to model data warehouses.
The document discusses data warehousing and OLAP technology. It defines a data warehouse as a subject-oriented, integrated, time-variant and non-volatile collection of data used to support management decision making. It describes key aspects of a data warehouse including its multi-dimensional schema with fact and dimension tables and use of data cubes to enable OLAP. It also contrasts data warehouses with operational databases and discusses ETL, architecture, and design approaches.
The document discusses data warehousing and OLAP (online analytical processing). It defines a data warehouse as a subject-oriented, integrated, time-variant and non-volatile collection of data used to support management decision making. The document outlines common data warehouse architectures like star schemas and snowflake schemas and discusses how data is modeled and organized in multidimensional data cubes. It also describes typical OLAP operations for analyzing and exploring cube data like roll-up, drill-down, slice and dice.
A data warehouse is a subject-oriented, consolidated collection of integrated data from multiple sources used to support management decision making. It is separate from operational databases and contains historical data for analysis. Data warehouses use a star schema with fact and dimension tables and support online analytical processing (OLAP) for complex analysis and reporting.
What is Data Warehouse?OLTP vs. OLAP, Conceptual Modeling of Data Warehouses,Data Warehousing Components, Data Warehousing Components, Building a Data Warehouse, Mapping the Data Warehouse to a Multiprocessor Architecture, Database Architectures for Parallel Processing
Data mining 2 - Data warehouse (cheat sheet - printable)yesheeka
A data warehouse is a subject-oriented database that integrates data from multiple sources to support analysis and decision making. It contains consolidated historical data separate from operational databases. Data is cleaned, integrated, and transformed from source systems. Unlike operational databases, a data warehouse is time-variant, containing data over long periods for analysis, and non-volatile, with read-only access and no updates. It provides a consolidated multidimensional view of data to support online analytical processing rather than transaction processing.
This document discusses data warehousing and online analytical processing (OLAP). It defines a data warehouse as a subject-oriented, integrated, time-variant, and nonvolatile collection of data used for analysis and decision making. The key aspects of a data warehouse covered are its multidimensional data model using cubes and dimensions, extraction of data from multiple sources, and usage for querying, reporting, analytical processing, and data mining. Common data warehouse architectures and operations like star schemas, snowflake schemas, and OLAP functions such as roll-up and drill-down are also summarized.
The document discusses OLAP cubes and data warehousing. It defines OLAP as online analytical processing used to analyze aggregated data in data warehouses. Key concepts covered include star schemas, dimensions and facts, cube operations like roll-up and drill-down, and different OLAP architectures like MOLAP and ROLAP that use multidimensional or relational storage respectively.
The document discusses emerging trends in database systems, including data warehousing and data mining. It provides definitions and characteristics of data warehousing, including that it involves centralizing organizational data in a central repository for analysis. It contrasts operational and transactional databases with data warehouses, noting that data warehouses are designed for analysis rather than transactions and contain historical, aggregated data. It also discusses data marts, ETL processes, and the components of a typical data warehouse architecture.
The document discusses emerging trends in database systems, including data warehousing and data mining. It provides definitions and characteristics of data warehousing, including that it involves centralizing organizational data in a central repository for analysis. It contrasts operational and transactional databases with data warehouses, noting that data warehouses are designed for analysis rather than transactions and contain historical, summarized data. It also discusses data marts, ETL processes, and the components of a typical data warehouse architecture.
This document contains a question bank for the subject of data warehousing and mining. It provides definitions and characteristics of data warehouses, including that they are subject-oriented, integrated, time-variant, and non-volatile stores of data from multiple sources made available for analysis. It also defines multidimensional data models using fact and dimension tables, and classifies OLAP tools as relational, multidimensional, or hybrid. Key differences between star and snowflake schemas are that snowflake schemas further normalize dimension tables. Metadata is defined as data about data.
The document discusses various concepts related to data warehousing including:
1. The key characteristics of a data warehouse including being subject-oriented, integrated, time-variant, and non-updatable.
2. Common data warehouse architectures including two-level, independent data marts, dependent data marts with an operational data store, logical data marts with an active warehouse, and a three-layer architecture.
3. The Extract, Transform, Load (ETL) process and data reconciliation to integrate and transform data from source systems into the data warehouse.
Data Warehousing for students educationpptxjainyshah20
This document discusses data warehousing and OLAP technology. It defines a data warehouse as a subject-oriented, integrated, time-variant, and nonvolatile collection of data used to support management decision making. Key aspects covered include the multi-dimensional data model using cubes and dimensions, various data warehouse architectures like star schemas and snowflake schemas, and OLAP operations for analysis like roll-up, drill-down, slice and dice. Building a data warehouse requires a range of business, technology, and program management skills.
This document discusses key concepts in data warehousing and modeling. It describes a multitier architecture for data warehousing consisting of a bottom tier warehouse database, middle tier OLAP server, and top tier front-end client tools. It also discusses different data warehouse models including enterprise warehouses, data marts, and virtual warehouses. The document outlines the extraction, transformation, and loading process used to populate data warehouses and the role of metadata repositories.
The document discusses backtracking algorithms. It begins by defining backtracking as a methodical way to try different sequences of decisions to solve a problem until a solution is found. It then provides examples of backtracking for finding a maze path and coloring a map. The key aspects of a backtracking algorithm are that it uses depth-first search to recursively explore choices, pruning paths that do not lead to solutions.
The document discusses various backtracking algorithms and problems. It begins with an overview of backtracking as a general algorithm design technique for problems that involve traversing decision trees and exploring partial solutions. It then provides examples of specific problems that can be solved using backtracking, including the N-Queens problem, map coloring problem, and Hamiltonian circuits problem. It also discusses common terminology and concepts in backtracking algorithms like state space trees, pruning nonpromising nodes, and backtracking when partial solutions are determined to not lead to complete solutions.
The document discusses the divide and conquer algorithm design strategy. It begins by explaining the general concept of divide and conquer, which involves splitting a problem into subproblems, solving the subproblems, and combining the solutions. It then provides pseudocode for a generic divide and conquer algorithm. Finally, it gives examples of divide and conquer algorithms like quicksort, binary search, and matrix multiplication.
This document discusses randomized data structures and algorithms. It begins by motivating randomized data structures as a way to transform average case runtimes into expected runtimes that are not dependent on specific inputs. It then provides examples of randomized data structures like treaps and randomized skip lists that provide efficient operations like insertion, deletion, and search in expected logarithmic time. It also discusses how randomization can be applied in algorithms like primality testing.
This document discusses randomized data structures and algorithms. It begins by motivating randomized data structures by noting that some data structures like binary search trees have average case performance but worst case inputs. Randomizing the data structure removes dependency on inputs and provides expected case performance. The document then discusses treaps and randomized skip lists as examples of randomized data structures that provide efficient expected case performance for operations like insertion, deletion, and search. It also covers topics like randomized number generation, primality testing, and how randomization can transform average case runtimes into expected case runtimes.
Skip lists are a data structure for implementing dictionaries. They consist of multiple sorted lists, with the top list containing all elements and lower lists being subsequences. Searching works by dropping down lists until finding the target element or determining it is absent. Insertion and deletion use a randomized algorithm adding/removing elements from the appropriate lists. Analysis shows the expected space is O(n) and search, insertion and deletion times are O(log n), with these bounds also holding with high probability. Skip lists provide fast, simple dictionary implementation in practice.
Dynamic programming is an algorithm design paradigm that can be applied to problems exhibiting optimal substructure and overlapping subproblems. It works by breaking down a problem into subproblems and storing the results of already solved subproblems, rather than recomputing them multiple times. This allows for an efficient bottom-up approach. Examples where dynamic programming can be applied include the matrix chain multiplication problem, the 0-1 knapsack problem, and finding the longest common subsequence between two strings.
Dynamic programming is a technique for solving problems with overlapping subproblems and optimal substructure. It works by breaking problems down into smaller subproblems and storing the results in a table to avoid recomputing them. Examples where it can be applied include the knapsack problem, longest common subsequence, and computing Fibonacci numbers efficiently through bottom-up iteration rather than top-down recursion. The technique involves setting up recurrences relating larger instances to smaller ones, solving the smallest instances, and building up the full solution using the stored results.
This document summarizes a talk on dynamic graph algorithms. It begins with an introduction to dynamic graph algorithms, which involve maintaining a graph structure and answering queries efficiently as the graph undergoes a sequence of edge insertions and deletions. It then discusses several examples of fully dynamic algorithms for problems like connectivity, minimum spanning trees, and graph spanners. A key data structure introduced is the Euler tour tree, which represents a dynamic tree as a one-dimensional structure to support efficient updates and queries. The document concludes by outlining a fully dynamic randomized algorithm for maintaining connectivity under edge updates with polylogarithmic update time, using a hierarchical approach with multiple levels of edge partitions and ET trees.
This document discusses divide-and-conquer algorithms and their time complexities. It begins with examples of finding the maximum of a set and binary search. It then presents the general steps of a divide-and-conquer algorithm and analyzes time complexity. Several algorithms are discussed including quicksort, merge sort, 2D maxima finding, closest pair problem, convex hull problem, and matrix multiplication. Strategies like divide, conquer, and merge are used to solve problems recursively in fewer comparisons than brute force methods. Many algorithms have a time complexity of O(n log n).
This document discusses the merge sort algorithm for sorting a sequence of numbers. It begins by introducing the divide and conquer approach, which merge sort uses. It then provides an example of how merge sort works, dividing the sequence into halves, sorting the halves recursively, and then merging the sorted halves together. The document proceeds to provide pseudocode for the merge sort and merge algorithms. It analyzes the running time of merge sort using recursion trees, determining that it runs in O(n log n) time. Finally, it covers techniques for solving recurrence relations that arise in algorithms like divide and conquer approaches.
This document discusses divide-and-conquer algorithms and their time complexities. It begins with examples of finding the maximum of a set and binary search. It then presents the general steps of a divide-and-conquer algorithm and analyzes time complexity. Several algorithms are discussed including quicksort, merge sort, 2D maxima finding, closest pair problem, convex hull problem, and matrix multiplication. Strategies like divide, conquer, and merge are used to solve problems recursively in fewer comparisons than brute force methods. Many algorithms have a time complexity of O(n log n).
The document discusses greedy algorithms and provides examples of problems that can be solved using greedy techniques. It introduces the coin changing problem and activity selection problem. For activity selection, it demonstrates that a greedy approach of always selecting the activity with the earliest finish time results in an optimal solution. It provides pseudo-code for a greedy algorithm and proves that the greedy solution is optimal for the activity selection problem by showing there is always an optimal solution that makes the greedy choice and combining the greedy choice with the optimal solution to the remaining subproblem yields an optimal solution to the original problem.
The document discusses various optimization problems that can be solved using the greedy method. It begins by explaining that the greedy method involves making locally optimal choices at each step that combine to produce a globally optimal solution. Several examples are then provided to illustrate problems that can and cannot be solved with the greedy method. These include shortest path problems, minimum spanning trees, activity-on-edge networks, and Huffman coding. Specific greedy algorithms like Kruskal's algorithm, Prim's algorithm, and Dijkstra's algorithm are also covered. The document concludes by noting that the greedy method can only be applied to solve a small number of optimization problems.
This document discusses greedy algorithms and provides examples of their use. It begins by defining characteristics of greedy algorithms, such as making locally optimal choices that reduce a problem into smaller subproblems. The document then covers designing greedy algorithms, proving their optimality, and analyzing examples like the fractional knapsack problem and minimum spanning tree algorithms. Specific greedy algorithms covered in more depth include Kruskal's and Prim's minimum spanning tree algorithms and Huffman coding.
The document outlines various data structures and algorithms for implementing dictionaries and hash tables, including:
- Separate chaining, which handles collisions by storing elements that hash to the same value in a linked list. Find, insert, and delete take average time of O(1).
- Open addressing techniques like linear probing and quadratic probing, which handle collisions by probing to alternate locations until an empty slot is found. These have faster search but slower inserts and deletes.
- Double hashing, which uses a second hash function to determine probe distances when collisions occur, reducing clustering compared to linear probing.
This document discusses hashing and hash tables. It begins by introducing hash tables and describing how hashing works by mapping keys to array indices using a hash function to allow for fast insertion, deletion and search operations in O(1) average time. However, hash tables do not support ordering of elements efficiently. The document then discusses issues with hash functions such as collisions when different keys map to the same index. It describes techniques for collision resolution including separate chaining, where each index points to a linked list, and open addressing techniques like linear probing, quadratic probing and double hashing that resolve collisions by probing alternate indices in the array.
This document summarizes key points about extendible hashing and discusses its use for a spelling dictionary case study. Extendible hashing is a hashing technique that optimizes disk accesses for huge datasets by storing hash buckets in disk blocks. It uses a directory to hash to the correct bucket. The document explains how to insert keys, split buckets, and rehash the table. It also discusses solutions for the spelling dictionary case study, comparing storage and time efficiency of sorted arrays, open hashing, and closed hashing with linear probing.
This document discusses extendible hashing, which is a hashing technique for dynamic files that allows efficient insertion and deletion of records. It works by using a directory to map hash values to buckets, and dynamically expanding the directory size and number of buckets as needed to accommodate new records. When a bucket overflows, it is split into two buckets, and the directory is expanded to distinguish them. The directory size can also be contracted when buckets can be combined due to deletions. Alternative approaches like dynamic hashing and linear hashing that address the same problem of dynamic files are also overviewed.
The document discusses searching data structures like binary search trees and linked lists. It provides pseudocode for iterative searching algorithms on both ordered and unordered linked lists. The algorithms traverse the list by iterating through each node until the target value is found or the end is reached. For ordered lists, searching can stop early if the current node value is greater than the target. Tracing examples are provided to demonstrate searching for a value in sample linked lists.
How to Manage Your Lost Opportunities in Odoo 17 CRMCeline George
Odoo 17 CRM allows us to track why we lose sales opportunities with "Lost Reasons." This helps analyze our sales process and identify areas for improvement. Here's how to configure lost reasons in Odoo 17 CRM
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
Thinking of getting a dog? Be aware that breeds like Pit Bulls, Rottweilers, and German Shepherds can be loyal and dangerous. Proper training and socialization are crucial to preventing aggressive behaviors. Ensure safety by understanding their needs and always supervising interactions. Stay safe, and enjoy your furry friends!
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Dr. Vinod Kumar Kanvaria
Exploiting Artificial Intelligence for Empowering Researchers and Faculty,
International FDP on Fundamentals of Research in Social Sciences
at Integral University, Lucknow, 06.06.2024
By Dr. Vinod Kumar Kanvaria
Strategies for Effective Upskilling is a presentation by Chinwendu Peace in a Your Skill Boost Masterclass organisation by the Excellence Foundation for South Sudan on 08th and 09th June 2024 from 1 PM to 3 PM on each day.
The simplified electron and muon model, Oscillating Spacetime: The Foundation...RitikBhardwaj56
Discover the Simplified Electron and Muon Model: A New Wave-Based Approach to Understanding Particles delves into a groundbreaking theory that presents electrons and muons as rotating soliton waves within oscillating spacetime. Geared towards students, researchers, and science buffs, this book breaks down complex ideas into simple explanations. It covers topics such as electron waves, temporal dynamics, and the implications of this model on particle physics. With clear illustrations and easy-to-follow explanations, readers will gain a new outlook on the universe's fundamental nature.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
Executive Directors Chat Leveraging AI for Diversity, Equity, and InclusionTechSoup
Let’s explore the intersection of technology and equity in the final session of our DEI series. Discover how AI tools, like ChatGPT, can be used to support and enhance your nonprofit's DEI initiatives. Participants will gain insights into practical AI applications and get tips for leveraging technology to advance their DEI goals.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
2. 2
Data Warehouse
Definition
A decision support database that is maintained separately from
the organization’s operational database
Support information processing by providing a solid platform of
consolidated, historical data for analysis.
“A data warehouse is a subject-oriented, integrated, time-
variant, and nonvolatile collection of data in support of
management’s decision-making process.”—W. H. Inmon
Data warehousing:
The process of constructing and using data warehouses
3. 3
Data Warehouse—Subject-Oriented
Organized around major subjects, such as
customer, product, sales
Focusing on the modeling and analysis of data for
decision makers, not on daily operations or
transaction processing
Provide a simple and concise view around
particular subject issues by excluding data that are
not useful in the decision support process
4. 4
Data Warehouse—Integrated
Constructed by integrating multiple, heterogeneous
data sources
relational databases, flat files, on-line transaction records
Data cleaning and data integration techniques are
applied.
Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data
sources
E.g., Hotel price: currency, tax, breakfast covered, etc.
When data is moved to the warehouse, it is converted.
5. 5
Data Warehouse—Time Variant
The time horizon for the data warehouse is
significantly longer than that of operational systems
Operational database: current value data
Data warehouse data: provide information from a historical
perspective (e.g., past 5-10 years)
Every key structure in the data warehouse
Contains an element of time, explicitly or implicitly
But the key of operational data may or may not contain
“time element”
6. 6
Data Warehouse—Nonvolatile
A physically separate store of data transformed from
the operational environment
Does not require transaction processing, recovery,
and concurrency control mechanisms
Major operations:
initial loading of data
access of data
Periodic updates
7. 7
Data Warehouse vs. Heterogeneous
DBMS
Traditional heterogeneous DB integration: A query driven
approach
Build wrappers/mediators on top of heterogeneous databases
When a query is posed to a client site, a meta-dictionary is used
to translate the query into queries appropriate for individual
heterogeneous sites involved, and the results are integrated into a
global answer set
Data warehouse: update-driven, high performance
Information from heterogeneous sources is integrated in advance
and stored in warehouses for direct query and analysis
8. 8
Data Warehouse vs. Operational DBMS
OLTP (on-line transaction processing)
Major task of traditional relational DBMS
Day-to-day operations: purchasing, inventory, banking,
manufacturing, payroll, registration, accounting, etc.
OLAP (on-line analytical processing)
Major task of data warehouse system
Data analysis and decision making
9. 9
Distinct features (OLTP vs. OLAP):
User and system orientation: customer vs. market
Data contents: current, detailed vs. historical, consolidated
Database design: ER + application vs. star + subject
View: current, local vs. evolutionary, integrated
Access patterns: update vs. read-only but complex queries
Data Warehouse vs. Operational DBMS
10. 10
OLTP vs. OLAP
OLTP OLAP
Characteristic Operational processing Informational processing
Orientation Transaction Analysis
Summarization Primitive, highly detailed Summarized, Consolidated
View Detailed, flat relational Summarized, Multidimensional
Focus Data in Information out
Operations Index/hash on primary key lots of scans
Priority High performance, high
availability
High flexibility, end user autonomy
11. 11
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date
detailed, flat relational
isolated
historical,
summarized, multidimensional
integrated, consolidated
usage repetitive ad-hoc
access read/write Mostly read
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
12. 12
Need for Separate Data Warehouse
High performance for both systems
DBMS— tuned for OLTP: access methods, indexing, concurrency
control, recovery
Warehouse—tuned for OLAP: complex OLAP queries,
multidimensional view, consolidation
Different functions and different data:
missing data: Decision support requires historical data which
operational DBs do not typically maintain
data consolidation: DS requires consolidation (aggregation,
summarization) of data from heterogeneous sources
data quality: different sources typically use inconsistent data
representations, codes and formats which have to be reconciled
14. 14
Three-Tier Data Warehouse Architecture
Bottom Tier
Data Warehouse Server
Usually a Relational database system
Data is extracted using API known as gateways such as ODBC,
OLE-DB, JDBC
Middle Tier
OLAP Server
ROLAP – Extended RDBMS that maps operations on multi-
dimensional data to standard relational operations
MOLAP – Special purpose Server that directly supports multi-
dimensional data and operations.
Top Tier
Client – Query and Reporting tools, Analysis and Data Mining
tools
15. 15
Three Data Warehouse Models
Enterprise warehouse
Collects all of the information about subjects spanning the entire organization
Contains detailed and summarized data
Can be implemented on Mainframes -years
Data Mart
a subset of corporate-wide data that is of value to a specific groups of users. Its
scope is confined to specific, selected groups, such as marketing data mart
Independent vs. dependent (directly from warehouse) data mart
Implemented on Departmental Servers - weeks
Virtual warehouse
A set of views over operational databases
Only some of the possible summary views may be materialized
16. 16
Conceptual Modeling of Data Warehouses
Data Warehouses require subject-oriented schema
Modeling data warehouses: dimensions & measures
Star schema: A fact table in the middle connected to a set of
dimension tables
Snowflake schema: A refinement of star schema where some
dimensional hierarchy is normalized into a set of smaller
dimension tables, forming a shape similar to snowflake
Fact constellations: Multiple fact tables share dimension tables,
viewed as a collection of stars, therefore called galaxy schema or
fact constellation
17. 17
Star Schema
Most common Modeling
Contains
Fact table
Bulky
No redundancy
Dimension tables
Dimension tables are arranged radially around the fact
table
18. 18
Example of Star Schema
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city
state_or_province
country
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_type
item
branch_key
branch_name
branch_type
branch
19. 19
Defining Star Schema in DMQL
define cube sales_star [time, item, branch, location]:
dollars_sold = sum(sales_in_dollars), avg_sales =
avg(sales_in_dollars), units_sold = count(*)
define dimension time as (time_key, day, day_of_week, month,
quarter, year)
define dimension item as (item_key, item_name, brand, type,
supplier_type)
define dimension branch as (branch_key, branch_name,
branch_type)
define dimension location as (location_key, street, city,
province_or_state, country)
20. 20
Snowflake Schema
Some Dimension tables are normalized to
reduce redundancies
More joins maybe required to execute a query
System performance is reduced
21. 21
Example of Snowflake Schema
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city_key
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_key
item
branch_key
branch_name
branch_type
branch
supplier_key
supplier_type
supplier
city_key
city
state_or_province
country
city
22. 22
Fact Constellation
Multiple fact tables may share dimension tables
Collection of stars – Galaxy schema or fact constellation
Data Warehouse – Fact Constellation
Data Mart – Star or Snowflake schema
23. 23
Example of Fact Constellation
time_key
day
day_of_the_week
month
quarter
year
time
location_key
street
city
province_or_state
country
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_key
item_name
brand
type
supplier_type
item
branch_key
branch_name
branch_type
branch
Shipping Fact Table
time_key
item_key
shipper_key
from_location
to_location
dollars_cost
units_shipped
shipper_key
shipper_name
location_key
shipper_type
shipper