SlideShare a Scribd company logo
Data Warehousing: Defined
and Its Applications
Pete Johnson
April 2002
Introduction
• The Past and The Problem
• What is a Data Warehouse?
• Components of a Data Warehouse
• OLAP, Metadata, Data Mining
• Getting the Data in
• Benefits vs. Costs
• Conclusion & Questions
The Past and The Problem
• Only had scattered transactional systems in the
organization – data spread among different
systems
• Transactional systems were not designed for
decision support analysis
• Data constantly changes on transactional systems
• Lack of historical data
• Often resources were taxed with both needs on the
same systems
The Past and The Problem
• Operational databases are designed to keep
transactions from daily operations. It is
optimized to efficiently update or create
individual records
• A database for analysis on the other hand
needs to be geared toward flexible requests
or queries (Ad hoc, statistical analysis)
What is a Data Warehouse?
Data warehousing is an architectural model
designed to gather data from various
sources into a single unified data model for
analysis purposes.
What Is a Data Warehouse?
Term was introduced in 1990 by William
Immon
A managed database in which the data is:
• Subject Oriented
• Integrated
• Time Variant
• Non Volatile
Subject Oriented
• Organized around major subject areas in the
enterprise (Sales, Inventory, Financial, etc.)
• Only includes data which is used in the
decision making processes
• Elements used for transactional processing are
removed
Integrated
• Data from different sources are brought
together and consolidated
• The data is cleaned and made consistent
Example – Bank Systems using Different Codes
Loan Department – COMM
Transactional System - C
Time Variant
• Data in a Data Warehouse contains both
current and historical information
• Operational Systems contain only current
data
Systems typically retain data:
Operational Systems – 60 to 90 Days
Data Warehouse – 5 to 10 Years
Non Volatile
• Operational systems have continually
changing data
• Data Warehouses continually absorb current
data and integrates it with its existing data
(Aggregate or Summary tables)
Example of volatile data would be an account balance at a bank
What Is a Data Warehouse?
• Not a product, it is a process
• Combination of hardware and software
• Concept of a Data Warehouse is not new,
but the technology that allows it is
What Is a Data Warehouse?
Can often be set up as one VLDB (Very Large
Database) or a collection of subject areas
called Data Marts.
There are now tools which “unify” these Data
Marts and make it appear as a single
database.
What Is a Data Warehouse?
Transformation of Data to Information
Transaction Processing
Cleansing / & Normalization
Relational Warehouse
SQL reporting
Exploration / Analysis
Data
Information
Components of a Data Warehouse
Four General Components:
• Hardware
• DBMS - Database Management System
• Front End Access Tools
• Other Tools
In all components scalability is vital
Scalability is the ability to grow as your data and processing needs
increase
Components of a Data Warehouse -
Hardware
• Power - # of Processors, Memory, I/O Bandwidth,
and Speed of the Bus
• Availability – Redundant equipment
• Disk Storage - Speed and enough storage for the
loaded data set
• Backup Solution - Automated and be able to allow
for incremental backups and archiving older data
Components of a Data Warehouse -
DBMS
• Physical storage capacity of the DBMS
• Loading, indexing, and processing speed
• Availability
• Handle your data needs
• Operational integrity, reliability, and
manageability
Components of a Data Warehouse -
Front End & Other Tools
• Query Tools (SQL & GUI based)
• Report Writers
• Metadata Repositories
• OLAP (Online Analytical Processing)
• Data Mining Products
Components of a Data Warehouse –
Metadata Repositories
Metadata is Data about Data. Users and Developers
often need a way to find information on the data
they use. Information can include:
• Source System(s) of the Data, contact information
• Related tables or subject areas
• Programs or Processes which use the data
• Population rules (Update or Insert and how often)
• Status of the Data Warehouse’s processing and
condition
Components of a Data Warehouse –
OLAP Tools
OLAP - Online Analytical Processing. It works by
aggregating detail data and looks at it by dimensions
• Gives the ability to “Drill Down” in to the detail data
• Decision Support Analysis Tool
• Multidimensional DB focusing on retrieval of
precalculated data
• Ends the “big reports” with large amounts of detailed data
• These tools are often graphical and can run on a “thin
client” such as a web browser
Components of a Data Warehouse –
Data Mining
• Answers the questions you didn’t know to
ask
• Analyzes great amounts of data (usually
contained in a Data Warehouse) and looks
for trends in the data
• Technology now allows us to do this better
than in the past
Components of a Data Warehouse –
Data Mining
• Most famous example is the Huggies -
Heineken case
• Used in Retail sector to analyze buying
habits
• Used in financial areas to detect fraud
• Used in the stock market to find trends
• Used in scientific research
• Used in national security
Getting the Data In
• Data will come from multiple databases and
files within the organization
• Also can come from outside sources
• Examples:
• Weather Reports
• Demographic information by Zip Code
Getting the Data In
Three Steps :
1. Extraction Phase
2. Transformation Phase
3. Loading Phase
Getting the Data In
Extraction Phase:
• Source systems export data via files or
populates directly when the databases can
“talk” to each other
• Transfers them to the Data Warehouse
server and puts it into some sort of staging
area
Getting the Data In
Transformation Phase:
• Takes data and turns it into a form that is suitable
for insertion into the warehouse
• Combines related data
• Removes redundancies
• Common Codes (Commercial Customer)
• Spelling Mistakes (Lozenges)
• Consistency (PA,Pa,Penna,Pennsylvania)
• Formatting (Addresses)
Getting the Data In
Loading Phase:
• Places the cleaned data into the DBMS in
its final, useable form
• Compare data from source systems and the
Data Warehouse
• Document the load information for the users
Benefits vs. Costs
Benefits
• Creates a single point for all data
• System is optimized and designed
specifically for analysis
• Access data without impacting the
operational systems
• Users can access the data directly without
the direct help from IT dept
Costs
• Cost of implementation & maintenance
(hardware, software, and staffing)
• Lack of compatibility between components
• Data from many sources are hard to
combine, data integrity issues
• Bad designs and practices can lead to costly
failures
Conclusion
• What is a Data Warehouse?
• Components of a Data Warehouse
• How the Data Gets In
• OLAP, Metadata, and Data Mining
• Benefits vs. Costs
Questions

More Related Content

Similar to DW (1).ppt

Data warehouse
Data warehouseData warehouse
Data warehouse
Shwetabh Jaiswal
 
ETL-Datawarehousing.ppt.pptx
ETL-Datawarehousing.ppt.pptxETL-Datawarehousing.ppt.pptx
ETL-Datawarehousing.ppt.pptx
karanamlakshminarasa
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
Yogendra Uikey
 
ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptx
ParnalSatle
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
Vibrant Technologies & Computers
 
Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
Vibrant Technologies & Computers
 
Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse
Lesa Cote
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
AttaUrRahman78
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
ReyersonMax
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
AttaUrRahman78
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
GenrlUse1
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data mart
Amit Sarkar
 
data warehousing need and characteristics. types of data w data warehouse arc...
data warehousing need and characteristics. types of data w data warehouse arc...data warehousing need and characteristics. types of data w data warehouse arc...
data warehousing need and characteristics. types of data w data warehouse arc...
aasifkuchey85
 
Data warehousing.pptx
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptx
Anusuya123
 
Building Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerBuilding Data Warehouse in SQL Server
Building Data Warehouse in SQL Server
Antonios Chatzipavlis
 
Cognos datawarehouse
Cognos datawarehouseCognos datawarehouse
Cognos datawarehouse
ssuser7fc7eb
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
Er. Nawaraj Bhandari
 
DWH_Session_1.pptx
DWH_Session_1.pptxDWH_Session_1.pptx
DWH_Session_1.pptx
umashanker manthena
 
Data Management
Data ManagementData Management
Data Management
Mufaddal Nullwala
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingsumit621
 

Similar to DW (1).ppt (20)

Data warehouse
Data warehouseData warehouse
Data warehouse
 
ETL-Datawarehousing.ppt.pptx
ETL-Datawarehousing.ppt.pptxETL-Datawarehousing.ppt.pptx
ETL-Datawarehousing.ppt.pptx
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
 
ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptx
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
 
Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
 
Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data mart
 
data warehousing need and characteristics. types of data w data warehouse arc...
data warehousing need and characteristics. types of data w data warehouse arc...data warehousing need and characteristics. types of data w data warehouse arc...
data warehousing need and characteristics. types of data w data warehouse arc...
 
Data warehousing.pptx
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptx
 
Building Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerBuilding Data Warehouse in SQL Server
Building Data Warehouse in SQL Server
 
Cognos datawarehouse
Cognos datawarehouseCognos datawarehouse
Cognos datawarehouse
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
DWH_Session_1.pptx
DWH_Session_1.pptxDWH_Session_1.pptx
DWH_Session_1.pptx
 
Data Management
Data ManagementData Management
Data Management
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 

Recently uploaded

Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 

Recently uploaded (20)

Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 

DW (1).ppt

  • 1. Data Warehousing: Defined and Its Applications Pete Johnson April 2002
  • 2. Introduction • The Past and The Problem • What is a Data Warehouse? • Components of a Data Warehouse • OLAP, Metadata, Data Mining • Getting the Data in • Benefits vs. Costs • Conclusion & Questions
  • 3. The Past and The Problem • Only had scattered transactional systems in the organization – data spread among different systems • Transactional systems were not designed for decision support analysis • Data constantly changes on transactional systems • Lack of historical data • Often resources were taxed with both needs on the same systems
  • 4. The Past and The Problem • Operational databases are designed to keep transactions from daily operations. It is optimized to efficiently update or create individual records • A database for analysis on the other hand needs to be geared toward flexible requests or queries (Ad hoc, statistical analysis)
  • 5. What is a Data Warehouse? Data warehousing is an architectural model designed to gather data from various sources into a single unified data model for analysis purposes.
  • 6. What Is a Data Warehouse? Term was introduced in 1990 by William Immon A managed database in which the data is: • Subject Oriented • Integrated • Time Variant • Non Volatile
  • 7. Subject Oriented • Organized around major subject areas in the enterprise (Sales, Inventory, Financial, etc.) • Only includes data which is used in the decision making processes • Elements used for transactional processing are removed
  • 8. Integrated • Data from different sources are brought together and consolidated • The data is cleaned and made consistent Example – Bank Systems using Different Codes Loan Department – COMM Transactional System - C
  • 9. Time Variant • Data in a Data Warehouse contains both current and historical information • Operational Systems contain only current data Systems typically retain data: Operational Systems – 60 to 90 Days Data Warehouse – 5 to 10 Years
  • 10. Non Volatile • Operational systems have continually changing data • Data Warehouses continually absorb current data and integrates it with its existing data (Aggregate or Summary tables) Example of volatile data would be an account balance at a bank
  • 11. What Is a Data Warehouse? • Not a product, it is a process • Combination of hardware and software • Concept of a Data Warehouse is not new, but the technology that allows it is
  • 12. What Is a Data Warehouse? Can often be set up as one VLDB (Very Large Database) or a collection of subject areas called Data Marts. There are now tools which “unify” these Data Marts and make it appear as a single database.
  • 13. What Is a Data Warehouse? Transformation of Data to Information Transaction Processing Cleansing / & Normalization Relational Warehouse SQL reporting Exploration / Analysis Data Information
  • 14. Components of a Data Warehouse Four General Components: • Hardware • DBMS - Database Management System • Front End Access Tools • Other Tools In all components scalability is vital Scalability is the ability to grow as your data and processing needs increase
  • 15. Components of a Data Warehouse - Hardware • Power - # of Processors, Memory, I/O Bandwidth, and Speed of the Bus • Availability – Redundant equipment • Disk Storage - Speed and enough storage for the loaded data set • Backup Solution - Automated and be able to allow for incremental backups and archiving older data
  • 16. Components of a Data Warehouse - DBMS • Physical storage capacity of the DBMS • Loading, indexing, and processing speed • Availability • Handle your data needs • Operational integrity, reliability, and manageability
  • 17. Components of a Data Warehouse - Front End & Other Tools • Query Tools (SQL & GUI based) • Report Writers • Metadata Repositories • OLAP (Online Analytical Processing) • Data Mining Products
  • 18. Components of a Data Warehouse – Metadata Repositories Metadata is Data about Data. Users and Developers often need a way to find information on the data they use. Information can include: • Source System(s) of the Data, contact information • Related tables or subject areas • Programs or Processes which use the data • Population rules (Update or Insert and how often) • Status of the Data Warehouse’s processing and condition
  • 19. Components of a Data Warehouse – OLAP Tools OLAP - Online Analytical Processing. It works by aggregating detail data and looks at it by dimensions • Gives the ability to “Drill Down” in to the detail data • Decision Support Analysis Tool • Multidimensional DB focusing on retrieval of precalculated data • Ends the “big reports” with large amounts of detailed data • These tools are often graphical and can run on a “thin client” such as a web browser
  • 20. Components of a Data Warehouse – Data Mining • Answers the questions you didn’t know to ask • Analyzes great amounts of data (usually contained in a Data Warehouse) and looks for trends in the data • Technology now allows us to do this better than in the past
  • 21. Components of a Data Warehouse – Data Mining • Most famous example is the Huggies - Heineken case • Used in Retail sector to analyze buying habits • Used in financial areas to detect fraud • Used in the stock market to find trends • Used in scientific research • Used in national security
  • 22. Getting the Data In • Data will come from multiple databases and files within the organization • Also can come from outside sources • Examples: • Weather Reports • Demographic information by Zip Code
  • 23. Getting the Data In Three Steps : 1. Extraction Phase 2. Transformation Phase 3. Loading Phase
  • 24. Getting the Data In Extraction Phase: • Source systems export data via files or populates directly when the databases can “talk” to each other • Transfers them to the Data Warehouse server and puts it into some sort of staging area
  • 25. Getting the Data In Transformation Phase: • Takes data and turns it into a form that is suitable for insertion into the warehouse • Combines related data • Removes redundancies • Common Codes (Commercial Customer) • Spelling Mistakes (Lozenges) • Consistency (PA,Pa,Penna,Pennsylvania) • Formatting (Addresses)
  • 26. Getting the Data In Loading Phase: • Places the cleaned data into the DBMS in its final, useable form • Compare data from source systems and the Data Warehouse • Document the load information for the users
  • 28. Benefits • Creates a single point for all data • System is optimized and designed specifically for analysis • Access data without impacting the operational systems • Users can access the data directly without the direct help from IT dept
  • 29. Costs • Cost of implementation & maintenance (hardware, software, and staffing) • Lack of compatibility between components • Data from many sources are hard to combine, data integrity issues • Bad designs and practices can lead to costly failures
  • 30. Conclusion • What is a Data Warehouse? • Components of a Data Warehouse • How the Data Gets In • OLAP, Metadata, and Data Mining • Benefits vs. Costs

Editor's Notes

  1. Welcome Honor to give back to PSU Resume Announce Informal Rules
  2. Review Slide
  3. Last Bullet - Resources being taxed is the lead in to the next slide
  4. Operational Systems – Many Small Writes, Data Warehouse – Heavy “Off” hour loads – Also can have a “trickle feed”
  5. Go over part that is in pink. - The only thing for them to remember Word “warehouse” – Reminds me of the end Indiana Jones when they had the Ark and put it in the large government warehouse never to be seen again. It implies storage. Data Warehouse is not only for business. Military, Intelligence, and Scientific Needs: 9/11 & Human Genome Project
  6. Lead in to next 4 slides
  7. Examples of what is removed is control fields and timestamps needed for transactional processing. Additionally Elements could be too small for analysis purposes – Retail environment recording the opening of a cash register drawer.
  8. Example with Bank is Commercial Customers Goal is for the data from the different sources to work seamlessly together.
  9. Operational systems usually need to purge some of this data in order for lookups to be efficient. Additional Hardware Costs as well
  10. Depending on the time of day the balance can change. Difficult to make decisions. DW – The data is usually checked and finalized. Once the data is finalized it should not change. Corrections of errors do happen after the fact but the should be communicated to the users.
  11. Despite marketing claims, can’t just buy a Data Warehouse Collection of tools and processes Improvement in Processing and Lower costs for this technology
  12. Data Marts can be divided up by subject areas or geographically Data Marts tend to be small departmental specific Geographic division is an example of Distributed System. Users don’t know where the data is coming from. In many cases, they don’t need to. However there should be documentation that the users can get to.
  13. Give example of 20,000 is data? What does it mean? Sales In The West of BBQ Grills in 2000? Now if I say Wilt Chamberlain – Something completely different comes to mind. Data Warehousing is a step taking low level transactional data and turning it into useful information.
  14. Growth is important, to switch to another solution can be very costly. Data – More years of data from more subject areas Processing – Load times, Number of users, Query or request times.
  15. Power – Requires powerful machines such as a main frame or clusters of servers like UNIX that can work together Storage – Not only disk, there is tape and optical
  16. Data Needs – No longer values and text. Searchable Pictures, fingerprint, video, audio, documents. Someday DNA. Operational integrity – Enforce its rules – Security and “Atomic Transactions” Reliability - Can it recover from failure quickly and easily Manageability – Ability to do the day to day tasks with little or no effort.
  17. Last 3 we will go into more detail Query Tools – Not uncommon to have different tools for different types of users – Power Users Report Writers – Prepare and distributed to many different mediums (email,web,wireless,voice). Can be electronic and centrally stored also.
  18. Value of metadata is sometimes over looked – Cost of Man hours. Column names alone are not enough. Field named “Date” Organizations without this sometime spend a lot of time researching this. Changes in Data – Impact is Known
  19. Example: White Board: Dimensions represent the core components of a business plan and often relate to departmental functions. Typical standard dimensions are Time, Accounts, Product Line, Market, and Division. Dimensions are static in most databases; database dimensions rarely change over the life of the application. Dimensions Organization – East,West Time – 2000, 2001 Product – Gas Grills, Charcoal Grills
  20. Technology Will Only Improve Data Mining
  21. Better connectivity is eliminating the need for raw files
  22. Let users know what data got updated
  23. Very High Maintenance: Staffing & Roles DBAs, Data Architects, Hardware/OS Specialists, Data Auditors, Support for Front End Tools, Documentation, Security