SlideShare a Scribd company logo
Data Warehousing
1Gurpreet Singh, MGN646
Agenda
• Concept of Data warehousing
• Data Integration and extraction
transformation and Load (ETL) Process
• Data Warehouse Development
• Administration
• Issues
2Gurpreet Singh, MGN646
Concept of Data warehousing
• A data warehouse is a pool of data produced
to support decision making.
• Data is usually structured to be available in a
form ready for OLAP, mining, querying,
reporting and other decision support
applications
3Gurpreet Singh, MGN646
Definition
“A data warehouse is simply a single, complete,
and consistent store of data obtained from a
variety of sources and made available to end
users in a way they can understand and use it in
a business context.”
-- Barry Devlin, IBM Consultant
4Gurpreet Singh, MGN646
Characteristics of data warehouse
• Subject oriented
• Integrated
• Time variant or time series
• Nonvolatile
• Multidimensional
• Client/server
• Real time
• Include metadata
5Gurpreet Singh, MGN646
1. Subject-Oriented
Data is categorized and stored by business subject
rather than by application
Equity
Plans
Shares
Customer
financial
information
Savings
Insurance
Loans
OLTP Applications Data Warehouse Subject
6Gurpreet Singh, MGN646
2. Integrated
OLTP Applications
Savings
Current
accounts
Loans
Data Warehouse
Data on a given subject is defined and stored once.
Customer
7Gurpreet Singh, MGN646
3. Time-Variant
Data is stored as a series of snapshots, each
representing a period of time
Time Data
Jan-97 January
Feb-97 February
Mar-97 March
8Gurpreet Singh, MGN646
4. Nonvolatile
Typically data in the data warehouse is not updated or delelted.
Insert
Update
Delete
Read Read
Operational Warehouse
Load
9Gurpreet Singh, MGN646
5. Client/Server
• A data warehouses uses the client/server
architecture to provide easy access for end
users.
10Gurpreet Singh, MGN646
6. Real Time
• Data warehouses provide real time, or active
data access and analysis capabilities.
11Gurpreet Singh, MGN646
7. Meta Data
• Metadata is a data that describes other data
• It provides information about a certain item’s
content.
• E.g. information about how long the
document is, who is the author, when the
document was written and a short summary
of the document.
12Gurpreet Singh, MGN646
• Structural Meta Data (data describing the
structure of data)
• Semantic Meta data (data describing the
meaning of a data)
13Gurpreet Singh, MGN646
Data Marts
• A data mart is usually smaller and focuses on a
particular subject or department.
• It typically consist of a single subject area.
• Data Mart can be dependent or independent
14Gurpreet Singh, MGN646
Data Warehouses Versus Data Marts
Property Data Warehouse Data Mart
Scope Enterprise Department
Subject Multiple Single-subject
Data Source Many Few
Size(typical) 100 GB to>1 TB <100 GB
Implementation time Months to years Months
Data
Warehouse
Data
Mart
15Gurpreet Singh, MGN646
Data Mart
– Dependent data mart
A subset that is created directly from a data
warehouse
– Independent data mart
A small data warehouse designed for a strategic
business unit or a department
16Gurpreet Singh, MGN646
Dependent Data Mart
Marketing
Sales
Finance
Human Resources
Marketing
Marketing
Marketing
External Data
Data
Warehouse
Operational
Systems
Flat Files
Data Marts
17Gurpreet Singh, MGN646
Independent Data Mart
Operational
Systems
External Data
Sale or Marketing
Flat Files
18Gurpreet Singh, MGN646
Operational Data stores
• Recent form of customer information file.
• Contents are updated throughout the course
of business operations.
• Used for short term decisions
• It stores only recent information
• Short term memory
19Gurpreet Singh, MGN646
Enterprise Data warehouse
• Large scale data warehouse that is used across
the enterprise for decision support.
20Gurpreet Singh, MGN646
Data Warehousing Process
• Data Sources (from legacy systems and external
data providers)
• Data extraction (using commercial software
called ETL)
• Data loading
• Comprehensive database
• Metadata
• Middleware tools (enable access to data
warehouse)
21Gurpreet Singh, MGN646
Data Warehouse Architecture
• The data warehouse itself
• Data acquisition software
• Client software, which allows users to access
and analyze data from the warehouse.
22Gurpreet Singh, MGN646
Data Integration and extraction
transformation and Load (ETL) Process
• A major purpose of a data warehouse is to
integrate data from multiple systems.
23Gurpreet Singh, MGN646
Continued…
• Various integration technologies enable data and metadata
integration:
I. Enterprise application integration (provides a vehicle
(software) for pushing data from source systems into the
data warehouse)
II. Enterprise information integration (real time data
integration from a variety of sources, mechanism for pulling
data from source systems)
24Gurpreet Singh, MGN646
Continued…
III. Extraction, transformation and Load: The ETL process is an
integral component in any data centric project.
• It consumes 70% of the time in a data centric project.
• The ETL process consists of extraction (reading data from one
or more databases), transformation (converting the extracted
data from its previous form into the form in which it needs to
be), and load ( putting the data into data warehouse)
• The purpose of the ETL process is to load the warehouse with
integrated and cleansed data.
• Data can come from any source like flat files, excel
spreadsheets etc.
25Gurpreet Singh, MGN646
Continued…
• Any data quality issues pertaining to the source files need to
be corrected before the data are loaded into the data
warehouse.
• The process of loading data into a data warehouse can be
performed through data transformation tools or using
programming languages
26Gurpreet Singh, MGN646
ETL Process
Packaged
application
Legacy
system
Other internal
applications
Transient
data source
Extract Transform Cleanse Load
Data
warehouse
Data mart
27Gurpreet Singh, MGN646
Data Warehouse Development
Benefits:
• End users can perform extensive analysis in
numerous ways.
• A consolidated view of corporate data is
possible.
• Better and more timely information.
• Data access is simplified.
28Gurpreet Singh, MGN646
Data warehouse development
approaches
• Inmon Model
• Kimball Model
29Gurpreet Singh, MGN646
Inmon Model
• EDW approach
• Emphasis on top-down development
• Inmon’s approach starts with an enterprise
data warehouse, creating data marts as
subsets if appropriate.
30Gurpreet Singh, MGN646
Kimball Model
• Data mart approach
• Emphasis on bottom-up development
• Kimball’s approach starts with data marts,
consolidating them into an EDW later if
appropriate.
31Gurpreet Singh, MGN646
Best Model
• No one size fits all strategy to data
warehousing.
32Gurpreet Singh, MGN646
DW Development Approaches
33Gurpreet Singh, MGN646
Similarities and differences between the Inmon and
Kimball data warehouse development approaches
• Similarities: Both methods can produce an enterprise data
warehouse and subset data marts.
• Differences: Inmon’s approach starts with an enterprise data
warehouse, creating data marts as subsets of that EDW if
appropriate. The focus is on proven, traditional methods and
technologies. Kimball’s starts with data marts, consolidating
them into an EDW later if appropriate. It focuses in creating a
useful end-user capability quickly.
34Gurpreet Singh, MGN646
Real Time Data Warehouse
• Real time data warehousing is the process of
loading and providing data via the data
warehouse as they become available.
• Also known as active data warehouse.
35Gurpreet Singh, MGN646
Concerns about real-time BI
 Not all data should be updated continuously
 May be cost prohibitive
 May also be infeasible
36Gurpreet Singh, MGN646
Example
• Egg plc (egg.com) is the world’s largest online
bank.
• It provides banking, insurance, investments
and mortgages to more than 3.6 million
customers through its internet site.
• In 1998, Egg selected Sum microsystems to
create a reliable, scalable, secure
infrastructure to support its more than 2.5
million daily transactions.
37Gurpreet Singh, MGN646
Continued…
• In 2001, the system was upgraded.
• This new customer data warehouse used Sun,
Oracle and SAS software products.
• The system provides near real-time data access.
• It provides data warehouse and data mining
services to users.
• Hundreds of sales and marketing campaigns are
constructed using near real time data.
• Enables faster decision making about specific
customers.
38Gurpreet Singh, MGN646
Data Warehouse Administration
• Due to its huge size, a DW requires especially strong
monitoring in order to sustain its efficiency,
productivity and security.
• The successful administration and management of a
data warehouse entails skills and proficiency.
• A data warehouse administrator should be familiar
with high performance software, hardware and
networking technologies.
39Gurpreet Singh, MGN646
DW Scalability and Security
• Scalability
– The main issues pertaining to scalability:
• The amount of data in the warehouse
• How quickly the warehouse is expected to grow
• The number of concurrent users
• The complexity of user queries
– Good scalability means that queries and other data-
access functions will grow linearly with the size of the
warehouse
• Security
– Emphasis on security and privacy
40Gurpreet Singh, MGN646
Security concerns involved in building a
data warehouse.
1.Laws and regulations, in the U.S. and elsewhere,
require certain safeguards on databases that contain
the type of information typically found in a DW.
2.The large amount of valuable corporate data in a data
warehouse can make it an attractive target.
3.The need to allow a wide variety of unplanned
queries in a DW makes it impractical to restrict end
user access to specific carefully constrained screens,
one way to limit potential violations.
41Gurpreet Singh, MGN646
Effective security in a data warehouse should
focus on four main areas:
• Step 1. Establishing effective corporate and security
policies and procedures. An effective security policy should
start at the top and be communicated to everyone in the
organization.
• Step 2. Implementing logical security procedures and
techniques to restrict access. This includes user
authentication, access controls, and encryption.
• Step 3. Limiting physical access to the data center
environment.
• Step 4. Establishing an effective internal control review
process for security and privacy.
42Gurpreet Singh, MGN646
DIRECTV THRIVES WITH ACTIVE DATA
WAREHOUSING
• DIRECTV which is known for its direct
television broadcast satellite service, has been
a regular contributor to the evolution of TV
with its advanced HD programming,
interactive features, digital video recording
services and electronic program guides.
43Gurpreet Singh, MGN646
Problem
• DIRECTV faced the challenge of dealing with
high transactional data volumes created by
no. of daily customer calls.
• Accommodating such a large data volume,
along with changing market conditions was
one of key challenges.
44Gurpreet Singh, MGN646
Solution
• Used software solutions of Teradata and
GoldenGate to develop a product that
integrates its data assets in near real time
throught the enterprise.
• The goal of the new data warehouse system
was to send fresh data to the call center at
least daily.
45Gurpreet Singh, MGN646
Results
• Sales personnel were able to retain customers.
46Gurpreet Singh, MGN646

More Related Content

What's hot

Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
Shanthi Mukkavilli
 
DATA Warehousing & Data Mining
DATA Warehousing & Data MiningDATA Warehousing & Data Mining
DATA Warehousing & Data Mining
cpjcollege
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
pcherukumalla
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Template
butest
 
Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mining
gulab sharma
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
InformaticaTrainingClasses
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Eyad Manna
 

What's hot (20)

Basic Introduction of Data Warehousing from Adiva Consulting
Basic Introduction of  Data Warehousing from Adiva ConsultingBasic Introduction of  Data Warehousing from Adiva Consulting
Basic Introduction of Data Warehousing from Adiva Consulting
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
DATA Warehousing & Data Mining
DATA Warehousing & Data MiningDATA Warehousing & Data Mining
DATA Warehousing & Data Mining
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehousing and Data mining
Data warehousing and Data mining Data warehousing and Data mining
Data warehousing and Data mining
 
Ppt
PptPpt
Ppt
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Template
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Enterprise resource planning system & data warehousing implementation
Enterprise resource planning system & data warehousing implementationEnterprise resource planning system & data warehousing implementation
Enterprise resource planning system & data warehousing implementation
 
Seminar datawarehousing
Seminar datawarehousingSeminar datawarehousing
Seminar datawarehousing
 
Gulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And MiningGulabs Ppt On Data Warehousing And Mining
Gulabs Ppt On Data Warehousing And Mining
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouse
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemas
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousing
 
Data mining and data warehousing
Data mining and data warehousingData mining and data warehousing
Data mining and data warehousing
 
Oracle: Fundamental Of DW
Oracle: Fundamental Of DWOracle: Fundamental Of DW
Oracle: Fundamental Of DW
 

Viewers also liked (10)

140127 rtg vcfeval vcf comparison tool
140127 rtg vcfeval vcf comparison tool140127 rtg vcfeval vcf comparison tool
140127 rtg vcfeval vcf comparison tool
 
Definisi data-warehouse
Definisi data-warehouseDefinisi data-warehouse
Definisi data-warehouse
 
Lecture 02 - The Data Warehouse Environment
Lecture 02 - The Data Warehouse Environment Lecture 02 - The Data Warehouse Environment
Lecture 02 - The Data Warehouse Environment
 
Dwdm 2(data warehouse)
Dwdm 2(data warehouse)Dwdm 2(data warehouse)
Dwdm 2(data warehouse)
 
Data ware house architecture
Data ware house architectureData ware house architecture
Data ware house architecture
 
Data Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OneData Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_One
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Components of a Data-Warehouse
Components of a Data-WarehouseComponents of a Data-Warehouse
Components of a Data-Warehouse
 

Similar to Data warehousing

Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
A P
 

Similar to Data warehousing (20)

Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
 
BVRM 402 IMS UNIT V
BVRM 402 IMS UNIT VBVRM 402 IMS UNIT V
BVRM 402 IMS UNIT V
 
BVRM 402 IMS Database Concept.pptx
BVRM 402 IMS Database Concept.pptxBVRM 402 IMS Database Concept.pptx
BVRM 402 IMS Database Concept.pptx
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 
Data warehousev2.1
Data warehousev2.1Data warehousev2.1
Data warehousev2.1
 
Cognos datawarehouse
Cognos datawarehouseCognos datawarehouse
Cognos datawarehouse
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 
Data Mining
Data MiningData Mining
Data Mining
 
001 More introduction to big data analytics
001   More introduction to big data analytics001   More introduction to big data analytics
001 More introduction to big data analytics
 
Data warehouseold
Data warehouseoldData warehouseold
Data warehouseold
 
presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptx
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data Warehousing
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use Cases
 
GDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data VirtualizationGDPR Noncompliance: Avoid the Risk with Data Virtualization
GDPR Noncompliance: Avoid the Risk with Data Virtualization
 
Master data management
Master data managementMaster data management
Master data management
 
Sgcp14dunlea
Sgcp14dunleaSgcp14dunlea
Sgcp14dunlea
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Lecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdfLecture 1-big data engineering (Introduction).pdf
Lecture 1-big data engineering (Introduction).pdf
 

Recently uploaded

chapter 10 - excise tax of transfer and business taxation
chapter 10 - excise tax of transfer and business taxationchapter 10 - excise tax of transfer and business taxation
chapter 10 - excise tax of transfer and business taxation
AUDIJEAngelo
 

Recently uploaded (20)

BeMetals Presentation_May_22_2024 .pdf
BeMetals Presentation_May_22_2024   .pdfBeMetals Presentation_May_22_2024   .pdf
BeMetals Presentation_May_22_2024 .pdf
 
Unlock Your TikTok Potential: Free TikTok Likes with InstBlast
Unlock Your TikTok Potential: Free TikTok Likes with InstBlastUnlock Your TikTok Potential: Free TikTok Likes with InstBlast
Unlock Your TikTok Potential: Free TikTok Likes with InstBlast
 
April 2024 Nostalgia Products Newsletter
April 2024 Nostalgia Products NewsletterApril 2024 Nostalgia Products Newsletter
April 2024 Nostalgia Products Newsletter
 
Evolution and Growth of Supply chain.pdf
Evolution and Growth of Supply chain.pdfEvolution and Growth of Supply chain.pdf
Evolution and Growth of Supply chain.pdf
 
Improving profitability for small business
Improving profitability for small businessImproving profitability for small business
Improving profitability for small business
 
sales plan presentation by mckinsey alum
sales plan presentation by mckinsey alumsales plan presentation by mckinsey alum
sales plan presentation by mckinsey alum
 
India’s Recommended Women Surgeons to Watch in 2024.pdf
India’s Recommended Women Surgeons to Watch in 2024.pdfIndia’s Recommended Women Surgeons to Watch in 2024.pdf
India’s Recommended Women Surgeons to Watch in 2024.pdf
 
Cracking the Change Management Code Main New.pptx
Cracking the Change Management Code Main New.pptxCracking the Change Management Code Main New.pptx
Cracking the Change Management Code Main New.pptx
 
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...
 
Luxury Artificial Plants Dubai | Plants in KSA, UAE | Shajara
Luxury Artificial Plants Dubai | Plants in KSA, UAE | ShajaraLuxury Artificial Plants Dubai | Plants in KSA, UAE | Shajara
Luxury Artificial Plants Dubai | Plants in KSA, UAE | Shajara
 
State of D2C in India: A Logistics Update
State of D2C in India: A Logistics UpdateState of D2C in India: A Logistics Update
State of D2C in India: A Logistics Update
 
IPTV Subscription UK: Your Guide to Choosing the Best Service
IPTV Subscription UK: Your Guide to Choosing the Best ServiceIPTV Subscription UK: Your Guide to Choosing the Best Service
IPTV Subscription UK: Your Guide to Choosing the Best Service
 
chapter 10 - excise tax of transfer and business taxation
chapter 10 - excise tax of transfer and business taxationchapter 10 - excise tax of transfer and business taxation
chapter 10 - excise tax of transfer and business taxation
 
Business Valuation Principles for Entrepreneurs
Business Valuation Principles for EntrepreneursBusiness Valuation Principles for Entrepreneurs
Business Valuation Principles for Entrepreneurs
 
Using Generative AI for Content Marketing
Using Generative AI for Content MarketingUsing Generative AI for Content Marketing
Using Generative AI for Content Marketing
 
RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...
RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...
RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...
 
8 Questions B2B Commercial Teams Can Ask To Help Product Discovery
8 Questions B2B Commercial Teams Can Ask To Help Product Discovery8 Questions B2B Commercial Teams Can Ask To Help Product Discovery
8 Questions B2B Commercial Teams Can Ask To Help Product Discovery
 
Pitch Deck Teardown: RAW Dating App's $3M Angel deck
Pitch Deck Teardown: RAW Dating App's $3M Angel deckPitch Deck Teardown: RAW Dating App's $3M Angel deck
Pitch Deck Teardown: RAW Dating App's $3M Angel deck
 
Matt Conway - Attorney - A Knowledgeable Professional - Kentucky.pdf
Matt Conway - Attorney - A Knowledgeable Professional - Kentucky.pdfMatt Conway - Attorney - A Knowledgeable Professional - Kentucky.pdf
Matt Conway - Attorney - A Knowledgeable Professional - Kentucky.pdf
 
Easy Way to Download and Set Up Gen TDS Software on Your Computer
Easy Way to Download and Set Up Gen TDS Software on Your ComputerEasy Way to Download and Set Up Gen TDS Software on Your Computer
Easy Way to Download and Set Up Gen TDS Software on Your Computer
 

Data warehousing

  • 2. Agenda • Concept of Data warehousing • Data Integration and extraction transformation and Load (ETL) Process • Data Warehouse Development • Administration • Issues 2Gurpreet Singh, MGN646
  • 3. Concept of Data warehousing • A data warehouse is a pool of data produced to support decision making. • Data is usually structured to be available in a form ready for OLAP, mining, querying, reporting and other decision support applications 3Gurpreet Singh, MGN646
  • 4. Definition “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.” -- Barry Devlin, IBM Consultant 4Gurpreet Singh, MGN646
  • 5. Characteristics of data warehouse • Subject oriented • Integrated • Time variant or time series • Nonvolatile • Multidimensional • Client/server • Real time • Include metadata 5Gurpreet Singh, MGN646
  • 6. 1. Subject-Oriented Data is categorized and stored by business subject rather than by application Equity Plans Shares Customer financial information Savings Insurance Loans OLTP Applications Data Warehouse Subject 6Gurpreet Singh, MGN646
  • 7. 2. Integrated OLTP Applications Savings Current accounts Loans Data Warehouse Data on a given subject is defined and stored once. Customer 7Gurpreet Singh, MGN646
  • 8. 3. Time-Variant Data is stored as a series of snapshots, each representing a period of time Time Data Jan-97 January Feb-97 February Mar-97 March 8Gurpreet Singh, MGN646
  • 9. 4. Nonvolatile Typically data in the data warehouse is not updated or delelted. Insert Update Delete Read Read Operational Warehouse Load 9Gurpreet Singh, MGN646
  • 10. 5. Client/Server • A data warehouses uses the client/server architecture to provide easy access for end users. 10Gurpreet Singh, MGN646
  • 11. 6. Real Time • Data warehouses provide real time, or active data access and analysis capabilities. 11Gurpreet Singh, MGN646
  • 12. 7. Meta Data • Metadata is a data that describes other data • It provides information about a certain item’s content. • E.g. information about how long the document is, who is the author, when the document was written and a short summary of the document. 12Gurpreet Singh, MGN646
  • 13. • Structural Meta Data (data describing the structure of data) • Semantic Meta data (data describing the meaning of a data) 13Gurpreet Singh, MGN646
  • 14. Data Marts • A data mart is usually smaller and focuses on a particular subject or department. • It typically consist of a single subject area. • Data Mart can be dependent or independent 14Gurpreet Singh, MGN646
  • 15. Data Warehouses Versus Data Marts Property Data Warehouse Data Mart Scope Enterprise Department Subject Multiple Single-subject Data Source Many Few Size(typical) 100 GB to>1 TB <100 GB Implementation time Months to years Months Data Warehouse Data Mart 15Gurpreet Singh, MGN646
  • 16. Data Mart – Dependent data mart A subset that is created directly from a data warehouse – Independent data mart A small data warehouse designed for a strategic business unit or a department 16Gurpreet Singh, MGN646
  • 17. Dependent Data Mart Marketing Sales Finance Human Resources Marketing Marketing Marketing External Data Data Warehouse Operational Systems Flat Files Data Marts 17Gurpreet Singh, MGN646
  • 18. Independent Data Mart Operational Systems External Data Sale or Marketing Flat Files 18Gurpreet Singh, MGN646
  • 19. Operational Data stores • Recent form of customer information file. • Contents are updated throughout the course of business operations. • Used for short term decisions • It stores only recent information • Short term memory 19Gurpreet Singh, MGN646
  • 20. Enterprise Data warehouse • Large scale data warehouse that is used across the enterprise for decision support. 20Gurpreet Singh, MGN646
  • 21. Data Warehousing Process • Data Sources (from legacy systems and external data providers) • Data extraction (using commercial software called ETL) • Data loading • Comprehensive database • Metadata • Middleware tools (enable access to data warehouse) 21Gurpreet Singh, MGN646
  • 22. Data Warehouse Architecture • The data warehouse itself • Data acquisition software • Client software, which allows users to access and analyze data from the warehouse. 22Gurpreet Singh, MGN646
  • 23. Data Integration and extraction transformation and Load (ETL) Process • A major purpose of a data warehouse is to integrate data from multiple systems. 23Gurpreet Singh, MGN646
  • 24. Continued… • Various integration technologies enable data and metadata integration: I. Enterprise application integration (provides a vehicle (software) for pushing data from source systems into the data warehouse) II. Enterprise information integration (real time data integration from a variety of sources, mechanism for pulling data from source systems) 24Gurpreet Singh, MGN646
  • 25. Continued… III. Extraction, transformation and Load: The ETL process is an integral component in any data centric project. • It consumes 70% of the time in a data centric project. • The ETL process consists of extraction (reading data from one or more databases), transformation (converting the extracted data from its previous form into the form in which it needs to be), and load ( putting the data into data warehouse) • The purpose of the ETL process is to load the warehouse with integrated and cleansed data. • Data can come from any source like flat files, excel spreadsheets etc. 25Gurpreet Singh, MGN646
  • 26. Continued… • Any data quality issues pertaining to the source files need to be corrected before the data are loaded into the data warehouse. • The process of loading data into a data warehouse can be performed through data transformation tools or using programming languages 26Gurpreet Singh, MGN646
  • 27. ETL Process Packaged application Legacy system Other internal applications Transient data source Extract Transform Cleanse Load Data warehouse Data mart 27Gurpreet Singh, MGN646
  • 28. Data Warehouse Development Benefits: • End users can perform extensive analysis in numerous ways. • A consolidated view of corporate data is possible. • Better and more timely information. • Data access is simplified. 28Gurpreet Singh, MGN646
  • 29. Data warehouse development approaches • Inmon Model • Kimball Model 29Gurpreet Singh, MGN646
  • 30. Inmon Model • EDW approach • Emphasis on top-down development • Inmon’s approach starts with an enterprise data warehouse, creating data marts as subsets if appropriate. 30Gurpreet Singh, MGN646
  • 31. Kimball Model • Data mart approach • Emphasis on bottom-up development • Kimball’s approach starts with data marts, consolidating them into an EDW later if appropriate. 31Gurpreet Singh, MGN646
  • 32. Best Model • No one size fits all strategy to data warehousing. 32Gurpreet Singh, MGN646
  • 34. Similarities and differences between the Inmon and Kimball data warehouse development approaches • Similarities: Both methods can produce an enterprise data warehouse and subset data marts. • Differences: Inmon’s approach starts with an enterprise data warehouse, creating data marts as subsets of that EDW if appropriate. The focus is on proven, traditional methods and technologies. Kimball’s starts with data marts, consolidating them into an EDW later if appropriate. It focuses in creating a useful end-user capability quickly. 34Gurpreet Singh, MGN646
  • 35. Real Time Data Warehouse • Real time data warehousing is the process of loading and providing data via the data warehouse as they become available. • Also known as active data warehouse. 35Gurpreet Singh, MGN646
  • 36. Concerns about real-time BI  Not all data should be updated continuously  May be cost prohibitive  May also be infeasible 36Gurpreet Singh, MGN646
  • 37. Example • Egg plc (egg.com) is the world’s largest online bank. • It provides banking, insurance, investments and mortgages to more than 3.6 million customers through its internet site. • In 1998, Egg selected Sum microsystems to create a reliable, scalable, secure infrastructure to support its more than 2.5 million daily transactions. 37Gurpreet Singh, MGN646
  • 38. Continued… • In 2001, the system was upgraded. • This new customer data warehouse used Sun, Oracle and SAS software products. • The system provides near real-time data access. • It provides data warehouse and data mining services to users. • Hundreds of sales and marketing campaigns are constructed using near real time data. • Enables faster decision making about specific customers. 38Gurpreet Singh, MGN646
  • 39. Data Warehouse Administration • Due to its huge size, a DW requires especially strong monitoring in order to sustain its efficiency, productivity and security. • The successful administration and management of a data warehouse entails skills and proficiency. • A data warehouse administrator should be familiar with high performance software, hardware and networking technologies. 39Gurpreet Singh, MGN646
  • 40. DW Scalability and Security • Scalability – The main issues pertaining to scalability: • The amount of data in the warehouse • How quickly the warehouse is expected to grow • The number of concurrent users • The complexity of user queries – Good scalability means that queries and other data- access functions will grow linearly with the size of the warehouse • Security – Emphasis on security and privacy 40Gurpreet Singh, MGN646
  • 41. Security concerns involved in building a data warehouse. 1.Laws and regulations, in the U.S. and elsewhere, require certain safeguards on databases that contain the type of information typically found in a DW. 2.The large amount of valuable corporate data in a data warehouse can make it an attractive target. 3.The need to allow a wide variety of unplanned queries in a DW makes it impractical to restrict end user access to specific carefully constrained screens, one way to limit potential violations. 41Gurpreet Singh, MGN646
  • 42. Effective security in a data warehouse should focus on four main areas: • Step 1. Establishing effective corporate and security policies and procedures. An effective security policy should start at the top and be communicated to everyone in the organization. • Step 2. Implementing logical security procedures and techniques to restrict access. This includes user authentication, access controls, and encryption. • Step 3. Limiting physical access to the data center environment. • Step 4. Establishing an effective internal control review process for security and privacy. 42Gurpreet Singh, MGN646
  • 43. DIRECTV THRIVES WITH ACTIVE DATA WAREHOUSING • DIRECTV which is known for its direct television broadcast satellite service, has been a regular contributor to the evolution of TV with its advanced HD programming, interactive features, digital video recording services and electronic program guides. 43Gurpreet Singh, MGN646
  • 44. Problem • DIRECTV faced the challenge of dealing with high transactional data volumes created by no. of daily customer calls. • Accommodating such a large data volume, along with changing market conditions was one of key challenges. 44Gurpreet Singh, MGN646
  • 45. Solution • Used software solutions of Teradata and GoldenGate to develop a product that integrates its data assets in near real time throught the enterprise. • The goal of the new data warehouse system was to send fresh data to the call center at least daily. 45Gurpreet Singh, MGN646
  • 46. Results • Sales personnel were able to retain customers. 46Gurpreet Singh, MGN646