The document provides an overview of data warehousing concepts. It defines a data warehouse as a subject-oriented, integrated, time-variant, and non-volatile collection of data. It discusses the differences between OLTP and OLAP systems. It also covers data warehouse architectures, components, and processes. Additionally, it explains key concepts like facts and dimensions, star schemas, normalization forms, and metadata.
2. Intelligent Ideas. Swift Execution.
Agenda
• Overview of Data-warehousing
• Difference between OLTP and Data-warehouse(OLAP)
• Terminologies in the data-warehouse world
• Architecture of data warehouse
• Components of data warehouse
• Process flows in data warehouse
• Relational v/s Dimensional modelling
• Kimball v/s Inmon Approach
• Zachman Framework
• Facts and Dimensions
• Different kinds of schemas
• Different Kind of Keys
• Normalization
3. Intelligent Ideas. Swift Execution.
Overview of Datawarehousing
• The term "Data Warehouse" was first coined by Bill Inmon in 1990.
• “A data warehouse is a subject-oriented, integrated, time-variant, and
non-volatile collection of data. “
• Subject Oriented WH is organized around major subjects like Finance,
Marketing, Sales
• Integrated Presents unified view of data coming from various sources
like SAP, legacy systems
• Non-Volatile New data is added on an incremental basis and not as
replacement.
• Time Variant Data is accurate only for the some point of time and will
change for Daily, Weekly, Monthly
4. Intelligent Ideas. Swift Execution.
Overview of Datawarehousing
• A data warehouse is a database, which is kept separate from the
organization's operational database.
• There is no frequent updating done in a data warehouse.
• It possesses consolidated historical data, which helps the organization to
analyze its business.
• A data warehouse helps executives to organize, understand, and use their
data to take strategic decisions.
• Data warehouse systems help in the integration of diversity of application
systems.
• A data warehouse system helps in consolidated historical data analysis.
• Used in Financial Services, Banking, Consumer Goods, Retail etc.
5. Intelligent Ideas. Swift Execution.
Difference Between OLAP and OLTP
Data warehouse (OLAP) OLTP
Historical processing of information Day to Day processing of transactions
Used by Executives, Managers and Analysts Used by clerks, DBAs or database professionals.
Used to analyze the business Used to run the business
Focuses on information out Focuses on Data in
Based on Star Schema, snowflake and Fact constellation Based on Entity Relationship Model
Provides multidimensional view of summarized and
consolidated data
Provides relational view of highly detailed data
No of records accessed are in millions. No of records accessed are in tens or hundreds
Database size is 100GB to 100TB Database size is 100MB to 100GB.
6. Intelligent Ideas. Swift Execution.
Problems in Datawarehousing
Problems
• Underestimation of resources for data loading
• Hidden problems with source systems
• Required data not captured
• Increased end-user demands
• Data homogenization
• High demand for resources
• Data ownership
• High maintenance
• Long-duration projects
• Complexity of integration
7. Intelligent Ideas. Swift Execution.
Terminologies in Datawarehousing
Terminologies
Metadata : Metadata is simply defined as data about data. The data that is used to
represent other data is called Metadata.
In terms of datawarehouse
• Metadata is roadmap to warehouse.
• Metadata in the data warehouse represents the warehouse objects.
• Metadata acts as a directory and helps in locating contents of data warehouse.
Metadata Repository : It is an integral part of data warehouse. It contains following
metadata.
Business Metadata
Operational Metadata
Data for mapping operation environment to data warehouse
Algorithm for summarization.
8. Intelligent Ideas. Swift Execution.
Terminologies in Datawarehousing
Terminologies
Data Cube: A data cube helps us represent data in multiple dimensions. It is defined by
dimensions and facts.
9. Intelligent Ideas. Swift Execution.
Terminologies in Data-warehousing
Terminologies
Data Mart : A subset of organisation wide data that is specific to a group of people
within the organisation.
• Data Marts are smaller in size.
• Data marts are customized for a
department.
• Data marts are flexible.
• Source of datamarts is
departmentaly structured DW.
• Implementation generally takes
weeks rather than months or
years.
10. Intelligent Ideas. Swift Execution.
Datawarehouse Architecture
Architecure of a typical Data warehouse
Operational
data source1
Query Manager
Warehouse Manager
DBMS
Operational
data source 2
Meta-data High
summarized data
Detailed data
Lightly
summarized
data
Operational
data store (ods)
Operational
data source n
Archive/backup
data
Load
Manager
Data mining
OLAP(online analytical processing) tools
Reporting, query,application development,
and EIS(executive information system) tools
End-user access tools
Operational data store (ODS)
11. Intelligent Ideas. Swift Execution.
Datawarehouse Architecture
Components of Data warehouse
• Query Manager
• Detailed and summarized data
• Archive/backup data
• End user access tools
Process flow in Data warehouse
• Extract and load data
• Cleaning and transformation of data
• Backup and archive data
• Managing queries
12. Intelligent Ideas. Swift Execution.
Relational v/s Dimensional Modelling
Relational Modelling Dimensional Modelling
Entity Relationship (ER) Model Facts and Dimensions, Star Schema
Normalization rules Less tables, but have duplicate/redundant data
Many tables using joins Easier for business user to understand
History tables using natural keys Slowly changing dimensions, surrogate keys
Good for indirect end-user access of data Good for end-user access of data
13. Intelligent Ideas. Swift Execution.
Kimball v/s Inmon approach
Kimball Inmon
Logical data warehouse made up of various data marts Enterprise data-warehouse
Business driven, users have active participation IT driven, users have passive participation
Decentralized datamarts Centralized atomic normalized table
Independent dimensional data marts for analytics Later create dependent data marts
2 tier (data mart, cube) Less ETL, no data duplication 3 tier (data-warehouse, data mart, cubes) ,data
duplication
16. Intelligent Ideas. Swift Execution.
Zachman Enterprise framework
Row 1 – Scope
External products and drivers
Business Function Modelling
Row2 – Enterprise Model
Business Process Models
Row 3 – System Model
Logical Models
Requirement Definitions
Row 4 – Technology Model
Physical Models
Solution Definition and Development
Row 5 – As Built
As Built
Deployment
Row 6 – Functioning Enterprise
Functioning Enterprise
Evaluation
17. Intelligent Ideas. Swift Execution.
Facts and Dimensions
Facts Dimensions
Business facts (or measures), generally numeric Data set composed of individual, non-overlapping data
elements
It can be aggregated Commonly used dimensions are date, customer,
product
Examples are Sales price, quanity, revenue Customer dimension attributes may be first name, last
name, birth data, gender
Fact table contains keys to dimensional table as well as
measurable facts
Contains a hierarchy of attributes
They can grow very large in millions and billions Date Dimension : year > quarter > month > week > date
Going up a level in the hierarchy called rolling of data
Going down a hierarchy called drilling down
19. Intelligent Ideas. Swift Execution.
Different Kinds of schema
• Star Schema
• Snowflake Schema
• Fact Constellation Schema
20. Intelligent Ideas. Swift Execution.
Star schema
Star Schema
• One fact table and multiple dimensional
tables.
• Results can be retrieved easily.
• Dimension table holds descriptive data.
reflecting dimensions or attributes.
• A fact table in the middle surrounded by
dimension tables.
21. Intelligent Ideas. Swift Execution.
Snowflake Schema
Snowflake Schema
• A refinement of star schema
• Some dimensional hierarchy is
normalized into a set of smaller
dimensional table.
• Connects entities to dimensional
table rather than to fact tables.
23. Intelligent Ideas. Swift Execution.
Different Kind of keys
Natural key : A natural key is simply a column or set of columns in a tbale that uniquely identifies
each row. E.g the natural key for a customer table may be customer id.
Surrogate Keys : A surrogate key is an unintelligent /dumb key not derived from application data,
It is artificially derived to cover regular changes within the fact and dimensional table
Alternate Key : Any key which is candidate key but not selected to be primary key.
Candidate Key : A field or combination of fields that can act as primary key for that table.
Compound Key : Also called composite or concatenated key and consists of 2 or more attributes.
Primary Key ; A value that can be used to identify a row uniquely.
Foreign Key : It is an attribute or combination of attributes in one base table that points to primary
key of another table.
Super Key : An attribute or combination of attribute that can uniquely identify a row in the table.
Secondary Key : Attributes that are not the super key but still used to identify records in the
database e.g Name.
Artificial Key : If no obvious key is available then a key is created by assigning an unique number.
.
24. Intelligent Ideas. Swift Execution.
First Normal Form
First Normal Form : A relation is in first normal form if there are no repeating or duplicate values.
Each record is unique.
Issues :
Which customer has telephone number 01245847698
What if customer has more than 2 telephone numbers
If a customer has same value for Contact No1 and No2.
Solution :
Have 2 tables, 1st for Customer and 2nd for Contact No.
Customer
Cust Id Name Contact No1. Contact No2
123 Navin Kumar 01202536145
456 Richard Branson 01245847698 01245847699
789 Harry DSouza 01132445465
Customer
Cust_Id Name
123 Navin Kumar
456 Richard Branson
789 Harry Dsouza
Contact
Cust_Id Contact No
123 01202536145
456 01245847698
456 01245847699
789 01132445465
25. Intelligent Ideas. Swift Execution.
Second Normal Form
Second Normal Form : A relation is in second normal form if it is in first normal form and all non
primary key fields depend on all components of primary key (not on partial key).
.
ISBN
ISBN Title Pages Publisher
0072958863 Database System Concepts 1168 1
0471688465 Operating System Concepts 844 1
0072958863 Database Management Systems 1164 2
ISBN Publisher Id
0072958863 1
0072958863 2
0471688465 1
Title Pages
Database System Concepts 1168
Operating System Concepts 844
Database Management System 1164
26. Intelligent Ideas. Swift Execution.
Third Normal Form
Third Normal Form : A relation is in third normal form if It is second normal form and no non key
field depends upon another and only depend on primary key.
.
ISBN
Author BookName ISBN Printed At
Chetan Bhagat Two States 12234 Bangalore
Amish Tripathi The Secret of the Nagas 45567 Kolkata
J K Rowling The Casual vacancy 75535 London
Author BookName
Chetan Bhagat Two States
Amish Tripathi The Secret of the Nagas
J K Rowling The Casual vacancy
ISBN Printed At
12234 Bangalore
45567 Kolkata
J K Rowling London
27. Intelligent Ideas. Swift Execution.
Boyce Codd Normal Form
BCNF (Boyce Codd Normal Form) : A relation is in BCNF if and only if every determinant in the
table is a candidate key. Every table in BCNF is a 3 NF. It is a special case of 3 NF.
.
Patient
Patient# Patient Name Address
1111 Johny Brown 55, Boston Road, Chester
1234 Anita Sood 12, Brook Road, Princetown
2345 Mary Jones Hilton Road, New Jersey , NY
Patient# Patient Name
1111 Johny Brown
1234 Anita Sood
2345 Mary Jones
Patient# Address
1111 55, Boston Road, Chester
1234 55, Boston Road, Chester
2345 55, Boston Road, Chester
28. Intelligent Ideas. Swift Execution.
Other Normal Forms
Fourth Normal Form : A relation is in fourth Normal form if it is in BCNF and any multivalued
dependencies are trivial.
Fifth Normal Form : A relation is in fifth Normal form if every join dependency in the relation is
implied by keys of the relation.
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos
Moved “Second largest private IT Co ……….. Top 10 fastest growing companies” to the bottom.
Rephrased: Industry Expertise – Retail, Hi-Tech, BFSI, Healthcare, Manufacturing, Telecom, Travel, Hospitality, Media & Education
Centers of Excellence – Enterprise Applications, Mobility/Collaboration, Big Data/BI, Staffing Services & Cloud Solutions
Regrouping the Logos