SlideShare a Scribd company logo
INTRODUCTION TO DATA 
WAREHOUSING 
BY 
INFORMATICATRAININGCLASSES 
PHONE : (404)-900-9988 
EMAIL : INFO@INFORMATICATRAININGCLASSES.COM 
WEBSITE : WWW.INFORMATICATRAININGCLASSES.COM
DATAWAREHOUSE 
 Maintain historic data 
 Analysis to get better understanding of business 
 Better Decision making 
 Definition: A data warehouse is a 
 subject-oriented 
 integrated 
 time-varying 
non-volatile 
collection of data that is used primarily in organizational 
decision making. 
-- Bill Inmon, Building the Data Warehouse 
1996
SUBJECT ORIENTED 
• Data warehouse is organized around subjects such as sales, 
product, customer. 
• It focuses on modeling and analysis of data for decision 
makers. 
• Excludes data not useful in decision support process.
INTEGRATED 
• Data Warehouse is constructed by integrating multiple 
heterogeneous sources. 
• Data Preprocessing are applied to ensure consistency. 
RDBMS 
Legacy 
System 
Data 
Warehouse 
Flat File 
Data Processing 
Data Transformation 
Data Processing 
Data Transformation
NON-VOLATILE 
• Mostly, data once recorded will not be updated. 
• Data warehouse requires two operations in data accessing 
- Incremental loading of data 
- Access of data 
load access
TIME VARIANT 
• Provides information from historical perspective e.g. past 5- 
10 years 
• Every key structure contains either implicitly or explicitly an 
element of time
WHY DATA WAREHOUSE? 
Problem Statement: 
• ABC Pvt Ltd is a company with branches at USA, 
UK,CANADA,INDIA 
• The Sales Manager wants quarterly sales report across the 
branches. 
• Each branch has a separate operational system where sales 
transactions are recorded.
WHY DATA WAREHOUSE? 
USA 
UK 
CANADA 
INDIA 
Sales 
Manager 
Get quarterly sales figure 
for each branch 
and manually calculate 
sales figure across branches. 
What if he need daily sales report across the branches?
WHY DATA WAREHOUSE? 
Solution: 
• Extract sales information from each database. 
• Store the information in a common repository at a single site.
WHY DATA WAREHOUSE? 
USA 
UK 
CANADA 
INDIA 
Data 
Warehouse 
Sales 
Manager 
Query & 
Analysis tools
CHARACTERISTICS OF DATAWAREHOUSE 
 Relational / Multidimensional database 
 Query and Analysis rather than transaction 
 Historical data from transactions 
 Consolidates Multiple data sources 
 Separates query load from transactions 
 Mostly non volatile 
 Large amount of data in order of TBs
WHEN WE SAY LARGE - WE MEAN IT! 
• Terabytes -- 10^12 bytes: 
• Petabytes -- 10^15 bytes: 
• Exabytes -- 10^18 bytes: 
• Zettabytes -- 10^21 bytes: 
• Zottabytes -- 10^24 bytes: 
Yahoo! – 300 Terabytes and 
growing 
Geographic Information Systems 
National Medical Records 
Weather images 
Intelligence Agency Videos
OLTP VS DATA WAREHOUSE (OLAP) 
OLTP Data Warehouse (OLAP) 
Indexes Few Many 
Data Normalized Generally De-normalized 
Joins Many Some 
Derived data and aggregates Rare Common
DATA WAREHOUSE ARCHITECTURE 
Flat 
Files 
ETL 
(Extract 
Transform 
and Load) 
Data 
Warehouse 
Sales 
Data Mart 
Inventory 
Data Mart 
Analysis 
Data Mining 
Reporting 
Generic 
Data Mart 
Operational 
System 
Operational 
System 
Flat 
Files
ETL 
ETL stands for Extract, Transform and Load 
 Data is distributed across different sources 
– Flat files, Streaming Data, DB Systems, XML, JSON 
 Data can be in different format 
– CSV, Key Value Pairs 
 Different units and representation 
– Country: IN or India 
– Date: 20 Nov 2010 or 20101020
ETL FUNCTIONS 
 Extract 
– Collect data from different sources 
– Parse data 
– Remove unwanted data 
 Transform 
– Project 
– Generate Surrogate keys 
– Encode data 
– Join data from different sources 
– Aggregate 
 Load
ETL STEPS 
• The first step in ETL process is mapping the data between 
source systems and target database. 
• The second step is cleansing of source data in staging area. 
• The third step is transforming cleansed source data. 
• Fourth step is loading into the target system. 
 Data before ETL Processing: 
 Data after ETL Processing:
ETL GLOSSARY 
Mapping: 
Defining relationship between source and target objects. 
Cleansing: 
The process of resolving inconsistencies in source data. 
Transformation: 
The process of manipulating data. Any manipulation beyond 
copying is a transformation. Examples include aggregating, and 
integrating data from multiple sources. 
Staging Area: 
A place where data is processed before entering the warehouse.
DIMENSION 
 Categorizes the data. For example - time, location, etc. 
 A dimension can have one or more attributes. For example 
- day, week and month are attributes of time dimension. 
 Role of dimensions in data warehousing. 
- Slice and dice 
- Filter by dimensions
TYPES OF DIMENSIONS 
• Conformed Dimension - A dimension that is shared across fact tables. 
• Junk Dimension - A junk dimension is a convenient grouping of flags 
and indicators. For example, payment method, shipping method. 
• De-generated Dimension - A dimension key, that has no attributes and 
hence does not have its own dimension table. For example, 
transaction number, invoice number. Value of these dimension is 
mostly unique within a fact table. 
• Role Playing Dimensions - Role Playing dimension refers to a 
dimension that play different roles in fact tables depending on the 
context. For example, the Date dimension can be used for the ordered 
date, shipment date, and invoice date. 
• Slowly Changing Dimensions - Dimensions that have data that 
changes slowly, rather than changing on a time-based, regular 
schedule.
TYPES OF SLOWLY CHANGING DIMENSION 
• Type1 - The Type 1 methodology overwrites old data with new data, and 
therefore does not track historical data at all. 
• Type 2 - The Type 2 method tracks historical data by creating multiple records 
for a given value in dimension table with separate surrogate keys. 
• Type 3 - The Type 3 method tracks changes using separate columns. Whereas 
Type 2 had unlimited history preservation, Type 3 has limited history 
preservation, as it's limited to the number of columns we designate for storing 
historical data. 
• Type 4 - The Type 4 method is usually referred to as using "history tables", 
where one table keeps the current data, and an additional table is used to keep 
a record of all changes. 
Type 1, 2 and 3 are commonly used. 
Some books talks about Type 0 and 6 also. 
http://en.wikipedia.org/wiki/Slowly_changing_dimension
FACTS 
 Facts are values that can be examined and analyzed. 
 For Example - Page Views, Unique Users, Pieces Sold, 
Profit. 
 Fact and measure are synonymous. 
 Types of facts: 
– Additive - Measures that can be added across all 
dimensions. 
– Non Additive - Measures that cannot be added across 
all dimensions. 
– Semi Additive - Measures that can be added across 
few dimensions and not with others.
HOW TO STORE DATA? 
Facts and Dimensions: 
1. Select the business process to model 
2. Declare the grain of the business process 
3. Choose the dimensions that apply to each fact table row 
4. Identify the numeric facts that will populate each fact table 
row
DIMENSION TABLE 
 Contains attributes of dimensions e.g. month is an attribute 
of Time dimension. 
 Can also have foreign keys to another dimension table 
 Usually identified by a unique integer primary key called 
surrogate key
FACT TABLE 
 Contains Facts 
 Foreign keys to dimension tables 
 Primary Key: usually composite key of all FKs
TYPES OF SCHEMA USED IN DATA WAREHOUSE 
 Star Schema 
 Snowflake Schema 
 Fact Constellation Schema
STAR SCHEMA 
 Multi-dimensional Data 
 Dimension and Fact Tables 
 A fact table with pointers to Dimension tables
STAR SCHEMA
SNOWFLAKE SCHEMA 
 An extension of star schema in which the dimension tables 
are partly or fully normalized. 
 Dimension table hierarchies broken down into simpler 
tables.
SNOWFLAKE SCHEMA
FACT CONSTELLATION SCHEMA 
• A fact constellation schema allows dimension tables to be 
shared between fact tables. 
• This Schema is used mainly for the aggregate fact tables, 
OR where we want to split a fact table for better 
comprehension. 
 For example, a separate fact table for daily, weekly and 
monthly reporting requirement.
FACT CONSTELLATION SCHEMA 
In this example, the dimensions tables for time, item, and location are 
shared between both the sales and shipping fact tables.
OPERATIONS ON DATAWAREHOUSE 
 Drill Down 
 Roll up 
 Slice & Dice 
 Pivoting
DRILL DOWN 
Time 
Product 
Category e.g Home Appliances 
Sub Category e.g Kitchen Appliances 
Product e.g Toaster
ROLL UP 
Year 
Quarter 
Month 
Fiscal Year 
Fiscal Quarter 
Fiscal Month 
Fiscal Week 
Day
SLICE & DICE 
Time 
Product 
Product = Toaster 
Time
PIVOTING 
Time 
Product 
• Also called rotation 
• Rotate on an axis 
• Interchange Rows and Columns 
Region 
Product
ADVANTAGES OF DATA WAREHOUSE 
• One consistent data store for reporting, forecasting, and 
analysis 
• Easier and timely access to data 
• Scalability 
• Trend analysis and detection 
• Drill down analysis
DISADVANTAGES OF DATA WAREHOUSE 
• Preparation may be time consuming. 
• High associated cost
CASE STUDY: WHY DATA WAREHOUSE 
• G2G Courier Pvt. Ltd. is an established brand in courier 
industry which has its own network in main cities and also 
have sub contracted in rural areas across the country to 
various partners. 
• The President of the company wants to look deep into the 
financial health of the company and different performance 
aspects.
CHALLENGES 
• Apart from G2G’s own transaction system, each partner has 
their own system which make the data very heterogeneous. 
• Granularity of data in various systems is also different. For 
eg: minute accuracy and day accuracy. 
• To do analysis on metrics like Revenue and Timely delivery 
across various geographical locations and partner, we need 
to have a unified system.
DATA WAREHOUSE MODEL 
Sales Fact 
Region 
Product 
Product 
Category 
Time
THANK YOU

More Related Content

What's hot

Data warehousing
Data warehousingData warehousing
Data warehousingVarun Jain
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Data warehouse
Data warehouseData warehouse
Data warehouse
shachibattar
 
Real World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data WarehousingReal World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data Warehousing
ukc4
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing Girish Dhareshwar
 
Data warehousing
Data warehousingData warehousing
Data warehousing
Subhanshu Verma
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouse
Krish_ver2
 
Basic Introduction of Data Warehousing from Adiva Consulting
Basic Introduction of  Data Warehousing from Adiva ConsultingBasic Introduction of  Data Warehousing from Adiva Consulting
Basic Introduction of Data Warehousing from Adiva Consulting
adivasoft
 
Data Warehousing Overview
Data Warehousing OverviewData Warehousing Overview
Data Warehousing Overview
Ahmed Gamal
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
King Julian
 
Open Source Datawarehouse
Open Source DatawarehouseOpen Source Datawarehouse
Open Source Datawarehouse
عباس بني اسدي مقدم
 
Data Warehousing - in the real world
Data Warehousing - in the real worldData Warehousing - in the real world
Data Warehousing - in the real world
ukc4
 
Data warehousing and Data mining
Data warehousing and Data mining Data warehousing and Data mining
Data warehousing and Data mining
Bahria University ,
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
thomasmary607
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
PanaEk Warawit
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemas
Eric Matthews
 

What's hot (20)

Data warehousing
Data warehousingData warehousing
Data warehousing
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
data warehousing
data warehousingdata warehousing
data warehousing
 
Data-ware Housing
Data-ware HousingData-ware Housing
Data-ware Housing
 
Real World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data WarehousingReal World Business Intelligence and Data Warehousing
Real World Business Intelligence and Data Warehousing
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouse
 
Basic Introduction of Data Warehousing from Adiva Consulting
Basic Introduction of  Data Warehousing from Adiva ConsultingBasic Introduction of  Data Warehousing from Adiva Consulting
Basic Introduction of Data Warehousing from Adiva Consulting
 
Data Warehousing Overview
Data Warehousing OverviewData Warehousing Overview
Data Warehousing Overview
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Open Source Datawarehouse
Open Source DatawarehouseOpen Source Datawarehouse
Open Source Datawarehouse
 
Ppt
PptPpt
Ppt
 
Data Warehousing - in the real world
Data Warehousing - in the real worldData Warehousing - in the real world
Data Warehousing - in the real world
 
Data warehousing and Data mining
Data warehousing and Data mining Data warehousing and Data mining
Data warehousing and Data mining
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemas
 

Viewers also liked

My presentation on data warehouse
My presentation on data warehouseMy presentation on data warehouse
My presentation on data warehouseChanchal Tripathi
 
Datawarehouse & bi introduction
Datawarehouse & bi introductionDatawarehouse & bi introduction
Datawarehouse & bi introduction
Shivmohan Purohit
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehousekiran14360
 
Mind Reader
Mind ReaderMind Reader
Mind Reader
Jason S
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
Lovely Professional University
 
General presentation
General presentationGeneral presentation
General presentation
Lovely Professional University
 
Power BI Desktop screen tour in Thai
Power BI Desktop screen tour in ThaiPower BI Desktop screen tour in Thai
Power BI Desktop screen tour in Thai
PanaEk Warawit
 
Organic Terrace Gardening by Jason
Organic Terrace Gardening by JasonOrganic Terrace Gardening by Jason
Organic Terrace Gardening by Jason
Jason S
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
obieefans
 
Brand setters: A creative view by symbians
Brand setters: A creative view by symbiansBrand setters: A creative view by symbians
Brand setters: A creative view by symbians
ac001
 
Haldimand county police department interview questions
Haldimand county police department interview questionsHaldimand county police department interview questions
Haldimand county police department interview questionsselinasimpson69
 
5c
5c5c
Sobha Clovelly
Sobha ClovellySobha Clovelly
Sobha Clovelly
Kin Housing
 

Viewers also liked (14)

My presentation on data warehouse
My presentation on data warehouseMy presentation on data warehouse
My presentation on data warehouse
 
Datawarehouse & bi introduction
Datawarehouse & bi introductionDatawarehouse & bi introduction
Datawarehouse & bi introduction
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
 
Mind Reader
Mind ReaderMind Reader
Mind Reader
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
General presentation
General presentationGeneral presentation
General presentation
 
Power BI Desktop screen tour in Thai
Power BI Desktop screen tour in ThaiPower BI Desktop screen tour in Thai
Power BI Desktop screen tour in Thai
 
Organic Terrace Gardening by Jason
Organic Terrace Gardening by JasonOrganic Terrace Gardening by Jason
Organic Terrace Gardening by Jason
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
Vt2014
Vt2014Vt2014
Vt2014
 
Brand setters: A creative view by symbians
Brand setters: A creative view by symbiansBrand setters: A creative view by symbians
Brand setters: A creative view by symbians
 
Haldimand county police department interview questions
Haldimand county police department interview questionsHaldimand county police department interview questions
Haldimand county police department interview questions
 
5c
5c5c
5c
 
Sobha Clovelly
Sobha ClovellySobha Clovelly
Sobha Clovelly
 

Similar to Dataware house introduction by InformaticaTrainingClasses

Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
Quontra Solutions
 
Data warehouse - Nivetha Durganathan
Data warehouse - Nivetha DurganathanData warehouse - Nivetha Durganathan
Data warehouse - Nivetha Durganathan
Nivetha Durganathan
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousing
Shahed Khalili
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)
tafosepsdfasg
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
Yogendra Uikey
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
Rishikese MR
 
Data Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OneData Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_One
Panchaleswar Nayak
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
MohammedAmeenUlIslam1
 
datamarts.ppt
datamarts.pptdatamarts.ppt
datamarts.ppt
bhavyag24
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptx
SalehaMariyam
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
Kiran kumar
 
Cognos datawarehouse
Cognos datawarehouseCognos datawarehouse
Cognos datawarehouse
ssuser7fc7eb
 
Datawarehouse
DatawarehouseDatawarehouse
Module 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxModule 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptx
nikshaikh786
 
data warehousing
data warehousingdata warehousing
data warehousing
143sohil
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptx
hqlm1
 
presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptx
vipush1
 

Similar to Dataware house introduction by InformaticaTrainingClasses (20)

Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Data warehouse - Nivetha Durganathan
Data warehouse - Nivetha DurganathanData warehouse - Nivetha Durganathan
Data warehouse - Nivetha Durganathan
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousing
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Data Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OneData Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_One
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
 
datamarts.ppt
datamarts.pptdatamarts.ppt
datamarts.ppt
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptx
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
 
Cognos datawarehouse
Cognos datawarehouseCognos datawarehouse
Cognos datawarehouse
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
 
Module 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxModule 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptx
 
3dw
3dw3dw
3dw
 
data warehousing
data warehousingdata warehousing
data warehousing
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptx
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptx
 

Recently uploaded

1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
CarlosHernanMontoyab2
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 

Recently uploaded (20)

1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf678020731-Sumas-y-Restas-Para-Colorear.pdf
678020731-Sumas-y-Restas-Para-Colorear.pdf
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 

Dataware house introduction by InformaticaTrainingClasses

  • 1. INTRODUCTION TO DATA WAREHOUSING BY INFORMATICATRAININGCLASSES PHONE : (404)-900-9988 EMAIL : INFO@INFORMATICATRAININGCLASSES.COM WEBSITE : WWW.INFORMATICATRAININGCLASSES.COM
  • 2. DATAWAREHOUSE  Maintain historic data  Analysis to get better understanding of business  Better Decision making  Definition: A data warehouse is a  subject-oriented  integrated  time-varying non-volatile collection of data that is used primarily in organizational decision making. -- Bill Inmon, Building the Data Warehouse 1996
  • 3. SUBJECT ORIENTED • Data warehouse is organized around subjects such as sales, product, customer. • It focuses on modeling and analysis of data for decision makers. • Excludes data not useful in decision support process.
  • 4. INTEGRATED • Data Warehouse is constructed by integrating multiple heterogeneous sources. • Data Preprocessing are applied to ensure consistency. RDBMS Legacy System Data Warehouse Flat File Data Processing Data Transformation Data Processing Data Transformation
  • 5. NON-VOLATILE • Mostly, data once recorded will not be updated. • Data warehouse requires two operations in data accessing - Incremental loading of data - Access of data load access
  • 6. TIME VARIANT • Provides information from historical perspective e.g. past 5- 10 years • Every key structure contains either implicitly or explicitly an element of time
  • 7. WHY DATA WAREHOUSE? Problem Statement: • ABC Pvt Ltd is a company with branches at USA, UK,CANADA,INDIA • The Sales Manager wants quarterly sales report across the branches. • Each branch has a separate operational system where sales transactions are recorded.
  • 8. WHY DATA WAREHOUSE? USA UK CANADA INDIA Sales Manager Get quarterly sales figure for each branch and manually calculate sales figure across branches. What if he need daily sales report across the branches?
  • 9. WHY DATA WAREHOUSE? Solution: • Extract sales information from each database. • Store the information in a common repository at a single site.
  • 10. WHY DATA WAREHOUSE? USA UK CANADA INDIA Data Warehouse Sales Manager Query & Analysis tools
  • 11. CHARACTERISTICS OF DATAWAREHOUSE  Relational / Multidimensional database  Query and Analysis rather than transaction  Historical data from transactions  Consolidates Multiple data sources  Separates query load from transactions  Mostly non volatile  Large amount of data in order of TBs
  • 12. WHEN WE SAY LARGE - WE MEAN IT! • Terabytes -- 10^12 bytes: • Petabytes -- 10^15 bytes: • Exabytes -- 10^18 bytes: • Zettabytes -- 10^21 bytes: • Zottabytes -- 10^24 bytes: Yahoo! – 300 Terabytes and growing Geographic Information Systems National Medical Records Weather images Intelligence Agency Videos
  • 13. OLTP VS DATA WAREHOUSE (OLAP) OLTP Data Warehouse (OLAP) Indexes Few Many Data Normalized Generally De-normalized Joins Many Some Derived data and aggregates Rare Common
  • 14. DATA WAREHOUSE ARCHITECTURE Flat Files ETL (Extract Transform and Load) Data Warehouse Sales Data Mart Inventory Data Mart Analysis Data Mining Reporting Generic Data Mart Operational System Operational System Flat Files
  • 15. ETL ETL stands for Extract, Transform and Load  Data is distributed across different sources – Flat files, Streaming Data, DB Systems, XML, JSON  Data can be in different format – CSV, Key Value Pairs  Different units and representation – Country: IN or India – Date: 20 Nov 2010 or 20101020
  • 16. ETL FUNCTIONS  Extract – Collect data from different sources – Parse data – Remove unwanted data  Transform – Project – Generate Surrogate keys – Encode data – Join data from different sources – Aggregate  Load
  • 17. ETL STEPS • The first step in ETL process is mapping the data between source systems and target database. • The second step is cleansing of source data in staging area. • The third step is transforming cleansed source data. • Fourth step is loading into the target system.  Data before ETL Processing:  Data after ETL Processing:
  • 18. ETL GLOSSARY Mapping: Defining relationship between source and target objects. Cleansing: The process of resolving inconsistencies in source data. Transformation: The process of manipulating data. Any manipulation beyond copying is a transformation. Examples include aggregating, and integrating data from multiple sources. Staging Area: A place where data is processed before entering the warehouse.
  • 19. DIMENSION  Categorizes the data. For example - time, location, etc.  A dimension can have one or more attributes. For example - day, week and month are attributes of time dimension.  Role of dimensions in data warehousing. - Slice and dice - Filter by dimensions
  • 20. TYPES OF DIMENSIONS • Conformed Dimension - A dimension that is shared across fact tables. • Junk Dimension - A junk dimension is a convenient grouping of flags and indicators. For example, payment method, shipping method. • De-generated Dimension - A dimension key, that has no attributes and hence does not have its own dimension table. For example, transaction number, invoice number. Value of these dimension is mostly unique within a fact table. • Role Playing Dimensions - Role Playing dimension refers to a dimension that play different roles in fact tables depending on the context. For example, the Date dimension can be used for the ordered date, shipment date, and invoice date. • Slowly Changing Dimensions - Dimensions that have data that changes slowly, rather than changing on a time-based, regular schedule.
  • 21. TYPES OF SLOWLY CHANGING DIMENSION • Type1 - The Type 1 methodology overwrites old data with new data, and therefore does not track historical data at all. • Type 2 - The Type 2 method tracks historical data by creating multiple records for a given value in dimension table with separate surrogate keys. • Type 3 - The Type 3 method tracks changes using separate columns. Whereas Type 2 had unlimited history preservation, Type 3 has limited history preservation, as it's limited to the number of columns we designate for storing historical data. • Type 4 - The Type 4 method is usually referred to as using "history tables", where one table keeps the current data, and an additional table is used to keep a record of all changes. Type 1, 2 and 3 are commonly used. Some books talks about Type 0 and 6 also. http://en.wikipedia.org/wiki/Slowly_changing_dimension
  • 22. FACTS  Facts are values that can be examined and analyzed.  For Example - Page Views, Unique Users, Pieces Sold, Profit.  Fact and measure are synonymous.  Types of facts: – Additive - Measures that can be added across all dimensions. – Non Additive - Measures that cannot be added across all dimensions. – Semi Additive - Measures that can be added across few dimensions and not with others.
  • 23. HOW TO STORE DATA? Facts and Dimensions: 1. Select the business process to model 2. Declare the grain of the business process 3. Choose the dimensions that apply to each fact table row 4. Identify the numeric facts that will populate each fact table row
  • 24. DIMENSION TABLE  Contains attributes of dimensions e.g. month is an attribute of Time dimension.  Can also have foreign keys to another dimension table  Usually identified by a unique integer primary key called surrogate key
  • 25. FACT TABLE  Contains Facts  Foreign keys to dimension tables  Primary Key: usually composite key of all FKs
  • 26. TYPES OF SCHEMA USED IN DATA WAREHOUSE  Star Schema  Snowflake Schema  Fact Constellation Schema
  • 27. STAR SCHEMA  Multi-dimensional Data  Dimension and Fact Tables  A fact table with pointers to Dimension tables
  • 29. SNOWFLAKE SCHEMA  An extension of star schema in which the dimension tables are partly or fully normalized.  Dimension table hierarchies broken down into simpler tables.
  • 31. FACT CONSTELLATION SCHEMA • A fact constellation schema allows dimension tables to be shared between fact tables. • This Schema is used mainly for the aggregate fact tables, OR where we want to split a fact table for better comprehension.  For example, a separate fact table for daily, weekly and monthly reporting requirement.
  • 32. FACT CONSTELLATION SCHEMA In this example, the dimensions tables for time, item, and location are shared between both the sales and shipping fact tables.
  • 33. OPERATIONS ON DATAWAREHOUSE  Drill Down  Roll up  Slice & Dice  Pivoting
  • 34. DRILL DOWN Time Product Category e.g Home Appliances Sub Category e.g Kitchen Appliances Product e.g Toaster
  • 35. ROLL UP Year Quarter Month Fiscal Year Fiscal Quarter Fiscal Month Fiscal Week Day
  • 36. SLICE & DICE Time Product Product = Toaster Time
  • 37. PIVOTING Time Product • Also called rotation • Rotate on an axis • Interchange Rows and Columns Region Product
  • 38. ADVANTAGES OF DATA WAREHOUSE • One consistent data store for reporting, forecasting, and analysis • Easier and timely access to data • Scalability • Trend analysis and detection • Drill down analysis
  • 39. DISADVANTAGES OF DATA WAREHOUSE • Preparation may be time consuming. • High associated cost
  • 40. CASE STUDY: WHY DATA WAREHOUSE • G2G Courier Pvt. Ltd. is an established brand in courier industry which has its own network in main cities and also have sub contracted in rural areas across the country to various partners. • The President of the company wants to look deep into the financial health of the company and different performance aspects.
  • 41. CHALLENGES • Apart from G2G’s own transaction system, each partner has their own system which make the data very heterogeneous. • Granularity of data in various systems is also different. For eg: minute accuracy and day accuracy. • To do analysis on metrics like Revenue and Timely delivery across various geographical locations and partner, we need to have a unified system.
  • 42. DATA WAREHOUSE MODEL Sales Fact Region Product Product Category Time