This document outlines an introductory session on data warehousing. It introduces the course instructor and participants. The course topics include introduction and background, de-normalization, online analytical processing, dimensional modeling, extract-transform-load, data quality management, and data mining. Students are advised to attend class, strive to learn, be on time, pay attention, ask questions, be prepared, and not use phones or eat in class. The goal is for students to understand database concepts in very large databases and data warehouses.
1. Intro to Data Warehousing
Ch Anwar ul Hassan (Lecturer)
Department of Computer Science and Software Engineering
Capital University of Sciences & Technology, Islamabad Pakistan
anwarchaudary@gmail.com
2. Today is Introductory Session
Today’s Agenda
Resource Person
Participants
Course
4. Introduction of Participants
Name
Previous qualification & expertise
Courses studies in area of Programming
Expectations from this course
5. How to be successful in my class
1. Come to class
2. Strive to learn
3. Be on time for class
4. Pay attention in class. Ask questions.
5. If you don’t understand a topic and/or don’t understand why it’s
relevant, ASK.
6. Be prepared to answer questions in class (Revise Previous
lectures).
7. DO NOT use cell phone and walk out of class during a class
8. Don’t allow to bring eatable items in the class room even water.
6. Course Profile
Credit Hours: 03
Evaluation Criteria
Assignments 10%
Research work 10%
Quizzes 20%
Mid term 20%
Final exam 40%
8. 8
Reference Books
S. Mahanty “Data Warehousing” Design,
Development and Best Practices (First Edition).
A. Abdullah, “Data Warehousing for beginners:
Concepts & Issues” (First Edition).
J. Mundy and W. Thornthwaite, “The Microsoft Data
Warehouse Toolkit”, (Second Edition).
Paulraj Ponniah, Data Warehousing
Fundamentals,
John Wiley & Sons Inc., NY.
9. DWH-Rizwana Irfan
9
Summary of course
Topics
1. Introduction & Background
2. De-normalization
3. On Line Analytical Processing (OLAP)
4. Dimensional modeling
5. Extract – Transform – Load (ETL)
6. Data Quality Management (DQM)
7. Need for speed (Parallelism, Join and Indexing techniques)
8. Data Mining
9. DWH Implementation steps
10. Complete implementation case study
11. Lab and tool usage
12. Others
10. 10
Summary of course
Topics
1. Introduction & Background
2. De-normalization
3. On Line Analytical Processing (OLAP)
4. Dimensional modeling
11. 11
Summary of course
Topics
5. Extract – Transform – Load (ETL)
6. Data Quality Management (DQM)
7. Need for speed (Parallelism, Join and
Indexing techniques)
8. Data Mining
9. DWH Implementation steps
10. Research paper ( Final Project)
12. 12
Develop an understanding of underlying RDBMS
concepts.
Apply these concepts to VLDB DSS environments
and understand where and why they break down?
Expose the differences between RDBMS and Data
Warehouse in the context of VLDB.
Provide the basics of DSS tools such as OLAP,
Data Mining and demonstrate their application.
Research Contribution.
Approach of the course
13. 13
The world is changing (actually changed),
either change or be left behind.
Missing the opportunities or going in the
wrong direction has prevented us from
growing.
What is the right direction?
Harnessing the data, in a knowledge driven
economy.
Why this course?
14. 14
The need
Knowledge is power, Intelligence
is absolute power!
“Drowning in data and starving
for information”
16. 16
Historical overview
1960
Master Files & Reports
1965
Lots of Master files!
1970
Direct Access Memory & DBMS
1975
Online high performance transaction processing
1980
PCs and 4GL Technology (MIS/DSS)
1985 & 1990
Extract programs, extract processing,
The legacy system’s web
17. 17
Why a Data Warehouse (DWH)?
Data recording and storage is growing.
History is excellent predictor of the future.
Gives total view of the organization.
Intelligent decision-support is required for
decision-making.
18. 18
Data Sets are growing.
How Much Data is that?
1 MB 220 or 106 bytes Small novel – 31/2 Disk
1 GB 230 or 109 bytes
Paper rims that could fill the back of a
pickup van
1 TB 240 or 1012 bytes
50,000 trees chopped and converted
into paper and printed
2 PB 1 PB = 250 or 1015 bytes
Academic research libraries across
the U.S.
5 EB 1 EB = 260 or 1018 bytes
All words ever spoken by human
beings
Reason-1: Why a Data Warehouse?
19. 19
Reason-1: Why a Data Warehouse?
Size of Data Sets are going up .
Cost of data storage is coming down .
The amount of data average business collects
and stores is doubling every year
Total hardware and software cost to store and
manage 1 Mbyte of data
1990: ~ $15
2002: ~ ¢15 (Down 100 times) 1¢=0.08 PKR
By 2007: < ¢1 (Down 150 times)
20. 20
Reason-1: Why a Data Warehouse?
A Few Examples
WalMart: 24 TB 2015 Now 40 PB 2020
France Telecom: ~ 500 TB
Cern(European Organization for Nuclear
Research): Up to 20 PB by 2006 200PB
Stanford Linear Accelerator Center (SLAC):
100EB
23. 23
Businesses demand Intelligence (BI).
Complex questions from integrated data.
“Intelligent Enterprise”
Reason-2: Why a Data Warehouse?
24. 24
Reason-2: Why a Data Warehouse?
List of all items that were sold last
month?
List of all items purchased by XYZ?
The total sales of the last month
grouped by branch?
How many sales transactions
occurred during the month of
January?
DBMS Approach
25. 25
Reason-2: Why a Data Warehouse?
Which items sell together? Which
items to stock?
Where and how to place the items?
What discounts to offer?
How best to target customers to
increase sales at a branch?
Which customers are most likely to
respond to my next promotional
campaign, and why?
Intelligent Enterprise
26. 26
Businesses want much more…
What happened?
Why it happened?
What will happen?
What is happening?
What do you want to happen?
Reason-3: Why a Data Warehouse?
Stages of
Data
Warehouse