Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Â
Dwh lecture slides-week3&4
1. Dr. Abdul Basit Siddiqui
Assistant Professor
FURC
(Lecture Slides Week # 2)
2. Why a Data Warehouse (DWH)?
īData recording and storage is growing:
īAlmost every industry has huge amount of operational data.
īCareful use/analysis of historic information may result in
excellent prediction for the future:
īKnowledge worker wants to turn available data into useful
information.
īThis information is used by them to support strategic decision
making.
īGives total view of the organization:
īIt is a platform for consolidated historical data for analysis.
īIt stores data of good quality so that knowledge worker can make
correct decisions.
īIntelligent decision-support is required for decision-
making.
Data Warehouse & Mining- Spring 201404/19/15 2
3. Why a Data Warehouse? (Contd.)
īFrom business perspective:
īIt is latest marketing weapon.
īHelps to keep customers by learning more about
their needs.
īValuable tool in todayâs competitive fast evolving
world.
Data Warehouse & Mining- Spring 201404/19/15 3
4. Reason-I: Why a Data Warehouse (DWH)?
īData sets are growing:
How Much Data is that?
1 MB 220
or 106
bytes Small novel 3ÂŊ Disk.
1 GB 230
or 109
bytes
Paper reams that could fill the back of a
pickup van.
1 TB 240
or 1012
bytes
50,000 trees chopped and converted into
paper and printed.
2 PB 1 PB = 250
or 1015
bytes Academic research libraries across USA.
5 EB 1 EB = 260
or 1018
bytes
All words ever spoken by the Human
Beings.
Data Warehouse & Mining- Spring 201404/19/15 4
5. Reason-I: Why a Data Warehouse (DWH)?
īSize of Data Sets are going up.
īCost of Data Storage is coming down.
īThe amount of data average business collects and stores is
doubling every year.
īTotal hardware and software cost to store and manage 1 MB of
data:
ī 1990: $ 15
ī 2002: Âĸ 15 (down 100 times)
ī 2010: < Âĸ 1 (down 150 times)
īA few examples:
ī Wall Mart: 24+ TB
ī Finance Telecom: 100+ TB
ī CERN: Upto 20 PB by 2006
ī Stanford Linear Accelerator Center (SLAC): 500 TB
ī Telenor, Ufone, Mobilink, Warid, Zong ???
Data Warehouse & Mining- Spring 201404/19/15 5
6. Caution!
A Warehouse of Data
is NOT a
Data Warehouse.
Data Warehouse & Mining- Spring 201404/19/15 6
8. Reason-2: Why a Data Warehouse (DWH)?
DBMS Approach
ī List of all items that were sold last
month?
ī List of all makeup items
purchased by Sassi?
ī The total sales of the last month
grouped by branch?
ī How many sales transactions
occurred during the month of
January?
Intelligent Enterprise
ī Which items sell together? Which
items to stock?
ī Where and how to place the
items? What discounts to offer?
ī How best to target customers to
increase sales at a branch?
ī Which customers are most likely
to respond to my next
promotional campaign, and why?
Data Warehouse & Mining- Spring 2014
īŦ Businesses demand Intelligence (BI).
īŦ Complex questions from integrated data.
īŦ âIntelligent Enterpriseâ
04/19/15 8
9. Reason-3: Why a Data Warehouse (DWH)?
īBusinesses want much more âĻ
īWhat happened?
īWhy it happened?
īWhat will happen?
īWhat is happening?
īWhat do you want to happen?
Data Warehouse & Mining- Spring 201404/19/15 9
10. What is a Data Warehouse?
A complete repository of historical
corporate data extracted from
transaction systems that is
available for ad-hoc access by
knowledge workers.
Data Warehouse & Mining- Spring 201404/19/15 10
11. What is a Data Warehouse?
īTransaction System:
īManagement Information System (MIS)
īCould be typed sheets (NOT transaction system)
īAd-Hoc Access:
īDoes not have a certain access pattern
īQueries not known in advance
īDifficult to write SQL in advance
īKnowledge Workers:
īTypically NOT IT literate (Executives, Analysts, Managers)
īNOT clerical workers
īDecision makers
Data Warehouse & Mining- Spring 201404/19/15 11
12. What is a Data Warehouse?
īInmonsâs Definition:
īA Data Warehouse is:
ī Subject-oriented
ī Integrated
ī Time-variant
ī Nonvolatile
īCollection of data in support of managementâs
decision making process.
Data Warehouse & Mining- Spring 201404/19/15 12
13. Another View of a DWH
Data Warehouse & Mining- Spring 2014
Subject
Oriented
Integrated
Time Variant
Non Volatile
04/19/15 13
14. Subject-oriented
īData Warehouse is organized around subjects such as sales,
product, customer.
īIt focuses on modeling and analysis of data for decision makers.
īExcludes data not useful in decision support process.
Data Warehouse & Mining- Spring 201404/19/15 14
15. Integration
īData Warehouse is constructed by integrating multiple
heterogeneous sources.
īData Preprocessing are applied to ensure consistency.
Data Warehouse & Mining- Spring 2014
RDBMS
Legacy
System
Data
Warehouse
Flat File Data Processing
Data Transformation
04/19/15 15
16. Time-variant
īProvides information from historical perspective e.g.
past 5-10 years.
īEvery key structure contains either implicitly or
explicitly an element of time.
Data Warehouse & Mining- Spring 201404/19/15 16
17. Nonvolatile
īData once recorded cannot be updated.
īData Warehouse requires two operations in data
accessing
īInitial loading of data
īAccess of data
Data Warehouse & Mining- Spring 2014
load
access
04/19/15 17
18. Summary: What is a Data Warehouse?
īIt is a blend of many technologies, the basic
concept being:
īTake all data from different operational systems
īIf necessary, add relevant data from industry
īTransform all data and bring into a uniform format
īIntegrate all data as a single entity
īStore data in a format supporting easy access for
decision support
īCreate performance enhancing indices
īImplement performance enhancement joins
īRun ad-hoc queries with slow selectivity
Data Warehouse & Mining- Spring 201404/19/15 18
19. Benefits of Data Warehouse
īHigh returns on investment.
īSubstantial competitive advantage.
īIncreased productivity of corporate decision-makers.
īFast reporting for decision making process.
īReduced reporting load on transactional systems.
īMaking institutional data more user-friendly and
accessible for knowledge workers.
īIntegrated data from different source systems.
īEnabled âpoint-in-timeâ analysis and trending over time.
īHelps in identifying and resolving data integrity issues,
either in the warehouse itself or in the source systems
that collect the data.
Data Warehouse & Mining- Spring 201404/19/15 19
20. Data Warehouse: How is it Different?
1. Decision making is Ad-Hoc
Data Warehouse & Mining- Spring 201404/19/15 20
21. Data Warehouse: How is it Different?
2. Different patterns of hardware utilization
Data Warehouse & Mining- Spring 2014
Bus Service vs. Train
04/19/15 21
22. Data Warehouse: How is it Different?
3. Combines operational and historic data
ī Donât do data entry into a DWH. OLTP or ERP are the
source systems.
ī OLTP systems donât keep history, cannot get balance
statement more than a year old.
ī DWH keep historical data, even of bygone customers.
Why?
ī In the context of bank, want to know why the customer
left?
ī What are the events that led to his/her leaving? Why?
ī Customer retention
Data Warehouse & Mining- Spring 201404/19/15 22
23. Data Warehouse: How is it Different?
How much history?
ī Depends on:
ī Industry
ī Cost of storing historical data
ī Economic value of historical data
ī Industry and history
ī Telecom calls are much much more as compared to bank
transactions
ī 18 months
ī Retailers interested in analyzing yearly seasonal patterns
ī 65 weeks, why?
ī Insurance companies want to do actuary analysis, use the
historical data in order to predict risk
ī 7 years
Hence NOT a complete repository of data.
Data Warehouse & Mining- Spring 201404/19/15 23
24. Data Warehouse: How is it Different?
How much history?
Economic value of data vs. storage cost
Data Warehouse a complete repository of data?
Data Warehouse & Mining- Spring 201404/19/15 24