SlideShare a Scribd company logo
1 of 29
-
1
Data Warehousing
 Data Warehousing, Mining and Web Tools
Ch Anwar ul Hassan (Lecturer)
Department of Computer Science and Software
Engineering
Capital University of Sciences & Technology, Islamabad
Pakistan
anwarchaudary@gmail.com
Slide 2
• So far we have concentrated on OLTP (online
transaction processing) systems
– range in size from megabytes to terabytes
– high transaction throughput
• Decision makers require access to all data
wherever it is located
– current data
– historical data
OLTP Systems
Slide 3
• Holds current data
• Stores detailed data
• Data is dynamic
• Repetitive processing
• High level of transaction throughput
• Predictable pattern of usage
• Transaction driven
• Application-oriented
• Supports day-to-day decisions
• Serves large number of clerical/operational users
OLTP Systems
Slide 4
• ‘A data warehouse is a
– subject-oriented,
– integrated,
– time-variant and
– non-volatile
• collection of data in support of management’s
decision-making process’(Inmon 1993)
Data Warehouse Definition
Slide 5
• Holds historical data
• Stores detailed, lightly and highly summarised
data
• Data is largely static
• Ad-hoc, unstructured and heuristic processing
• Medium/low level of transaction throughput
• Unpredictable pattern of usage
• Analysis driven
• Subject-oriented
• Supports strategic decisions
• Serves relatively low no. of managerial users
Data Warehousing Systems
Slide 6
• Potential high returns on investment
– 401% return of investment (over three years) for 90%
of companies in 1996
• Competitive advantage
– data can reveal previously unknown, unavailable and
untapped information
• Increased productivity of corporate decision-
makers
– integration allows more substantive, accurate and
consistent analysis
Benefits
Slide 7
Warehouse mgr
Load
mgr
Query
manager
Meta-data Highly
summarized
data
Lightly summarized
data
Detailed data DBMS
Warehouse mgr
Mainframe operational
n/w,h/w data
Departmental
RDBMS data
Private data
External data
Reporting, query,
application development,
EIS tools
OLAPtools
Data-mining tools
Archive/backup
Architecture
Slide 8
Warehouse Mgr
Load
mgr
Warehouse mgr
Query
manager
DBMS
Highly
summ.
data
Lightly
summ.
Detailed data
Operational data
source 1
Operational data
source n
Reporting query, app
development,EIS tools
OLAPtools
Data-mining tools
Archive/backup
Meta-flow
Meta-
data
Inflow
Downflow
Upflow
Outflow
Information Flows
Slide 9
• Five primary information flows
– Inflow - extraction, cleansing and loading of data from
source systems into warehouse
– Upflow - adding value to data in warehouse through
summarizing, packaging and distributing data
– Downflow - archiving and backing up data in
warehouse
– Outflow - making data available to end users
– Metaflow - managing the metadata
Information Flow Processes
Slide 10
• Data must be designed to allow ad-hoc queries to be
answered with acceptable performance constraints
• Queries usually require access to factual data
generated by business transactions
– e.g. find the average number of properties rented out with
a monthly rent greater than £700 at each branch office
over the last six months
• Uses Dimensionality Modelling
Data Warehouse Design
Slide 11
• Similar to E-R modelling but with constraints
– composed of one fact table with a composite primary
key
– dimension tables have a simple primary key which
corresponds exactly to one foreign key in the fact table
– uses surrogate keys based on integer values
– Can efficiently and easily support ad-hoc end-user
queries
Dimensionality Modelling
Slide 12
• The most common dimensional model
• A fact table surrounded by dimension tables
• Fact tables
– contains FK for each dimension table
– large relative to dimension tables
– read-only
• Dimension tables
– reference data
– query performance can be speeded up by denormalising
into a single dimension table
Star Schemas
Slide 13
E-R Model Example
Slide 14
Star Schema Example
Slide 15
• ‘The process of extracting valid, previously
unknown, comprehensible and actionable
information from large databases and using it to
make crucial business decisions’
– focus is to reveal information which is hidden or
unexpected
– patterns and relationships are identified by examining
the underlying rules and features of the data
– work from data up
– require large volumes of data
Data Mining
Slide 16
• Retail/Marketing
– Identifying buying patterns of customers
– Finding associations among customer
demographic characteristics
– Predicting response to mailing campaigns
– Market basket analysis
Example Data Mining Applications
Slide 17
• Banking
– Detecting patterns of fraudulent credit card use
– Identifying loyal customers
– Predicting customers likely to change their credit card
affiliation
– Determining credit card spending by customer groups
Example Data Mining Applications
Slide 18
• Predictive Modelling
– using observations to form a model of the important
characteristics of some phenomenon
• Techniques:
– Classification
– Value Prediction
Data Mining Techniques
Slide 19
Customer renting property
> 2 years
Rent property
Rent property Buy property
Customer age
> 25 years?
No Yes
No Yes
Classification Example: Tree
Induction
Slide 20
• Database Segmentation:
– to partition a database into an unknown number of
segments (or clusters) of records which share a number
of properties
• Techniques:
– Demographic clustering
– Neural clustering
Data Mining Techniques
Slide 21
Database Segmentation: Scatterplot Example
Slide 22
• Link Analysis
– establish associations between individual records (or
sets of records) in a database
• e.g. ‘when a customer rents property for more than two years
and is more than 25 year olds, then in 40% of cases, the
customer will buy the property’
– Techniques
– Association discovery
– Sequential pattern discovery
– Similar time sequence discovery
Data Mining Techniques
Slide 23
• Deviation Detection
– identify ‘outliers’, something which deviates from
some known expectation or norm
– Statistics
– Visualisation
Data Mining Techniques
Slide 24
Deviation Detection: Visualization Example
Slide 25
• Data mining needs single, separate, clean,
integrated, self-consistent data source
• Data warehouse well equipped:
– populated with clean, consistent data
– contains multiple sources
– utilizes query capabilities
– capability to go back to data source
Mining and Warehousing
Slide 26
•
Web Warehouses
 Web-based systems are making possible
the access of data across an enterprise
and among an enterprise's business
partners. Data warehousing technology is
taking advantage of the Web's access
capabilities
relational
object-oriented
semi-structured
unstructured ...
It is impossible to store all this data in a warehouse
imagine the storage required!
See Internet Joke – http://www.w3schools.com
 So need an intermediary
Slide 27
• The ultimate data warehouse is the Internet
– contains data in numerous formats
Web Warehouses
COMM1E Lecture Eleven
Slide 28
• A meta-language that enables designers to create
their own customised tags to provide functionality
not available within HTML
• e.g.
<STAFF>
<NAME>
<FNAME>John</FNAME><LNAME>White</LNAME>
</NAME>
<SEX gender=‘M’/>
</STAFF>
XML
Slide 29
• Can define stylesheets to display XML database in
web pages
• Can write queries:
WHERE <STAFF>
<GENDER>$$</GENDER>
<NAME><FNAME>$F</FNAME><LNAME>$L</LNAME></NAME>
$$ = ‘M’
CONSTRUCT <LNAME>$L</LNAME>
• To build a warehouse can develop a representation
of data models in XML
• Good as a common format for EDI
XML Tools

More Related Content

What's hot

Datawarehouse & bi introduction
Datawarehouse & bi introductionDatawarehouse & bi introduction
Datawarehouse & bi introductionguest7b34c2
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouseKrish_ver2
 
Data warehousing
Data warehousingData warehousing
Data warehousingsuZZal123
 
Data warehouse system and its concepts
Data warehouse system and its conceptsData warehouse system and its concepts
Data warehouse system and its conceptsGaurav Garg
 
What is bi analytics and big data
What is bi analytics and big dataWhat is bi analytics and big data
What is bi analytics and big datagaliasisense
 
Data mining 2 - Data warehouse (cheat sheet - printable)
Data mining 2 - Data warehouse (cheat sheet - printable)Data mining 2 - Data warehouse (cheat sheet - printable)
Data mining 2 - Data warehouse (cheat sheet - printable)yesheeka
 
Benefits of a data warehouse presentation by Being topper
Benefits of a data warehouse presentation by Being topperBenefits of a data warehouse presentation by Being topper
Benefits of a data warehouse presentation by Being topperBeing Topper
 
Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)yesheeka
 
Data warehousing
Data warehousingData warehousing
Data warehousingVarun Jain
 
intro_to_business_analytics_and_data_science_ver 1.0
intro_to_business_analytics_and_data_science_ver 1.0intro_to_business_analytics_and_data_science_ver 1.0
intro_to_business_analytics_and_data_science_ver 1.0Anthony Paulus
 
Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)Muhammad Fahad
 
introduction to data warehousing and mining
 introduction to data warehousing and mining introduction to data warehousing and mining
introduction to data warehousing and miningRajesh Chandra
 
Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable)
Data mining  3 - Data Models and Data Warehouse Design (cheat sheet - printable)Data mining  3 - Data Models and Data Warehouse Design (cheat sheet - printable)
Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable)yesheeka
 
Introduction to BIG DATA
Introduction to BIG DATA Introduction to BIG DATA
Introduction to BIG DATA Zeeshan Khan
 

What's hot (20)

Datawarehouse & bi introduction
Datawarehouse & bi introductionDatawarehouse & bi introduction
Datawarehouse & bi introduction
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouse
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Big Data - Module 1
Big Data - Module 1Big Data - Module 1
Big Data - Module 1
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehouse system and its concepts
Data warehouse system and its conceptsData warehouse system and its concepts
Data warehouse system and its concepts
 
What is bi analytics and big data
What is bi analytics and big dataWhat is bi analytics and big data
What is bi analytics and big data
 
Data mining 2 - Data warehouse (cheat sheet - printable)
Data mining 2 - Data warehouse (cheat sheet - printable)Data mining 2 - Data warehouse (cheat sheet - printable)
Data mining 2 - Data warehouse (cheat sheet - printable)
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
 
Benefits of a data warehouse presentation by Being topper
Benefits of a data warehouse presentation by Being topperBenefits of a data warehouse presentation by Being topper
Benefits of a data warehouse presentation by Being topper
 
Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
intro_to_business_analytics_and_data_science_ver 1.0
intro_to_business_analytics_and_data_science_ver 1.0intro_to_business_analytics_and_data_science_ver 1.0
intro_to_business_analytics_and_data_science_ver 1.0
 
Data warehouse proposal
Data warehouse proposalData warehouse proposal
Data warehouse proposal
 
BI Introduction
BI IntroductionBI Introduction
BI Introduction
 
Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)
 
introduction to data warehousing and mining
 introduction to data warehousing and mining introduction to data warehousing and mining
introduction to data warehousing and mining
 
Data mining in e commerce
Data mining in e commerceData mining in e commerce
Data mining in e commerce
 
Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable)
Data mining  3 - Data Models and Data Warehouse Design (cheat sheet - printable)Data mining  3 - Data Models and Data Warehouse Design (cheat sheet - printable)
Data mining 3 - Data Models and Data Warehouse Design (cheat sheet - printable)
 
Introduction to BIG DATA
Introduction to BIG DATA Introduction to BIG DATA
Introduction to BIG DATA
 

Similar to Intro to Data warehousing lecture 16

Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data miningmaxonlinetr
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptxhqlm1
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesDATAVERSITY
 
The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3Terry Bunio
 
Application Middleware Overview
Application Middleware OverviewApplication Middleware Overview
Application Middleware OverviewChristalin Nelson
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
chap1.ppt
chap1.pptchap1.ppt
chap1.pptImXaib
 
presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxvipush1
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousingEr. Nawaraj Bhandari
 
Data Warehouse And Data Mining
Data Warehouse And Data MiningData Warehouse And Data Mining
Data Warehouse And Data MiningDebarpanChowdhury
 
Data Ware Housing Knowledge Data Discover
Data Ware Housing Knowledge Data DiscoverData Ware Housing Knowledge Data Discover
Data Ware Housing Knowledge Data Discovergeorgejusjer
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1malathieswaran29
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousingShahed Khalili
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationDATAVERSITY
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence ArchitecturePhilippe Julio
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptxinfinix8
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data WarehousingAAKANKSHA JAIN
 

Similar to Intro to Data warehousing lecture 16 (20)

Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data mining
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptx
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use Cases
 
The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3
 
Application Middleware Overview
Application Middleware OverviewApplication Middleware Overview
Application Middleware Overview
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 
presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptx
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
Data Warehouse And Data Mining
Data Warehouse And Data MiningData Warehouse And Data Mining
Data Warehouse And Data Mining
 
Data Ware Housing Knowledge Data Discover
Data Ware Housing Knowledge Data DiscoverData Ware Housing Knowledge Data Discover
Data Ware Housing Knowledge Data Discover
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousing
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence Architecture
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data Warehousing
 

More from AnwarrChaudary

Intro to Data warehousing lecture 20
Intro to Data warehousing   lecture 20Intro to Data warehousing   lecture 20
Intro to Data warehousing lecture 20AnwarrChaudary
 
Intro to Data warehousing lecture 19
Intro to Data warehousing   lecture 19Intro to Data warehousing   lecture 19
Intro to Data warehousing lecture 19AnwarrChaudary
 
Intro to Data warehousing lecture 18
Intro to Data warehousing   lecture 18Intro to Data warehousing   lecture 18
Intro to Data warehousing lecture 18AnwarrChaudary
 
Intro to Data warehousing lecture 17
Intro to Data warehousing   lecture 17Intro to Data warehousing   lecture 17
Intro to Data warehousing lecture 17AnwarrChaudary
 
Intro to Data warehousing lecture 15
Intro to Data warehousing   lecture 15Intro to Data warehousing   lecture 15
Intro to Data warehousing lecture 15AnwarrChaudary
 
Intro to Data warehousing lecture 14
Intro to Data warehousing   lecture 14Intro to Data warehousing   lecture 14
Intro to Data warehousing lecture 14AnwarrChaudary
 
Intro to Data warehousing lecture 13
Intro to Data warehousing   lecture 13Intro to Data warehousing   lecture 13
Intro to Data warehousing lecture 13AnwarrChaudary
 
Intro to Data warehousing lecture 12
Intro to Data warehousing   lecture 12Intro to Data warehousing   lecture 12
Intro to Data warehousing lecture 12AnwarrChaudary
 
Intro to Data warehousing lecture 11
Intro to Data warehousing   lecture 11Intro to Data warehousing   lecture 11
Intro to Data warehousing lecture 11AnwarrChaudary
 
Intro to Data warehousing lecture 10
Intro to Data warehousing   lecture 10Intro to Data warehousing   lecture 10
Intro to Data warehousing lecture 10AnwarrChaudary
 
Intro to Data warehousing lecture 09
Intro to Data warehousing   lecture 09Intro to Data warehousing   lecture 09
Intro to Data warehousing lecture 09AnwarrChaudary
 
Intro to Data warehousing lecture 08
Intro to Data warehousing   lecture 08Intro to Data warehousing   lecture 08
Intro to Data warehousing lecture 08AnwarrChaudary
 
Intro to Data warehousing lecture 07
Intro to Data warehousing   lecture 07Intro to Data warehousing   lecture 07
Intro to Data warehousing lecture 07AnwarrChaudary
 
Intro to Data warehousing Lecture 06
Intro to Data warehousing   Lecture 06Intro to Data warehousing   Lecture 06
Intro to Data warehousing Lecture 06AnwarrChaudary
 
Intro to Data warehousing lecture 05
Intro to Data warehousing   lecture 05Intro to Data warehousing   lecture 05
Intro to Data warehousing lecture 05AnwarrChaudary
 
Intro to Data warehousing Lecture 04
Intro to Data warehousing   Lecture 04Intro to Data warehousing   Lecture 04
Intro to Data warehousing Lecture 04AnwarrChaudary
 
Intro to Data warehousing lecture 03
Intro to Data warehousing   lecture 03Intro to Data warehousing   lecture 03
Intro to Data warehousing lecture 03AnwarrChaudary
 
Intro to Data warehousing lecture 02
Intro to Data warehousing   lecture 02Intro to Data warehousing   lecture 02
Intro to Data warehousing lecture 02AnwarrChaudary
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseAnwarrChaudary
 
Introduction to Software Engineering
Introduction to Software EngineeringIntroduction to Software Engineering
Introduction to Software EngineeringAnwarrChaudary
 

More from AnwarrChaudary (20)

Intro to Data warehousing lecture 20
Intro to Data warehousing   lecture 20Intro to Data warehousing   lecture 20
Intro to Data warehousing lecture 20
 
Intro to Data warehousing lecture 19
Intro to Data warehousing   lecture 19Intro to Data warehousing   lecture 19
Intro to Data warehousing lecture 19
 
Intro to Data warehousing lecture 18
Intro to Data warehousing   lecture 18Intro to Data warehousing   lecture 18
Intro to Data warehousing lecture 18
 
Intro to Data warehousing lecture 17
Intro to Data warehousing   lecture 17Intro to Data warehousing   lecture 17
Intro to Data warehousing lecture 17
 
Intro to Data warehousing lecture 15
Intro to Data warehousing   lecture 15Intro to Data warehousing   lecture 15
Intro to Data warehousing lecture 15
 
Intro to Data warehousing lecture 14
Intro to Data warehousing   lecture 14Intro to Data warehousing   lecture 14
Intro to Data warehousing lecture 14
 
Intro to Data warehousing lecture 13
Intro to Data warehousing   lecture 13Intro to Data warehousing   lecture 13
Intro to Data warehousing lecture 13
 
Intro to Data warehousing lecture 12
Intro to Data warehousing   lecture 12Intro to Data warehousing   lecture 12
Intro to Data warehousing lecture 12
 
Intro to Data warehousing lecture 11
Intro to Data warehousing   lecture 11Intro to Data warehousing   lecture 11
Intro to Data warehousing lecture 11
 
Intro to Data warehousing lecture 10
Intro to Data warehousing   lecture 10Intro to Data warehousing   lecture 10
Intro to Data warehousing lecture 10
 
Intro to Data warehousing lecture 09
Intro to Data warehousing   lecture 09Intro to Data warehousing   lecture 09
Intro to Data warehousing lecture 09
 
Intro to Data warehousing lecture 08
Intro to Data warehousing   lecture 08Intro to Data warehousing   lecture 08
Intro to Data warehousing lecture 08
 
Intro to Data warehousing lecture 07
Intro to Data warehousing   lecture 07Intro to Data warehousing   lecture 07
Intro to Data warehousing lecture 07
 
Intro to Data warehousing Lecture 06
Intro to Data warehousing   Lecture 06Intro to Data warehousing   Lecture 06
Intro to Data warehousing Lecture 06
 
Intro to Data warehousing lecture 05
Intro to Data warehousing   lecture 05Intro to Data warehousing   lecture 05
Intro to Data warehousing lecture 05
 
Intro to Data warehousing Lecture 04
Intro to Data warehousing   Lecture 04Intro to Data warehousing   Lecture 04
Intro to Data warehousing Lecture 04
 
Intro to Data warehousing lecture 03
Intro to Data warehousing   lecture 03Intro to Data warehousing   lecture 03
Intro to Data warehousing lecture 03
 
Intro to Data warehousing lecture 02
Intro to Data warehousing   lecture 02Intro to Data warehousing   lecture 02
Intro to Data warehousing lecture 02
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Introduction to Software Engineering
Introduction to Software EngineeringIntroduction to Software Engineering
Introduction to Software Engineering
 

Recently uploaded

Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........LeaCamillePacle
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxLigayaBacuel1
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationAadityaSharma884161
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxsqpmdrvczh
 

Recently uploaded (20)

Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint Presentation
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptx
 

Intro to Data warehousing lecture 16

  • 1. - 1 Data Warehousing  Data Warehousing, Mining and Web Tools Ch Anwar ul Hassan (Lecturer) Department of Computer Science and Software Engineering Capital University of Sciences & Technology, Islamabad Pakistan anwarchaudary@gmail.com
  • 2. Slide 2 • So far we have concentrated on OLTP (online transaction processing) systems – range in size from megabytes to terabytes – high transaction throughput • Decision makers require access to all data wherever it is located – current data – historical data OLTP Systems
  • 3. Slide 3 • Holds current data • Stores detailed data • Data is dynamic • Repetitive processing • High level of transaction throughput • Predictable pattern of usage • Transaction driven • Application-oriented • Supports day-to-day decisions • Serves large number of clerical/operational users OLTP Systems
  • 4. Slide 4 • ‘A data warehouse is a – subject-oriented, – integrated, – time-variant and – non-volatile • collection of data in support of management’s decision-making process’(Inmon 1993) Data Warehouse Definition
  • 5. Slide 5 • Holds historical data • Stores detailed, lightly and highly summarised data • Data is largely static • Ad-hoc, unstructured and heuristic processing • Medium/low level of transaction throughput • Unpredictable pattern of usage • Analysis driven • Subject-oriented • Supports strategic decisions • Serves relatively low no. of managerial users Data Warehousing Systems
  • 6. Slide 6 • Potential high returns on investment – 401% return of investment (over three years) for 90% of companies in 1996 • Competitive advantage – data can reveal previously unknown, unavailable and untapped information • Increased productivity of corporate decision- makers – integration allows more substantive, accurate and consistent analysis Benefits
  • 7. Slide 7 Warehouse mgr Load mgr Query manager Meta-data Highly summarized data Lightly summarized data Detailed data DBMS Warehouse mgr Mainframe operational n/w,h/w data Departmental RDBMS data Private data External data Reporting, query, application development, EIS tools OLAPtools Data-mining tools Archive/backup Architecture
  • 8. Slide 8 Warehouse Mgr Load mgr Warehouse mgr Query manager DBMS Highly summ. data Lightly summ. Detailed data Operational data source 1 Operational data source n Reporting query, app development,EIS tools OLAPtools Data-mining tools Archive/backup Meta-flow Meta- data Inflow Downflow Upflow Outflow Information Flows
  • 9. Slide 9 • Five primary information flows – Inflow - extraction, cleansing and loading of data from source systems into warehouse – Upflow - adding value to data in warehouse through summarizing, packaging and distributing data – Downflow - archiving and backing up data in warehouse – Outflow - making data available to end users – Metaflow - managing the metadata Information Flow Processes
  • 10. Slide 10 • Data must be designed to allow ad-hoc queries to be answered with acceptable performance constraints • Queries usually require access to factual data generated by business transactions – e.g. find the average number of properties rented out with a monthly rent greater than £700 at each branch office over the last six months • Uses Dimensionality Modelling Data Warehouse Design
  • 11. Slide 11 • Similar to E-R modelling but with constraints – composed of one fact table with a composite primary key – dimension tables have a simple primary key which corresponds exactly to one foreign key in the fact table – uses surrogate keys based on integer values – Can efficiently and easily support ad-hoc end-user queries Dimensionality Modelling
  • 12. Slide 12 • The most common dimensional model • A fact table surrounded by dimension tables • Fact tables – contains FK for each dimension table – large relative to dimension tables – read-only • Dimension tables – reference data – query performance can be speeded up by denormalising into a single dimension table Star Schemas
  • 15. Slide 15 • ‘The process of extracting valid, previously unknown, comprehensible and actionable information from large databases and using it to make crucial business decisions’ – focus is to reveal information which is hidden or unexpected – patterns and relationships are identified by examining the underlying rules and features of the data – work from data up – require large volumes of data Data Mining
  • 16. Slide 16 • Retail/Marketing – Identifying buying patterns of customers – Finding associations among customer demographic characteristics – Predicting response to mailing campaigns – Market basket analysis Example Data Mining Applications
  • 17. Slide 17 • Banking – Detecting patterns of fraudulent credit card use – Identifying loyal customers – Predicting customers likely to change their credit card affiliation – Determining credit card spending by customer groups Example Data Mining Applications
  • 18. Slide 18 • Predictive Modelling – using observations to form a model of the important characteristics of some phenomenon • Techniques: – Classification – Value Prediction Data Mining Techniques
  • 19. Slide 19 Customer renting property > 2 years Rent property Rent property Buy property Customer age > 25 years? No Yes No Yes Classification Example: Tree Induction
  • 20. Slide 20 • Database Segmentation: – to partition a database into an unknown number of segments (or clusters) of records which share a number of properties • Techniques: – Demographic clustering – Neural clustering Data Mining Techniques
  • 21. Slide 21 Database Segmentation: Scatterplot Example
  • 22. Slide 22 • Link Analysis – establish associations between individual records (or sets of records) in a database • e.g. ‘when a customer rents property for more than two years and is more than 25 year olds, then in 40% of cases, the customer will buy the property’ – Techniques – Association discovery – Sequential pattern discovery – Similar time sequence discovery Data Mining Techniques
  • 23. Slide 23 • Deviation Detection – identify ‘outliers’, something which deviates from some known expectation or norm – Statistics – Visualisation Data Mining Techniques
  • 24. Slide 24 Deviation Detection: Visualization Example
  • 25. Slide 25 • Data mining needs single, separate, clean, integrated, self-consistent data source • Data warehouse well equipped: – populated with clean, consistent data – contains multiple sources – utilizes query capabilities – capability to go back to data source Mining and Warehousing
  • 26. Slide 26 • Web Warehouses  Web-based systems are making possible the access of data across an enterprise and among an enterprise's business partners. Data warehousing technology is taking advantage of the Web's access capabilities
  • 27. relational object-oriented semi-structured unstructured ... It is impossible to store all this data in a warehouse imagine the storage required! See Internet Joke – http://www.w3schools.com  So need an intermediary Slide 27 • The ultimate data warehouse is the Internet – contains data in numerous formats Web Warehouses
  • 28. COMM1E Lecture Eleven Slide 28 • A meta-language that enables designers to create their own customised tags to provide functionality not available within HTML • e.g. <STAFF> <NAME> <FNAME>John</FNAME><LNAME>White</LNAME> </NAME> <SEX gender=‘M’/> </STAFF> XML
  • 29. Slide 29 • Can define stylesheets to display XML database in web pages • Can write queries: WHERE <STAFF> <GENDER>$$</GENDER> <NAME><FNAME>$F</FNAME><LNAME>$L</LNAME></NAME> $$ = ‘M’ CONSTRUCT <LNAME>$L</LNAME> • To build a warehouse can develop a representation of data models in XML • Good as a common format for EDI XML Tools