Data warehousing and Data mining

Data Mining and
Data Warehousing Techniques
Presented to : Muhammad Faisal
Presented by:
Faizan Saleem
Pireh Pirzada
Ahmed Hassan
Muhammad Usman
BSE-4 | DATABASE MANAGEMENT SYSTEM

Topics
 Why we need Data warehouses and
Data mining?
 What Data warehouses and Data
mining?
 History of Data warehouses and Data
mining?
 Techniques of Data warehouses and
Data mining

Why we need Data Mining and
Ware-housing
Problem Scenario
Solution
Needs of Data warehouses and Data Mining

Why Data Warehouse?
Necessity is the mother of invention

Problem Scenario 1
ABC Pvt Ltd is a company with
branches at Karachi, Lahore,
Peshawar and Islamabad.
The Sales Manager wants quarterly
sales report.
Each branch has a separate
operational system.

ABC Pvt Ltd.
Karachi
Lahore
Peshawar
Islamabad
Sales
Manager
Sales per item type per branch
for first quarter.

Solution for ABC Pvt Ltd.
 Extract sales information
from each database and
Store the information in a
common repository at a
single site.

Solution ABC Pvt Ltd.
Karachi
Lahore
Peshawar
Islamabad
Data
Warehouse
Sales
Manager
Query &
Analysis tools
Reports

Problem Scenario 2
A Shopping Super Market has huge
operational database. Whenever
Executives wants some report the OLTP
system becomes slow and data entry
operators have to wait for some time.

Problem
Operational
Database
Data Entry Operator
Data Entry Operator
ManagementWait
Report

Solutions for Shopping Mart
 Extract data needed for analysis from
operational database and Store it in warehouse.
 Refresh warehouse at regular interval so that it
contains up to date information for analysis.
 Warehouse will contain data with historical
perspective.

Solution
Operational
database
Data
Warehouse
Extract
data
Data Entry
Operator
Data Entry
Operator
Manager
Report
Transaction

Need for Data Warehousing
 Industry has huge amount of operational data
 Knowledge worker wants to turn this data into
useful information.
 This information is used by them to support
strategic decision making .

 It is a platform for consolidated historical data
for analysis.
 It stores data of good quality so that knowledge
worker can make correct decisions.

 From business perspective
It is latest marketing weapon
Helps to keep customers by learning more
about their needs .
Valuable tool in today’s competitive fast
evolving world.

Why Mine Data? Commercial Viewpoint
 Lots of data is being collected and warehoused
 Web data, e-commerce
 Purchases at department/ grocery stores
 Bank/Credit Card
transactions
 Computers have become cheaper and more powerful
 Competitive Pressure is Strong
 Provide better, customized services for an edge (e.g.
in Customer Relationship Management)

Why Mine Data in Scientific Viewpoint
 Data collected and stored at enormous speeds
(GB/hour)
 Remote sensors on a satellite
 telescopes scanning the skies
 Microarrays generating gene expression data
 Scientific simulations generating terabytes of
data

What is Data Mining and Ware-
housing?
Definition Data Warehouse
Data Ware houses Uses
Definition Data Warehouse
Data Mining Uses
Data Ware Housing Verses Data Mining
Examples

What is Data Ware-Housing?
20
Data warehousing can be
said to be the process of
centralizing or
aggregating data from
multiple sources into one
common repository.
A process of transforming data
into information and making it
available to users in a timely
enough manner to make a
difference.
Data Information

Data Ware-Housing Uses
 Reporting and Data Analysis.
 Data warehouses store current as well as historical
data and are used for creating trending reports for
senior management reporting such as annual and
quarterly comparisons.

What is Data Mining?
23
Data mining is the process
of mining and discovering
of new information in
terms of patterns or rules
from vast amounts of data
involving methods at the
intersection of artificial
intelligence, machine
learning, statistics, and
database systems.

What is Data Mining?
 Extract information and transform it into an
understandable structure.
 Uses past data to analyze the outcome of a particular
problem or situation.

Data Mining Uses
 To decide upon marketing strategies for their product.
 They can use data to compare and contrast among
competitors.
 Data mining interprets its data into real time analysis
that can be used to:
 increase sales,
 promote new product,
 or delete product that is not value-added to the company.

Data Mining works with Warehouse
Data
26
 Data Warehousing provides
the Enterprise with a memory
Data Mining provides
the Enterprise with
intelligence

Data ware-housing VS data
mining
Data Ware Housing
 Occurs before any Data
mining process.
 data warehousing is the
process of compiling and
organizing data into one
common database
Data Mining
 Relies on data
warehousing data to
detect meaningful
patterns.
 data mining is the
process of extracting
meaningful data from
that database.

Example of data mining
 Credit Card Fraud.
 Data it collection on shoppers to find patterns
in their shopping habits.
 A great example of data warehousing that
everyone can relate to is what Facebook does.

History of Data Mining and
Ware-housing?
Data Warehouse History
Data Mining History

History of Data warehouse
 1960s — General Mills and Dartmouth College, in a joint
research project, develop the
terms dimensions and facts.
 1970s — ACNielsen and IRI provide dimensional data
marts for retail sales.
 1970s — Bill Inmon begins to define and discuss the
term: Data Warehouse

 1975 — Sperry Univac Introduce MAPPER (MAintain,
Prepare, and Produce Executive Reports) is a database
management and reporting system that includes the
world's first 4GL.

 1983 — Tera data introduces a database management
system specifically designed for decision support.
 1983 — Sperry Corporation Martyn Richard Jones defines
the Sperry Information Center approach, which while
not being a true DW in the Inmon sense, did contain
many of the characteristics of DW structures.

 1984 — Metaphor Computer Systems releases Data
Interpretation System (DIS). DIS was a
hardware/software package and GUI for business users
to create a database management and analytic system.

 1988 — Barry Devlin and Paul Murphy publish the article
in IBM Systems Journal where they introduce the term
"business data warehouse".
 1990 — Red Brick Systems, founded by Ralph Kimball,
introduces Red Brick Warehouse, a database
management system specifically for data warehousing.
 1991 — Prism Solutions, founded by Bill Inmon,
introduces Prism Warehouse Manager, software for
developing a data warehouse.

 1992 — Bill Inmon publishes the book Building the Data
Warehouse.
 1995 — The Data Warehousing Institute, a for-profit
organization that promotes data warehousing, is
founded.

 1996 — Ralph Kimball publishes the book The Data
Warehouse Toolkit.
 2000 — Daniel Linstedt releases the Data Vault, enabling
real time auditable Data Warehouses warehouse.

Brief History Of Data Mining
 The term "Data mining" was introduced in the 1990s.
 Data mining can be tracked through classical statistics,
artificial intelligence, and machine learning.
 Statistics are the foundation of most technologies on
which data mining is built. All of these are used to study
data and data relationships.

 Artificial intelligence, or AI, which is built upon
heuristics as opposed to statistics, attempts to
apply human-thought-like processing to statistical
problems. AI concepts were adopted for RDBMS ‘s
Query processor.

 Machine learning is the union of statistics
and AI. It could be considered an
evolution of AI, because it blends AI
heuristics with advanced statistical
analysis.

Data Mining Techniques
Task of data mining
Applications of data mining

Processes Used in Data Mining
It is done by two Methods:
• Prediction Methods
• Description Methods

How it works
 Data mining involves six common tasks
o Classification [Predictive]
o Clustering [Descriptive]
o Association Rule Discovery [Descriptive]
o Sequential Pattern Discovery [Descriptive]
o Regression [Predictive]
o Deviation Detection [Predictive]

Anomaly detection
 What is Anomaly Detection ?
 Types of Anomaly Detection:
• Unsupervised anomaly detection
• Supervised anomaly detection
• Semi-supervised anomaly detection

Association rule learning
 What is Association rule learning
 The examples:
• In super Market
• Inventory Management

Classification
What is it ?
 Given a collection of records (training set )
Find a model for class attribute as a function of the values
of other attributes
Goal: previously unseen records should be assigned a class
as accurately as possible.
 Example:

Clusters
 What is it ?
 Example:

Sequential Pattern
Discovery
 What is it?
 Example:
 In point-of-sale transaction sequences,
 Computer Bookstore:
(Intro_To_Visual_C) (C++_Primer) -->
(Perl_for_dummies,Tcl_Tk)
 Athletic Apparel Store:
(Shoes) (Racket, Racketball) --> (Sports_Jacket)
(A B) (C)  (D E)

Regression
 What is it ?
 Example:
 Pagerank as used by google
 • Page structure implicitly holds importance of a page
 • Important pages are linked to by important pages

Applications Of Data Mining
 Data Mining Applications in Sales/Marketing
 Data Mining Applications in Banking / Finance
 Data Mining Applications in Health Care and Insurance
 Data Mining Applications in Transportation
 Data Mining Applications in Medicine

Data Mining Applications in
Sales/Marketing
 enables businesses to understand the hidden patterns
inside historical purchasing transaction
 Market basket analysis
 Identify customer’s behavior

Data Mining Applications
in Banking / Finance
 credit card fraud detection
 identify customers loyalty
 identify stock trading rules
 Identify users by method of payment/transaction

in Health Care and Insurance
 Claims analysis
 Forecasts of customers
 Detect risky customers
 Fraudulent behavior

in Transportation
 Determine the distribution schedules

in Medicine
 Characterize patient activities
 Identify the patterns

Data Ware-housing
Techniques
Star Schema
Elements
Example
Star Schema VS Snowflake Schema

Star Schema
 Star schema is the simplest form of a dimensional model, in
which data is organized into facts and dimensions.
 A star schema is diagramed by surrounding each fact with
its associated dimensions.
 The resulting diagram resembles a star.
 Star schemas are optimized for querying large data sets and
are used in data warehouses and data marts to support
OLAP cubes, business intelligence and analytic applications,
and queries.

Elements of star schema
 Dimension tables
 A dimension contains reference information
about the fact, such as date, product, or
customer.
 Demoralized, decoded and cleaned set of
descriptive data elements
 Geography dimension tables describe
location data, such as country, state, or city
 Employee dimension tables describe
employees, such as salespeople

Fact Tables
A fact is an event that is counted or measured,
such as a sale or login.
Contains foreign keys referencing dimension
records
Contain either additive or semi-additive
measures for analysis

Example
 Each dimension table has a primary key on its Id column, relating
to one of the columns (viewed as rows in the example schema) of
the Fact_Sales table's three-column (compound) primary key
(Date_Id, Store_Id, Product_Id).
 The non-primary key Units_Sold column of the fact table in this
example represents a measure or metric that can be used in
calculations and analysis.
 The non-primary key columns of the dimension tables represent
additional attributes of the dimensions (such as the Year of the
Dim_Date dimension).
 For example, the following query answers how many TV sets have
been sold, for each brand and country, in 1997:
 SELECT P.Brand, S.Country, SUM(F.Units_Sold)FROM
Fact_Sales FINNER JOIN Dim_Date D ON F.Date_Id = D.IdINNER
JOIN Dim_Store S ON F.Store_Id = S.IdINNER JOIN Dim_Product P
ON F.Product_Id = P.IdWHERE D.YEAR = 1997AND
P.Product_Category = 'tv'GROUP BY P.Brand, S.Country

Snowflake
Schema
Star Schema
Ease of
maintenance/change:
No redundancy
and hence more
easy to maintain
and change
Has redundant data and hence less easy to
maintain/change
Ease of Use:
More complex
queries and hence
less easy to
understand
Less complex queries and easy to
understand
Query Performance:
More foreign keys-
and hence more
query execution
time
Less no. of foreign keys and hence lesser
query execution time
Normalization:
Has normalized
tables
Has De-normalized tables

Type of
Datawarehouse:
Good to use for
datawarehouse
core to simplify
complex
relationships
(many:many)
Good for datamarts with simple
relationships (1:1 or 1:many)
Joins:
Higher number of
Joins
Fewer Joins
Dimension table:
It may have more
than one
dimension table
for each
dimension
Contains only single dimension table for
each dimension
When to use:
When dimension
table is relatively
big in size,
snowflaking is
better as it
reduces space.
When dimension table contains less number
of rows, we can go for Star schema.

References
 http://www.programmerinterview.com/index.php/data
base-sql/data-mining-vs-warehousing/
 http://en.wikipedia.org/wiki/Data_mining
 http://en.wikipedia.org/wiki/Data_warehouse

Thank you For Your Attention
Any Questions

Presented by
Engr.Faizan Saleem
Software Engineer
Bahria University Karachi Campus
faizansaleem2803@yahoo.com
www.facebook.com/faiz.saleem

Data warehousing and Data mining

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data warehousing and Data mining

Similar to Data warehousing and Data mining (20)

Recently uploaded

Recently uploaded (20)

Data warehousing and Data mining

Editor's Notes