This document provides an overview of data warehousing and data mining. It discusses key topics such as the definition of a data warehouse, its characteristics and architectures. It also describes how data is stored and modeled in a data warehouse. The document then covers data mining, outlining its process and elements. It discusses advantages of both data warehousing and data mining such as enhanced business intelligence and improved decision making. Disadvantages like privacy and security issues are also presented.
1. PRESENTATION
ON
DATA WAREHOUSING
AND
DATA MINING
SUBMITTED TO: SUBMITTED BY:-
MRS.MANISHA BHATNAGAR
(HOD OF COMP. SCI DEPT) MCA-III
MRS.HARKAWNALJEET KAUR ROLL NO:9
(ASST. PROF OF COMP. SCIENCE DEPT)
2. CONTENTS
DATA WAREHOUSE
CHARACTERSTICS OF DATA WAREHOUSE
ARCHITECTURE OF DATA WAREHOUSE
DATA STORING IN DATA WAREHOUSE
DATA WAREHOUSE DIMESIONAL MODELLING
INSTALLING THE SERVICE MANAGER DATA
WAREHOUSE SERVER
EXAMPLE OF DATA WAREHOUSE
ADVANTAGE OF DATA WAREHOUSE
DISADVANTAGE OF DATA WAREHOUSE
3. CONTENTS
DATA MINING
ELEMENTS OF DATA MINING
DATA MINING PROCESS
ARCHITECTURE OF DATA MINING
ADDING THE OPTION OF DATA MINING TO A
DATABASE
ADVANTAGES OF DATA MINING
DISADVANTAGES OF DATA MINING
4. DATA WAREHOUSE
A data warehouse is a relational database that is designed for query
and analysis rather than for transaction processing. It usually
contains historical data derived from transaction data, but can
include data from other sources.
DEFINITION OF DATA WAREHOUSE
“ A data warehouse is simply a single, complete, and consistent
store of data obtained from a variety of different sources and made
available to end users in a way they can understand and use it in a
business context.”
BARLIEN DEVLIN, IBM CONSULTANT
5. CHARACTERSTICS OF DATA
WAREHOUSING
Subject Oriented: -Data are organized according to subject instead of
application. Data warehouses are designed to help you analyze data
Integrated: -Integration is closely related to subject orientation. Data
warehouses must put data from disparate sources into a consistent
format.
Nonvolatile: - Nonvolatile means that, once entered into the data
warehouse, data should not change.
Time Variant: - Data warehouse maintains historical data which are
used to analyze the business or market trends and facilitate future
predictions.
6. Data Warehouse Architectures
Data warehouses and their architectures vary depending upon the
specifics of an organization's situation. Three common architectures are:
■ Data Warehouse Architecture (Basic)
■ Data Warehouse Architecture (with a Staging Area)
■ Data Warehouse Architecture (with a Staging Area and Data Marts)
7. Data Warehouse Architecture (Basic)
Figure shows a simple architecture for a data warehouse.
End users directly access data derived from several source
systems through the data warehouse.
8. Data Warehouse Architecture
(with a Staging Area)
you need to clean and process your operational data before putting it into
the warehouse.
You can do this programmatically, although most data warehouses use
a staging area instead.
A staging area simplifies building summaries and general warehouse
management.
Figure illustrates this typical architecture.
10. Data Warehouse Architecture
(with a Staging Area and Data Marts)
Although the architecture in Figure is quite common, you may want to
customize your warehouse's architecture for different groups within your
organization.
You can do this by adding data marts, which are systems designed for a
particular line of business.
Figure illustrates an example where purchasing, sales, and inventories
are separated. In this example, a financial analyst might want to analyze
historical data for purchases and sales.
12. DATA STORING IN DATA
WAREHOUSE
FACT TABLE: -
The central table that contains the fact data. Fact tables represent data
usually numeric that are analyzed and examined.
DIMENSION TABLE:-
Dimension tables store the information you normally use to contain
queries.
13. Data Warehouse Dimensional Modelling
(Types of Schemas)
There are four types of schemas are available in data warehouse.
SCHEMA
FACT
STAR
CONSTELLATION
SCHEMA
SCHEMA
SNOWFLAKE GALAXY
SCHEMA SCHEMA
14. Star Schema
A star schema is the one in which a central fact table is sourrounded by
denormalized dimensional tables.
15. Snowflake schema
A snow flake schema is an enhancement of star schema by adding
additional dimensions.
16. Snowflake Schema
Sale fact table
geography Number
Number
Store
Prod_id
Prod_id
Id store_id
id store_id
State quantity
Name quantity
country
Geography_id
product Brand
Prod_id
Brand_id Id
cost Brand
17. Galaxy Schema
Galaxy schema contains many fact tables with some common
dimensions (conformed dimensions). It is also known as Fact
Constellation Schema
18. Galaxy Schema
Retailer supplier
Supplier_id
Retail_id Name
Name country
city
Sale fact table Purchase fact table
Number
Number Number
Number
Prod_id
Prod_id Prod_id
Prod_id
Retail_id
Retail_id supplier_id
supplier_id
quantity
quantity quantity
quantity
product
Prod_id
Type
cost
21. EXAMPLE OF DATA WAREHOUSE
ARCHITECTURE (PRODUCTION ENVIRONMENT)
We are considering implementing the following three-tier platform
which will allow us to scale horizontally in the future:
Our development environment consists of a server with 2 x Intel Xeon
2.8GHz Processors, 2GB of RAM and is running Windows 2000 –
Service Pack 4.
We are considering the following for the scaled roll-out of our
production environment.
A. Hardware
1. Server 1 - SAS® Data Server
- 4 way 64 bit 1.5Ghz Itanium2 server
- 16 Gb RAM
- 2 73 Gb Drives (RAID 1) for the OS
- 1 10/100/1Gb Cu Ethernet card
22. EXAMPLE OF DATA WAREHOUSE
ARCHITECTURE (PRODUCTION ENVIRONMENT)
- 1 Windows 2003 Enterprise Edition for Itanium
2 Mid-Tier (Web) Server
- 2 way 32 bit 3Ghz Xeon Server
- 4 Gb RAM
- 1 10/100/1Gb Cu Ethernet card
- 1 Windows 2003 Enterprise Edition for x86
3. SAN Drive Array (modular and can grow with the warehouse)
- 6 – 72GB Drives (RAID 5) total 360GB for SAS® and Data
23. EXAMPLE OF DATA WAREHOUSE
ARCHITECTURE (PRODUCTION ENVIRONMENT)
B. Software
1. Server 1 - SAS® Data Server
- SAS® 9.1.3
- SAS® Metadata Server
- SAS® WorkSpace Server
- SAS® Stored Process Server
- Platform JobScheduler
2. Mid -Tier Server
- SAS® Web Report Studio
- SAS® Information Delivery Portal
- BEA Web Logic for future SAS® SPM Platform
- Xythos Web File System (WFS)
24. EXAMPLE OF DATA WAREHOUSE
ARCHITECTURE (PRODUCTION ENVIRONMENT)
3. Client –Tier Server
- SAS® Enterprise Guide
- SAS® Add-In for Microsoft Office
25. BENFITS OF DATA WAREHOUSE
1. A Data Warehouse Delivers Enhanced Business Intelligence: -
Insights will be gained through improved information access.
Managers and executives will be freed from making their
decisions based on limited data and their own “gut feelings”.
Decisions that affect the strategy and operations of organizations
will be based upon credible facts and will be backed up with
evidence and actual organizational data.
2. A Data Warehouse Saves Time
Since business users can quickly access critical data from a
number of sources—all in one place—they can rapidly make
informed decisions on key initiatives. They won’t waste
precious time retrieving data from multiple sources.
26. BENFITS OF DATA WAREHOUSE
3. A Data Warehouse Enhances Data Quality and
Consistency
A data warehouse implementation includes the conversion of
data from numerous source systems into a common
format. So you can have more confidence in the accuracy
of your data. And accurate data is the basis for strong
business decisions.
4. A Data Warehouse Provides Historical Intelligence
A data warehouse stores large amounts of historical data so
you can analyze different time periods and trends in order to
make future predictions. Such data typically cannot be
stored in a transactional database or used to generate
reports from a transactional system.
27. BENFITS OF DATA WAREHOUSE
5. A Data Warehouse Generates a High ROI
Finally, the piece de resistance—return on investment.
Companies that have implemented data warehouses and
complementary BI systems have generated more
revenue and saved more money than companies that
haven’t invested in BI systems and data warehouses.
28. DISADVANTAGES OF DATA
WAREHOUSE
•Long initial implementation time and associated high cost
•Adding new data sources takes time and associated high cost
•Limited flexibility of use and types of users - requires multiple
separate data marts for multiple uses and types of users
•Typically, data is static and dated
•Difficult to accommodate changes in data types and ranges,
data source schema.
29. DATA MINING
Data mining is process of discovering hidden, previously unknown
and usable information from a large amount of data. It is often
defined as finding hidden information in a database.
DEFINITION OF DATA MINING: -
“The efficient discovery of valuable non-obvious information from a
large collection of data.”
[BIGUS 96]
30. Elements of Data mining
•Extract, transform, and load transaction data onto the data warehouse
system.
•Store and manage the data in a multidimensional database system.
•Provide data access to business analysts and information technology
professionals.
•Analyze the data by application software.
•Present the data in a useful format, such as a graph or table.
33. DATA MINING PROCESS
DATA MINING:-
the stage is concerned with the extraction of patterns
from the data. A pattern can be defined as given a set of facts(data) F, a
language L, and some measure of certainty C, a pattern is a statement S
in L that describes relationships among a subset F(s) of F with a certainty
C.
INTERPRETATION AND EVALUTION:-
The Patterns identified by
the system are interpreted into knowledge which can then be used to
support human decision making.
34. ARCHTITECTURE OF DATA MINING
There are three tiers in the tight-coupling data mining architecture:
Data layer: data layer can be database and/or data warehouse systems.
This layer is an interface for all data sources. Data mining results are
stored in data layer so it can be presented to end-user in form of reports
or other kind of visualization.
Application layer: -Data mining application layer is used to retrieve
data from database. Some transformation routine can be performed here
to transform data into desired format.
Front-end layer: -Front-end layer provides intuitive and friendly user
interface for end-user to interact with data mining system. Data mining
result presented in visualization form to the user in the front-end layer.
36. Adding the Data Mining Option to a
Database
Once you have installed the Oracle Database software, you can build
databases as needed. You might build a database without the Data
Mining option but later decide to add it.
37. Advantages of Data
Mining
Marketing / Retail
Data mining helps marketing companies to build models based on
historical data to predict who will respond to new marketing campaign
such as direct mail, online marketing campaign and etc.
Data mining brings a lot of benefit s to retail company in the same way
as marketing. Through market basket analysis, the store can have an
appropriate production arrangement in the way that customers can buy
frequent buying products together with pleasant.
38. Advantages of Data
Mining
Finance / Banking
Data mining gives financial institutions information about loan
information and credit reporting. By building a model from previous
customer’s data with common characteristics, the bank and financial can
estimate what are the good and/or bad loans and its risk level. In
addition, data mining can help banks to detect fraudulent credit card
transaction to help credit card’s owner prevent their losses.
39. Advantages of Data
Mining
Manufacturing
By applying data mining in operational engineering data, manufacturers
can detect faulty equipments and determine optimal control parameters.
Governments
Data mining helps government agency by digging and analyzing records
of financial transaction to build patterns that can detect money
laundering or criminal activity.
40. Disadvantages of data mining
Privacy Issues
The concerns about the personal privacy have been increasing
enormously recently especially when internet is booming with social
networks.
Security issues
Security is a big issue. Businesses owns information about their
employee and customers including social security number, birthday,
payroll and etc. However how properly this information is taken is still
in questions. There have been a lot of cases that hackers were accesses
and stole big data of customers from big corporation such as Ford Motor
Credit Company, Sony… with so much personal and financial
information available, the credit card stolen and identity theft become a
big problem.
41. Disadvantages of data mining
Misuse of information/inaccurate information
Information collected through data mining intended for marketing or
ethical purposes can be misused. This information is exploited by
unethical people or business to take benefit of vulnerable people or
discriminate against a group of people.