Data miningvs datawarehouse

Data Mining is the mining, or discovery, of new information in terms of patterns or rules
from vast amounts of data. T1o be useful, data mining must be carried out efficiently on large
files and databases. Eg: using neural network , some mathematical algorithm to mine on data
and analyzing data. That result extracting of data increasing productivity and efficiency..
eg: socail network: facebook, linked in, twitter. People as a data . Extracting data for valuable
busineess resource
Goals of Data Mining
 Prediction: Determine how certain attributes will behave in the future. For example,
how much sales volume a store will generate in a given period.
 Identification: Identify patterns in data. For example, newly wed couples tend to
spend more money buying furniture.
 Classification: Partition data into classes. For example, customers can be classified
into different categories with different behavior in shopping.
Eg:customer in supermarket can be categorized into discount seeking, shoppers,
shopper in rush, loyal regular shopper, infrequent shopper.
 Optimization: Optimize the use of limited resources such as time, space, money or
materials. For example, how to best use advertising to maximize profits (sales).
Types of Knowledge Discovered during Data Mining
 Association rules: For example, when a male shopper buys a new car, he is likely to
buy a car CD.
 Classification hierarchies: For example, mutual funds may be classified into three
categories: growth, income and stable. In banking application, customer applying for
credit card can be classified as risk,fail risk and good risk.
 Sequence patterns: Sequence patterns are temporal associations. For example, if
mortgage interest rate drops, within six months period the sales of houses will
increase by certain percentage.
 Patterns within time series: such as stock price data behavior in time.
 Detection of Similarity, or segmentation (Clustering): A population of events or
item can be partitioned into similar set of elements .For example, health data may
indicate similarity among subgroups of people.
1 http://sumanastani.com.np
s

Applications of Data Mining
 Marketing
 Finance
 Manufacturing
 Health Care
Commercial Data Mining Tools
Intelligent Miner from IBM applies classification and association rules to detect rules and
patterns and make predictions.
Enterprise Miner from SAS applies decision trees, neural nets, clustering techniques, statistics,
association rules.
Many new tools are coming out on the market in recent years, making data mining a very
active research and development area.
What is 'Data Warehousing'
Data warehousing is the electronic storage of a large amount of information by a business that
help in future decision making. Warehoused data must be stored in a manner that is secure,
reliable, easy to retrieve and easy to manage
A data warehouse is a:
 subject-oriented
 integrated
 timevarying
 non-volatilecollection of data in support of the management's decision-making
process.
A data warehouse is a centralized repository that stores data from multiple
information sources and transforms them into a common, multidimensional data
model for efficient querying and analysis.

DATAWARE HOUSE VS DATABASE
Database
1.Database are collection of data organized in some way.
2.Used for Online Transactional Processing (OLTP) include insert, delete, update and
other queries. but can be used for other purposes such as Data Warehousing. This records
the data from the user for history.
3.The tables and joins are complex since they are normalized (for RDMS). This is done to
reduce redundant data and to save storage space.
4. Database Desigh :Entity – Relational modeling techniques are used for RDMS database
design.
5.Optimized for write operation.
6.Performance is low for analysis queries.
7.Data are volatile: changes frequently
Data Warehouse
1.DataWare house is an effective collection of data that facilitates reporting and analysis
for future decision.
2.Used for Online Analytical Processing (OLAP). This reads the historical data for the
Users for business decisions.
3.The Tables and joins are simple since they are de-normalized. This is done to reduce the
response time for analytical queries.
4.Database Design : Data – Modeling techniques are used for the Data Warehouse design.
5.Optimized for read operations.
6.High performance for analytical queries.
7.Data are non-volatile: changes less often.
Characterstics
subject-oriented : A data warehouse can be used to analyze a particular subject area. For
example, “sales” can be a particular subject.
integrated : A data warehouse integrates data from multiple data sources. For example,
source A and source B may have different ways of identifying a product, but in a data
warehouse, there will be only a single way of identifying a product.
It is consistent in the way that data from several sources is extracted and transformed. For
example, coding conventions are standardized: M _ male, F _ female.
Timevarying : Historical data is kept in a data warehouse. For example, one can retrieve data
from 3 months, 6 months, 12 months, or even older data from a data warehouse. This
contrasts with a transactions system, where often only the most recent data is kept. For
example, a transaction system may hold the most recent address of a customer, where a data
warehouse can hold all addresses associated with a customer.

Data are organized by various time-periods (e.g. months).
Non-volatile : Once data is in the data warehouse, it will not change. So, historical data in a
data warehouse should never be altered.
collection of data in support of the management's decision-making process.
A data warehouse is a centralized repository that stores data from multiple information
sources and transforms them into a common, multidimensional data model for efficient
querying and analysis.
Other extra charcter:
1.Client Server Architecture
2.Transperency
3.Flexible reporting
4.Multi user support
Function of Data Ware house.(RDSSSD)
1. Roll Up: Data are summarized with generalization like weekly=>monthly=>annualy
2. Drill Down: Complement of roll up. Opposite
3. Pivot : cross tabulation(roatation) can be performed
4. Slice and Dice : projection operation is performed on the dimension
5. Sorting : data is sorted in some order(ascend/descend)
6. Selection: data is available by value or range
7. Derived computed attributes: Attributes are composed by operation on stored derived
value.

Data miningvs datawarehouse

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Data miningvs datawarehouse

Similar to Data miningvs datawarehouse (20)

Recently uploaded

Recently uploaded (20)

Data miningvs datawarehouse