Data mining

Priyabrata satapathy
M.Tech 1st Year
SIS NO.-MCS12121

 What is Data mining.

 Why Data mining needed.

 Data, Information, Knowledge.

 Data mining & KDD.

 Data Warehouses.

 Data Cleaning.

 Applications of Data mining.

Data mining (knowledge discovery in databases):

 Extraction of interesting information or patterns from data
in large databases.

 Knowledge discovery in databases (KDD) is the process of
identifying valid, useful and ultimately understandable patterns
in data from large database.

 Data mining is needed for providing tools to
discover Knowledge from data.

 Data mining turns a large collection of data
into knowledge.

•Data
Data are any facts, numbers, or text that can be
processed by a computer.
 operational or transactional data : such as, sales, cost,
inventory, payroll, and accounting

 meta data - : data about the data itself, such as logical
database design or data dictionary definitions

 nonoperational data: such as industry sales, forecast
data, and macro economic data

The patterns, associations, or relationships among
All this data can provide information.

 For example, analysis of retail point of sale
transaction data can yield information on which
products are selling and when.

•Information can be converted into knowledge
about historical patterns and future trends.

 For example, summary information on retail
supermarket sales can be analyzed in light of
promotional efforts to provide knowledge of
consumer buying behavior.

Data cleaning
Used to remove noise and inconsistent data.
Data integration
Where multiple data sources may be combined.
Data selection
Where data relevant to the analysis task are retrieved from
the database.
Data transformation
Where data are transformed or consolidated into forms
appropriate for mining by performing summary.
Data mining
An essential process where intelligent methods are applied
in order to extract data patterns.

IA data warehouse is a repository of information
collected from multiple sources, stored under a
unified schema and residing to a single site.

Data warehouse constructed through a process of
data cleaning, data integration, data transformation,
data loading & data refreshing.

Data that is to be analyze by data mining
techniques can be incomplete, noisy, and
inconsistent.

Data cleaning routines attempt to fill in missing
values, smooth out noise while identifying
outliers, and correct inconstancies of data.

We can clean the missing values in data by
 Ignoring the tuple.

 Filling the missing value manually.

 Using a global constant to fill the values.

 Using the measure of mean, median to fill the
missing value.
 Using the most probable value to fill.

Noisy data means error full data .
To handle noisy data :
 Binning:

Binning methods smooth a sorted data value by
consulting the neighborhood values around it.
 Regression:

Data smoothing can be done by regression . Here
data values changes to a function.
 Outlier:

Outliers may be detected by clustering. Here
similar values are arranged in clusters, those are fall
outside are outliers.

Data mining for Financial data analysis.

Data mining for Retail and Telecommunication
Industries.

Data mining for Science and Engineering.

Data mining and Recommender systems.

Data mining

More Related Content

What's hot

Viewers also liked

Similar to Data mining

Data mining