Data Mining (Introduction)

1 I NAME OF PRESENTER
Data Mining
Ashis Kumar Chanda
Department of Computer Science and Engineering
University of Dhaka

2 I NAME OF PRESENTERCSE, DU2
Key concepts
 What is Data mining
 Why learn Data mining
 Data type
 Warehouse & OLAP
 Data Cleaning, Integration
 Associations, Item sets, Support, Confidence

Data Mining
 Data mining refers to Knowledge mining
from large amount of data
 Also known as “Knowledge Discovery from
Data” or KDD
 Target is to find a Hidden Pattern

4 I NAME OF PRESENTER
 We can’t get all type of information through Query
 Query not support Statistical analysis
 Again, we can apply artificial intelligence & find new
patterns or structures
CSE, DU4
Why learn data mining
Query provide values but data mining provides idea that help
to take (business ) decision
Ex: Women live at “Dhanmondi” & older than 40 years
most frequently buy “Jamdani Shari” at “Arong”

Data type
 Tabular (Transaction data) Most commonly
used
 Spatial Data (Remote sensing data/
encoded data)
 Tree Data ( xml )
 Graphs (www, bio-molecular)
 Sequence (DNA, activity log)
 Text, multimedia data

Warehouse & OLAP
Ware House
Data Source
Warehouse is an archive of information gathered from
multiple sources
Suppose a Banking database where each has a data source
that stores all transactions of that area. And all data source
will provide a clean/safe copy at Warehouse

Warehouse & OLAP
There is several issues about Warehouse:
 When and how to gather data
 What schema/pattern to use
 Data transformation & cleaning
 How to update
“Warehouse is a collection of data marts”
Where data mart is store of data in specialized pattern

Warehouse & OLAP
OLAP: Online Analytical Processing
OLAP tools support interactive analysis of summary Information
OLAP permits an analyst to view different summaries of
multidimensional data
Item name
Dress
Fig: Data Cube

Data cleaning
There may be some missing data, duplicate data, dirty data
So we need to data cleaning
Some methods:
 Ignore the tuple (not effective unless tuple contain many
missing attribute)
 Fill missing values (time consuming)
 Fill with a global value (like: unknown)
 Use mean attribute
 Use most probable value

Associations & Item sets
Associations:
An associations is a rule of the form if X then Y
It is denoted as X-> Y
Example: if there is an exam then I read
Item Sets:
For any rule if X->Y & Y->X Then X, Y are called item-set
Example:
People buying school books in January also by notebook
People buying school note books in January also by book

Support & confidence
Support:
The proportion of transactions in the data set which contains
the itemset
Confidence:
The conditional probability that an item appears in a
transaction when another item appears.

Support & confidence
Support for {I₁,I₂}
= support_count(I1 U I2)/ |D|
= 4/9
Confidence for I1 → I2
=support_count(I1 U I2) /
support_count(I1)
= 4/6

Association rules
Where, support count(AUB) is the number of transactions
containing the itemsets AUB, and support count(A) is the
number of transactions containing the itemset A.
•Association rules can be generated as follows:
1. For each frequent itemset l, generate all nonempty subsets
of l.
2. For every nonempty subset s of l, output the rule “s → (l-
s)” if support count(l)/support count(s) >= min_conf,
where min_conf is the minimum confidence threshold.

Summary
Basic topics: Data mining, Data cleaning, Warehouse, OLAP
Term: Association, Item-set, Support, Confidence

References
- Data Mining Concepts & Techniques
by J. Han & M. Kamber
- Database system Concept
by Abraham Sillberschatz, Korth, Sudarshan
- Lecture of Dr. S. Srinath
Institute of Technology at Madras, India

Data Mining (Introduction)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Data Mining (Introduction)

Similar to Data Mining (Introduction) (20)

More from Ashis Kumar Chanda

More from Ashis Kumar Chanda (20)

Recently uploaded

Recently uploaded (20)

Data Mining (Introduction)