Data mining refers Knowledge mining from large amount of data. Also known as “Knowledge Discovery from Data” or KDD.
Basic terms & notations are described in that presentation
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
Data Mining (Introduction)
1. 1 I NAME OF PRESENTER
Data Mining
Ashis Kumar Chanda
Department of Computer Science and Engineering
University of Dhaka
2. 2 I NAME OF PRESENTERCSE, DU2
Key concepts
What is Data mining
Why learn Data mining
Data type
Warehouse & OLAP
Data Cleaning, Integration
Associations, Item sets, Support, Confidence
3. 3 I NAME OF PRESENTERCSE, DU3
Data Mining
Data mining refers to Knowledge mining
from large amount of data
Also known as “Knowledge Discovery from
Data” or KDD
Target is to find a Hidden Pattern
4. 4 I NAME OF PRESENTER
We can’t get all type of information through Query
Query not support Statistical analysis
Again, we can apply artificial intelligence & find new
patterns or structures
CSE, DU4
Why learn data mining
Query provide values but data mining provides idea that help
to take (business ) decision
Ex: Women live at “Dhanmondi” & older than 40 years
most frequently buy “Jamdani Shari” at “Arong”
5. 5 I NAME OF PRESENTERCSE, DU5
Data type
Tabular (Transaction data) Most commonly
used
Spatial Data (Remote sensing data/
encoded data)
Tree Data ( xml )
Graphs (www, bio-molecular)
Sequence (DNA, activity log)
Text, multimedia data
6. 6 I NAME OF PRESENTERCSE, DU6
Warehouse & OLAP
Ware House
Data Source
Warehouse is an archive of information gathered from
multiple sources
Suppose a Banking database where each has a data source
that stores all transactions of that area. And all data source
will provide a clean/safe copy at Warehouse
7. 7 I NAME OF PRESENTERCSE, DU7
Warehouse & OLAP
There is several issues about Warehouse:
When and how to gather data
What schema/pattern to use
Data transformation & cleaning
How to update
“Warehouse is a collection of data marts”
Where data mart is store of data in specialized pattern
8. 8 I NAME OF PRESENTERCSE, DU8
Warehouse & OLAP
OLAP: Online Analytical Processing
OLAP tools support interactive analysis of summary Information
OLAP permits an analyst to view different summaries of
multidimensional data
Item name
Dress
Fig: Data Cube
9. 9 I NAME OF PRESENTERCSE, DU9
Data cleaning
There may be some missing data, duplicate data, dirty data
So we need to data cleaning
Some methods:
Ignore the tuple (not effective unless tuple contain many
missing attribute)
Fill missing values (time consuming)
Fill with a global value (like: unknown)
Use mean attribute
Use most probable value
11. 11 I NAME OF PRESENTERCSE, DU11
Associations & Item sets
Associations:
An associations is a rule of the form if X then Y
It is denoted as X-> Y
Example: if there is an exam then I read
Item Sets:
For any rule if X->Y & Y->X Then X, Y are called item-set
Example:
People buying school books in January also by notebook
People buying school note books in January also by book
12. 12 I NAME OF PRESENTERCSE, DU12
Support & confidence
Support:
The proportion of transactions in the data set which contains
the itemset
Confidence:
The conditional probability that an item appears in a
transaction when another item appears.
13. 13 I NAME OF PRESENTERCSE, DU13
Support & confidence
Support for {I₁,I₂}
= support_count(I1 U I2)/ |D|
= 4/9
Confidence for I1 → I2
=support_count(I1 U I2) /
support_count(I1)
= 4/6
14. 14 I NAME OF PRESENTERCSE, DU14
Association rules
Where, support count(AUB) is the number of transactions
containing the itemsets AUB, and support count(A) is the
number of transactions containing the itemset A.
•Association rules can be generated as follows:
1. For each frequent itemset l, generate all nonempty subsets
of l.
2. For every nonempty subset s of l, output the rule “s → (l-
s)” if support count(l)/support count(s) >= min_conf,
where min_conf is the minimum confidence threshold.
15. 15 I NAME OF PRESENTERCSE, DU15
Summary
Basic topics: Data mining, Data cleaning, Warehouse, OLAP
Term: Association, Item-set, Support, Confidence
16. 16 I NAME OF PRESENTERCSE, DU16
References
- Data Mining Concepts & Techniques
by J. Han & M. Kamber
- Database system Concept
by Abraham Sillberschatz, Korth, Sudarshan
- Lecture of Dr. S. Srinath
Institute of Technology at Madras, India