Data Warehousing
Lecture-31
Supervised vs. Unsupervised Learning
Virtual University of PakistanVirtual University of Pakistan
Ahsan Abdullah
Assoc. Prof. & Head
Center for Agro-Informatics Research
www.nu.edu.pk/cairindex.asp
National University of Computers & Emerging Sciences, Islamabad
Email: ahsan101@yahoo.com
Data Structures in Data Mining
• Data matrix
– Table or database
– n records and m attributes,
– n >> m
C1,1 C1,2 C1,3 C1,m
C2,1 C2,2 C2,3 C2,m
C3,1 C3,2 C3,3 C3,m
Cn,1 Cn,2 Cn,3 Cn,m
…
.
.
.
…
.
.
.
1 S1,2 S1,3 S1,n
S2,1 1 S2,3 S2,n
S3,1 S3,2 1 S3,n
Sn,1 Sn,2 Sn,3 1
…
.
.
.
…
.
.
.
• Similarity matrix
– Symmetric square matrix
– n x n or m x m
Main types of DATA MINING
Supervised
• Bayesian Modeling
• Decision Trees
• Neural Networks
• Etc.
Unsupervised
• One-way Clustering
• Two-way Clustering
Type and number of
classes are NOT
known in advance
Type and number of
classes are known in
advance
Clustering: Min-Max Distance
Age
Salary
20 40 60
outlier Inter-cluster
distances are
maximized
Intra-cluster
distances are
minimized
How Clustering works?
One-way clustering example
INPUT OUTPUT
Black spots
are noise
White spots
are missing
data
Data Mining Agriculture data
INPUT Clustered OUTPUT
clusters
Which class?
Classifier (model)
Unseen Data
Classification
Output
Confidence
Level
Inputs
How Classification work?
Classification Process (1): Model Construction
TrainingTraining
DataData
NAME Time Items Gender
Moin 10 2 M
Munir 16 3 M
Meher 15 1 F
Javed 5 1 M
Mahin 20 1 F
Akram 20 4 M
ClassificationClassification
AlgorithmsAlgorithms
IF time/items >= 6
THEN gender = ‘F’
ClassifierClassifier
(Model)(Model)
(observations, measurements, etc.)
Relationship between shopping time and items bought
Classification Process (2): Use the Model in Prediction
TestingTesting
DataData Unseen DataUnseen Data
(Firdous, Time= 15 Items = 1)
ClassifierClassifier
Gender?
NAME Time Items Gender
Tahir 20 1 M
Younas 11 2 M
Yasin 3 1 M
Clustering vs. Cluster Detection
Clustering vs. Cluster Detection Example
AA BB
The K-Means Clustering
The K-Means Clustering: Example
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
A B
D C
The K-Means Clustering: Comment

Lecture 31

  • 1.
    Data Warehousing Lecture-31 Supervised vs.Unsupervised Learning Virtual University of PakistanVirtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www.nu.edu.pk/cairindex.asp National University of Computers & Emerging Sciences, Islamabad Email: ahsan101@yahoo.com
  • 2.
    Data Structures inData Mining • Data matrix – Table or database – n records and m attributes, – n >> m C1,1 C1,2 C1,3 C1,m C2,1 C2,2 C2,3 C2,m C3,1 C3,2 C3,3 C3,m Cn,1 Cn,2 Cn,3 Cn,m … . . . … . . . 1 S1,2 S1,3 S1,n S2,1 1 S2,3 S2,n S3,1 S3,2 1 S3,n Sn,1 Sn,2 Sn,3 1 … . . . … . . . • Similarity matrix – Symmetric square matrix – n x n or m x m
  • 3.
    Main types ofDATA MINING Supervised • Bayesian Modeling • Decision Trees • Neural Networks • Etc. Unsupervised • One-way Clustering • Two-way Clustering Type and number of classes are NOT known in advance Type and number of classes are known in advance
  • 4.
    Clustering: Min-Max Distance Age Salary 2040 60 outlier Inter-cluster distances are maximized Intra-cluster distances are minimized
  • 5.
  • 6.
    One-way clustering example INPUTOUTPUT Black spots are noise White spots are missing data
  • 7.
    Data Mining Agriculturedata INPUT Clustered OUTPUT clusters
  • 8.
  • 9.
  • 10.
    Classification Process (1):Model Construction TrainingTraining DataData NAME Time Items Gender Moin 10 2 M Munir 16 3 M Meher 15 1 F Javed 5 1 M Mahin 20 1 F Akram 20 4 M ClassificationClassification AlgorithmsAlgorithms IF time/items >= 6 THEN gender = ‘F’ ClassifierClassifier (Model)(Model) (observations, measurements, etc.) Relationship between shopping time and items bought
  • 11.
    Classification Process (2):Use the Model in Prediction TestingTesting DataData Unseen DataUnseen Data (Firdous, Time= 15 Items = 1) ClassifierClassifier Gender? NAME Time Items Gender Tahir 20 1 M Younas 11 2 M Yasin 3 1 M
  • 12.
  • 13.
    Clustering vs. ClusterDetection Example AA BB
  • 14.
  • 15.
    The K-Means Clustering:Example 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 A B D C
  • 16.