SlideShare a Scribd company logo
MSc Dissertation 2013-14 MSc Course: KDD
Using diversity for dynamic optimisation
of data fusion ensembles
Student: Chris Ballard Supervisor: Dr Wang
School of Computing Sciences, August 2014
Aim and Objectives
This study evaluated ensemble methods
for the classification of heterogeneous data
sources. In this context, its aims were to
•Compare accuracy of feature-level fusion and
existing ensemble approaches
•Investigate how diversity can be used to
optimise ensemble performance
•Explore the link between diversity and accuracy
Conclusions
• Feature-level fusion outperformed decision-level
fusion
• Learn++ performed poorly with multiple datasets
• DES significantly improved performance of
Bagging when using decision-level fusion
• DES reduced performance when using feature-
level fusion
• Weak positive correlation between ensemble
accuracy and DES cluster diversity with CFD for
some datasets. No relationship with MFD.
Data
The study used 9 benchmark datasets from the UCI
Machine Learning Repository, partitioned into multiple
feature sets to simulate different data sources.
System Design
• Base classifiers: Bagging – CART Decision Trees;
Learn++ - Decision Stumps
• Ensemble generation, DES and automated test rig
implemented using Python 2.7.3 and scikit-learn
• Simulated annealing used to optimise sub-
ensemble diversity
• Diversity measured using CFD, MFD and EEC
diversity measures
Experimental Results
• Performance of Learn++.DES and Bagging.DES
compared to benchmark methods using Friedman and
Nemenyi tests (Figure 2)
• Ensemble accuracy and CFD diversity by dataset
(Figure 3); significant Spearman Correlation Coefficient
between accuracy and CFD (p < 0.05) highlighted
(Figure 4)
Figure 2 - Critical difference diagram - Learn++.DES and
Bagging.DES using decision-level fusion (CD=1.56, p=0.05)
Methods
• Two algorithms called Bagging.DES and
Learn++.DES generate ensembles for
each feature set:
• Bagging and Learn++ used to
generate pool of classifiers
• Dynamic Ensemble Selection (DES)
generates sub-ensembles for local
regions of feature space with optimal
diversity (Figure 1).
• DES compared to AdaBoost, Bagging
and Learn++ using feature- and decision-
level fusion
k-means clustering
1 2 3
4 5 6
N Classifiers
Validation set
Test instance
1 3 6 2 4 5 1 2 3
Cluster sub-ensemble Eck
Decision combination
Dynamic ensemble selection
Ensemble decision
Generated using
Bagging or Learn++
ck
Figure 1 – Dynamic Ensemble Selection in local clusters
Figure 3 – Plots of ensemble accuracy and
DES cluster CFD
Correlation
coefficient
Arrhythmia 0.694
Bio Deg 0.401
Heart
Disease
0.253
Hill Valley 0.168
Ionosphere 0.199
LSVT 0.46
Spectf 0.006
WDBC 0.574
Figure 4 – Spearman
rank-order correlation
coefficient

More Related Content

What's hot

Forest Cover type prediction
Forest Cover type predictionForest Cover type prediction
Forest Cover type prediction
Daniel Gribel
 
Clustering
ClusteringClustering
Clustering
M Rizwan Aqeel
 
What is cluster analysis
What is cluster analysisWhat is cluster analysis
What is cluster analysis
Prabhat gangwar
 
Spss tutorial-cluster-analysis
Spss tutorial-cluster-analysisSpss tutorial-cluster-analysis
Spss tutorial-cluster-analysis
Animesh Kumar
 
forest-cover-type
forest-cover-typeforest-cover-type
forest-cover-type
Kayleigh Beard
 
lab report 4
lab report 4lab report 4
lab report 4
Selase Kwami
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text mining
Houw Liong The
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
Fellowship at Vodafone FutureLab
 
Pattern recognition binoy k means clustering
Pattern recognition binoy  k means clusteringPattern recognition binoy  k means clustering
Pattern recognition binoy k means clustering
108kaushik
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basic
Houw Liong The
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
Poonam Kshirsagar
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
Jewel Refran
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clustering
tim_hare
 
Clustering: A Survey
Clustering: A SurveyClustering: A Survey
Clustering: A Survey
Raffaele Capaldo
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
DataminingTools Inc
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
Kasun Ranga Wijeweera
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
s v
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
Shubham Goyal
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
Acad
 
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
IRJET Journal
 

What's hot (20)

Forest Cover type prediction
Forest Cover type predictionForest Cover type prediction
Forest Cover type prediction
 
Clustering
ClusteringClustering
Clustering
 
What is cluster analysis
What is cluster analysisWhat is cluster analysis
What is cluster analysis
 
Spss tutorial-cluster-analysis
Spss tutorial-cluster-analysisSpss tutorial-cluster-analysis
Spss tutorial-cluster-analysis
 
forest-cover-type
forest-cover-typeforest-cover-type
forest-cover-type
 
lab report 4
lab report 4lab report 4
lab report 4
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text mining
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 
Pattern recognition binoy k means clustering
Pattern recognition binoy  k means clusteringPattern recognition binoy  k means clustering
Pattern recognition binoy k means clustering
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basic
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clustering
 
Clustering: A Survey
Clustering: A SurveyClustering: A Survey
Clustering: A Survey
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
 

Viewers also liked

Durham University Dissertation 2015
Durham University Dissertation 2015Durham University Dissertation 2015
Durham University Dissertation 2015
Janelle Chow
 
Protocol proposal to university
Protocol proposal to universityProtocol proposal to university
Protocol proposal to university
sushiv
 
Digital marketing for cloud services_Master's thesis results
Digital marketing for cloud services_Master's thesis resultsDigital marketing for cloud services_Master's thesis results
Digital marketing for cloud services_Master's thesis results
Stephanie Schulze
 
MSc Digital Innovation: Thesis/Dissertation Route
MSc Digital Innovation: Thesis/Dissertation RouteMSc Digital Innovation: Thesis/Dissertation Route
MSc Digital Innovation: Thesis/Dissertation Route
Allen Higgins
 
MSc Project Management Dissertation
MSc Project Management Dissertation MSc Project Management Dissertation
MSc Project Management Dissertation
Vardaan Sharma
 
Business management dissertation sample for mba students by dissertation-serv...
Business management dissertation sample for mba students by dissertation-serv...Business management dissertation sample for mba students by dissertation-serv...
Business management dissertation sample for mba students by dissertation-serv...
Dissertation Services
 
Dissertation Proposal Ppt
Dissertation Proposal PptDissertation Proposal Ppt
Dissertation Proposal Ppt
cutehalle
 
Master's Final Dissertation
Master's Final DissertationMaster's Final Dissertation
Master's Final Dissertation
Click Mark
 

Viewers also liked (8)

Durham University Dissertation 2015
Durham University Dissertation 2015Durham University Dissertation 2015
Durham University Dissertation 2015
 
Protocol proposal to university
Protocol proposal to universityProtocol proposal to university
Protocol proposal to university
 
Digital marketing for cloud services_Master's thesis results
Digital marketing for cloud services_Master's thesis resultsDigital marketing for cloud services_Master's thesis results
Digital marketing for cloud services_Master's thesis results
 
MSc Digital Innovation: Thesis/Dissertation Route
MSc Digital Innovation: Thesis/Dissertation RouteMSc Digital Innovation: Thesis/Dissertation Route
MSc Digital Innovation: Thesis/Dissertation Route
 
MSc Project Management Dissertation
MSc Project Management Dissertation MSc Project Management Dissertation
MSc Project Management Dissertation
 
Business management dissertation sample for mba students by dissertation-serv...
Business management dissertation sample for mba students by dissertation-serv...Business management dissertation sample for mba students by dissertation-serv...
Business management dissertation sample for mba students by dissertation-serv...
 
Dissertation Proposal Ppt
Dissertation Proposal PptDissertation Proposal Ppt
Dissertation Proposal Ppt
 
Master's Final Dissertation
Master's Final DissertationMaster's Final Dissertation
Master's Final Dissertation
 

Similar to Dissertation Data Fusion Summary Poster

Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software Testing
Lionel Briand
 
Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...
Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...
Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...
ijseajournal
 
Study of relevancy, diversity, and novelty in recommender systems
Study of relevancy, diversity, and novelty in recommender systemsStudy of relevancy, diversity, and novelty in recommender systems
Study of relevancy, diversity, and novelty in recommender systems
Chemseddine Berbague
 
Assessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data LinkagesAssessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data Linkages
journal ijrtem
 
Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...
Alexander Decker
 
Parallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingParallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive Indexing
IRJET Journal
 
Presentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptx
Presentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptxPresentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptx
Presentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptx
AkimPardede2
 
Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...
butest
 
deep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptdeep_Visualization in Data mining.ppt
deep_Visualization in Data mining.ppt
PerumalPitchandi
 
PPT s10-machine vision-s2
PPT s10-machine vision-s2PPT s10-machine vision-s2
PPT s10-machine vision-s2
Binus Online Learning
 
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
csandit
 
Competition16
Competition16Competition16
Competition16
Saurabh Vashist
 
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
Nexgen Technology
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
Houw Liong The
 
data mining.pptx
data mining.pptxdata mining.pptx
data mining.pptx
Kaviya452563
 
ClusetrigBasic.ppt
ClusetrigBasic.pptClusetrigBasic.ppt
ClusetrigBasic.ppt
ChaitanyaKulkarni451137
 
Lx3520322036
Lx3520322036Lx3520322036
Lx3520322036
IJERA Editor
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
SaiPragnaKancheti
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
SaiPragnaKancheti
 
Scalable decision tree based on fuzzy partitioning and an incremental approach
Scalable decision tree based on fuzzy partitioning and an  incremental approachScalable decision tree based on fuzzy partitioning and an  incremental approach
Scalable decision tree based on fuzzy partitioning and an incremental approach
IJECEIAES
 

Similar to Dissertation Data Fusion Summary Poster (20)

Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software Testing
 
Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...
Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...
Vinayaka : A Semi-Supervised Projected Clustering Method Using Differential E...
 
Study of relevancy, diversity, and novelty in recommender systems
Study of relevancy, diversity, and novelty in recommender systemsStudy of relevancy, diversity, and novelty in recommender systems
Study of relevancy, diversity, and novelty in recommender systems
 
Assessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data LinkagesAssessment of Cluster Tree Analysis based on Data Linkages
Assessment of Cluster Tree Analysis based on Data Linkages
 
Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...
 
Parallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingParallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive Indexing
 
Presentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptx
Presentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptxPresentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptx
Presentasi Dedy Hartama Icosnicom 10 11 2023 finish.pptx
 
Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...Ensemble Learning Featuring the Netflix Prize Competition and ...
Ensemble Learning Featuring the Netflix Prize Competition and ...
 
deep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptdeep_Visualization in Data mining.ppt
deep_Visualization in Data mining.ppt
 
PPT s10-machine vision-s2
PPT s10-machine vision-s2PPT s10-machine vision-s2
PPT s10-machine vision-s2
 
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
ATTRIBUTE REDUCTION-BASED ENSEMBLE RULE CLASSIFIERS METHOD FOR DATASET CLASSI...
 
Competition16
Competition16Competition16
Competition16
 
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
 
data mining.pptx
data mining.pptxdata mining.pptx
data mining.pptx
 
ClusetrigBasic.ppt
ClusetrigBasic.pptClusetrigBasic.ppt
ClusetrigBasic.ppt
 
Lx3520322036
Lx3520322036Lx3520322036
Lx3520322036
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
 
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
 
Scalable decision tree based on fuzzy partitioning and an incremental approach
Scalable decision tree based on fuzzy partitioning and an  incremental approachScalable decision tree based on fuzzy partitioning and an  incremental approach
Scalable decision tree based on fuzzy partitioning and an incremental approach
 

Dissertation Data Fusion Summary Poster

  • 1. MSc Dissertation 2013-14 MSc Course: KDD Using diversity for dynamic optimisation of data fusion ensembles Student: Chris Ballard Supervisor: Dr Wang School of Computing Sciences, August 2014 Aim and Objectives This study evaluated ensemble methods for the classification of heterogeneous data sources. In this context, its aims were to •Compare accuracy of feature-level fusion and existing ensemble approaches •Investigate how diversity can be used to optimise ensemble performance •Explore the link between diversity and accuracy Conclusions • Feature-level fusion outperformed decision-level fusion • Learn++ performed poorly with multiple datasets • DES significantly improved performance of Bagging when using decision-level fusion • DES reduced performance when using feature- level fusion • Weak positive correlation between ensemble accuracy and DES cluster diversity with CFD for some datasets. No relationship with MFD. Data The study used 9 benchmark datasets from the UCI Machine Learning Repository, partitioned into multiple feature sets to simulate different data sources. System Design • Base classifiers: Bagging – CART Decision Trees; Learn++ - Decision Stumps • Ensemble generation, DES and automated test rig implemented using Python 2.7.3 and scikit-learn • Simulated annealing used to optimise sub- ensemble diversity • Diversity measured using CFD, MFD and EEC diversity measures Experimental Results • Performance of Learn++.DES and Bagging.DES compared to benchmark methods using Friedman and Nemenyi tests (Figure 2) • Ensemble accuracy and CFD diversity by dataset (Figure 3); significant Spearman Correlation Coefficient between accuracy and CFD (p < 0.05) highlighted (Figure 4) Figure 2 - Critical difference diagram - Learn++.DES and Bagging.DES using decision-level fusion (CD=1.56, p=0.05) Methods • Two algorithms called Bagging.DES and Learn++.DES generate ensembles for each feature set: • Bagging and Learn++ used to generate pool of classifiers • Dynamic Ensemble Selection (DES) generates sub-ensembles for local regions of feature space with optimal diversity (Figure 1). • DES compared to AdaBoost, Bagging and Learn++ using feature- and decision- level fusion k-means clustering 1 2 3 4 5 6 N Classifiers Validation set Test instance 1 3 6 2 4 5 1 2 3 Cluster sub-ensemble Eck Decision combination Dynamic ensemble selection Ensemble decision Generated using Bagging or Learn++ ck Figure 1 – Dynamic Ensemble Selection in local clusters Figure 3 – Plots of ensemble accuracy and DES cluster CFD Correlation coefficient Arrhythmia 0.694 Bio Deg 0.401 Heart Disease 0.253 Hill Valley 0.168 Ionosphere 0.199 LSVT 0.46 Spectf 0.006 WDBC 0.574 Figure 4 – Spearman rank-order correlation coefficient