SlideShare a Scribd company logo
1 of 15
Download to read offline
Finding number of groups
using a penalized internal
cluster quality index
Marica Manisera and Marika Vezzoli
University of Brescia, Italy
Modena, September 19, 2013
IntroductionIntroductionIntroductionIntroduction
Cluster analysis is an important tool to find groups in data
without the help of a response variable
(unsupervised learning)
The identification of the optimal number of groupsoptimal number of groupsoptimal number of groupsoptimal number of groups is a
major challenge
Many authors handled this issue by
exploring several criteria
AimAimAimAim
To propose a new method that automaticallyautomaticallyautomaticallyautomatically identifies
the optimal number of groups in a hierarchical cluster
algorithm
Starting from the idea of pruning, we propose to use a
penalized internal cluster quality index in order to
identify the best cut in the dendrogram, able to provide a
partition easily interpretable
MethodMethodMethodMethod
Starting from the n x p data matrix X with n subjects and p
quantitative variables, cluster analysis aims at
partitioning subjects into k clusters
Many criteria identify the optimal number k of groups on
the basis of the tradetradetradetrade----offoffoffoff between a high inter-cluster
dissimilarity and a low intra-cluster dissimilarity, where
dissimilarity is usually defined starting from a chosen
(distance) function
MethodMethodMethodMethod
We focus on the CalinskiCalinskiCalinskiCalinski andandandand HarabaszHarabaszHarabaszHarabasz (CH) indexindexindexindex,
suitable for quantitative data, which measures the
internal cluster quality for a given k as
WGSS (Within-Group Sum of Squares) summarizes the intra-cluster
dissimilarity and is given by trace(W), where W is a k x k matrix
whose generic element is the distance of the subjects belonging to
group h from the centroid ct of group t
BGSS (Between-Group Sum of Squares) summarizes the inter-
cluster dissimilarity and is given by (trace(nΣΣΣΣ) - WGSS) where ΣΣΣΣ is
the variance-covariance matrix of X
MethodMethodMethodMethod
The best k is given by
WheneverWheneverWheneverWhenever CH increasesincreasesincreasesincreases asasasas k increasesincreasesincreasesincreases, the optimal
partition is expected for k=n-1
However, this result is useless and does not
comply with the aim of a cluster analysis
MethodMethodMethodMethod
In order to identify an interpretable partition, k should be
reasonably small and this is commonly achieved by
subjective choices. In hierarchical clustering this
corresponds to a subjective cutting of the dendrogram
MethodMethodMethodMethod
In order to avoid such arbitrariness, we propose to
identify k* as:
Q(k|λ)=CH(k) – λ k is obtained by introducing the penalty
λ ∈ ℜ+ on the number k of groups, in order to keep k*
reasonablyreasonablyreasonablyreasonably smallsmallsmallsmall and find it automaticallyautomaticallyautomaticallyautomatically
MethodMethodMethodMethod
If {0} is included in the domain of λ, for λ=0 we have
Q(k|l)=CH(k) and no penalization is imposed.
The larger the values of λ, the stronger the penalty (and
viceversa).
The effect of a fixed λ on k depends on the magnitude of
the chosen cluster quality index.
ExampleExampleExampleExample
DataDataDataData
We applied the proposed procedure on an artificially
generated data described in Walesiak & Dudek (2012)
and referred to 5 interval-type variables on 75 subjects
clustered into 5 groups
AnalysisAnalysisAnalysisAnalysis
We performed a hierarchical cluster analysis
(hclust function in Rwith complete linkage)
ExampleExampleExampleExample
ResultsResultsResultsResults 1111////2222
ExampleExampleExampleExample
ResultsResultsResultsResults 2222////2222
ConclusionsConclusionsConclusionsConclusions
Results show that the proposed procedure is able to reach
the objective of automaticallyautomaticallyautomaticallyautomatically identifying the best
number of clusters in a data set by taking account of the
interpretabilityinterpretabilityinterpretabilityinterpretability of the resulting groups
Current research is being devoted to refine the
optimization algorithm, especially with reference to the
choice of λ
ConclusionsConclusionsConclusionsConclusions
Simulation studies and the analysis of real data sets,
involving several internal cluster quality indices suitable
for different data types, could confirm the validity of our
proposal
A project founded
by the European Commission
Thank you for your attention!
manisera@eco.unibs.it
marika.vezzoli@med.unibs.it
info@syrtoproject.eu

More Related Content

What's hot

Performance Analysis of Different Clustering Algorithm
Performance Analysis of Different Clustering AlgorithmPerformance Analysis of Different Clustering Algorithm
Performance Analysis of Different Clustering AlgorithmIOSR Journals
 
A SECURE DIGITAL SIGNATURE SCHEME WITH FAULT TOLERANCE BASED ON THE IMPROVED ...
A SECURE DIGITAL SIGNATURE SCHEME WITH FAULT TOLERANCE BASED ON THE IMPROVED ...A SECURE DIGITAL SIGNATURE SCHEME WITH FAULT TOLERANCE BASED ON THE IMPROVED ...
A SECURE DIGITAL SIGNATURE SCHEME WITH FAULT TOLERANCE BASED ON THE IMPROVED ...csandit
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesFarzad Nozarian
 
Icitam2019 2020 book_chapter
Icitam2019 2020 book_chapterIcitam2019 2020 book_chapter
Icitam2019 2020 book_chapterBan Bang
 
IMPROVING SCHEDULING OF DATA TRANSMISSION IN TDMA SYSTEMS
IMPROVING SCHEDULING OF DATA TRANSMISSION IN TDMA SYSTEMSIMPROVING SCHEDULING OF DATA TRANSMISSION IN TDMA SYSTEMS
IMPROVING SCHEDULING OF DATA TRANSMISSION IN TDMA SYSTEMScsandit
 
Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016ijcsbi
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningHouw Liong The
 
Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...Alexander Decker
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clusteringKrish_ver2
 
An Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringAn Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringIDES Editor
 
lecture_mooney.ppt
lecture_mooney.pptlecture_mooney.ppt
lecture_mooney.pptbutest
 
3.6 constraint based cluster analysis
3.6 constraint based cluster analysis3.6 constraint based cluster analysis
3.6 constraint based cluster analysisKrish_ver2
 

What's hot (18)

Performance Analysis of Different Clustering Algorithm
Performance Analysis of Different Clustering AlgorithmPerformance Analysis of Different Clustering Algorithm
Performance Analysis of Different Clustering Algorithm
 
A SECURE DIGITAL SIGNATURE SCHEME WITH FAULT TOLERANCE BASED ON THE IMPROVED ...
A SECURE DIGITAL SIGNATURE SCHEME WITH FAULT TOLERANCE BASED ON THE IMPROVED ...A SECURE DIGITAL SIGNATURE SCHEME WITH FAULT TOLERANCE BASED ON THE IMPROVED ...
A SECURE DIGITAL SIGNATURE SCHEME WITH FAULT TOLERANCE BASED ON THE IMPROVED ...
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And Strategies
 
Icitam2019 2020 book_chapter
Icitam2019 2020 book_chapterIcitam2019 2020 book_chapter
Icitam2019 2020 book_chapter
 
50120130406039
5012013040603950120130406039
50120130406039
 
IMPROVING SCHEDULING OF DATA TRANSMISSION IN TDMA SYSTEMS
IMPROVING SCHEDULING OF DATA TRANSMISSION IN TDMA SYSTEMSIMPROVING SCHEDULING OF DATA TRANSMISSION IN TDMA SYSTEMS
IMPROVING SCHEDULING OF DATA TRANSMISSION IN TDMA SYSTEMS
 
Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016Vol 16 No 2 - July-December 2016
Vol 16 No 2 - July-December 2016
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text mining
 
Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...Analysis of mass based and density based clustering techniques on numerical d...
Analysis of mass based and density based clustering techniques on numerical d...
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
Chapter8
Chapter8Chapter8
Chapter8
 
An Iterative Improved k-means Clustering
An Iterative Improved k-means ClusteringAn Iterative Improved k-means Clustering
An Iterative Improved k-means Clustering
 
P229 godfrey
P229 godfreyP229 godfrey
P229 godfrey
 
lecture_mooney.ppt
lecture_mooney.pptlecture_mooney.ppt
lecture_mooney.ppt
 
Clustering
ClusteringClustering
Clustering
 
Clique
Clique Clique
Clique
 
3.6 constraint based cluster analysis
3.6 constraint based cluster analysis3.6 constraint based cluster analysis
3.6 constraint based cluster analysis
 

Similar to Finding number of groups using a penalized internal cluster quality index - Marica Manisera, , Marika Vezzoli. September, 19 2013

15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learningAnil Yadav
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptSubrata Kumer Paul
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Salah Amean
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.pptLPrashanthi
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basicHouw Liong The
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberHouw Liong The
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithmLaura Petrosanu
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3Nandhini S
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.pptvikassingh569137
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasicengrasi
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapterNaveenKumar5162
 

Similar to Finding number of groups using a penalized internal cluster quality index - Marica Manisera, , Marika Vezzoli. September, 19 2013 (20)

Dataa miining
Dataa miiningDataa miining
Dataa miining
 
Data clustering
Data clustering Data clustering
Data clustering
 
47 292-298
47 292-29847 292-298
47 292-298
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.ppt
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basic
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data Analysis
 
CLUSTERING
CLUSTERINGCLUSTERING
CLUSTERING
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
 
data mining cocepts and techniques chapter
data mining cocepts and techniques chapterdata mining cocepts and techniques chapter
data mining cocepts and techniques chapter
 

More from SYRTO Project

Systemic risk caused by synchronization
Systemic risk caused by synchronizationSystemic risk caused by synchronization
Systemic risk caused by synchronizationSYRTO Project
 
Policy and Research Agenda on Prudential Supervision
Policy and Research Agenda on Prudential SupervisionPolicy and Research Agenda on Prudential Supervision
Policy and Research Agenda on Prudential SupervisionSYRTO Project
 
Regulation: More accurate measurements for control enhancements
Regulation: More accurate measurements for control enhancementsRegulation: More accurate measurements for control enhancements
Regulation: More accurate measurements for control enhancementsSYRTO Project
 
Predicting the economic public opinions in Europe
Predicting the economic public opinions in EuropePredicting the economic public opinions in Europe
Predicting the economic public opinions in EuropeSYRTO Project
 
Identifying excessive credit growth and leverage
Identifying excessive credit growth and leverageIdentifying excessive credit growth and leverage
Identifying excessive credit growth and leverageSYRTO Project
 
Interbank loans, collateral, and monetary policy
Interbank loans, collateral, and monetary policyInterbank loans, collateral, and monetary policy
Interbank loans, collateral, and monetary policySYRTO Project
 
Scalable inference for a full multivariate stochastic volatility
Scalable inference for a full multivariate stochastic volatilityScalable inference for a full multivariate stochastic volatility
Scalable inference for a full multivariate stochastic volatilitySYRTO Project
 
Systemic risk indicators
Systemic risk indicatorsSystemic risk indicators
Systemic risk indicatorsSYRTO Project
 
Network and risk spillovers: a multivariate GARCH perspective
Network and risk spillovers: a multivariate GARCH perspectiveNetwork and risk spillovers: a multivariate GARCH perspective
Network and risk spillovers: a multivariate GARCH perspectiveSYRTO Project
 
European sovereign systemic risk zones
European sovereign systemic risk zonesEuropean sovereign systemic risk zones
European sovereign systemic risk zonesSYRTO Project
 
Clustering in dynamic causal networks as a measure of systemic risk on the eu...
Clustering in dynamic causal networks as a measure of systemic risk on the eu...Clustering in dynamic causal networks as a measure of systemic risk on the eu...
Clustering in dynamic causal networks as a measure of systemic risk on the eu...SYRTO Project
 
Entropy and systemic risk measures
Entropy and systemic risk measuresEntropy and systemic risk measures
Entropy and systemic risk measuresSYRTO Project
 
Results of the SYRTO Project
Results of the SYRTO ProjectResults of the SYRTO Project
Results of the SYRTO ProjectSYRTO Project
 
Comment on: Risk Dynamics in the Eurozone: A New Factor Model for Sovereign C...
Comment on:Risk Dynamics in the Eurozone: A New Factor Model forSovereign C...Comment on:Risk Dynamics in the Eurozone: A New Factor Model forSovereign C...
Comment on: Risk Dynamics in the Eurozone: A New Factor Model for Sovereign C...SYRTO Project
 
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...SYRTO Project
 
Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...
Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...
Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...SYRTO Project
 
A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos
A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos
A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos SYRTO Project
 
Spillover dynamics for sistemic risk measurement using spatial financial time...
Spillover dynamics for sistemic risk measurement using spatial financial time...Spillover dynamics for sistemic risk measurement using spatial financial time...
Spillover dynamics for sistemic risk measurement using spatial financial time...SYRTO Project
 
Sovereign credit risk, liquidity, and the ecb intervention: deus ex machina? ...
Sovereign credit risk, liquidity, and the ecb intervention: deus ex machina? ...Sovereign credit risk, liquidity, and the ecb intervention: deus ex machina? ...
Sovereign credit risk, liquidity, and the ecb intervention: deus ex machina? ...SYRTO Project
 
Bayesian Graphical Models for Structural Vector Autoregressive Processes - D....
Bayesian Graphical Models for Structural Vector Autoregressive Processes - D....Bayesian Graphical Models for Structural Vector Autoregressive Processes - D....
Bayesian Graphical Models for Structural Vector Autoregressive Processes - D....SYRTO Project
 

More from SYRTO Project (20)

Systemic risk caused by synchronization
Systemic risk caused by synchronizationSystemic risk caused by synchronization
Systemic risk caused by synchronization
 
Policy and Research Agenda on Prudential Supervision
Policy and Research Agenda on Prudential SupervisionPolicy and Research Agenda on Prudential Supervision
Policy and Research Agenda on Prudential Supervision
 
Regulation: More accurate measurements for control enhancements
Regulation: More accurate measurements for control enhancementsRegulation: More accurate measurements for control enhancements
Regulation: More accurate measurements for control enhancements
 
Predicting the economic public opinions in Europe
Predicting the economic public opinions in EuropePredicting the economic public opinions in Europe
Predicting the economic public opinions in Europe
 
Identifying excessive credit growth and leverage
Identifying excessive credit growth and leverageIdentifying excessive credit growth and leverage
Identifying excessive credit growth and leverage
 
Interbank loans, collateral, and monetary policy
Interbank loans, collateral, and monetary policyInterbank loans, collateral, and monetary policy
Interbank loans, collateral, and monetary policy
 
Scalable inference for a full multivariate stochastic volatility
Scalable inference for a full multivariate stochastic volatilityScalable inference for a full multivariate stochastic volatility
Scalable inference for a full multivariate stochastic volatility
 
Systemic risk indicators
Systemic risk indicatorsSystemic risk indicators
Systemic risk indicators
 
Network and risk spillovers: a multivariate GARCH perspective
Network and risk spillovers: a multivariate GARCH perspectiveNetwork and risk spillovers: a multivariate GARCH perspective
Network and risk spillovers: a multivariate GARCH perspective
 
European sovereign systemic risk zones
European sovereign systemic risk zonesEuropean sovereign systemic risk zones
European sovereign systemic risk zones
 
Clustering in dynamic causal networks as a measure of systemic risk on the eu...
Clustering in dynamic causal networks as a measure of systemic risk on the eu...Clustering in dynamic causal networks as a measure of systemic risk on the eu...
Clustering in dynamic causal networks as a measure of systemic risk on the eu...
 
Entropy and systemic risk measures
Entropy and systemic risk measuresEntropy and systemic risk measures
Entropy and systemic risk measures
 
Results of the SYRTO Project
Results of the SYRTO ProjectResults of the SYRTO Project
Results of the SYRTO Project
 
Comment on: Risk Dynamics in the Eurozone: A New Factor Model for Sovereign C...
Comment on:Risk Dynamics in the Eurozone: A New Factor Model forSovereign C...Comment on:Risk Dynamics in the Eurozone: A New Factor Model forSovereign C...
Comment on: Risk Dynamics in the Eurozone: A New Factor Model for Sovereign C...
 
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
Spillover Dynamics for Systemic Risk Measurement Using Spatial Financial Time...
 
Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...
Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...
Discussion of “Network Connectivity and Systematic Risk” and “The Impact of N...
 
A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos
A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos
A Dynamic Factor Model: Inference and Empirical Application. Ioannis Vrontos
 
Spillover dynamics for sistemic risk measurement using spatial financial time...
Spillover dynamics for sistemic risk measurement using spatial financial time...Spillover dynamics for sistemic risk measurement using spatial financial time...
Spillover dynamics for sistemic risk measurement using spatial financial time...
 
Sovereign credit risk, liquidity, and the ecb intervention: deus ex machina? ...
Sovereign credit risk, liquidity, and the ecb intervention: deus ex machina? ...Sovereign credit risk, liquidity, and the ecb intervention: deus ex machina? ...
Sovereign credit risk, liquidity, and the ecb intervention: deus ex machina? ...
 
Bayesian Graphical Models for Structural Vector Autoregressive Processes - D....
Bayesian Graphical Models for Structural Vector Autoregressive Processes - D....Bayesian Graphical Models for Structural Vector Autoregressive Processes - D....
Bayesian Graphical Models for Structural Vector Autoregressive Processes - D....
 

Recently uploaded

Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 

Recently uploaded (20)

Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 

Finding number of groups using a penalized internal cluster quality index - Marica Manisera, , Marika Vezzoli. September, 19 2013

  • 1. Finding number of groups using a penalized internal cluster quality index Marica Manisera and Marika Vezzoli University of Brescia, Italy Modena, September 19, 2013
  • 2. IntroductionIntroductionIntroductionIntroduction Cluster analysis is an important tool to find groups in data without the help of a response variable (unsupervised learning) The identification of the optimal number of groupsoptimal number of groupsoptimal number of groupsoptimal number of groups is a major challenge Many authors handled this issue by exploring several criteria
  • 3. AimAimAimAim To propose a new method that automaticallyautomaticallyautomaticallyautomatically identifies the optimal number of groups in a hierarchical cluster algorithm Starting from the idea of pruning, we propose to use a penalized internal cluster quality index in order to identify the best cut in the dendrogram, able to provide a partition easily interpretable
  • 4. MethodMethodMethodMethod Starting from the n x p data matrix X with n subjects and p quantitative variables, cluster analysis aims at partitioning subjects into k clusters Many criteria identify the optimal number k of groups on the basis of the tradetradetradetrade----offoffoffoff between a high inter-cluster dissimilarity and a low intra-cluster dissimilarity, where dissimilarity is usually defined starting from a chosen (distance) function
  • 5. MethodMethodMethodMethod We focus on the CalinskiCalinskiCalinskiCalinski andandandand HarabaszHarabaszHarabaszHarabasz (CH) indexindexindexindex, suitable for quantitative data, which measures the internal cluster quality for a given k as WGSS (Within-Group Sum of Squares) summarizes the intra-cluster dissimilarity and is given by trace(W), where W is a k x k matrix whose generic element is the distance of the subjects belonging to group h from the centroid ct of group t BGSS (Between-Group Sum of Squares) summarizes the inter- cluster dissimilarity and is given by (trace(nΣΣΣΣ) - WGSS) where ΣΣΣΣ is the variance-covariance matrix of X
  • 6. MethodMethodMethodMethod The best k is given by WheneverWheneverWheneverWhenever CH increasesincreasesincreasesincreases asasasas k increasesincreasesincreasesincreases, the optimal partition is expected for k=n-1 However, this result is useless and does not comply with the aim of a cluster analysis
  • 7. MethodMethodMethodMethod In order to identify an interpretable partition, k should be reasonably small and this is commonly achieved by subjective choices. In hierarchical clustering this corresponds to a subjective cutting of the dendrogram
  • 8. MethodMethodMethodMethod In order to avoid such arbitrariness, we propose to identify k* as: Q(k|λ)=CH(k) – λ k is obtained by introducing the penalty λ ∈ ℜ+ on the number k of groups, in order to keep k* reasonablyreasonablyreasonablyreasonably smallsmallsmallsmall and find it automaticallyautomaticallyautomaticallyautomatically
  • 9. MethodMethodMethodMethod If {0} is included in the domain of λ, for λ=0 we have Q(k|l)=CH(k) and no penalization is imposed. The larger the values of λ, the stronger the penalty (and viceversa). The effect of a fixed λ on k depends on the magnitude of the chosen cluster quality index.
  • 10. ExampleExampleExampleExample DataDataDataData We applied the proposed procedure on an artificially generated data described in Walesiak & Dudek (2012) and referred to 5 interval-type variables on 75 subjects clustered into 5 groups AnalysisAnalysisAnalysisAnalysis We performed a hierarchical cluster analysis (hclust function in Rwith complete linkage)
  • 13. ConclusionsConclusionsConclusionsConclusions Results show that the proposed procedure is able to reach the objective of automaticallyautomaticallyautomaticallyautomatically identifying the best number of clusters in a data set by taking account of the interpretabilityinterpretabilityinterpretabilityinterpretability of the resulting groups Current research is being devoted to refine the optimization algorithm, especially with reference to the choice of λ
  • 14. ConclusionsConclusionsConclusionsConclusions Simulation studies and the analysis of real data sets, involving several internal cluster quality indices suitable for different data types, could confirm the validity of our proposal
  • 15. A project founded by the European Commission Thank you for your attention! manisera@eco.unibs.it marika.vezzoli@med.unibs.it info@syrtoproject.eu