SlideShare a Scribd company logo
1 of 61
Download to read offline
Modelling Population Heterogeneity: A simulation
study
Michael A Ghebre
Department of Infection, Immunity and Inflammation; University of Leicester
Institute for Lung Health lab Meeting
Friday 11th
October, 2013
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Introduction
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Introduction
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Introduction
We consider the problem of choosing variables for clustering
high-dimensional data
Discovering and understanding relations in high-dimensional data is
problematic, especially as the underling structure is unknown.
We compared several variables selection techniques for clustering
high-dimensional data with underlying structure (similar variable
profiles).
However, K-means algorithm was used as a clustering techniques
throughout.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Introduction
We consider the problem of choosing variables for clustering
high-dimensional data
Discovering and understanding relations in high-dimensional data is
problematic, especially as the underling structure is unknown.
We compared several variables selection techniques for clustering
high-dimensional data with underlying structure (similar variable
profiles).
However, K-means algorithm was used as a clustering techniques
throughout.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Introduction
We consider the problem of choosing variables for clustering
high-dimensional data
Discovering and understanding relations in high-dimensional data is
problematic, especially as the underling structure is unknown.
We compared several variables selection techniques for clustering
high-dimensional data with underlying structure (similar variable
profiles).
However, K-means algorithm was used as a clustering techniques
throughout.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Introduction
We consider the problem of choosing variables for clustering
high-dimensional data
Discovering and understanding relations in high-dimensional data is
problematic, especially as the underling structure is unknown.
We compared several variables selection techniques for clustering
high-dimensional data with underlying structure (similar variable
profiles).
However, K-means algorithm was used as a clustering techniques
throughout.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Introduction
We consider the problem of choosing variables for clustering
high-dimensional data
Discovering and understanding relations in high-dimensional data is
problematic, especially as the underling structure is unknown.
We compared several variables selection techniques for clustering
high-dimensional data with underlying structure (similar variable
profiles).
However, K-means algorithm was used as a clustering techniques
throughout.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Proposed variables
Proposed variables
All measured variables (standard approach)
Factor Scores
Highest-loading variables
Random sub-variables
Combination of Highest-loading variables and Random sub-variables
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Definition
Definition
K-means clustering algorithm
Factor analysis
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
K-means cluster Algorithm
The distance calculated using the Squared Euclidean Distance:
Dij =
p
k=1
(Yki − Ykj )2
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
K-means cluster Algorithm
The distance calculated using the Squared Euclidean Distance:
Dij =
p
k=1
(Yki − Ykj )2
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Factor Analysis
Factor Analysis
Factor analysis was performed to reduce the high-dimensional variables to
a low number of independent factors and subsequently investigate the
underlying structure (relationship) among the variables.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Factor Analysis
Mathematical Formula of Factor Analysis
Y1 = β11F1 + β12F2 + · · · + β1k Fk + 1
Y2 = β21F1 + β22F2 + · · · + β2k Fk + 2
...
Yp = βp1F1 + βp2F2 + · · · + βpk Fk + p
Where:
*Yp are observed variables; Fk are latent variables; and P>K
* Fk are independent, such that E(Fk ) = 0 and Var(Fk ) = 1
* p(error terms) are independent,such that E( p) = 0 and Var( p) = σ2
p
* Cor(Fk , p)=0
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Factor Analysis
Mathematical Formula of Factor Analysis
Y1 = β11F1 + β12F2 + · · · + β1k Fk + 1
Y2 = β21F1 + β22F2 + · · · + β2k Fk + 2
...
Yp = βp1F1 + βp2F2 + · · · + βpk Fk + p
Where:
*Yp are observed variables; Fk are latent variables; and P>K
* Fk are independent, such that E(Fk ) = 0 and Var(Fk ) = 1
* p(error terms) are independent,such that E( p) = 0 and Var( p) = σ2
p
* Cor(Fk , p)=0
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Factor Analysis
Mathematical Formula of Factor Analysis
Y1 = β11F1 + β12F2 + · · · + β1k Fk + 1
Y2 = β21F1 + β22F2 + · · · + β2k Fk + 2
...
Yp = βp1F1 + βp2F2 + · · · + βpk Fk + p
Where:
*Yp are observed variables; Fk are latent variables; and P>K
* Fk are independent, such that E(Fk ) = 0 and Var(Fk ) = 1
* p(error terms) are independent,such that E( p) = 0 and Var( p) = σ2
p
* Cor(Fk , p)=0
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Path diagram for 2 factors and 5 observed variables
Figure: Factors and observed variables
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Factor Scores
After Factor Analysis, Factor Scores were estimated as follows:
FactorScores = Z(S−1
β)
Where: Z is standardised variables; S−1
is inverse correlation matrix of
the variables; and β is factor loading matrix
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Factor Scores
After Factor Analysis, Factor Scores were estimated as follows:
FactorScores = Z(S−1
β)
Where: Z is standardised variables; S−1
is inverse correlation matrix of
the variables; and β is factor loading matrix
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Performance comparison
Data was simulated to compare the performance of the techniques
The technique which performed well was applied to asthma and
COPD real dataset
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Simulated dataset
1000 observations from four known clusters of high-dimensional data
(20 variables) with underlying structure were simulated
Then factor analysis was performed on high-dimensional data, and
then reduced to low-dimensional factors
Screeplot was used to retain possible number of factors
Variables that have the highest-loadings on each factor were
identified, and factor scores were extracted
Performance of the clustering techniques for determining clusters
were compared with true cluster membership.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Simulated dataset
1000 observations from four known clusters of high-dimensional data
(20 variables) with underlying structure were simulated
Then factor analysis was performed on high-dimensional data, and
then reduced to low-dimensional factors
Screeplot was used to retain possible number of factors
Variables that have the highest-loadings on each factor were
identified, and factor scores were extracted
Performance of the clustering techniques for determining clusters
were compared with true cluster membership.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Simulated dataset
1000 observations from four known clusters of high-dimensional data
(20 variables) with underlying structure were simulated
Then factor analysis was performed on high-dimensional data, and
then reduced to low-dimensional factors
Screeplot was used to retain possible number of factors
Variables that have the highest-loadings on each factor were
identified, and factor scores were extracted
Performance of the clustering techniques for determining clusters
were compared with true cluster membership.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Simulated dataset
1000 observations from four known clusters of high-dimensional data
(20 variables) with underlying structure were simulated
Then factor analysis was performed on high-dimensional data, and
then reduced to low-dimensional factors
Screeplot was used to retain possible number of factors
Variables that have the highest-loadings on each factor were
identified, and factor scores were extracted
Performance of the clustering techniques for determining clusters
were compared with true cluster membership.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Simulated dataset
1000 observations from four known clusters of high-dimensional data
(20 variables) with underlying structure were simulated
Then factor analysis was performed on high-dimensional data, and
then reduced to low-dimensional factors
Screeplot was used to retain possible number of factors
Variables that have the highest-loadings on each factor were
identified, and factor scores were extracted
Performance of the clustering techniques for determining clusters
were compared with true cluster membership.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Simulated dataset
1000 observations from four known clusters of high-dimensional data
(20 variables) with underlying structure were simulated
Then factor analysis was performed on high-dimensional data, and
then reduced to low-dimensional factors
Screeplot was used to retain possible number of factors
Variables that have the highest-loadings on each factor were
identified, and factor scores were extracted
Performance of the clustering techniques for determining clusters
were compared with true cluster membership.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Description of the simulated dataset
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Description of the simulated dataset
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
K-means clustering on All Variables
All Variables
Clustering on All Variables (standard technique) predicted only 70%
correctly!
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
K-means clustering on All Variables
All Variables
Clustering on All Variables (standard technique) predicted only 70%
correctly!
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Scree plot to retain number of factors
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Scree plot to retain number of factors
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Four Factor scores across the clusters
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
K-means on Factor scores
K-means Clustering on Factor Scores
Factors ExplainedVariance(%) Mis*(%) Mis**(%)
2 70.1 47.0 27.1
3 86.2 25.0 11.3
4 93.8 0.0 0.0
5 96.9 0.0 0.0
6 98.3 0.0 0.0
7 99.3 0.0 0.0
Mis=Misclassification: *Unrotated; **Varimax rotation
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Factor loading
Variable Factor 1 Factor 2 Factor 3 Factor 4 ExplainedVariance
X1 0.90 0.13 0.22 0.08 0.88
X2 0.80 0.27 0.28 0.25 0.85
X3 0.97 0.05 0.08 0.04 0.95
X4 0.95 0.02 -0.01 0.02 0.91
X5 0.93 0.14 0.14 0.13 0.91
X6 0.22 0.18 0.92 0.12 0.95
X7 0.11 0.05 0.93 -0.04 0.88
X8 0.34 0.27 0.82 0.20 0.90
X9 -0.01 -0.06 0.68 0.63 0.87
X10 -0.04 -0.16 0.84 0.42 0.92
X11 0.27 0.31 0.31 0.81 0.91
X12 0.07 0.02 0.10 0.96 0.95
X13 0.42 0.43 0.44 0. 44 0.73
X14 0.50 0.44 0.43 0. 42 0.79
X15 0.41 0.40 0.52 0. 41 0.76
X16 0.32 0.81 0.24 0. 27 0.88
X17 0.16 0.92 0.11 0.12 0.89
X18 -0.03 0.98 0.06 -0.07 0.96
X19 0.10 0.93 0.11 0.15 0.91
X20 0.08 0.96 0.02 -0.01 0.92
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Highest-loading variables
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Random sub-variables
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Simulation results
Clustering on:
All measured variables were predicted 70% correctly
Factor scores or Highest-loading variables predicted 100% correctly
Other random variables, predicted only 44% correctly!
Highest-loading and Random sub-variables predicted 71% correctly
with 29% misclassification!
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Simulation results
Clustering on:
All measured variables were predicted 70% correctly
Factor scores or Highest-loading variables predicted 100% correctly
Other random variables, predicted only 44% correctly!
Highest-loading and Random sub-variables predicted 71% correctly
with 29% misclassification!
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering Asthma and COPD cytokines
Clustering techniques
”K-means clustering on factor scores” was applied to identify asthma and
COPD biological subgroups!
Three biologically and clinically relevant subgroups was identified
And found that about 33% of asthma and 31% of COPD were
overlap at biological level!
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering Asthma and COPD cytokines
Clustering techniques
”K-means clustering on factor scores” was applied to identify asthma and
COPD biological subgroups!
Three biologically and clinically relevant subgroups was identified
And found that about 33% of asthma and 31% of COPD were
overlap at biological level!
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering Asthma and COPD cytokines
Clustering techniques
”K-means clustering on factor scores” was applied to identify asthma and
COPD biological subgroups!
Three biologically and clinically relevant subgroups was identified
And found that about 33% of asthma and 31% of COPD were
overlap at biological level!
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Asthma and COPD Biological Clusters
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Discussion and Conclusion
K-means clustering on Factor scores and Highest-loading variables
outperformed the other approaches!
The true subgroups will be missed if the observations were clustered
using the full set of variables.
If we suspect the underlying clusters differ with respect to a some of
the variables:
It is plausible to use the highest-loading variables technique (after
varimax rotation) for accurate identification of clusters, and variables
having substantial contribution in differentiating the clusters!
And then yields interpretable results, as can determine which
variables are responsible for the differences between the clusters, and
fewer variables will be required to assign a new observation to
existing clusters.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Discussion and Conclusion
K-means clustering on Factor scores and Highest-loading variables
outperformed the other approaches!
The true subgroups will be missed if the observations were clustered
using the full set of variables.
If we suspect the underlying clusters differ with respect to a some of
the variables:
It is plausible to use the highest-loading variables technique (after
varimax rotation) for accurate identification of clusters, and variables
having substantial contribution in differentiating the clusters!
And then yields interpretable results, as can determine which
variables are responsible for the differences between the clusters, and
fewer variables will be required to assign a new observation to
existing clusters.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Discussion and Conclusion
K-means clustering on Factor scores and Highest-loading variables
outperformed the other approaches!
The true subgroups will be missed if the observations were clustered
using the full set of variables.
If we suspect the underlying clusters differ with respect to a some of
the variables:
It is plausible to use the highest-loading variables technique (after
varimax rotation) for accurate identification of clusters, and variables
having substantial contribution in differentiating the clusters!
And then yields interpretable results, as can determine which
variables are responsible for the differences between the clusters, and
fewer variables will be required to assign a new observation to
existing clusters.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Discussion and Conclusion
K-means clustering on Factor scores and Highest-loading variables
outperformed the other approaches!
The true subgroups will be missed if the observations were clustered
using the full set of variables.
If we suspect the underlying clusters differ with respect to a some of
the variables:
It is plausible to use the highest-loading variables technique (after
varimax rotation) for accurate identification of clusters, and variables
having substantial contribution in differentiating the clusters!
And then yields interpretable results, as can determine which
variables are responsible for the differences between the clusters, and
fewer variables will be required to assign a new observation to
existing clusters.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
On-going and Future work
Developing model based techniques for clustering and variable
selection simultaneously
Applying this technique to identify asthma and COPD subgroups
using CTscan(small and large airways), and gene expression
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
References
1 GH Lubke and B Muthen (2005); Investigating Population
Heterogeneity With Factor Mixture Models
2 Tabachnick BG and Fidell LS (2007); Using Multivariate Statistics.
FIFTH ed.
3 Everitt B (2011); An Introduction to Applied Multivariate Analysis
with R.
4 Gregory R and Samuelsen KM (2008); Advances in Latent Variable
Mixture Models.
5 McLachlan, G. J., and Peel, D. (2000). Finite mixture models. New
York: Wiley.
6 Witten DM and R Tibshirani (2010) A framework for feature
selection in clustering.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Acknowledgement
Prof C. Brightling, Dr C. Newby, Prof P. Burton, & Prof J. Thompson
Michael A Ghebre Modelling Population Heterogeneity: A simulation study

More Related Content

Viewers also liked

IIP3_improvement_using_Source_Degeneration_Technique_Ahsan_Ghoncheh
IIP3_improvement_using_Source_Degeneration_Technique_Ahsan_GhonchehIIP3_improvement_using_Source_Degeneration_Technique_Ahsan_Ghoncheh
IIP3_improvement_using_Source_Degeneration_Technique_Ahsan_Ghoncheh
Ahsan Ghoncheh
 
I054297 - Ashley Smith - Does late payment legislation offer sufficient resti...
I054297 - Ashley Smith - Does late payment legislation offer sufficient resti...I054297 - Ashley Smith - Does late payment legislation offer sufficient resti...
I054297 - Ashley Smith - Does late payment legislation offer sufficient resti...
Ashley Smith
 
Sociología de la Educación
Sociología de la EducaciónSociología de la Educación
Sociología de la Educación
Álvaro Miguel Carranza Montalvo
 
Estadística, Chat, Bolivia, Medicina, Estética, Salud, CEPI, Estudios, Posgra...
Estadística, Chat, Bolivia, Medicina, Estética, Salud, CEPI, Estudios, Posgra...Estadística, Chat, Bolivia, Medicina, Estética, Salud, CEPI, Estudios, Posgra...
Estadística, Chat, Bolivia, Medicina, Estética, Salud, CEPI, Estudios, Posgra...
Álvaro Miguel Carranza Montalvo
 

Viewers also liked (18)

Social Networking for Surgeons
Social Networking for SurgeonsSocial Networking for Surgeons
Social Networking for Surgeons
 
IIP3_improvement_using_Source_Degeneration_Technique_Ahsan_Ghoncheh
IIP3_improvement_using_Source_Degeneration_Technique_Ahsan_GhonchehIIP3_improvement_using_Source_Degeneration_Technique_Ahsan_Ghoncheh
IIP3_improvement_using_Source_Degeneration_Technique_Ahsan_Ghoncheh
 
I054297 - Ashley Smith - Does late payment legislation offer sufficient resti...
I054297 - Ashley Smith - Does late payment legislation offer sufficient resti...I054297 - Ashley Smith - Does late payment legislation offer sufficient resti...
I054297 - Ashley Smith - Does late payment legislation offer sufficient resti...
 
Resume
ResumeResume
Resume
 
Providing a Social Work Service to British Military Families Overseas: Contem...
Providing a Social Work Service to British Military Families Overseas: Contem...Providing a Social Work Service to British Military Families Overseas: Contem...
Providing a Social Work Service to British Military Families Overseas: Contem...
 
Experiential Learning around Court Skills in Child Protection Cases: A key Pa...
Experiential Learning around Court Skills in Child Protection Cases: A key Pa...Experiential Learning around Court Skills in Child Protection Cases: A key Pa...
Experiential Learning around Court Skills in Child Protection Cases: A key Pa...
 
Mifta erla r.6.i (trigger)
Mifta erla r.6.i (trigger)Mifta erla r.6.i (trigger)
Mifta erla r.6.i (trigger)
 
Living with Historical Childhood Sexual Abuse
Living with Historical Childhood Sexual AbuseLiving with Historical Childhood Sexual Abuse
Living with Historical Childhood Sexual Abuse
 
Improving Child Protection with Highly Resistant Families
Improving Child Protection with Highly Resistant FamiliesImproving Child Protection with Highly Resistant Families
Improving Child Protection with Highly Resistant Families
 
ActiveEvent інтернет-платформа ведення мікробізнесу з організації івентів
ActiveEvent інтернет-платформа ведення мікробізнесу з організації івентівActiveEvent інтернет-платформа ведення мікробізнесу з організації івентів
ActiveEvent інтернет-платформа ведення мікробізнесу з організації івентів
 
Oh yeah
Oh yeahOh yeah
Oh yeah
 
Photoshoot Plan
Photoshoot Plan Photoshoot Plan
Photoshoot Plan
 
Don't give up the adoptee - The Research of Adoption Dissolution in Taiwan
Don't give up the adoptee - The Research of Adoption Dissolution in TaiwanDon't give up the adoptee - The Research of Adoption Dissolution in Taiwan
Don't give up the adoptee - The Research of Adoption Dissolution in Taiwan
 
Sociología de la Educación
Sociología de la EducaciónSociología de la Educación
Sociología de la Educación
 
los salesianos en contratacion-santander
los salesianos en contratacion-santanderlos salesianos en contratacion-santander
los salesianos en contratacion-santander
 
Eje cafetero
Eje cafeteroEje cafetero
Eje cafetero
 
Estadística, Chat, Bolivia, Medicina, Estética, Salud, CEPI, Estudios, Posgra...
Estadística, Chat, Bolivia, Medicina, Estética, Salud, CEPI, Estudios, Posgra...Estadística, Chat, Bolivia, Medicina, Estética, Salud, CEPI, Estudios, Posgra...
Estadística, Chat, Bolivia, Medicina, Estética, Salud, CEPI, Estudios, Posgra...
 
CRS 181 Project 2
CRS 181 Project 2CRS 181 Project 2
CRS 181 Project 2
 

Similar to Modelling Population Heterogeneity- MichaelGhebre

Introduction to Quantitative Research Methods
Introduction to Quantitative Research MethodsIntroduction to Quantitative Research Methods
Introduction to Quantitative Research Methods
Iman Ardekani
 
dissertation_defense_final
dissertation_defense_finaldissertation_defense_final
dissertation_defense_final
Yang Yang
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
The Statistical and Applied Mathematical Sciences Institute
 

Similar to Modelling Population Heterogeneity- MichaelGhebre (20)

Introduction to Quantitative Research Methods
Introduction to Quantitative Research MethodsIntroduction to Quantitative Research Methods
Introduction to Quantitative Research Methods
 
Re-analysis of the Cochrane Library data and heterogeneity challenges
Re-analysis of the Cochrane Library data and heterogeneity challengesRe-analysis of the Cochrane Library data and heterogeneity challenges
Re-analysis of the Cochrane Library data and heterogeneity challenges
 
ISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lectureISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lecture
 
Meta-analysis when the normality assumptions are violated (2008)
Meta-analysis when the normality assumptions are violated (2008)Meta-analysis when the normality assumptions are violated (2008)
Meta-analysis when the normality assumptions are violated (2008)
 
Internal 2014 - Cochrane data
Internal 2014 - Cochrane dataInternal 2014 - Cochrane data
Internal 2014 - Cochrane data
 
RSS 2013 - A re-analysis of the Cochrane Library data]
RSS 2013 - A re-analysis of the Cochrane Library data]RSS 2013 - A re-analysis of the Cochrane Library data]
RSS 2013 - A re-analysis of the Cochrane Library data]
 
Revealing Trends Based on Defined Queries in Biological Publications Using Co...
Revealing Trends Based on Defined Queries in Biological Publications Using Co...Revealing Trends Based on Defined Queries in Biological Publications Using Co...
Revealing Trends Based on Defined Queries in Biological Publications Using Co...
 
dissertation_defense_final
dissertation_defense_finaldissertation_defense_final
dissertation_defense_final
 
Learning target Pattern-of-Life for wide-area Anomaly Detection
Learning target Pattern-of-Life for wide-area Anomaly DetectionLearning target Pattern-of-Life for wide-area Anomaly Detection
Learning target Pattern-of-Life for wide-area Anomaly Detection
 
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
 
Methods for High Dimensional Interactions
Methods for High Dimensional InteractionsMethods for High Dimensional Interactions
Methods for High Dimensional Interactions
 
How to use data to design and optimize reaction? A quick introduction to work...
How to use data to design and optimize reaction? A quick introduction to work...How to use data to design and optimize reaction? A quick introduction to work...
How to use data to design and optimize reaction? A quick introduction to work...
 
180326paper
180326paper180326paper
180326paper
 
Brd project
Brd projectBrd project
Brd project
 
Revealing Personal Effects of Nutrition
Revealing Personal Effects of NutritionRevealing Personal Effects of Nutrition
Revealing Personal Effects of Nutrition
 
Learning to Extract Relations for Protein Annotation
Learning to Extract Relations for Protein AnnotationLearning to Extract Relations for Protein Annotation
Learning to Extract Relations for Protein Annotation
 
  Information Theory and the Analysis of Uncertainties in a Spatial Geologi...
  Information Theory and the Analysis of Uncertainties in a Spatial Geologi...  Information Theory and the Analysis of Uncertainties in a Spatial Geologi...
  Information Theory and the Analysis of Uncertainties in a Spatial Geologi...
 
STRATOS ISCB 2019: Ruth Keogh
STRATOS ISCB 2019: Ruth KeoghSTRATOS ISCB 2019: Ruth Keogh
STRATOS ISCB 2019: Ruth Keogh
 
Ct lecture 17. introduction to logistic regression
Ct lecture 17. introduction to logistic regressionCt lecture 17. introduction to logistic regression
Ct lecture 17. introduction to logistic regression
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
 

Modelling Population Heterogeneity- MichaelGhebre

  • 1. Modelling Population Heterogeneity: A simulation study Michael A Ghebre Department of Infection, Immunity and Inflammation; University of Leicester Institute for Lung Health lab Meeting Friday 11th October, 2013
  • 2. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Table of contents 1 Introduction and Objectives 2 Cluster and Factor analyses 3 Simulation study Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables 4 Clustering Asthma and COPD 5 Discussion and Conclusion 6 On-going and Future work Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 3. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Table of contents 1 Introduction and Objectives 2 Cluster and Factor analyses 3 Simulation study Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables 4 Clustering Asthma and COPD 5 Discussion and Conclusion 6 On-going and Future work Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 4. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Introduction Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 5. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Introduction Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 6. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Introduction We consider the problem of choosing variables for clustering high-dimensional data Discovering and understanding relations in high-dimensional data is problematic, especially as the underling structure is unknown. We compared several variables selection techniques for clustering high-dimensional data with underlying structure (similar variable profiles). However, K-means algorithm was used as a clustering techniques throughout. Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 7. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Introduction We consider the problem of choosing variables for clustering high-dimensional data Discovering and understanding relations in high-dimensional data is problematic, especially as the underling structure is unknown. We compared several variables selection techniques for clustering high-dimensional data with underlying structure (similar variable profiles). However, K-means algorithm was used as a clustering techniques throughout. Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 8. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Introduction We consider the problem of choosing variables for clustering high-dimensional data Discovering and understanding relations in high-dimensional data is problematic, especially as the underling structure is unknown. We compared several variables selection techniques for clustering high-dimensional data with underlying structure (similar variable profiles). However, K-means algorithm was used as a clustering techniques throughout. Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 9. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Introduction We consider the problem of choosing variables for clustering high-dimensional data Discovering and understanding relations in high-dimensional data is problematic, especially as the underling structure is unknown. We compared several variables selection techniques for clustering high-dimensional data with underlying structure (similar variable profiles). However, K-means algorithm was used as a clustering techniques throughout. Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 10. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Introduction We consider the problem of choosing variables for clustering high-dimensional data Discovering and understanding relations in high-dimensional data is problematic, especially as the underling structure is unknown. We compared several variables selection techniques for clustering high-dimensional data with underlying structure (similar variable profiles). However, K-means algorithm was used as a clustering techniques throughout. Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 11. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Proposed variables Proposed variables All measured variables (standard approach) Factor Scores Highest-loading variables Random sub-variables Combination of Highest-loading variables and Random sub-variables Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 12. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Table of contents 1 Introduction and Objectives 2 Cluster and Factor analyses 3 Simulation study Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables 4 Clustering Asthma and COPD 5 Discussion and Conclusion 6 On-going and Future work Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 13. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Definition Definition K-means clustering algorithm Factor analysis Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 14. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work K-means cluster Algorithm The distance calculated using the Squared Euclidean Distance: Dij = p k=1 (Yki − Ykj )2 Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 15. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work K-means cluster Algorithm The distance calculated using the Squared Euclidean Distance: Dij = p k=1 (Yki − Ykj )2 Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 16. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Factor Analysis Factor Analysis Factor analysis was performed to reduce the high-dimensional variables to a low number of independent factors and subsequently investigate the underlying structure (relationship) among the variables. Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 17. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Factor Analysis Mathematical Formula of Factor Analysis Y1 = β11F1 + β12F2 + · · · + β1k Fk + 1 Y2 = β21F1 + β22F2 + · · · + β2k Fk + 2 ... Yp = βp1F1 + βp2F2 + · · · + βpk Fk + p Where: *Yp are observed variables; Fk are latent variables; and P>K * Fk are independent, such that E(Fk ) = 0 and Var(Fk ) = 1 * p(error terms) are independent,such that E( p) = 0 and Var( p) = σ2 p * Cor(Fk , p)=0 Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 18. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Factor Analysis Mathematical Formula of Factor Analysis Y1 = β11F1 + β12F2 + · · · + β1k Fk + 1 Y2 = β21F1 + β22F2 + · · · + β2k Fk + 2 ... Yp = βp1F1 + βp2F2 + · · · + βpk Fk + p Where: *Yp are observed variables; Fk are latent variables; and P>K * Fk are independent, such that E(Fk ) = 0 and Var(Fk ) = 1 * p(error terms) are independent,such that E( p) = 0 and Var( p) = σ2 p * Cor(Fk , p)=0 Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 19. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Factor Analysis Mathematical Formula of Factor Analysis Y1 = β11F1 + β12F2 + · · · + β1k Fk + 1 Y2 = β21F1 + β22F2 + · · · + β2k Fk + 2 ... Yp = βp1F1 + βp2F2 + · · · + βpk Fk + p Where: *Yp are observed variables; Fk are latent variables; and P>K * Fk are independent, such that E(Fk ) = 0 and Var(Fk ) = 1 * p(error terms) are independent,such that E( p) = 0 and Var( p) = σ2 p * Cor(Fk , p)=0 Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 20. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Path diagram for 2 factors and 5 observed variables Figure: Factors and observed variables Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 21. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Factor Scores After Factor Analysis, Factor Scores were estimated as follows: FactorScores = Z(S−1 β) Where: Z is standardised variables; S−1 is inverse correlation matrix of the variables; and β is factor loading matrix Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 22. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Factor Scores After Factor Analysis, Factor Scores were estimated as follows: FactorScores = Z(S−1 β) Where: Z is standardised variables; S−1 is inverse correlation matrix of the variables; and β is factor loading matrix Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 23. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Table of contents 1 Introduction and Objectives 2 Cluster and Factor analyses 3 Simulation study Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables 4 Clustering Asthma and COPD 5 Discussion and Conclusion 6 On-going and Future work Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 24. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Performance comparison Data was simulated to compare the performance of the techniques The technique which performed well was applied to asthma and COPD real dataset Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 25. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Simulated dataset 1000 observations from four known clusters of high-dimensional data (20 variables) with underlying structure were simulated Then factor analysis was performed on high-dimensional data, and then reduced to low-dimensional factors Screeplot was used to retain possible number of factors Variables that have the highest-loadings on each factor were identified, and factor scores were extracted Performance of the clustering techniques for determining clusters were compared with true cluster membership. Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 26. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Simulated dataset 1000 observations from four known clusters of high-dimensional data (20 variables) with underlying structure were simulated Then factor analysis was performed on high-dimensional data, and then reduced to low-dimensional factors Screeplot was used to retain possible number of factors Variables that have the highest-loadings on each factor were identified, and factor scores were extracted Performance of the clustering techniques for determining clusters were compared with true cluster membership. Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 27. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Simulated dataset 1000 observations from four known clusters of high-dimensional data (20 variables) with underlying structure were simulated Then factor analysis was performed on high-dimensional data, and then reduced to low-dimensional factors Screeplot was used to retain possible number of factors Variables that have the highest-loadings on each factor were identified, and factor scores were extracted Performance of the clustering techniques for determining clusters were compared with true cluster membership. Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 28. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Simulated dataset 1000 observations from four known clusters of high-dimensional data (20 variables) with underlying structure were simulated Then factor analysis was performed on high-dimensional data, and then reduced to low-dimensional factors Screeplot was used to retain possible number of factors Variables that have the highest-loadings on each factor were identified, and factor scores were extracted Performance of the clustering techniques for determining clusters were compared with true cluster membership. Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 29. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Simulated dataset 1000 observations from four known clusters of high-dimensional data (20 variables) with underlying structure were simulated Then factor analysis was performed on high-dimensional data, and then reduced to low-dimensional factors Screeplot was used to retain possible number of factors Variables that have the highest-loadings on each factor were identified, and factor scores were extracted Performance of the clustering techniques for determining clusters were compared with true cluster membership. Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 30. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Simulated dataset 1000 observations from four known clusters of high-dimensional data (20 variables) with underlying structure were simulated Then factor analysis was performed on high-dimensional data, and then reduced to low-dimensional factors Screeplot was used to retain possible number of factors Variables that have the highest-loadings on each factor were identified, and factor scores were extracted Performance of the clustering techniques for determining clusters were compared with true cluster membership. Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 31. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Description of the simulated dataset Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 32. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Description of the simulated dataset Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 33. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Table of contents 1 Introduction and Objectives 2 Cluster and Factor analyses 3 Simulation study Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables 4 Clustering Asthma and COPD 5 Discussion and Conclusion 6 On-going and Future work Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 34. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables K-means clustering on All Variables All Variables Clustering on All Variables (standard technique) predicted only 70% correctly! Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 35. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables K-means clustering on All Variables All Variables Clustering on All Variables (standard technique) predicted only 70% correctly! Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 36. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Table of contents 1 Introduction and Objectives 2 Cluster and Factor analyses 3 Simulation study Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables 4 Clustering Asthma and COPD 5 Discussion and Conclusion 6 On-going and Future work Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 37. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Scree plot to retain number of factors Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 38. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Scree plot to retain number of factors Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 39. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Four Factor scores across the clusters Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 40. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables K-means on Factor scores K-means Clustering on Factor Scores Factors ExplainedVariance(%) Mis*(%) Mis**(%) 2 70.1 47.0 27.1 3 86.2 25.0 11.3 4 93.8 0.0 0.0 5 96.9 0.0 0.0 6 98.3 0.0 0.0 7 99.3 0.0 0.0 Mis=Misclassification: *Unrotated; **Varimax rotation Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 41. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Table of contents 1 Introduction and Objectives 2 Cluster and Factor analyses 3 Simulation study Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables 4 Clustering Asthma and COPD 5 Discussion and Conclusion 6 On-going and Future work Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 42. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Factor loading Variable Factor 1 Factor 2 Factor 3 Factor 4 ExplainedVariance X1 0.90 0.13 0.22 0.08 0.88 X2 0.80 0.27 0.28 0.25 0.85 X3 0.97 0.05 0.08 0.04 0.95 X4 0.95 0.02 -0.01 0.02 0.91 X5 0.93 0.14 0.14 0.13 0.91 X6 0.22 0.18 0.92 0.12 0.95 X7 0.11 0.05 0.93 -0.04 0.88 X8 0.34 0.27 0.82 0.20 0.90 X9 -0.01 -0.06 0.68 0.63 0.87 X10 -0.04 -0.16 0.84 0.42 0.92 X11 0.27 0.31 0.31 0.81 0.91 X12 0.07 0.02 0.10 0.96 0.95 X13 0.42 0.43 0.44 0. 44 0.73 X14 0.50 0.44 0.43 0. 42 0.79 X15 0.41 0.40 0.52 0. 41 0.76 X16 0.32 0.81 0.24 0. 27 0.88 X17 0.16 0.92 0.11 0.12 0.89 X18 -0.03 0.98 0.06 -0.07 0.96 X19 0.10 0.93 0.11 0.15 0.91 X20 0.08 0.96 0.02 -0.01 0.92 Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 43. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Highest-loading variables Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 44. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Table of contents 1 Introduction and Objectives 2 Cluster and Factor analyses 3 Simulation study Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables 4 Clustering Asthma and COPD 5 Discussion and Conclusion 6 On-going and Future work Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 45. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Random sub-variables Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 46. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Simulation results Clustering on: All measured variables were predicted 70% correctly Factor scores or Highest-loading variables predicted 100% correctly Other random variables, predicted only 44% correctly! Highest-loading and Random sub-variables predicted 71% correctly with 29% misclassification! Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 47. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables Simulation results Clustering on: All measured variables were predicted 70% correctly Factor scores or Highest-loading variables predicted 100% correctly Other random variables, predicted only 44% correctly! Highest-loading and Random sub-variables predicted 71% correctly with 29% misclassification! Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 48. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Table of contents 1 Introduction and Objectives 2 Cluster and Factor analyses 3 Simulation study Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables 4 Clustering Asthma and COPD 5 Discussion and Conclusion 6 On-going and Future work Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 49. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering Asthma and COPD cytokines Clustering techniques ”K-means clustering on factor scores” was applied to identify asthma and COPD biological subgroups! Three biologically and clinically relevant subgroups was identified And found that about 33% of asthma and 31% of COPD were overlap at biological level! Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 50. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering Asthma and COPD cytokines Clustering techniques ”K-means clustering on factor scores” was applied to identify asthma and COPD biological subgroups! Three biologically and clinically relevant subgroups was identified And found that about 33% of asthma and 31% of COPD were overlap at biological level! Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 51. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Clustering Asthma and COPD cytokines Clustering techniques ”K-means clustering on factor scores” was applied to identify asthma and COPD biological subgroups! Three biologically and clinically relevant subgroups was identified And found that about 33% of asthma and 31% of COPD were overlap at biological level! Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 52. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Asthma and COPD Biological Clusters Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 53. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Table of contents 1 Introduction and Objectives 2 Cluster and Factor analyses 3 Simulation study Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables 4 Clustering Asthma and COPD 5 Discussion and Conclusion 6 On-going and Future work Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 54. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Discussion and Conclusion K-means clustering on Factor scores and Highest-loading variables outperformed the other approaches! The true subgroups will be missed if the observations were clustered using the full set of variables. If we suspect the underlying clusters differ with respect to a some of the variables: It is plausible to use the highest-loading variables technique (after varimax rotation) for accurate identification of clusters, and variables having substantial contribution in differentiating the clusters! And then yields interpretable results, as can determine which variables are responsible for the differences between the clusters, and fewer variables will be required to assign a new observation to existing clusters. Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 55. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Discussion and Conclusion K-means clustering on Factor scores and Highest-loading variables outperformed the other approaches! The true subgroups will be missed if the observations were clustered using the full set of variables. If we suspect the underlying clusters differ with respect to a some of the variables: It is plausible to use the highest-loading variables technique (after varimax rotation) for accurate identification of clusters, and variables having substantial contribution in differentiating the clusters! And then yields interpretable results, as can determine which variables are responsible for the differences between the clusters, and fewer variables will be required to assign a new observation to existing clusters. Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 56. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Discussion and Conclusion K-means clustering on Factor scores and Highest-loading variables outperformed the other approaches! The true subgroups will be missed if the observations were clustered using the full set of variables. If we suspect the underlying clusters differ with respect to a some of the variables: It is plausible to use the highest-loading variables technique (after varimax rotation) for accurate identification of clusters, and variables having substantial contribution in differentiating the clusters! And then yields interpretable results, as can determine which variables are responsible for the differences between the clusters, and fewer variables will be required to assign a new observation to existing clusters. Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 57. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Discussion and Conclusion K-means clustering on Factor scores and Highest-loading variables outperformed the other approaches! The true subgroups will be missed if the observations were clustered using the full set of variables. If we suspect the underlying clusters differ with respect to a some of the variables: It is plausible to use the highest-loading variables technique (after varimax rotation) for accurate identification of clusters, and variables having substantial contribution in differentiating the clusters! And then yields interpretable results, as can determine which variables are responsible for the differences between the clusters, and fewer variables will be required to assign a new observation to existing clusters. Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 58. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Table of contents 1 Introduction and Objectives 2 Cluster and Factor analyses 3 Simulation study Clustering on all measured variables Clustering on Factor Scores Clustering on Highest-loading variables Clustering on Random sub-variables 4 Clustering Asthma and COPD 5 Discussion and Conclusion 6 On-going and Future work Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 59. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work On-going and Future work Developing model based techniques for clustering and variable selection simultaneously Applying this technique to identify asthma and COPD subgroups using CTscan(small and large airways), and gene expression Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 60. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work References 1 GH Lubke and B Muthen (2005); Investigating Population Heterogeneity With Factor Mixture Models 2 Tabachnick BG and Fidell LS (2007); Using Multivariate Statistics. FIFTH ed. 3 Everitt B (2011); An Introduction to Applied Multivariate Analysis with R. 4 Gregory R and Samuelsen KM (2008); Advances in Latent Variable Mixture Models. 5 McLachlan, G. J., and Peel, D. (2000). Finite mixture models. New York: Wiley. 6 Witten DM and R Tibshirani (2010) A framework for feature selection in clustering. Michael A Ghebre Modelling Population Heterogeneity: A simulation study
  • 61. Introduction and Objectives Cluster and Factor analyses Simulation study Clustering Asthma and COPD Discussion and Conclusion On-going and Future work Acknowledgement Prof C. Brightling, Dr C. Newby, Prof P. Burton, & Prof J. Thompson Michael A Ghebre Modelling Population Heterogeneity: A simulation study