MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
Modelling Population Heterogeneity- MichaelGhebre
1. Modelling Population Heterogeneity: A simulation
study
Michael A Ghebre
Department of Infection, Immunity and Inflammation; University of Leicester
Institute for Lung Health lab Meeting
Friday 11th
October, 2013
2. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
3. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
4. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Introduction
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
5. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Introduction
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
6. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Introduction
We consider the problem of choosing variables for clustering
high-dimensional data
Discovering and understanding relations in high-dimensional data is
problematic, especially as the underling structure is unknown.
We compared several variables selection techniques for clustering
high-dimensional data with underlying structure (similar variable
profiles).
However, K-means algorithm was used as a clustering techniques
throughout.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
7. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Introduction
We consider the problem of choosing variables for clustering
high-dimensional data
Discovering and understanding relations in high-dimensional data is
problematic, especially as the underling structure is unknown.
We compared several variables selection techniques for clustering
high-dimensional data with underlying structure (similar variable
profiles).
However, K-means algorithm was used as a clustering techniques
throughout.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
8. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Introduction
We consider the problem of choosing variables for clustering
high-dimensional data
Discovering and understanding relations in high-dimensional data is
problematic, especially as the underling structure is unknown.
We compared several variables selection techniques for clustering
high-dimensional data with underlying structure (similar variable
profiles).
However, K-means algorithm was used as a clustering techniques
throughout.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
9. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Introduction
We consider the problem of choosing variables for clustering
high-dimensional data
Discovering and understanding relations in high-dimensional data is
problematic, especially as the underling structure is unknown.
We compared several variables selection techniques for clustering
high-dimensional data with underlying structure (similar variable
profiles).
However, K-means algorithm was used as a clustering techniques
throughout.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
10. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Introduction
We consider the problem of choosing variables for clustering
high-dimensional data
Discovering and understanding relations in high-dimensional data is
problematic, especially as the underling structure is unknown.
We compared several variables selection techniques for clustering
high-dimensional data with underlying structure (similar variable
profiles).
However, K-means algorithm was used as a clustering techniques
throughout.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
11. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Proposed variables
Proposed variables
All measured variables (standard approach)
Factor Scores
Highest-loading variables
Random sub-variables
Combination of Highest-loading variables and Random sub-variables
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
12. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
13. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Definition
Definition
K-means clustering algorithm
Factor analysis
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
14. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
K-means cluster Algorithm
The distance calculated using the Squared Euclidean Distance:
Dij =
p
k=1
(Yki − Ykj )2
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
15. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
K-means cluster Algorithm
The distance calculated using the Squared Euclidean Distance:
Dij =
p
k=1
(Yki − Ykj )2
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
16. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Factor Analysis
Factor Analysis
Factor analysis was performed to reduce the high-dimensional variables to
a low number of independent factors and subsequently investigate the
underlying structure (relationship) among the variables.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
17. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Factor Analysis
Mathematical Formula of Factor Analysis
Y1 = β11F1 + β12F2 + · · · + β1k Fk + 1
Y2 = β21F1 + β22F2 + · · · + β2k Fk + 2
...
Yp = βp1F1 + βp2F2 + · · · + βpk Fk + p
Where:
*Yp are observed variables; Fk are latent variables; and P>K
* Fk are independent, such that E(Fk ) = 0 and Var(Fk ) = 1
* p(error terms) are independent,such that E( p) = 0 and Var( p) = σ2
p
* Cor(Fk , p)=0
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
18. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Factor Analysis
Mathematical Formula of Factor Analysis
Y1 = β11F1 + β12F2 + · · · + β1k Fk + 1
Y2 = β21F1 + β22F2 + · · · + β2k Fk + 2
...
Yp = βp1F1 + βp2F2 + · · · + βpk Fk + p
Where:
*Yp are observed variables; Fk are latent variables; and P>K
* Fk are independent, such that E(Fk ) = 0 and Var(Fk ) = 1
* p(error terms) are independent,such that E( p) = 0 and Var( p) = σ2
p
* Cor(Fk , p)=0
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
19. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Factor Analysis
Mathematical Formula of Factor Analysis
Y1 = β11F1 + β12F2 + · · · + β1k Fk + 1
Y2 = β21F1 + β22F2 + · · · + β2k Fk + 2
...
Yp = βp1F1 + βp2F2 + · · · + βpk Fk + p
Where:
*Yp are observed variables; Fk are latent variables; and P>K
* Fk are independent, such that E(Fk ) = 0 and Var(Fk ) = 1
* p(error terms) are independent,such that E( p) = 0 and Var( p) = σ2
p
* Cor(Fk , p)=0
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
20. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Path diagram for 2 factors and 5 observed variables
Figure: Factors and observed variables
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
21. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Factor Scores
After Factor Analysis, Factor Scores were estimated as follows:
FactorScores = Z(S−1
β)
Where: Z is standardised variables; S−1
is inverse correlation matrix of
the variables; and β is factor loading matrix
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
22. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Factor Scores
After Factor Analysis, Factor Scores were estimated as follows:
FactorScores = Z(S−1
β)
Where: Z is standardised variables; S−1
is inverse correlation matrix of
the variables; and β is factor loading matrix
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
23. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
24. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Performance comparison
Data was simulated to compare the performance of the techniques
The technique which performed well was applied to asthma and
COPD real dataset
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
25. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Simulated dataset
1000 observations from four known clusters of high-dimensional data
(20 variables) with underlying structure were simulated
Then factor analysis was performed on high-dimensional data, and
then reduced to low-dimensional factors
Screeplot was used to retain possible number of factors
Variables that have the highest-loadings on each factor were
identified, and factor scores were extracted
Performance of the clustering techniques for determining clusters
were compared with true cluster membership.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
26. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Simulated dataset
1000 observations from four known clusters of high-dimensional data
(20 variables) with underlying structure were simulated
Then factor analysis was performed on high-dimensional data, and
then reduced to low-dimensional factors
Screeplot was used to retain possible number of factors
Variables that have the highest-loadings on each factor were
identified, and factor scores were extracted
Performance of the clustering techniques for determining clusters
were compared with true cluster membership.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
27. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Simulated dataset
1000 observations from four known clusters of high-dimensional data
(20 variables) with underlying structure were simulated
Then factor analysis was performed on high-dimensional data, and
then reduced to low-dimensional factors
Screeplot was used to retain possible number of factors
Variables that have the highest-loadings on each factor were
identified, and factor scores were extracted
Performance of the clustering techniques for determining clusters
were compared with true cluster membership.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
28. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Simulated dataset
1000 observations from four known clusters of high-dimensional data
(20 variables) with underlying structure were simulated
Then factor analysis was performed on high-dimensional data, and
then reduced to low-dimensional factors
Screeplot was used to retain possible number of factors
Variables that have the highest-loadings on each factor were
identified, and factor scores were extracted
Performance of the clustering techniques for determining clusters
were compared with true cluster membership.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
29. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Simulated dataset
1000 observations from four known clusters of high-dimensional data
(20 variables) with underlying structure were simulated
Then factor analysis was performed on high-dimensional data, and
then reduced to low-dimensional factors
Screeplot was used to retain possible number of factors
Variables that have the highest-loadings on each factor were
identified, and factor scores were extracted
Performance of the clustering techniques for determining clusters
were compared with true cluster membership.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
30. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Simulated dataset
1000 observations from four known clusters of high-dimensional data
(20 variables) with underlying structure were simulated
Then factor analysis was performed on high-dimensional data, and
then reduced to low-dimensional factors
Screeplot was used to retain possible number of factors
Variables that have the highest-loadings on each factor were
identified, and factor scores were extracted
Performance of the clustering techniques for determining clusters
were compared with true cluster membership.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
31. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Description of the simulated dataset
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
32. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Description of the simulated dataset
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
33. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
34. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
K-means clustering on All Variables
All Variables
Clustering on All Variables (standard technique) predicted only 70%
correctly!
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
35. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
K-means clustering on All Variables
All Variables
Clustering on All Variables (standard technique) predicted only 70%
correctly!
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
36. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
37. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Scree plot to retain number of factors
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
38. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Scree plot to retain number of factors
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
39. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Four Factor scores across the clusters
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
40. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
K-means on Factor scores
K-means Clustering on Factor Scores
Factors ExplainedVariance(%) Mis*(%) Mis**(%)
2 70.1 47.0 27.1
3 86.2 25.0 11.3
4 93.8 0.0 0.0
5 96.9 0.0 0.0
6 98.3 0.0 0.0
7 99.3 0.0 0.0
Mis=Misclassification: *Unrotated; **Varimax rotation
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
41. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
43. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Highest-loading variables
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
44. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
45. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Random sub-variables
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
46. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Simulation results
Clustering on:
All measured variables were predicted 70% correctly
Factor scores or Highest-loading variables predicted 100% correctly
Other random variables, predicted only 44% correctly!
Highest-loading and Random sub-variables predicted 71% correctly
with 29% misclassification!
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
47. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
Simulation results
Clustering on:
All measured variables were predicted 70% correctly
Factor scores or Highest-loading variables predicted 100% correctly
Other random variables, predicted only 44% correctly!
Highest-loading and Random sub-variables predicted 71% correctly
with 29% misclassification!
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
48. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
49. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering Asthma and COPD cytokines
Clustering techniques
”K-means clustering on factor scores” was applied to identify asthma and
COPD biological subgroups!
Three biologically and clinically relevant subgroups was identified
And found that about 33% of asthma and 31% of COPD were
overlap at biological level!
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
50. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering Asthma and COPD cytokines
Clustering techniques
”K-means clustering on factor scores” was applied to identify asthma and
COPD biological subgroups!
Three biologically and clinically relevant subgroups was identified
And found that about 33% of asthma and 31% of COPD were
overlap at biological level!
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
51. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Clustering Asthma and COPD cytokines
Clustering techniques
”K-means clustering on factor scores” was applied to identify asthma and
COPD biological subgroups!
Three biologically and clinically relevant subgroups was identified
And found that about 33% of asthma and 31% of COPD were
overlap at biological level!
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
52. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Asthma and COPD Biological Clusters
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
53. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
54. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Discussion and Conclusion
K-means clustering on Factor scores and Highest-loading variables
outperformed the other approaches!
The true subgroups will be missed if the observations were clustered
using the full set of variables.
If we suspect the underlying clusters differ with respect to a some of
the variables:
It is plausible to use the highest-loading variables technique (after
varimax rotation) for accurate identification of clusters, and variables
having substantial contribution in differentiating the clusters!
And then yields interpretable results, as can determine which
variables are responsible for the differences between the clusters, and
fewer variables will be required to assign a new observation to
existing clusters.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
55. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Discussion and Conclusion
K-means clustering on Factor scores and Highest-loading variables
outperformed the other approaches!
The true subgroups will be missed if the observations were clustered
using the full set of variables.
If we suspect the underlying clusters differ with respect to a some of
the variables:
It is plausible to use the highest-loading variables technique (after
varimax rotation) for accurate identification of clusters, and variables
having substantial contribution in differentiating the clusters!
And then yields interpretable results, as can determine which
variables are responsible for the differences between the clusters, and
fewer variables will be required to assign a new observation to
existing clusters.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
56. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Discussion and Conclusion
K-means clustering on Factor scores and Highest-loading variables
outperformed the other approaches!
The true subgroups will be missed if the observations were clustered
using the full set of variables.
If we suspect the underlying clusters differ with respect to a some of
the variables:
It is plausible to use the highest-loading variables technique (after
varimax rotation) for accurate identification of clusters, and variables
having substantial contribution in differentiating the clusters!
And then yields interpretable results, as can determine which
variables are responsible for the differences between the clusters, and
fewer variables will be required to assign a new observation to
existing clusters.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
57. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Discussion and Conclusion
K-means clustering on Factor scores and Highest-loading variables
outperformed the other approaches!
The true subgroups will be missed if the observations were clustered
using the full set of variables.
If we suspect the underlying clusters differ with respect to a some of
the variables:
It is plausible to use the highest-loading variables technique (after
varimax rotation) for accurate identification of clusters, and variables
having substantial contribution in differentiating the clusters!
And then yields interpretable results, as can determine which
variables are responsible for the differences between the clusters, and
fewer variables will be required to assign a new observation to
existing clusters.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
58. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Table of contents
1 Introduction and Objectives
2 Cluster and Factor analyses
3 Simulation study
Clustering on all measured variables
Clustering on Factor Scores
Clustering on Highest-loading variables
Clustering on Random sub-variables
4 Clustering Asthma and COPD
5 Discussion and Conclusion
6 On-going and Future work
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
59. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
On-going and Future work
Developing model based techniques for clustering and variable
selection simultaneously
Applying this technique to identify asthma and COPD subgroups
using CTscan(small and large airways), and gene expression
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
60. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
References
1 GH Lubke and B Muthen (2005); Investigating Population
Heterogeneity With Factor Mixture Models
2 Tabachnick BG and Fidell LS (2007); Using Multivariate Statistics.
FIFTH ed.
3 Everitt B (2011); An Introduction to Applied Multivariate Analysis
with R.
4 Gregory R and Samuelsen KM (2008); Advances in Latent Variable
Mixture Models.
5 McLachlan, G. J., and Peel, D. (2000). Finite mixture models. New
York: Wiley.
6 Witten DM and R Tibshirani (2010) A framework for feature
selection in clustering.
Michael A Ghebre Modelling Population Heterogeneity: A simulation study
61. Introduction and Objectives
Cluster and Factor analyses
Simulation study
Clustering Asthma and COPD
Discussion and Conclusion
On-going and Future work
Acknowledgement
Prof C. Brightling, Dr C. Newby, Prof P. Burton, & Prof J. Thompson
Michael A Ghebre Modelling Population Heterogeneity: A simulation study