Your SlideShare is downloading. ×
  • Like

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Using CART to Unravel Clusters for the Testing of Interactions in Asthma Databases



Published in Technology , Health & Medicine
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Using CART to Unravel Clusters for the Testingof Interactions in Asthma DatabasesBen TrzaskomaSr Statistical ScientistGenentech, Inc
  • 2. Introduction• Ultimate Purpose: Determine subgroups with greatest “Drug advantage” (versus Placebo)• Variable clustering − Used to identify groups of variables − Created composite scores summarizing groups of variables − Determined “Drug benefit” for each composite score• Patient (case) clustering − Used to identify subsets of patients − Determined “Drug benefit” for each subset of patients
  • 3. Clinical Trial Dataset• Phase IIIb multicenter, randomized, double-blind, placebo-controlled study• Original purpose of the study: − Evaluate efficacy and safety of Asthma Drug − Subjects have moderate to severe asthma − Asthma is inadequately controlled with ICS and LABA• 850 patients age 12-75 from about 150 sites were enrolled and followed for 48 weeks − Half given Drug + ICS + LABA − Half given placebo + ICS + LABA
  • 4. Methods• Two approaches: • Variable Clustering (Proc Varclus in SAS) • Related to factor analysis • Looks for relationship among variables • Patient (Case) Clustering (Proc Cluster in SAS) • Looks for groups of “similar” cases
  • 5. Methods: Patient Clustering • Patient (case) clustering with SAS CLUSTER procedure − Identify groups of similar patients − Variables with most variability are most important − The resulting clusters allow for the analysis of patient groups with different characteristics
  • 6. Methods: Patient Clustering • Patient clustering details − The SAS CLUSTER procedure is an agglomerative clustering technique  Starts with one cluster per patient and iteratively groups the two nearest clusters until there is only one cluster with all patients in it.  Based on a set of key variables  Ward’s method selected. This method uses the minimum variance to determine which two clusters should be the next to cluster together. It tends to maximize the corresponding ANOVA.  Selected stopping point with moderate number of clusters (selection is somewhat arbitrary -- see dendrogram)  The squared multiple correlation, R-squared, is the proportion of variance accounted for by the clusters and is used to assess the goodness of a particular cluster solution.
  • 7. Methods: Patient ClusteringWard’s MethodWard proposes that at any stage of an analysis the loss of information which results fromthe grouping of individuals into clusters can be measured by the total sum of squareddeviations of every point from the mean of the cluster to which it belongs.The distance between a group k and a group (ij) formed by the fusion of i and j: dk(ij) = αidki + αjdkj + βdijWhere dij is the distance between groups i and j nk + ni nk + nj -nkαi = αj = β= nk + ni + nj nk + ni + nj nk + ni + njAnd ni is the number of cases in group i.Everitt B, Cluster Analysis, 1977
  • 8. Patient Clusters from Clinical Trial Dataset Data Variables used to Cluster Age, Sex, Demographics Race, BMI Data File FEV1, FVC Spirometry variables Data File Duration, Allergic History Onset, Skin Data File tests Symptoms, AQLQ Data Activity, File Smoking
  • 9. Patient Clusters• Used CART to better understand patient clusters  Used to uncover hidden structure in complex data to predict our 7 clusters  10-fold cross validation used to build the CART model – We set the variable indicating the 7 clusters as the target  Allowed CART to help us describe and ultimately name the 7 clusters via the nodes in the final tree
  • 10. Patient Clusters • CART Method Details – The target is the Cluster assignment variable, 30 predictors were included – Cluster was considered categorical and the classification tree was used – CART single variable splitting criteria method used was Gini – Each predictor was given equal weight – No priors, constraints, or penalties were defined – Default of 10-fold cross validation was used
  • 11. Patient ClustersCART Tree Page 1
  • 12. Patient ClustersCART Tree Page 2
  • 13. Patient ClustersCART Tree Node Descriptions 1. ICS Use = Low/Medium 7. ICS Use High and 2. ICS Use High and Not ICU/Intubated and ICU/Intubated Not On Women’s Hormone Therapy and 3. ICS Use High and Not Black and Not ICU/Intubated and Age < 44.5 and On Women’s Hormone Therapy Post-Bronchodilator % Predicted FVC > .72 and 4. ICS Use High and Activity Score > 3.59 Not ICU/Intubated and 8. ICS Use High and Not On Women’s Hormone Therapy and Not ICU/Intubated and Black Not On Women’s Hormone Therapy and 5. ICS Use High and Not Black and Not ICU/Intubated and Age > 44.5 and Not On Women’s Hormone Therapy and Post-Bronchodilator % Predicted FEV1 <= 61.88 Not Black and 9. ICS Use High and Age < 44.5 and Not ICU/Intubated and Post-Bronchodilator % Predicted FVC <= .72 Not On Women’s Hormone Therapy and 6. ICS Use High and Not Black and Not ICU/Intubated and Age > 44.5 and Not On Women’s Hormone Therapy and Post-Bronchodilator % Predicted FEV1 > 61.88 Not Black and Age < 44.5 and Post-Bronchodilator % Predicted FVC > .72 and Activity Score <= 3.59
  • 14. Patient Clusters• CART output and node descriptions helped to name the 7 clusters – (1) Older/Poor Lung Function – (2) Younger/Good Lung Function/Good Activity – (3) Older/Moderate Lung Function – (4) High Women’s Hormone Therapy – (5) Race - Black – (6) High ICS Use – (7) High ICU/Intubation
  • 15. ResultsFEV1* and Exacerbation Advantage for the Seven Cluster Solution
  • 16. Limitations• Potential Issues with Case Clustering – Additional variables could have been included – CART is one way to describe the cluster splits, but not the only way
  • 17. Conclusions• Conclusions – CART helped us describe the clusters in a clinically meaningful way – There are groups of patients that respond to Active Drug over placebo better than other groups – Patients in cluster 4 (High Women’s Hormone Therapy) and the cluster 2 (Younger/Good Lung Function/Good Activity) responded better than other patient clusters