Ama ieee-rpg


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Ama ieee-rpg

  1. 1. LOW/HIGH-RISK DIABETES GROUP SEGMENTATION USING α-TREES<br />AMA-IEEE Medical Technology Conference 2011<br />Anurekha Ramakrishnan1, Yubin Park2, Joydeep Ghosh2<br />1Dept. of Statistics and Scientific Computation<br />2Dept. of Electrical and Computer Engineering<br />The University of Texas at Austin<br />
  2. 2. Barriers to M/C learning Adoption in Healthcare<br />Class-imbalance <br /> Target ratios are often extremely skewed.<br />Mismatch with Performance Metrics ‘Misclassification rates may not be relevant<br />Asymmetric costs involved. <br />‘Sensitivity/Specificity’ or ‘Lift’ should be a part of learning goals.<br />Interpretation of Results<br />Simple AND/OR Rules (in Natural Language) are desirable.<br />We suggest a possible solution for these problems using:<br />Modified α-Trees,<br />Disjunctive Combination of Rules.<br />
  3. 3. Objectives<br />Other Requirements:<br />Interpretable segmentation - AND, OR Rules in Natural language<br />Extensive coverage using Simple rules.<br />Note: These objectives are different from traditional machine learning objectives. The objectives are based on the observations on many failed Medical Decision Support systems.<br />
  4. 4. BRFSS Dataset<br />Behavioral Risk Factor Surveillance System<br />URL:<br />The largest telephone survey since 1984.<br />Tracks health conditions and risk behaviors in the United States.<br />Contains information on a variety of diseases<br />e.g. diabetes, hypertension, cancer, asthma, HIV, etc. <br />More than 400,000 records per year.<br />Many states use BRFSS data to support health-related legislative efforts.<br />
  5. 5. α-Tree1<br />A Decision Tree Algorithm (e.g. CART, C4.5)<br />Decision criterion: α-Divergence.<br />Generalizes C4.5.<br />Robust performance in class-imbalance settings.<br />Stop its growth when a Low/High-risk group is obtained. (modified α-Tree)<br />Different ‘α’ values result in different decision rules.<br />Decision trees provide greedy solutions (sub-optimal solutions).<br />By disjunctively combining different solutions from different α-Trees, we can approach to a better solution.<br />Python Code available (<br />1. Y.Park and J.Ghosh, “Compact Ensemble Trees for Imbalanced Data,” in 10th International Wokshop on Multiple Classifier Systems, Italy, June 2011.<br />
  6. 6. 3-Phase Diagram<br />Example)When High-risk group is defined as more than 24% Diabetes Rate group.<br /> - Twice Higher rate than Normal Population<br /> Rule1:RFHYPE5 = 1 & AGE_G >= 5.0 & RFHLTH = 2 & BMI4CAT >= 2.0 from α=0.1<br /> ORRule 2: RFHYPE5 ≠ 1 & RFHLTH = 1 & BMI4CAT >= 2.9 & PNEUVAC3 = 1 from α=1.0<br /> ORRule 3: RFHYPE5 = 2 & RFHLTH ≠ 1from α=1.5<br /> OR …<br /> These combined rules extract High-risk Diabetes Segments (>24%).<br />
  7. 7. Example Tree Structure<br />When α=2.0, total five High-risk Segmentation Rules are extracted.<br />Different α values result in different tree structures.<br />Yes<br />No<br />
  8. 8. Results for Twice Higher Diabetes Rate Group (High-risk)<br />Resultant Rules from α-Trees.<br />RFHYPE5 = 2 & RFHLTH ≠1<br />RFHYPE5 ≠2 & RFHLTH = 2 <br />& RFCHOL = 2<br />…<br />English Translation<br />Segment 1: They have high-blood pressure and think themselves unhealthy (including not responding to this question).<br />Segment 2: They have high cholesterol and think themselves unhealthy. But they don’t have high-blood pressure.<br />…<br />
  9. 9. Results for Four-times lower Diabetes Rate Group (Low-risk)<br />Resultant Rules from α-Trees.<br />RFHYPE5 ≠2 and RFHLTH ≠2 and PNEUVAC3 ≠1 <br />RFHYPE5 =1 and RFHLTH ≠2 and AGE_G < 5.0<br />…<br />English Translation<br />Segment 1: They don’t have high blood pressure and think themselves healthy. They had a pneumonia shot at least once in their life time.<br />Segment 2: They have high blood pressure, but think themselves healthy and are under 50 yrs of age.<br />…<br />
  10. 10. AppendixA<br />α-Divergence<br />Special cases<br />
  11. 11. AppendixB<br />Modified α-Tree Algorithm<br />Input: BRFSS (input data), α (parameter)<br />Output: Low-risk group extraction rules<br />Select the best feature, which gives the maximum α-divergence criterion.<br />If (no such feature) <br />or (number of data points < cut-off size) <br />or (This group is a low/high-risk group) <br />then stop its growth.<br />Else<br />Segment the input data based on the best feature.<br />Recursively run Modified α-Tree Algorithm( segmented data, α)<br />