Automatic Heart Sound Recording
Classification using a Nested Set of
Ensemble Algorithms
Masun Nabhan Homsi, Natasha Medina, Miguel Hernandez
Philip Warrick*
Our Approach
Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
Physionet
2016 datasets
Resampling at
1000Hz
Features
Extraction
Segmentation
Pre-processing
Phase
Random Forest
LogitBoost
Cost-Sensitive
Learning
Classification Phase Evaluation Phase
Se Sp
MAcc
Pre-processing Phase
• Features Extracted for Classification
Qty Feature(s) Per S1,S2,
Dia, Sys
Domain
20
m_RR, sd_RR, mean_IntS,
sd_IntS1, mean_IntS2,
sd_IntS2, mean_IntSys,
sd_IntSys, mean_IntDia,
sd_IntDia, m_Ratio_SysRR,
sd_Ratio_SysRR
m_Ratio_DiaRR
sd_Ratio_DiaRR
m_Ratio_SysDia
sd_Ratio_SysDia,
m_Amp_SysS1
sd_Amp_SysS1,
m_Amp_DiaS2 and
sd_Amp_DiaS2 [1]
Time and statistical
(Sample Entry set)
Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
Pre-processing Phase
• Features Extracted for Classification
Qty Feature(s) Per S1,S2,
Dia, Sys
Domain
1 Heart Rate (HR) Time
4 Zero Crossing Rate (ZCR) 
4 Time Duration (TD) 
4 Root Mean Square (RMS) 
4 Total Power (TotPowT) 
17 Total
Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
Pre-processing Phase
• Features Extracted for Classification
Qty Feature(s) Per S1,S2,
Dia, Sys
Domain
4 Tota Power (TotPowF)  Frequency
4 Bandwidth (BW) 
4 Q-Factor (Qf) 
12 Total
Qty Feature(s) Per S1,S2,
Dia, Sys
Domain
4 Max  Statistical
4 Mean 
4 Variance 
4 Skewness 
4 Kurtosis 
4 Sample Entropy (SampEn) 
4 Shannon Entropy (SE1) 
28 Total
Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
Pre-processing Phase
• Features Extracted for Classification
Qty Feature(s)
Shannon Entropy (SE)
Per S1,S2,
Dia, Sys
Domain
6 SE2 (5-level wavelet) Statistical and Wavelet
24 SE3 (5-level wavelet) 
24 SE4 (5-level wavelet) 
54 Total
Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
Classification Phase
• Nested set of ensemble classifiers:
– Cost-Sensitive Classifier (CSC)
– LogitBoost(LB)
– Random Forest(RF)
• Trained and tested on both Physionet2016
datasets (10-fold stratified cross-validation).
Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
Classification Phase
Random Forest (RF):
• Meta-learning approach that
uses multiple random
decision trees as base
learners and aggregates them
to compute final ensemble
prediction.
• involves sampling of input
data with replacement
(bootstrap).
An RF has three parameters that can affect its performance:
• Number of features (NF)
• Number of trees (NT)
• Maximum depth of tree (MDT)
Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
Classification Phase
• LogitBoost (LB) is a meta-learning that is used for model optimization
• It performs additive logistic regression and generates the individual models fj.
• It maximizes the probability of the data with respect to the ensemble if each
model fj is determined by minimizing the squared error on the corresponding
regression problem.
• The algorithm converges to the maximum likelihood linear logistic regression
model
Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
Classification Phase
Cost-Sensitive Classifier (CSC)
• meta-classifier with cost-sensitive base
classifier (i.e., LB+RF)
• misclassification penalties associated with
each outcome of confusion matrix:
Actual
Predicted
Positive Negative
Positive 0 CFN
Negative CFP 0
Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
Results and Discussions
Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
Results of the Best Entry
• Se:93.7%
• Sp:87.3%
• MAcc: 88.6%
• LB-IT=3
• CM: 0 8
1 0
0.54
0.46
0.57
0
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6 7 8 9 10
Iteration
Out of Bag Error
Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
CFN
Iteration OOB Se (%) Sp (%) Macc (%)
1 0.54 100 0 21.1
3 0.46 93.7 87.3 88.6
10 0.57 83.9 94.8 92.5
Results using 100% of the dataset
Sample Tree from Best Entry
Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
Sample Tree from Best Entry
• Tree root: Sample Entropy of Diastole
Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
Conclusions
• Promising approach for classifying heart
sounds recorded in heterogeneous
environments.
• Detector performance strongly depends on
data quality.
• #Examples of minority class limited ability:
– to represent well subpopulations of various
abnormalities attributable to heart disease?
– for classifiers to learn adequately?
Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
Future Works
• Employ pre-processing methods to determine
the most discriminating features from our
large set and to gain insight into developing
more improved features.
Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
Questions
?
Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016

Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms

  • 1.
    Automatic Heart SoundRecording Classification using a Nested Set of Ensemble Algorithms Masun Nabhan Homsi, Natasha Medina, Miguel Hernandez Philip Warrick*
  • 2.
    Our Approach Automatic HeartSound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016 Physionet 2016 datasets Resampling at 1000Hz Features Extraction Segmentation Pre-processing Phase Random Forest LogitBoost Cost-Sensitive Learning Classification Phase Evaluation Phase Se Sp MAcc
  • 3.
    Pre-processing Phase • FeaturesExtracted for Classification Qty Feature(s) Per S1,S2, Dia, Sys Domain 20 m_RR, sd_RR, mean_IntS, sd_IntS1, mean_IntS2, sd_IntS2, mean_IntSys, sd_IntSys, mean_IntDia, sd_IntDia, m_Ratio_SysRR, sd_Ratio_SysRR m_Ratio_DiaRR sd_Ratio_DiaRR m_Ratio_SysDia sd_Ratio_SysDia, m_Amp_SysS1 sd_Amp_SysS1, m_Amp_DiaS2 and sd_Amp_DiaS2 [1] Time and statistical (Sample Entry set) Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
  • 4.
    Pre-processing Phase • FeaturesExtracted for Classification Qty Feature(s) Per S1,S2, Dia, Sys Domain 1 Heart Rate (HR) Time 4 Zero Crossing Rate (ZCR)  4 Time Duration (TD)  4 Root Mean Square (RMS)  4 Total Power (TotPowT)  17 Total Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
  • 5.
    Pre-processing Phase • FeaturesExtracted for Classification Qty Feature(s) Per S1,S2, Dia, Sys Domain 4 Tota Power (TotPowF)  Frequency 4 Bandwidth (BW)  4 Q-Factor (Qf)  12 Total Qty Feature(s) Per S1,S2, Dia, Sys Domain 4 Max  Statistical 4 Mean  4 Variance  4 Skewness  4 Kurtosis  4 Sample Entropy (SampEn)  4 Shannon Entropy (SE1)  28 Total Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
  • 6.
    Pre-processing Phase • FeaturesExtracted for Classification Qty Feature(s) Shannon Entropy (SE) Per S1,S2, Dia, Sys Domain 6 SE2 (5-level wavelet) Statistical and Wavelet 24 SE3 (5-level wavelet)  24 SE4 (5-level wavelet)  54 Total Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
  • 7.
    Classification Phase • Nestedset of ensemble classifiers: – Cost-Sensitive Classifier (CSC) – LogitBoost(LB) – Random Forest(RF) • Trained and tested on both Physionet2016 datasets (10-fold stratified cross-validation). Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
  • 8.
    Classification Phase Random Forest(RF): • Meta-learning approach that uses multiple random decision trees as base learners and aggregates them to compute final ensemble prediction. • involves sampling of input data with replacement (bootstrap). An RF has three parameters that can affect its performance: • Number of features (NF) • Number of trees (NT) • Maximum depth of tree (MDT) Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
  • 9.
    Classification Phase • LogitBoost(LB) is a meta-learning that is used for model optimization • It performs additive logistic regression and generates the individual models fj. • It maximizes the probability of the data with respect to the ensemble if each model fj is determined by minimizing the squared error on the corresponding regression problem. • The algorithm converges to the maximum likelihood linear logistic regression model Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
  • 10.
    Classification Phase Cost-Sensitive Classifier(CSC) • meta-classifier with cost-sensitive base classifier (i.e., LB+RF) • misclassification penalties associated with each outcome of confusion matrix: Actual Predicted Positive Negative Positive 0 CFN Negative CFP 0 Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
  • 11.
    Results and Discussions AutomaticHeart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
  • 12.
    Results of theBest Entry • Se:93.7% • Sp:87.3% • MAcc: 88.6% • LB-IT=3 • CM: 0 8 1 0 0.54 0.46 0.57 0 0.1 0.2 0.3 0.4 0.5 0.6 1 2 3 4 5 6 7 8 9 10 Iteration Out of Bag Error Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016 CFN Iteration OOB Se (%) Sp (%) Macc (%) 1 0.54 100 0 21.1 3 0.46 93.7 87.3 88.6 10 0.57 83.9 94.8 92.5 Results using 100% of the dataset
  • 13.
    Sample Tree fromBest Entry Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
  • 14.
    Sample Tree fromBest Entry • Tree root: Sample Entropy of Diastole Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
  • 15.
    Conclusions • Promising approachfor classifying heart sounds recorded in heterogeneous environments. • Detector performance strongly depends on data quality. • #Examples of minority class limited ability: – to represent well subpopulations of various abnormalities attributable to heart disease? – for classifiers to learn adequately? Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
  • 16.
    Future Works • Employpre-processing methods to determine the most discriminating features from our large set and to gain insight into developing more improved features. Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
  • 17.
    Questions ? Automatic Heart SoundRecording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016