Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms

Automatic Heart Sound Recording
Classification using a Nested Set of
Ensemble Algorithms
Masun Nabhan Homsi, Natasha Medina, Miguel Hernandez
Philip Warrick*

Our Approach
Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms , CINC2016, Physionet Challenge 2016
Physionet
2016 datasets
Resampling at
1000Hz
Features
Extraction
Segmentation
Pre-processing
Phase
Random Forest
LogitBoost
Cost-Sensitive
Learning
Classification Phase Evaluation Phase
Se Sp
MAcc

Pre-processing Phase
• Features Extracted for Classification
Qty Feature(s) Per S1,S2,
Dia, Sys
Domain
20
m_RR, sd_RR, mean_IntS,
sd_IntS1, mean_IntS2,
sd_IntS2, mean_IntSys,
sd_IntSys, mean_IntDia,
sd_IntDia, m_Ratio_SysRR,
sd_Ratio_SysRR
m_Ratio_DiaRR
sd_Ratio_DiaRR
m_Ratio_SysDia
sd_Ratio_SysDia,
m_Amp_SysS1
sd_Amp_SysS1,
m_Amp_DiaS2 and
sd_Amp_DiaS2 [1]
Time and statistical
(Sample Entry set)

Dia, Sys
Domain
1 Heart Rate (HR) Time
4 Zero Crossing Rate (ZCR) 
4 Time Duration (TD) 
4 Root Mean Square (RMS) 
4 Total Power (TotPowT) 
17 Total

Dia, Sys
Domain
4 Tota Power (TotPowF)  Frequency
4 Bandwidth (BW) 
4 Q-Factor (Qf) 
12 Total
Dia, Sys
Domain
4 Max  Statistical
4 Mean 
4 Variance 
4 Skewness 
4 Kurtosis 
4 Sample Entropy (SampEn) 
4 Shannon Entropy (SE1) 
28 Total

Qty Feature(s)
Shannon Entropy (SE)
Per S1,S2,
Dia, Sys
Domain
6 SE2 (5-level wavelet) Statistical and Wavelet
24 SE3 (5-level wavelet) 
24 SE4 (5-level wavelet) 
54 Total

Classification Phase
• Nested set of ensemble classifiers:
– Cost-Sensitive Classifier (CSC)
– LogitBoost(LB)
– Random Forest(RF)
• Trained and tested on both Physionet2016
datasets (10-fold stratified cross-validation).

Random Forest (RF):
• Meta-learning approach that
uses multiple random
decision trees as base
learners and aggregates them
to compute final ensemble
prediction.
• involves sampling of input
data with replacement
(bootstrap).
An RF has three parameters that can affect its performance:
• Number of features (NF)
• Number of trees (NT)
• Maximum depth of tree (MDT)

• LogitBoost (LB) is a meta-learning that is used for model optimization
• It performs additive logistic regression and generates the individual models fj.
• It maximizes the probability of the data with respect to the ensemble if each
model fj is determined by minimizing the squared error on the corresponding
regression problem.
• The algorithm converges to the maximum likelihood linear logistic regression
model

Cost-Sensitive Classifier (CSC)
• meta-classifier with cost-sensitive base
classifier (i.e., LB+RF)
• misclassification penalties associated with
each outcome of confusion matrix:
Actual
Predicted
Positive Negative
Positive 0 CFN
Negative CFP 0

Results and Discussions

Results of the Best Entry
• Se:93.7%
• Sp:87.3%
• MAcc: 88.6%
• LB-IT=3
• CM: 0 8
1 0
0.54
0.46
0.57
0
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6 7 8 9 10
Iteration
Out of Bag Error
CFN
Iteration OOB Se (%) Sp (%) Macc (%)
1 0.54 100 0 21.1
3 0.46 93.7 87.3 88.6
10 0.57 83.9 94.8 92.5
Results using 100% of the dataset

Sample Tree from Best Entry

Sample Tree from Best Entry
• Tree root: Sample Entropy of Diastole

Conclusions
• Promising approach for classifying heart
sounds recorded in heterogeneous
environments.
• Detector performance strongly depends on
data quality.
• #Examples of minority class limited ability:
– to represent well subpopulations of various
abnormalities attributable to heart disease?
– for classifiers to learn adequately?

Future Works
• Employ pre-processing methods to determine
the most discriminating features from our
large set and to gain insight into developing
more improved features.

Questions
?

Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms

More Related Content

Similar to Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms

More from Simon Bolivar University

Recently uploaded

Automatic Heart Sound Recording Classification using a Nested Set of Ensemble Algorithms