Towards diagnosis of rotator cuff tears in 3-D MRI using 3-D convolutional neural networks. Paper presented at the Workshop on Computational Biology at the International Conference on Machine Learning, Long Beach, USA, 2019.
Towards diagnosis of rotator cuff tears in 3-D MRI using 3-D convolutional neural networks
1. TOWARDS DIAGNOSIS OF ROTATOR CUFF TEARS IN 3-D MRI USING 3-D
CONVOLUTIONAL NEURAL NETWORKS
CENTER FOR BIOTECH DATA SCIENCE | IDLAB
Mijung Kim, Ho-Min Park, Jae Yoon Kim, Sofie Van Hoecke, and Wesley De Neve
Contact Info : Mijung Kim (mijung.kim@ugent.be) | webpage: https://users.ugent.be/~mijkim
| 2019
Thirty-sixth International Conference on
Machine Learning @ Long Beach, USA
Do you feel pain when:
• At rest and at night, particularly if lying on the
affected shoulder?
• Lifting or lowering your arms?
You might have rotator cuff tears in your shoulder.
Do you want to know what a rotator cuff tear is?
Yes
• Note: this is a toy test for detecting rotator cuff tears.
• Source: American Academy of Orthopaedic Surgeons (AAOS)
• https://orthoinfo.aaos.org/en/diseases--conditions/rotator-cuff-tears/
Fig. 1. Anatomy of a shoulder (image courtesy: Wikipedia)
Rotator cuff tears (RCTs) are the major cause of
musculoskeletal pain in shoulders. They are prevalent among
middle-aged people and older. Fig. 1. illustrates the rotator
cuff anatomy. Taking into account severity, tears can be
classified as partial-thickness tears or full-thickness tears.
Do you want to learn more about a first model for computer-aided
diagnosis of RCTs using 3-D convolutional neural networks?
Yes
Motivation: An increasing number of computer-aided diagnosis (CAD) models
are available for brain, lung, and eye diseases, hereby using convolutional
neural networks (CNNs). However, no such model is available for diagnosis of
rotator cuff tears. Moreover, the use of 3-D CNNs for CAD is not well understood.
Challenges: Small-sized datasets, imbalanced datasets, 3-D data nature
Our Approach: We apply a 3-D CNN to magnetic resonance images (MRI) of
shoulders to identify normal, partially-torn, or fully-torn rotator cuffs.
Fig. 2. Augmented input clips of 16 images are used for fine-tuning a 3-D CNN that consists of eight 3-D
convolutional layers. The layers within the dashed bounding box were pre-trained using the UCF101 video dataset.
The obtained weights were then transferred and fine-tuned using our shoulder 3-D MRI dataset. The resulting 3-D
CNN is then able to determine whether or not an unseen input clip contains torn tendon slices.
Table `1. Our Dataset: 2,447 Shoulder MRI Scans (Examinations)
Statistics Training Validation Testing
Total number of examinations (%) 1,963 (100) 242 (100) 242 (100)
- # of normal examinations (%) 1,308 (66.6) 160 (66.1) 160 (66.1)
- # of partial-thickness tear examinations (%) 125 (6.4) 16 (6.6) 16 (6.6)
- # of full-thickness tear examinations (%) 530 (27) 66(27.3) 66 (27.3)
Total number of patients 1,847 231 228
- # of female patients (%) 942 (51) 115 (49) 134 (58)
- Age mean by the number of patients (std.) 56 (14.8) 57 (14.9) 56 (14.6)
Table 2. Experimental Results
Experimental Details
Model Accuracy Precision Recall F1 score M-AUC m-AUC
Logistic Regression 0.72 0.67 0.72 0.67 0.59 0.79
AdaBoost 0.66 0.44 0.66 0.53 0.50 0.75
K-Nearest Neighbors 0.67 0.61 0.67 0.63 0.56 0.76
Decision Tree 0.68 0.63 0.68 0.65 0.60 0.76
Random Forest 0.73 0.72 0.73 0.66 0.58 0.80
Multi Layer Perceptron 0.71 0.66 0.71 0.66 0.58 0.79
Gaussian NB 0.52 0.67 0.52 0.56 0.61 0.64
Quadratic Discriminant Analysis 0.57 0.55 0.57 0.56 0.54 0.68
Gaussian Process 0.61 0.60 0.61 0.60 0.57 0.71
XGBoost 0.74 0.69 0.74 0.69 0.61 0.80
Our approach 0.87 0.81 0.87 0.84 0.87 0.96
Are your shoulders healthy?
The shoulder MRI scans have been collected at Chung-
Ang University Hospital, Seoul, Korea. The detailed
statistics are summarized in Table 1.
As shown in Table 2, the overall diagnosis accuracy of our
approach is 0.87, which is significantly higher than any of
the other machine learning approaches implemented. In
addition, we achieved the highest macro- and micro-AUC
scores. Fig. 3 shows the AUC scores of each class.
Note: the data imbalance problem remains unsolved,
with partial-thickness tears only representing 6.6% of
the total number of MRI scans.
Fig. 3. AUC scores of each class
Fig. 4(a) shows that the
precision-recall scores are
high for normal and full-
thickness tears.
Fig. 4(b) shows that all
partial-thickness tears
were incorrectly classified,
a problem that will be
addressed by future work.
(a) Precision-recall curve (b) Confusion matrix
Fig. 4. Performance metrics
Future Work
We plan to perform a more extensive exploration of deep learning models using different 3-D CNN
approaches, with the goal of further improving our diagnosis effectiveness. We also need to examine
the generalizability of our models by testing on external datasets. Finally, by overcoming the
imbalanced dataset problem, we should be able to move from coarse- to fine-grained classification.