CBMS 2017 presentation on multi-Label modality classification for figures in biomedical literature. Presenting three different multi-label approaches to classify biomedical figures from PubMed Central and MEDIEVAL, a web application where you can easily search and filter by modality PMC figures.
Multi-Label Modality Classification for Figures in Biomedical Literature
1. Multi-Label Modality
Classification for
Figures in Biomedical
Literature
Athanasios Lagopoulos, PhD Student
School of Informatics, Aristotle University of Thessaloniki
lathanag@csd.auth.gr
Anestis Fachantidis (afa@csd.auth.gr), Grigorios Tsoumakas (greg@csd.auth.gr)
30th International Symposium
on Computer-Based Medical
Systems - IEEE CBMS 2017
2. The Case
PubMed Central (PMC):
• >4 million figures available
• Great source of information for biomedical
research, education and clinical decision.
Lack of associated meta-data
Inaccessible information
6. Our Approach: Multi-Label Classification
No use of figure separation algorithm
Three different multi-label learning approaches:
• Simple
• Standard
• Extended
10. Model Training
Feature Extraction from JPEG
• BVLC model - Caffe1
• Deep learning (1.2 million images)
• 4096 visual features/figure
Linear Support Vector Machines (SVMs)
• scikit-learn2
• One-vs-Rest transformation (multiple binaries)
1
http://caffe.berkeleyvision.org/
2
http://scikit-learn.org/
11. ImageCLEF 2016 dataset
20.985 Figures
1.568 Compound
No simple figures with categories
Extracted subfigures as simple
Split 40% - 60% (compound –
subfigures) in order to resemble
the distribution of PMC
13. The System
Web app
Weekly updates from PMC
Extended multi-label approach
Easy search & filtering by modality
Build with Apache Solr & AngularJS
Available @
atypon.csd.auth.gr/medieval/