Multi-Label Modality Classification for Figures in Biomedical Literature

•Download as PPTX, PDF•

1 like•106 views

CBMS 2017 presentation on multi-Label modality classification for figures in biomedical literature. Presenting three different multi-label approaches to classify biomedical figures from PubMed Central and MEDIEVAL, a web application where you can easily search and filter by modality PMC figures.

Data & Analytics

Multi-Label Modality
Classification for
Figures in Biomedical
Literature
Athanasios Lagopoulos, PhD Student
School of Informatics, Aristotle University of Thessaloniki
lathanag@csd.auth.gr
Anestis Fachantidis (afa@csd.auth.gr), Grigorios Tsoumakas (greg@csd.auth.gr)
30th International Symposium
on Computer-Based Medical
Systems - IEEE CBMS 2017

The Case
PubMed Central (PMC):
• >4 million figures available
• Great source of information for biomedical
research, education and clinical decision.
Lack of associated meta-data
Inaccessible information

30 different modalities/categories
as proposed by ImageCLEF

Simple Vs Compound
40% are compound figures (multi-panel format)
Simple (60%) Compound (40%)

The Standard Approach
Compound
Figure
Detection
Multi-class
Model
Simple Figure
Compound Subfigures
Figure Separation
Algorithm
Figure Separation is not perfect (~85%)
Figure Isolation - Information Loss

Our Approach: Multi-Label Classification
No use of figure separation algorithm
Three different multi-label learning approaches:
• Simple
• Standard
• Extended

Simple multi-label approach
Multi-label
model
Compound Simple
Training
Prediction

Standard multi-label approach
Compound
Figure
Detection
Simple
Compound
Multi-class
Model
Multi-label
Model
Compound
Figure
Detection
Multi-class
Model
Multi-label
Model
Compound Simple
PredictionTraining

Extended multi-label approach
Compound
Figure
Detection
Simple
Compound
Multi-class
Model
Multi-label
Model
Compound
Figure
Detection
Multi-class
Model
Multi-label
Model
Compound Simple
PredictionTraining

Model Training
Feature Extraction from JPEG
• BVLC model - Caffe1
• Deep learning (1.2 million images)
• 4096 visual features/figure
Linear Support Vector Machines (SVMs)
• scikit-learn2
• One-vs-Rest transformation (multiple binaries)
1
http://caffe.berkeleyvision.org/
2
http://scikit-learn.org/

ImageCLEF 2016 dataset
20.985 Figures
1.568 Compound
No simple figures with categories
Extracted subfigures as simple
Split 40% - 60% (compound –
subfigures) in order to resemble
the distribution of PMC

Results
Approach F1-Macro F1-Micro F1-Samples
Standard 0.3569 0.7786 0.7912
Simple multi-label 0.3139 0.7581 0.7215
Standard multi-label 0.3270 0.7667 0.7726
Extended multi-label 0.3309 0.7666 0.7728
Perfect (100%)
figure separation
Compound Figure Detection: 88,83% (Balanced Accuracy)
Multi-class model: 79,54% (F1-micro)

The System
Web app
Weekly updates from PMC
Extended multi-label approach
Easy search & filtering by modality
Build with Apache Solr & AngularJS
Available @
atypon.csd.auth.gr/medieval/

Conclusion
No information loss
Model redundancy
Promising results
Web application

Future Work
Backend:
• Textual features + Visual features
Frontend:
• User feedback, crowdsourcing
• Active learning

THANK YOU!
intelligence.csd.auth.gr
Questions?
atypon.csd.auth.gr/medieval
Partially funded by
A. Lagopoulos, A. Fachantidis, G. Tsoumakas
Multi-Label Modality Classification for
Figures in Biomedical Literature
{lathanag,afa,greg}@csd.auth.gr
Atypon Systems Inc.

Similar to Multi-Label Modality Classification for Figures in Biomedical Literature

A Framework for Statistical Simulation of Physiological Responses (SSPR).Waqas Tariq

Integrative Networks Centric BioinformaticsNatalio Krasnogor

NTU-2019FranciscoJAzuajeG

Session ii g2 overview chemical modeling mmcUSD Bioinformatics

Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Seattle DAML meetup

Developing a Replicable Methodology for Automated Identification of Emerging ...University of Michigan Taubman Health Sciences Library

Top 50 ML Ques & Ans.pdfJetender Sharma

Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...Waqas Tariq

Big Data & ML for Clinical DataPaul Agapow

Large scale machine learning challenges for systems biologyMaté Ongenaert

Machine learning to solve bioinformatics problemsJunaidAKG

Ijetr021252Engineering Research Publication

Challenges and opportunities for machine learning in biomedical researchFranciscoJAzuajeG

The Power of Topology - Colleen Farrelly - WiDS Miami 2018Catalina Arango

Women in Data Science 2018 Slides--Small Samples, Subgroups, and TopologyColleen Farrelly

Evolution of Knowledge Discovery and Management inscit2006

A preliminary survey on optimized multiobjective metaheuristic methods for da...ijcsit

Theory and Practice of Integrating Machine Learning and Conventional Statisti...University of Malaya

Real life application of statistics in engineeringJannatulFerdous160

Clinical Data and AIStefano Paluello

Similar to Multi-Label Modality Classification for Figures in Biomedical Literature (20)

A Framework for Statistical Simulation of Physiological Responses (SSPR).

Integrative Networks Centric Bioinformatics

NTU-2019

Session ii g2 overview chemical modeling mmc

Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...

Developing a Replicable Methodology for Automated Identification of Emerging ...

Top 50 ML Ques & Ans.pdf

Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...

Big Data & ML for Clinical Data

Large scale machine learning challenges for systems biology

Machine learning to solve bioinformatics problems

Ijetr021252

Challenges and opportunities for machine learning in biomedical research

The Power of Topology - Colleen Farrelly - WiDS Miami 2018

Women in Data Science 2018 Slides--Small Samples, Subgroups, and Topology

Evolution of Knowledge Discovery and Management

A preliminary survey on optimized multiobjective metaheuristic methods for da...

Theory and Practice of Integrating Machine Learning and Conventional Statisti...

Real life application of statistics in engineering

Clinical Data and AI

Recently uploaded

社内勉強会資料　Mamba - A new era or ephemeralNABLAS株式会社

一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo

Machine Learning for Accident Severity PredictionBoston Institute of Analytics

basics of data science with application areas.pdfvyankatesh1

Artificial_General_Intelligence__storm_gen_article.pdfscitechtalktv

AI Imagen for data-storytelling Infographics.pdfMichaelSenkow

How I opened a fake bank account and didn't go to prisonPayment Village

Generative AI for Trailblazers_ Unlock the Future of AI.pdfEmmanuel Dauda

2024 Q2 Orange County (CA) Tableau User Group MeetingAlison Pitt

Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole

一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag

Pre-ProductionImproveddsfjgndflghtgg.pptxStephen266013

Easy and simple project file on mp onlinebalibahu1313

2024 Q1 Tableau User Group Leader Quarterly Calllward7

Fuzzy Sets decision making under information of uncertaintyRafigAliyev2

Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen

一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理pyhepag

Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc

Exploratory Data Analysis - Dilip S.pptxDilipVasan

Slip-and-fall Injuries: Top Workers' Comp ClaimsBisnar Chase Personal Injury Attorneys

Recently uploaded (20)

社内勉強会資料　Mamba - A new era or ephemeral

一比一原版纽卡斯尔大学毕业证成绩单如何办理

Machine Learning for Accident Severity Prediction

basics of data science with application areas.pdf

Artificial_General_Intelligence__storm_gen_article.pdf

AI Imagen for data-storytelling Infographics.pdf

How I opened a fake bank account and didn't go to prison

Generative AI for Trailblazers_ Unlock the Future of AI.pdf

2024 Q2 Orange County (CA) Tableau User Group Meeting

Supply chain analytics to combat the effects of Ukraine-Russia-conflict

一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理

Pre-ProductionImproveddsfjgndflghtgg.pptx

Easy and simple project file on mp online

2024 Q1 Tableau User Group Leader Quarterly Call

Fuzzy Sets decision making under information of uncertainty

Atlantic Grupa Case Study (Mintec Data AI)

一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理

Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs

Exploratory Data Analysis - Dilip S.pptx

Slip-and-fall Injuries: Top Workers' Comp Claims

Multi-Label Modality Classification for Figures in Biomedical Literature

1. Multi-Label Modality Classification for Figures in Biomedical Literature Athanasios Lagopoulos, PhD Student School of Informatics, Aristotle University of Thessaloniki lathanag@csd.auth.gr Anestis Fachantidis (afa@csd.auth.gr), Grigorios Tsoumakas (greg@csd.auth.gr) 30th International Symposium on Computer-Based Medical Systems - IEEE CBMS 2017

2. The Case PubMed Central (PMC): • >4 million figures available • Great source of information for biomedical research, education and clinical decision. Lack of associated meta-data Inaccessible information

3. 30 different modalities/categories as proposed by ImageCLEF

4. Simple Vs Compound 40% are compound figures (multi-panel format) Simple (60%) Compound (40%)

5. The Standard Approach Compound Figure Detection Multi-class Model Simple Figure Compound Subfigures Figure Separation Algorithm Figure Separation is not perfect (~85%) Figure Isolation - Information Loss

6. Our Approach: Multi-Label Classification No use of figure separation algorithm Three different multi-label learning approaches: • Simple • Standard • Extended

7. Simple multi-label approach Multi-label model Compound Simple Training Prediction

8. Standard multi-label approach Compound Figure Detection Simple Compound Multi-class Model Multi-label Model Compound Figure Detection Multi-class Model Multi-label Model Compound Simple PredictionTraining

9. Extended multi-label approach Compound Figure Detection Simple Compound Multi-class Model Multi-label Model Compound Figure Detection Multi-class Model Multi-label Model Compound Simple PredictionTraining

10. Model Training Feature Extraction from JPEG • BVLC model - Caffe1 • Deep learning (1.2 million images) • 4096 visual features/figure Linear Support Vector Machines (SVMs) • scikit-learn2 • One-vs-Rest transformation (multiple binaries) 1 http://caffe.berkeleyvision.org/ 2 http://scikit-learn.org/

11. ImageCLEF 2016 dataset 20.985 Figures 1.568 Compound No simple figures with categories Extracted subfigures as simple Split 40% - 60% (compound – subfigures) in order to resemble the distribution of PMC

12. Results Approach F1-Macro F1-Micro F1-Samples Standard 0.3569 0.7786 0.7912 Simple multi-label 0.3139 0.7581 0.7215 Standard multi-label 0.3270 0.7667 0.7726 Extended multi-label 0.3309 0.7666 0.7728 Perfect (100%) figure separation Compound Figure Detection: 88,83% (Balanced Accuracy) Multi-class model: 79,54% (F1-micro)

13. The System Web app Weekly updates from PMC Extended multi-label approach Easy search & filtering by modality Build with Apache Solr & AngularJS Available @ atypon.csd.auth.gr/medieval/

14. Conclusion No information loss Model redundancy Promising results Web application

15. Future Work Backend: • Textual features + Visual features Frontend: • User feedback, crowdsourcing • Active learning

16. THANK YOU! intelligence.csd.auth.gr Questions? atypon.csd.auth.gr/medieval Partially funded by A. Lagopoulos, A. Fachantidis, G. Tsoumakas Multi-Label Modality Classification for Figures in Biomedical Literature {lathanag,afa,greg}@csd.auth.gr Atypon Systems Inc.

Multi-Label Modality Classification for Figures in Biomedical Literature

Recommended

Recommended

More Related Content

Similar to Multi-Label Modality Classification for Figures in Biomedical Literature

Similar to Multi-Label Modality Classification for Figures in Biomedical Literature (20)

Recently uploaded

Recently uploaded (20)

Multi-Label Modality Classification for Figures in Biomedical Literature