SlideShare a Scribd company logo
Multi-Label Modality
Classification for
Figures in Biomedical
Literature
Athanasios Lagopoulos, PhD Student
School of Informatics, Aristotle University of Thessaloniki
lathanag@csd.auth.gr
Anestis Fachantidis (afa@csd.auth.gr), Grigorios Tsoumakas (greg@csd.auth.gr)
30th International Symposium
on Computer-Based Medical
Systems - IEEE CBMS 2017
The Case
PubMed Central (PMC):
• >4 million figures available
• Great source of information for biomedical
research, education and clinical decision.
Lack of associated meta-data
Inaccessible information
30 different modalities/categories
as proposed by ImageCLEF
Simple Vs Compound
40% are compound figures (multi-panel format)
Simple (60%) Compound (40%)
The Standard Approach
Compound
Figure
Detection
Multi-class
Model
Simple Figure
Compound Subfigures
Figure Separation
Algorithm
Figure Separation is not perfect (~85%)
Figure Isolation - Information Loss
Our Approach: Multi-Label Classification
No use of figure separation algorithm
Three different multi-label learning approaches:
• Simple
• Standard
• Extended
Simple multi-label approach
Multi-label
model
Compound Simple
Training
Prediction
Standard multi-label approach
Compound
Figure
Detection
Simple
Compound
Multi-class
Model
Multi-label
Model
Compound
Figure
Detection
Multi-class
Model
Multi-label
Model
Compound Simple
PredictionTraining
Extended multi-label approach
Compound
Figure
Detection
Simple
Compound
Multi-class
Model
Multi-label
Model
Compound
Figure
Detection
Multi-class
Model
Multi-label
Model
Compound Simple
PredictionTraining
Model Training
Feature Extraction from JPEG
• BVLC model - Caffe1
• Deep learning (1.2 million images)
• 4096 visual features/figure
Linear Support Vector Machines (SVMs)
• scikit-learn2
• One-vs-Rest transformation (multiple binaries)
1
http://caffe.berkeleyvision.org/
2
http://scikit-learn.org/
ImageCLEF 2016 dataset
20.985 Figures
1.568 Compound
No simple figures with categories
Extracted subfigures as simple
Split 40% - 60% (compound –
subfigures) in order to resemble
the distribution of PMC
Results
Approach F1-Macro F1-Micro F1-Samples
Standard 0.3569 0.7786 0.7912
Simple multi-label 0.3139 0.7581 0.7215
Standard multi-label 0.3270 0.7667 0.7726
Extended multi-label 0.3309 0.7666 0.7728
Perfect (100%)
figure separation
Compound Figure Detection: 88,83% (Balanced Accuracy)
Multi-class model: 79,54% (F1-micro)
The System
Web app
Weekly updates from PMC
Extended multi-label approach
Easy search & filtering by modality
Build with Apache Solr & AngularJS
Available @
atypon.csd.auth.gr/medieval/
Conclusion
No information loss
Model redundancy
Promising results
Web application
Future Work
Backend:
• Textual features + Visual features
Frontend:
• User feedback, crowdsourcing
• Active learning
THANK YOU!
intelligence.csd.auth.gr
Questions?
atypon.csd.auth.gr/medieval
Partially funded by
A. Lagopoulos, A. Fachantidis, G. Tsoumakas
Multi-Label Modality Classification for
Figures in Biomedical Literature
{lathanag,afa,greg}@csd.auth.gr
Atypon Systems Inc.

More Related Content

Similar to Multi-Label Modality Classification for Figures in Biomedical Literature

A Framework for Statistical Simulation of Physiological Responses (SSPR).
A Framework for Statistical Simulation of Physiological Responses (SSPR).A Framework for Statistical Simulation of Physiological Responses (SSPR).
A Framework for Statistical Simulation of Physiological Responses (SSPR).
Waqas Tariq
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric Bioinformatics
Natalio Krasnogor
 
NTU-2019
NTU-2019NTU-2019
Session ii g2 overview chemical modeling mmc
Session ii g2 overview chemical modeling mmcSession ii g2 overview chemical modeling mmc
Session ii g2 overview chemical modeling mmc
USD Bioinformatics
 
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Seattle DAML meetup
 
Developing a Replicable Methodology for Automated Identification of Emerging ...
Developing a Replicable Methodology for Automated Identification of Emerging ...Developing a Replicable Methodology for Automated Identification of Emerging ...
Developing a Replicable Methodology for Automated Identification of Emerging ...
University of Michigan Taubman Health Sciences Library
 
Top 50 ML Ques & Ans.pdf
Top 50 ML Ques & Ans.pdfTop 50 ML Ques & Ans.pdf
Top 50 ML Ques & Ans.pdf
Jetender Sharma
 
Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...
Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...
Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...
Waqas Tariq
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
Paul Agapow
 
Large scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyLarge scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biology
Maté Ongenaert
 
Machine learning to solve bioinformatics problems
Machine learning to solve bioinformatics problemsMachine learning to solve bioinformatics problems
Machine learning to solve bioinformatics problems
JunaidAKG
 
Ijetr021252
Ijetr021252Ijetr021252
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical research
FranciscoJAzuajeG
 
The Power of Topology - Colleen Farrelly - WiDS Miami 2018
The Power of Topology - Colleen Farrelly - WiDS Miami 2018The Power of Topology - Colleen Farrelly - WiDS Miami 2018
The Power of Topology - Colleen Farrelly - WiDS Miami 2018
Catalina Arango
 
Women in Data Science 2018 Slides--Small Samples, Subgroups, and Topology
Women in Data Science 2018 Slides--Small Samples, Subgroups, and TopologyWomen in Data Science 2018 Slides--Small Samples, Subgroups, and Topology
Women in Data Science 2018 Slides--Small Samples, Subgroups, and Topology
Colleen Farrelly
 
Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management
inscit2006
 
A preliminary survey on optimized multiobjective metaheuristic methods for da...
A preliminary survey on optimized multiobjective metaheuristic methods for da...A preliminary survey on optimized multiobjective metaheuristic methods for da...
A preliminary survey on optimized multiobjective metaheuristic methods for da...
ijcsit
 
Theory and Practice of Integrating Machine Learning and Conventional Statisti...
Theory and Practice of Integrating Machine Learning and Conventional Statisti...Theory and Practice of Integrating Machine Learning and Conventional Statisti...
Theory and Practice of Integrating Machine Learning and Conventional Statisti...
University of Malaya
 
Real life application of statistics in engineering
Real life application of statistics in engineeringReal life application of statistics in engineering
Real life application of statistics in engineering
JannatulFerdous160
 
Clinical Data and AI
Clinical Data and AIClinical Data and AI
Clinical Data and AI
Stefano Paluello
 

Similar to Multi-Label Modality Classification for Figures in Biomedical Literature (20)

A Framework for Statistical Simulation of Physiological Responses (SSPR).
A Framework for Statistical Simulation of Physiological Responses (SSPR).A Framework for Statistical Simulation of Physiological Responses (SSPR).
A Framework for Statistical Simulation of Physiological Responses (SSPR).
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric Bioinformatics
 
NTU-2019
NTU-2019NTU-2019
NTU-2019
 
Session ii g2 overview chemical modeling mmc
Session ii g2 overview chemical modeling mmcSession ii g2 overview chemical modeling mmc
Session ii g2 overview chemical modeling mmc
 
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
 
Developing a Replicable Methodology for Automated Identification of Emerging ...
Developing a Replicable Methodology for Automated Identification of Emerging ...Developing a Replicable Methodology for Automated Identification of Emerging ...
Developing a Replicable Methodology for Automated Identification of Emerging ...
 
Top 50 ML Ques & Ans.pdf
Top 50 ML Ques & Ans.pdfTop 50 ML Ques & Ans.pdf
Top 50 ML Ques & Ans.pdf
 
Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...
Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...
Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...
 
Big Data & ML for Clinical Data
Big Data & ML for Clinical DataBig Data & ML for Clinical Data
Big Data & ML for Clinical Data
 
Large scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biologyLarge scale machine learning challenges for systems biology
Large scale machine learning challenges for systems biology
 
Machine learning to solve bioinformatics problems
Machine learning to solve bioinformatics problemsMachine learning to solve bioinformatics problems
Machine learning to solve bioinformatics problems
 
Ijetr021252
Ijetr021252Ijetr021252
Ijetr021252
 
Challenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical researchChallenges and opportunities for machine learning in biomedical research
Challenges and opportunities for machine learning in biomedical research
 
The Power of Topology - Colleen Farrelly - WiDS Miami 2018
The Power of Topology - Colleen Farrelly - WiDS Miami 2018The Power of Topology - Colleen Farrelly - WiDS Miami 2018
The Power of Topology - Colleen Farrelly - WiDS Miami 2018
 
Women in Data Science 2018 Slides--Small Samples, Subgroups, and Topology
Women in Data Science 2018 Slides--Small Samples, Subgroups, and TopologyWomen in Data Science 2018 Slides--Small Samples, Subgroups, and Topology
Women in Data Science 2018 Slides--Small Samples, Subgroups, and Topology
 
Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management Evolution of Knowledge Discovery and Management
Evolution of Knowledge Discovery and Management
 
A preliminary survey on optimized multiobjective metaheuristic methods for da...
A preliminary survey on optimized multiobjective metaheuristic methods for da...A preliminary survey on optimized multiobjective metaheuristic methods for da...
A preliminary survey on optimized multiobjective metaheuristic methods for da...
 
Theory and Practice of Integrating Machine Learning and Conventional Statisti...
Theory and Practice of Integrating Machine Learning and Conventional Statisti...Theory and Practice of Integrating Machine Learning and Conventional Statisti...
Theory and Practice of Integrating Machine Learning and Conventional Statisti...
 
Real life application of statistics in engineering
Real life application of statistics in engineeringReal life application of statistics in engineering
Real life application of statistics in engineering
 
Clinical Data and AI
Clinical Data and AIClinical Data and AI
Clinical Data and AI
 

Recently uploaded

Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
sapna sharmap11
 
Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
mukulupadhayay1
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
zsafxbf
 
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
PsychoTech Services
 
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service LucknowCall Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
hiju9823
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
GeorgiiSteshenko
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
Timothy Spann
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
Vineet
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
uevausa
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
ugydym
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
Timothy Spann
 
Senior Engineering Sample EM DOE - Sheet1.pdf
Senior Engineering Sample EM DOE  - Sheet1.pdfSenior Engineering Sample EM DOE  - Sheet1.pdf
Senior Engineering Sample EM DOE - Sheet1.pdf
Vineet
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
newdirectionconsulta
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
ArshadAyub49
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 

Recently uploaded (20)

Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
Call Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call GirlCall Girls Hyderabad  (india) ☎️ +91-7426014248 Hyderabad  Call Girl
Call Girls Hyderabad (india) ☎️ +91-7426014248 Hyderabad Call Girl
 
Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
 
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
Interview Methods - Marital and Family Therapy and Counselling - Psychology S...
 
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service LucknowCall Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
Call Girls Lucknow 0000000000 Independent Call Girl Service Lucknow
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
 
Senior Engineering Sample EM DOE - Sheet1.pdf
Senior Engineering Sample EM DOE  - Sheet1.pdfSenior Engineering Sample EM DOE  - Sheet1.pdf
Senior Engineering Sample EM DOE - Sheet1.pdf
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 
SAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content DocumentSAP BW4HANA Implementagtion Content Document
SAP BW4HANA Implementagtion Content Document
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 

Multi-Label Modality Classification for Figures in Biomedical Literature