SlideShare a Scribd company logo
1 of 19
Download to read offline
Studying Public Medical Images from the Open Access
Literature and Social Networks for Model Training and
Knowledge Extraction
Vincent Andrearczyk
HES-SO, Switzerland
MMM 2020, 08.01.2020
Henning Müller, Vincent Andrearczyk, Oscar Jimenez, Anjani Dhrangadhariya,
Roger Schaer, and Manfredo Atzori
Motivation
• Deep learning has been a driving force for
improving many applications of image analysis
• Complex networks require large amounts of
training data
- Data diversity is important for generalizability
• Most medical data sets have strong class
imbalances (rare diseases)
- Rare diseases require data from multiple centers
making the organization complex
• Many resources that include images have become
available in the past few years
- PubMed Central, TCIA, social networks, etc.
Objectives of this article
• Summarize existing approaches that harvest
public data
– Focusing on PubMed Central and social networks
• Highlight advantages and difficulties in exploiting
the data
– (+) Very diverse data
– (+) Rare cases are oversampled
– (-) Much pre-treatment and filtering is required
• Develop next steps required to fully use the data
PubMed Central
• Repository with the biomedical open access
literature, including images as files, etc.
– 3-4 images per article,
PubMed Central
• Repository with the biomedical open access
literature, including images as files, etc.
– 3-4 images per article,
– increasing # articles
Methodology for finding articles
• Analysis of tasks of ImageCLEF and work done
on these tasks using data from ImageCLEF
– Over the past 12 years
– Steps of filtering out data taken from this
• Use of Google scholar to add references
– Terms “medical image classification”, “publicly
accessible resources”, “medical literature”,
“machine learning” were combined
• Dynamically growing data sets were favored
• Journal papers were referenced over
conference publication
Image retrieval
• Allows to search for images with text
– Or semantic terms such as UMLS or MeSH
• Content-based image retrieval
Demner-Fushman, et al. (2012), Journal of Computing Science and Engineering
Structuring the visual content
• Define types of images to make the literature
images classifiable
– Extremely large variety in most categories
– Many sub-categories are possible
– Categories with clinical relevance
are most important
– Allows removing noise
– Compound figures
are separately treated
[ImageCLEF 2013]
Challenges in the data
• Look-alikes
– Much strange content that needs to be removed
Challenges in the data
• Look-alikes
– Much strange content that needs to be removed
• Compound figures can not easily be classified,
as they may contain aspects of several classes
– Cutting them into subfigures makes content
accessible
Meta data available for PMC
• Text of the figure caption
– Relatively specific but often short
– Hard for compound figures that contain many parts
• Full text of the article
– Non specific for individual figures
– Location of the figure is available
• Article title and author-generated key words
• Global MeSH terms (Manually attached)
– Cover species and organs
• Not all is available for all articles (incomplete)
Tasks to make figures accessible
• Removing very small images & strange aspect
ratios
• Classify figures into figure types
– Using image data and also text
– Remove non-relevant images, e.g. flowcharts
• Detect and cut compound figures into their parts
– Classify these into figure types again
• Filter human and animal tissue
• Filter specific organs of interest
• Find diseases or grading/staging
– Ground truth classes for machine learning
Advantages of literature images
• Rare images are generally used for articles and
case descriptions
– Mostly extreme cases to share the knowledge
on them
– Creates critical mass for rare diseases
• Images are from many laboratories and thus
contain many image variations
– Increase generalizability of learned models
• Exponentially increasing content
Problems with filtered images
• Many images might be missed by automatic
filtering
• Ground truth is not always solid
• Images might not have clinical quality
– Grey level resolution
– No information on level/window setting
– Cropped images, arrows in images, other overlays
• Size of the images is often small for publications
• Scale of images is not known (can be detected)
Otalora et al. (2018) MICCAI 2018
An example of Twitter images
• Images and information posted by pathologists on
Twitter
• Create dataset of histopathology images
• Train machine learning algorithms
– identify stains (H&E, IHC ...)
– discriminate between different tissues
– predict malignant tumors
• Limitations:
– good results (AUROC 0.9) only for simple tasks: H&E
vs rest
Schaumberg et al. (2018), BioRxiv
Next steps
• Quickly increasing content offers many possibilities
– Automatic pipelines need to contain update
mechanisms based on latest imaging equipment
– Community efforts for data curation
• Distribute the class labels with confidence scores
via PMC
• Evaluate impact on machine learning tasks of
adding such diverse sources
Next steps
• We have been working on it!
– Mined out 32,486 light microscopy human rare
cancer images Dhrangadhariya et al. (2020) SPIE2020
– Automatic generalizable filtering pipeline
In preparation: Jimenez et al. (2020) Journal of the American Medical Informatics Association
– Benefits in deep learning clinical tasks … to come
Conclusions
• Images from public resources are complementary to
clinical images for machine learning
– Rare cases, much diversity
– Very large amount of data
• How can we obtain high quality annotations with
limited effort (for example via active learning)
Contact
• More information can be found at
– http://medgift.hevs.ch/
– http://publications.hevs.ch
• Contact:
– vincent.andrearczyk@hevs.ch
– henning.mueller@hevs.ch

More Related Content

What's hot

Learning Analytics: Seeking new insights from educational data
Learning Analytics: Seeking new insights from educational dataLearning Analytics: Seeking new insights from educational data
Learning Analytics: Seeking new insights from educational dataAndrew Deacon
 
A Learning Analytics Approach
A Learning Analytics ApproachA Learning Analytics Approach
A Learning Analytics ApproachMehrnooshV
 
Educational Data Mining/Learning Analytics issue brief overview
Educational Data Mining/Learning Analytics issue brief overviewEducational Data Mining/Learning Analytics issue brief overview
Educational Data Mining/Learning Analytics issue brief overviewMarie Bienkowski
 
Victor (Shengli) Sheng
Victor (Shengli) ShengVictor (Shengli) Sheng
Victor (Shengli) Shengbutest
 
Дмитрий Ветров. Математика больших данных: тензоры, нейросети, байесовский вы...
Дмитрий Ветров. Математика больших данных: тензоры, нейросети, байесовский вы...Дмитрий Ветров. Математика больших данных: тензоры, нейросети, байесовский вы...
Дмитрий Ветров. Математика больших данных: тензоры, нейросети, байесовский вы...Yandex
 
EDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action ResearchEDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action Researcheckchela
 
Open and Collaborative Software for Digital Pathology
Open and Collaborative Software for Digital Pathology Open and Collaborative Software for Digital Pathology
Open and Collaborative Software for Digital Pathology William Baird
 
Case Studies in Teaching and Learning with Social Media in Higher Education
Case Studies  in Teaching and Learning  with Social Media  in Higher EducationCase Studies  in Teaching and Learning  with Social Media  in Higher Education
Case Studies in Teaching and Learning with Social Media in Higher EducationMichael Johnson
 
Connections b/w active learning and model extraction
Connections b/w active learning and model extractionConnections b/w active learning and model extraction
Connections b/w active learning and model extractionAnmol Dwivedi
 
Data Management Lab: Data mapping exercise instructions
Data Management Lab: Data mapping exercise instructionsData Management Lab: Data mapping exercise instructions
Data Management Lab: Data mapping exercise instructionsIUPUI
 
OLT conference Learning analytics
OLT conference Learning analyticsOLT conference Learning analytics
OLT conference Learning analyticsShirley Alexander
 
Presentation pick a card - newman 17-04-13 - final
Presentation   pick a card - newman 17-04-13 - finalPresentation   pick a card - newman 17-04-13 - final
Presentation pick a card - newman 17-04-13 - finalacsizmadia
 
교육분야 성취기준 링크드 데이터 프로파일 설계
교육분야 성취기준 링크드 데이터 프로파일 설계교육분야 성취기준 링크드 데이터 프로파일 설계
교육분야 성취기준 링크드 데이터 프로파일 설계Open Cyber University of Korea
 
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEMMIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEMSunayana Gawde
 
Seminal Works on Education Data Mining and Analytics
Seminal Works on Education Data Mining and AnalyticsSeminal Works on Education Data Mining and Analytics
Seminal Works on Education Data Mining and AnalyticsArdith Conway
 
Wasana (2011) a systematic, tool-supported method for conducting lr in is
Wasana (2011)   a systematic, tool-supported method for conducting lr in isWasana (2011)   a systematic, tool-supported method for conducting lr in is
Wasana (2011) a systematic, tool-supported method for conducting lr in isResearchworkshop
 
Personalized Information Retrieval system using Computational Intelligence Te...
Personalized Information Retrieval system using Computational Intelligence Te...Personalized Information Retrieval system using Computational Intelligence Te...
Personalized Information Retrieval system using Computational Intelligence Te...veningstonk
 

What's hot (20)

Learning Analytics: Seeking new insights from educational data
Learning Analytics: Seeking new insights from educational dataLearning Analytics: Seeking new insights from educational data
Learning Analytics: Seeking new insights from educational data
 
A Learning Analytics Approach
A Learning Analytics ApproachA Learning Analytics Approach
A Learning Analytics Approach
 
Educational Data Mining/Learning Analytics issue brief overview
Educational Data Mining/Learning Analytics issue brief overviewEducational Data Mining/Learning Analytics issue brief overview
Educational Data Mining/Learning Analytics issue brief overview
 
Victor (Shengli) Sheng
Victor (Shengli) ShengVictor (Shengli) Sheng
Victor (Shengli) Sheng
 
Дмитрий Ветров. Математика больших данных: тензоры, нейросети, байесовский вы...
Дмитрий Ветров. Математика больших данных: тензоры, нейросети, байесовский вы...Дмитрий Ветров. Математика больших данных: тензоры, нейросети, байесовский вы...
Дмитрий Ветров. Математика больших данных: тензоры, нейросети, байесовский вы...
 
EDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action ResearchEDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action Research
 
Open and Collaborative Software for Digital Pathology
Open and Collaborative Software for Digital Pathology Open and Collaborative Software for Digital Pathology
Open and Collaborative Software for Digital Pathology
 
Case Studies in Teaching and Learning with Social Media in Higher Education
Case Studies  in Teaching and Learning  with Social Media  in Higher EducationCase Studies  in Teaching and Learning  with Social Media  in Higher Education
Case Studies in Teaching and Learning with Social Media in Higher Education
 
Connections b/w active learning and model extraction
Connections b/w active learning and model extractionConnections b/w active learning and model extraction
Connections b/w active learning and model extraction
 
Semiotics in spreadsheets
Semiotics in spreadsheetsSemiotics in spreadsheets
Semiotics in spreadsheets
 
Data Management Lab: Data mapping exercise instructions
Data Management Lab: Data mapping exercise instructionsData Management Lab: Data mapping exercise instructions
Data Management Lab: Data mapping exercise instructions
 
OLT conference Learning analytics
OLT conference Learning analyticsOLT conference Learning analytics
OLT conference Learning analytics
 
Clinical Anatomy 9566
Clinical Anatomy 9566Clinical Anatomy 9566
Clinical Anatomy 9566
 
CV
CVCV
CV
 
Presentation pick a card - newman 17-04-13 - final
Presentation   pick a card - newman 17-04-13 - finalPresentation   pick a card - newman 17-04-13 - final
Presentation pick a card - newman 17-04-13 - final
 
교육분야 성취기준 링크드 데이터 프로파일 설계
교육분야 성취기준 링크드 데이터 프로파일 설계교육분야 성취기준 링크드 데이터 프로파일 설계
교육분야 성취기준 링크드 데이터 프로파일 설계
 
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEMMIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
 
Seminal Works on Education Data Mining and Analytics
Seminal Works on Education Data Mining and AnalyticsSeminal Works on Education Data Mining and Analytics
Seminal Works on Education Data Mining and Analytics
 
Wasana (2011) a systematic, tool-supported method for conducting lr in is
Wasana (2011)   a systematic, tool-supported method for conducting lr in isWasana (2011)   a systematic, tool-supported method for conducting lr in is
Wasana (2011) a systematic, tool-supported method for conducting lr in is
 
Personalized Information Retrieval system using Computational Intelligence Te...
Personalized Information Retrieval system using Computational Intelligence Te...Personalized Information Retrieval system using Computational Intelligence Te...
Personalized Information Retrieval system using Computational Intelligence Te...
 

Similar to Studying Public Medical Images from Open Access Literature and Social Networks for Model Training and Knowledge Extraction

Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseVaticle
 
Melissa Informatics - Data Quality and AI
Melissa Informatics - Data Quality and AIMelissa Informatics - Data Quality and AI
Melissa Informatics - Data Quality and AImelissadata
 
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...Lauri Eloranta
 
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...Lauri Eloranta
 
The state of the art in integrating machine learning into visual analytics
The state of the art in integrating machine learning into visual analyticsThe state of the art in integrating machine learning into visual analytics
The state of the art in integrating machine learning into visual analyticsCagatay Turkay
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master SpecialisationArjen de Vries
 
The concept of health informatics
The concept of health informatics The concept of health informatics
The concept of health informatics Ebtissam Al-Madi
 
An introduction to machine learning in biomedical research: Key concepts, pr...
An introduction to machine learning in biomedical research:  Key concepts, pr...An introduction to machine learning in biomedical research:  Key concepts, pr...
An introduction to machine learning in biomedical research: Key concepts, pr...FranciscoJAzuajeG
 
University Public Driven Applications - Big Data and Organizational Design
University Public Driven Applications - Big Data and Organizational Design University Public Driven Applications - Big Data and Organizational Design
University Public Driven Applications - Big Data and Organizational Design maria chiara pettenati
 
Visual Information Retrieval: Advances, Challenges and Opportunities
Visual Information Retrieval: Advances, Challenges and OpportunitiesVisual Information Retrieval: Advances, Challenges and Opportunities
Visual Information Retrieval: Advances, Challenges and OpportunitiesOge Marques
 
Leveraging social media for training object detectors
Leveraging social media for training object detectorsLeveraging social media for training object detectors
Leveraging social media for training object detectorsManish Kumar
 
Introduction to Big Data and its Potential for Dementia Research
Introduction to Big Data and its Potential for Dementia ResearchIntroduction to Big Data and its Potential for Dementia Research
Introduction to Big Data and its Potential for Dementia ResearchDavid De Roure
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptxshalini s
 

Similar to Studying Public Medical Images from Open Access Literature and Social Networks for Model Training and Knowledge Extraction (20)

Medical image analysis and big data evaluation infrastructures
Medical image analysis and big data evaluation infrastructuresMedical image analysis and big data evaluation infrastructures
Medical image analysis and big data evaluation infrastructures
 
Medical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructuresMedical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructures
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge Base
 
Melissa Informatics - Data Quality and AI
Melissa Informatics - Data Quality and AIMelissa Informatics - Data Quality and AI
Melissa Informatics - Data Quality and AI
 
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...
 
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
 
U mpres
U mpresU mpres
U mpres
 
The state of the art in integrating machine learning into visual analytics
The state of the art in integrating machine learning into visual analyticsThe state of the art in integrating machine learning into visual analytics
The state of the art in integrating machine learning into visual analytics
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master Specialisation
 
The concept of health informatics
The concept of health informatics The concept of health informatics
The concept of health informatics
 
2015 04-18-wilson cg
2015 04-18-wilson cg2015 04-18-wilson cg
2015 04-18-wilson cg
 
NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...
NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...
NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...
 
An introduction to machine learning in biomedical research: Key concepts, pr...
An introduction to machine learning in biomedical research:  Key concepts, pr...An introduction to machine learning in biomedical research:  Key concepts, pr...
An introduction to machine learning in biomedical research: Key concepts, pr...
 
University Public Driven Applications - Big Data and Organizational Design
University Public Driven Applications - Big Data and Organizational Design University Public Driven Applications - Big Data and Organizational Design
University Public Driven Applications - Big Data and Organizational Design
 
Visual Information Retrieval: Advances, Challenges and Opportunities
Visual Information Retrieval: Advances, Challenges and OpportunitiesVisual Information Retrieval: Advances, Challenges and Opportunities
Visual Information Retrieval: Advances, Challenges and Opportunities
 
Challenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL modelChallenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL model
 
Leveraging social media for training object detectors
Leveraging social media for training object detectorsLeveraging social media for training object detectors
Leveraging social media for training object detectors
 
Introduction to Big Data and its Potential for Dementia Research
Introduction to Big Data and its Potential for Dementia ResearchIntroduction to Big Data and its Potential for Dementia Research
Introduction to Big Data and its Potential for Dementia Research
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
 
MRDB 5
MRDB 5MRDB 5
MRDB 5
 

More from Institute of Information Systems (HES-SO)

Classification of noisy free-text prostate cancer pathology reports using nat...
Classification of noisy free-text prostate cancer pathology reports using nat...Classification of noisy free-text prostate cancer pathology reports using nat...
Classification of noisy free-text prostate cancer pathology reports using nat...Institute of Information Systems (HES-SO)
 
Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Machine learning assisted citation screening for Systematic Reviews - Anjani ...Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Machine learning assisted citation screening for Systematic Reviews - Anjani ...Institute of Information Systems (HES-SO)
 
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...Institute of Information Systems (HES-SO)
 
Risques opérationnels et le système de contrôle interne : les limites d’un te...
Risques opérationnels et le système de contrôle interne : les limites d’un te...Risques opérationnels et le système de contrôle interne : les limites d’un te...
Risques opérationnels et le système de contrôle interne : les limites d’un te...Institute of Information Systems (HES-SO)
 
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...Institute of Information Systems (HES-SO)
 
Le système de contrôle interne : Présentation générale, enjeux et méthodes
Le système de contrôle interne : Présentation générale, enjeux et méthodesLe système de contrôle interne : Présentation générale, enjeux et méthodes
Le système de contrôle interne : Présentation générale, enjeux et méthodesInstitute of Information Systems (HES-SO)
 
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...Institute of Information Systems (HES-SO)
 
NOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
NOSE: une approche Smart-City pour les zones périphériques et extra-urbainesNOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
NOSE: une approche Smart-City pour les zones périphériques et extra-urbainesInstitute of Information Systems (HES-SO)
 

More from Institute of Information Systems (HES-SO) (20)

MIE20232.pptx
MIE20232.pptxMIE20232.pptx
MIE20232.pptx
 
Classification of noisy free-text prostate cancer pathology reports using nat...
Classification of noisy free-text prostate cancer pathology reports using nat...Classification of noisy free-text prostate cancer pathology reports using nat...
Classification of noisy free-text prostate cancer pathology reports using nat...
 
Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Machine learning assisted citation screening for Systematic Reviews - Anjani ...Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Machine learning assisted citation screening for Systematic Reviews - Anjani ...
 
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
 
L'IoT dans les usines. Quels avantages ?
L'IoT dans les usines. Quels avantages ?L'IoT dans les usines. Quels avantages ?
L'IoT dans les usines. Quels avantages ?
 
Risques opérationnels et le système de contrôle interne : les limites d’un te...
Risques opérationnels et le système de contrôle interne : les limites d’un te...Risques opérationnels et le système de contrôle interne : les limites d’un te...
Risques opérationnels et le système de contrôle interne : les limites d’un te...
 
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
 
Le système de contrôle interne : Présentation générale, enjeux et méthodes
Le système de contrôle interne : Présentation générale, enjeux et méthodesLe système de contrôle interne : Présentation générale, enjeux et méthodes
Le système de contrôle interne : Présentation générale, enjeux et méthodes
 
Crowdsourcing-based Mobile Application for Wheelchair Accessibility
Crowdsourcing-based Mobile Application for Wheelchair AccessibilityCrowdsourcing-based Mobile Application for Wheelchair Accessibility
Crowdsourcing-based Mobile Application for Wheelchair Accessibility
 
Quelle(s) valeur(s) pour le leadership stratégique ?
Quelle(s) valeur(s) pour le leadership stratégique ?Quelle(s) valeur(s) pour le leadership stratégique ?
Quelle(s) valeur(s) pour le leadership stratégique ?
 
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
 
NOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
NOSE: une approche Smart-City pour les zones périphériques et extra-urbainesNOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
NOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
 
How to detect soft falls on devices
How to detect soft falls on devicesHow to detect soft falls on devices
How to detect soft falls on devices
 
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSISFUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
 
MOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLS
MOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLSMOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLS
MOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLS
 
Enhanced Students Laboratory The GET project
Enhanced Students Laboratory The GET projectEnhanced Students Laboratory The GET project
Enhanced Students Laboratory The GET project
 
Solar production prediction based on non linear meteo source adaptation
Solar production prediction based on non linear meteo source adaptationSolar production prediction based on non linear meteo source adaptation
Solar production prediction based on non linear meteo source adaptation
 
Exploring the New Trends of Chinese Tourists in Switzerland
Exploring the New Trends of Chinese Tourists in SwitzerlandExploring the New Trends of Chinese Tourists in Switzerland
Exploring the New Trends of Chinese Tourists in Switzerland
 
Social Media Data analyzis and Semantics for Tourism Understanding
Social Media Data analyzis and Semantics for Tourism UnderstandingSocial Media Data analyzis and Semantics for Tourism Understanding
Social Media Data analyzis and Semantics for Tourism Understanding
 
Valeurs et management agile
Valeurs et management agileValeurs et management agile
Valeurs et management agile
 

Recently uploaded

MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxMALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxNidaFaviankaNawawi
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfscitechtalktv
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like BitcoinDOT TECH
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Calllward7
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp onlinebalibahu1313
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxStephen266013
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理pyhepag
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonPayment Village
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group MeetingAlison Pitt
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfMichaelSenkow
 

Recently uploaded (20)

MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxMALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoin
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 

Studying Public Medical Images from Open Access Literature and Social Networks for Model Training and Knowledge Extraction

  • 1. Studying Public Medical Images from the Open Access Literature and Social Networks for Model Training and Knowledge Extraction Vincent Andrearczyk HES-SO, Switzerland MMM 2020, 08.01.2020 Henning Müller, Vincent Andrearczyk, Oscar Jimenez, Anjani Dhrangadhariya, Roger Schaer, and Manfredo Atzori
  • 2. Motivation • Deep learning has been a driving force for improving many applications of image analysis • Complex networks require large amounts of training data - Data diversity is important for generalizability • Most medical data sets have strong class imbalances (rare diseases) - Rare diseases require data from multiple centers making the organization complex • Many resources that include images have become available in the past few years - PubMed Central, TCIA, social networks, etc.
  • 3. Objectives of this article • Summarize existing approaches that harvest public data – Focusing on PubMed Central and social networks • Highlight advantages and difficulties in exploiting the data – (+) Very diverse data – (+) Rare cases are oversampled – (-) Much pre-treatment and filtering is required • Develop next steps required to fully use the data
  • 4. PubMed Central • Repository with the biomedical open access literature, including images as files, etc. – 3-4 images per article,
  • 5. PubMed Central • Repository with the biomedical open access literature, including images as files, etc. – 3-4 images per article, – increasing # articles
  • 6. Methodology for finding articles • Analysis of tasks of ImageCLEF and work done on these tasks using data from ImageCLEF – Over the past 12 years – Steps of filtering out data taken from this • Use of Google scholar to add references – Terms “medical image classification”, “publicly accessible resources”, “medical literature”, “machine learning” were combined • Dynamically growing data sets were favored • Journal papers were referenced over conference publication
  • 7. Image retrieval • Allows to search for images with text – Or semantic terms such as UMLS or MeSH • Content-based image retrieval Demner-Fushman, et al. (2012), Journal of Computing Science and Engineering
  • 8. Structuring the visual content • Define types of images to make the literature images classifiable – Extremely large variety in most categories – Many sub-categories are possible – Categories with clinical relevance are most important – Allows removing noise – Compound figures are separately treated [ImageCLEF 2013]
  • 9. Challenges in the data • Look-alikes – Much strange content that needs to be removed
  • 10. Challenges in the data • Look-alikes – Much strange content that needs to be removed • Compound figures can not easily be classified, as they may contain aspects of several classes – Cutting them into subfigures makes content accessible
  • 11. Meta data available for PMC • Text of the figure caption – Relatively specific but often short – Hard for compound figures that contain many parts • Full text of the article – Non specific for individual figures – Location of the figure is available • Article title and author-generated key words • Global MeSH terms (Manually attached) – Cover species and organs • Not all is available for all articles (incomplete)
  • 12. Tasks to make figures accessible • Removing very small images & strange aspect ratios • Classify figures into figure types – Using image data and also text – Remove non-relevant images, e.g. flowcharts • Detect and cut compound figures into their parts – Classify these into figure types again • Filter human and animal tissue • Filter specific organs of interest • Find diseases or grading/staging – Ground truth classes for machine learning
  • 13. Advantages of literature images • Rare images are generally used for articles and case descriptions – Mostly extreme cases to share the knowledge on them – Creates critical mass for rare diseases • Images are from many laboratories and thus contain many image variations – Increase generalizability of learned models • Exponentially increasing content
  • 14. Problems with filtered images • Many images might be missed by automatic filtering • Ground truth is not always solid • Images might not have clinical quality – Grey level resolution – No information on level/window setting – Cropped images, arrows in images, other overlays • Size of the images is often small for publications • Scale of images is not known (can be detected) Otalora et al. (2018) MICCAI 2018
  • 15. An example of Twitter images • Images and information posted by pathologists on Twitter • Create dataset of histopathology images • Train machine learning algorithms – identify stains (H&E, IHC ...) – discriminate between different tissues – predict malignant tumors • Limitations: – good results (AUROC 0.9) only for simple tasks: H&E vs rest Schaumberg et al. (2018), BioRxiv
  • 16. Next steps • Quickly increasing content offers many possibilities – Automatic pipelines need to contain update mechanisms based on latest imaging equipment – Community efforts for data curation • Distribute the class labels with confidence scores via PMC • Evaluate impact on machine learning tasks of adding such diverse sources
  • 17. Next steps • We have been working on it! – Mined out 32,486 light microscopy human rare cancer images Dhrangadhariya et al. (2020) SPIE2020 – Automatic generalizable filtering pipeline In preparation: Jimenez et al. (2020) Journal of the American Medical Informatics Association – Benefits in deep learning clinical tasks … to come
  • 18. Conclusions • Images from public resources are complementary to clinical images for machine learning – Rare cases, much diversity – Very large amount of data • How can we obtain high quality annotations with limited effort (for example via active learning)
  • 19. Contact • More information can be found at – http://medgift.hevs.ch/ – http://publications.hevs.ch • Contact: – vincent.andrearczyk@hevs.ch – henning.mueller@hevs.ch