SlideShare a Scribd company logo
Studying Public Medical Images from the Open Access
Literature and Social Networks for Model Training and
Knowledge Extraction
Vincent Andrearczyk
HES-SO, Switzerland
MMM 2020, 08.01.2020
Henning Müller, Vincent Andrearczyk, Oscar Jimenez, Anjani Dhrangadhariya,
Roger Schaer, and Manfredo Atzori
Motivation
• Deep learning has been a driving force for
improving many applications of image analysis
• Complex networks require large amounts of
training data
- Data diversity is important for generalizability
• Most medical data sets have strong class
imbalances (rare diseases)
- Rare diseases require data from multiple centers
making the organization complex
• Many resources that include images have become
available in the past few years
- PubMed Central, TCIA, social networks, etc.
Objectives of this article
• Summarize existing approaches that harvest
public data
– Focusing on PubMed Central and social networks
• Highlight advantages and difficulties in exploiting
the data
– (+) Very diverse data
– (+) Rare cases are oversampled
– (-) Much pre-treatment and filtering is required
• Develop next steps required to fully use the data
PubMed Central
• Repository with the biomedical open access
literature, including images as files, etc.
– 3-4 images per article,
PubMed Central
• Repository with the biomedical open access
literature, including images as files, etc.
– 3-4 images per article,
– increasing # articles
Methodology for finding articles
• Analysis of tasks of ImageCLEF and work done
on these tasks using data from ImageCLEF
– Over the past 12 years
– Steps of filtering out data taken from this
• Use of Google scholar to add references
– Terms “medical image classification”, “publicly
accessible resources”, “medical literature”,
“machine learning” were combined
• Dynamically growing data sets were favored
• Journal papers were referenced over
conference publication
Image retrieval
• Allows to search for images with text
– Or semantic terms such as UMLS or MeSH
• Content-based image retrieval
Demner-Fushman, et al. (2012), Journal of Computing Science and Engineering
Structuring the visual content
• Define types of images to make the literature
images classifiable
– Extremely large variety in most categories
– Many sub-categories are possible
– Categories with clinical relevance
are most important
– Allows removing noise
– Compound figures
are separately treated
[ImageCLEF 2013]
Challenges in the data
• Look-alikes
– Much strange content that needs to be removed
Challenges in the data
• Look-alikes
– Much strange content that needs to be removed
• Compound figures can not easily be classified,
as they may contain aspects of several classes
– Cutting them into subfigures makes content
accessible
Meta data available for PMC
• Text of the figure caption
– Relatively specific but often short
– Hard for compound figures that contain many parts
• Full text of the article
– Non specific for individual figures
– Location of the figure is available
• Article title and author-generated key words
• Global MeSH terms (Manually attached)
– Cover species and organs
• Not all is available for all articles (incomplete)
Tasks to make figures accessible
• Removing very small images & strange aspect
ratios
• Classify figures into figure types
– Using image data and also text
– Remove non-relevant images, e.g. flowcharts
• Detect and cut compound figures into their parts
– Classify these into figure types again
• Filter human and animal tissue
• Filter specific organs of interest
• Find diseases or grading/staging
– Ground truth classes for machine learning
Advantages of literature images
• Rare images are generally used for articles and
case descriptions
– Mostly extreme cases to share the knowledge
on them
– Creates critical mass for rare diseases
• Images are from many laboratories and thus
contain many image variations
– Increase generalizability of learned models
• Exponentially increasing content
Problems with filtered images
• Many images might be missed by automatic
filtering
• Ground truth is not always solid
• Images might not have clinical quality
– Grey level resolution
– No information on level/window setting
– Cropped images, arrows in images, other overlays
• Size of the images is often small for publications
• Scale of images is not known (can be detected)
Otalora et al. (2018) MICCAI 2018
An example of Twitter images
• Images and information posted by pathologists on
Twitter
• Create dataset of histopathology images
• Train machine learning algorithms
– identify stains (H&E, IHC ...)
– discriminate between different tissues
– predict malignant tumors
• Limitations:
– good results (AUROC 0.9) only for simple tasks: H&E
vs rest
Schaumberg et al. (2018), BioRxiv
Next steps
• Quickly increasing content offers many possibilities
– Automatic pipelines need to contain update
mechanisms based on latest imaging equipment
– Community efforts for data curation
• Distribute the class labels with confidence scores
via PMC
• Evaluate impact on machine learning tasks of
adding such diverse sources
Next steps
• We have been working on it!
– Mined out 32,486 light microscopy human rare
cancer images Dhrangadhariya et al. (2020) SPIE2020
– Automatic generalizable filtering pipeline
In preparation: Jimenez et al. (2020) Journal of the American Medical Informatics Association
– Benefits in deep learning clinical tasks … to come
Conclusions
• Images from public resources are complementary to
clinical images for machine learning
– Rare cases, much diversity
– Very large amount of data
• How can we obtain high quality annotations with
limited effort (for example via active learning)
Contact
• More information can be found at
– http://medgift.hevs.ch/
– http://publications.hevs.ch
• Contact:
– vincent.andrearczyk@hevs.ch
– henning.mueller@hevs.ch

More Related Content

What's hot

Learning Analytics: Seeking new insights from educational data
Learning Analytics: Seeking new insights from educational dataLearning Analytics: Seeking new insights from educational data
Learning Analytics: Seeking new insights from educational data
Andrew Deacon
 
A Learning Analytics Approach
A Learning Analytics ApproachA Learning Analytics Approach
A Learning Analytics Approach
MehrnooshV
 
Educational Data Mining/Learning Analytics issue brief overview
Educational Data Mining/Learning Analytics issue brief overviewEducational Data Mining/Learning Analytics issue brief overview
Educational Data Mining/Learning Analytics issue brief overview
Marie Bienkowski
 
Victor (Shengli) Sheng
Victor (Shengli) ShengVictor (Shengli) Sheng
Victor (Shengli) Shengbutest
 
Дмитрий Ветров. Математика больших данных: тензоры, нейросети, байесовский вы...
Дмитрий Ветров. Математика больших данных: тензоры, нейросети, байесовский вы...Дмитрий Ветров. Математика больших данных: тензоры, нейросети, байесовский вы...
Дмитрий Ветров. Математика больших данных: тензоры, нейросети, байесовский вы...
Yandex
 
EDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action ResearchEDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action Research
eckchela
 
Open and Collaborative Software for Digital Pathology
Open and Collaborative Software for Digital Pathology Open and Collaborative Software for Digital Pathology
Open and Collaborative Software for Digital Pathology
William Baird
 
Case Studies in Teaching and Learning with Social Media in Higher Education
Case Studies  in Teaching and Learning  with Social Media  in Higher EducationCase Studies  in Teaching and Learning  with Social Media  in Higher Education
Case Studies in Teaching and Learning with Social Media in Higher Education
Michael Johnson
 
Connections b/w active learning and model extraction
Connections b/w active learning and model extractionConnections b/w active learning and model extraction
Connections b/w active learning and model extraction
Anmol Dwivedi
 
Data Management Lab: Data mapping exercise instructions
Data Management Lab: Data mapping exercise instructionsData Management Lab: Data mapping exercise instructions
Data Management Lab: Data mapping exercise instructions
IUPUI
 
OLT conference Learning analytics
OLT conference Learning analyticsOLT conference Learning analytics
OLT conference Learning analyticsShirley Alexander
 
Clinical Anatomy 9566
Clinical Anatomy 9566Clinical Anatomy 9566
Clinical Anatomy 9566
Robin Featherstone
 
Presentation pick a card - newman 17-04-13 - final
Presentation   pick a card - newman 17-04-13 - finalPresentation   pick a card - newman 17-04-13 - final
Presentation pick a card - newman 17-04-13 - final
acsizmadia
 
교육분야 성취기준 링크드 데이터 프로파일 설계
교육분야 성취기준 링크드 데이터 프로파일 설계교육분야 성취기준 링크드 데이터 프로파일 설계
교육분야 성취기준 링크드 데이터 프로파일 설계
Open Cyber University of Korea
 
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEMMIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
Sunayana Gawde
 
Seminal Works on Education Data Mining and Analytics
Seminal Works on Education Data Mining and AnalyticsSeminal Works on Education Data Mining and Analytics
Seminal Works on Education Data Mining and Analytics
Ardith Conway
 
Wasana (2011) a systematic, tool-supported method for conducting lr in is
Wasana (2011)   a systematic, tool-supported method for conducting lr in isWasana (2011)   a systematic, tool-supported method for conducting lr in is
Wasana (2011) a systematic, tool-supported method for conducting lr in is
Researchworkshop
 
Personalized Information Retrieval system using Computational Intelligence Te...
Personalized Information Retrieval system using Computational Intelligence Te...Personalized Information Retrieval system using Computational Intelligence Te...
Personalized Information Retrieval system using Computational Intelligence Te...
veningstonk
 

What's hot (20)

Learning Analytics: Seeking new insights from educational data
Learning Analytics: Seeking new insights from educational dataLearning Analytics: Seeking new insights from educational data
Learning Analytics: Seeking new insights from educational data
 
A Learning Analytics Approach
A Learning Analytics ApproachA Learning Analytics Approach
A Learning Analytics Approach
 
Educational Data Mining/Learning Analytics issue brief overview
Educational Data Mining/Learning Analytics issue brief overviewEducational Data Mining/Learning Analytics issue brief overview
Educational Data Mining/Learning Analytics issue brief overview
 
Victor (Shengli) Sheng
Victor (Shengli) ShengVictor (Shengli) Sheng
Victor (Shengli) Sheng
 
Дмитрий Ветров. Математика больших данных: тензоры, нейросети, байесовский вы...
Дмитрий Ветров. Математика больших данных: тензоры, нейросети, байесовский вы...Дмитрий Ветров. Математика больших данных: тензоры, нейросети, байесовский вы...
Дмитрий Ветров. Математика больших данных: тензоры, нейросети, байесовский вы...
 
EDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action ResearchEDR 8204 Week 3 Assignment: Analyze Action Research
EDR 8204 Week 3 Assignment: Analyze Action Research
 
Open and Collaborative Software for Digital Pathology
Open and Collaborative Software for Digital Pathology Open and Collaborative Software for Digital Pathology
Open and Collaborative Software for Digital Pathology
 
Case Studies in Teaching and Learning with Social Media in Higher Education
Case Studies  in Teaching and Learning  with Social Media  in Higher EducationCase Studies  in Teaching and Learning  with Social Media  in Higher Education
Case Studies in Teaching and Learning with Social Media in Higher Education
 
Connections b/w active learning and model extraction
Connections b/w active learning and model extractionConnections b/w active learning and model extraction
Connections b/w active learning and model extraction
 
Semiotics in spreadsheets
Semiotics in spreadsheetsSemiotics in spreadsheets
Semiotics in spreadsheets
 
Data Management Lab: Data mapping exercise instructions
Data Management Lab: Data mapping exercise instructionsData Management Lab: Data mapping exercise instructions
Data Management Lab: Data mapping exercise instructions
 
OLT conference Learning analytics
OLT conference Learning analyticsOLT conference Learning analytics
OLT conference Learning analytics
 
Clinical Anatomy 9566
Clinical Anatomy 9566Clinical Anatomy 9566
Clinical Anatomy 9566
 
CV
CVCV
CV
 
Presentation pick a card - newman 17-04-13 - final
Presentation   pick a card - newman 17-04-13 - finalPresentation   pick a card - newman 17-04-13 - final
Presentation pick a card - newman 17-04-13 - final
 
교육분야 성취기준 링크드 데이터 프로파일 설계
교육분야 성취기준 링크드 데이터 프로파일 설계교육분야 성취기준 링크드 데이터 프로파일 설계
교육분야 성취기준 링크드 데이터 프로파일 설계
 
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEMMIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
 
Seminal Works on Education Data Mining and Analytics
Seminal Works on Education Data Mining and AnalyticsSeminal Works on Education Data Mining and Analytics
Seminal Works on Education Data Mining and Analytics
 
Wasana (2011) a systematic, tool-supported method for conducting lr in is
Wasana (2011)   a systematic, tool-supported method for conducting lr in isWasana (2011)   a systematic, tool-supported method for conducting lr in is
Wasana (2011) a systematic, tool-supported method for conducting lr in is
 
Personalized Information Retrieval system using Computational Intelligence Te...
Personalized Information Retrieval system using Computational Intelligence Te...Personalized Information Retrieval system using Computational Intelligence Te...
Personalized Information Retrieval system using Computational Intelligence Te...
 

Similar to Studying Public Medical Images from Open Access Literature and Social Networks for Model Training and Knowledge Extraction

Medical image analysis and big data evaluation infrastructures
Medical image analysis and big data evaluation infrastructuresMedical image analysis and big data evaluation infrastructures
Medical image analysis and big data evaluation infrastructures
Institute of Information Systems (HES-SO)
 
Medical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructuresMedical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructures
Institute of Information Systems (HES-SO)
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge Base
Vaticle
 
Melissa Informatics - Data Quality and AI
Melissa Informatics - Data Quality and AIMelissa Informatics - Data Quality and AI
Melissa Informatics - Data Quality and AI
melissadata
 
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...
Lauri Eloranta
 
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
Lauri Eloranta
 
The state of the art in integrating machine learning into visual analytics
The state of the art in integrating machine learning into visual analyticsThe state of the art in integrating machine learning into visual analytics
The state of the art in integrating machine learning into visual analytics
Cagatay Turkay
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master Specialisation
Arjen de Vries
 
The concept of health informatics
The concept of health informatics The concept of health informatics
The concept of health informatics
Ebtissam Al-Madi
 
2015 04-18-wilson cg
2015 04-18-wilson cg2015 04-18-wilson cg
2015 04-18-wilson cg
Christopher Wilson
 
NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...
NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...
NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...
The Statistical and Applied Mathematical Sciences Institute
 
An introduction to machine learning in biomedical research: Key concepts, pr...
An introduction to machine learning in biomedical research:  Key concepts, pr...An introduction to machine learning in biomedical research:  Key concepts, pr...
An introduction to machine learning in biomedical research: Key concepts, pr...
FranciscoJAzuajeG
 
University Public Driven Applications - Big Data and Organizational Design
University Public Driven Applications - Big Data and Organizational Design University Public Driven Applications - Big Data and Organizational Design
University Public Driven Applications - Big Data and Organizational Design
maria chiara pettenati
 
Visual Information Retrieval: Advances, Challenges and Opportunities
Visual Information Retrieval: Advances, Challenges and OpportunitiesVisual Information Retrieval: Advances, Challenges and Opportunities
Visual Information Retrieval: Advances, Challenges and Opportunities
Oge Marques
 
Challenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL modelChallenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL model
Institute of Information Systems (HES-SO)
 
Leveraging social media for training object detectors
Leveraging social media for training object detectorsLeveraging social media for training object detectors
Leveraging social media for training object detectors
Manish Kumar
 
Introduction to Big Data and its Potential for Dementia Research
Introduction to Big Data and its Potential for Dementia ResearchIntroduction to Big Data and its Potential for Dementia Research
Introduction to Big Data and its Potential for Dementia Research
David De Roure
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
shalini s
 

Similar to Studying Public Medical Images from Open Access Literature and Social Networks for Model Training and Knowledge Extraction (20)

Medical image analysis and big data evaluation infrastructures
Medical image analysis and big data evaluation infrastructuresMedical image analysis and big data evaluation infrastructures
Medical image analysis and big data evaluation infrastructures
 
Medical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructuresMedical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructures
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge Base
 
Melissa Informatics - Data Quality and AI
Melissa Informatics - Data Quality and AIMelissa Informatics - Data Quality and AI
Melissa Informatics - Data Quality and AI
 
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...
A Summary of Computational Social Science - Lecture 8 in Introduction to Comp...
 
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
 
U mpres
U mpresU mpres
U mpres
 
The state of the art in integrating machine learning into visual analytics
The state of the art in integrating machine learning into visual analyticsThe state of the art in integrating machine learning into visual analytics
The state of the art in integrating machine learning into visual analytics
 
Data Science Master Specialisation
Data Science Master SpecialisationData Science Master Specialisation
Data Science Master Specialisation
 
The concept of health informatics
The concept of health informatics The concept of health informatics
The concept of health informatics
 
2015 04-18-wilson cg
2015 04-18-wilson cg2015 04-18-wilson cg
2015 04-18-wilson cg
 
NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...
NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...
NCCU: The Story of Data Science and Machine Learning Workshop - A Tutorial in...
 
An introduction to machine learning in biomedical research: Key concepts, pr...
An introduction to machine learning in biomedical research:  Key concepts, pr...An introduction to machine learning in biomedical research:  Key concepts, pr...
An introduction to machine learning in biomedical research: Key concepts, pr...
 
University Public Driven Applications - Big Data and Organizational Design
University Public Driven Applications - Big Data and Organizational Design University Public Driven Applications - Big Data and Organizational Design
University Public Driven Applications - Big Data and Organizational Design
 
Visual Information Retrieval: Advances, Challenges and Opportunities
Visual Information Retrieval: Advances, Challenges and OpportunitiesVisual Information Retrieval: Advances, Challenges and Opportunities
Visual Information Retrieval: Advances, Challenges and Opportunities
 
Challenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL modelChallenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL model
 
Leveraging social media for training object detectors
Leveraging social media for training object detectorsLeveraging social media for training object detectors
Leveraging social media for training object detectors
 
Introduction to Big Data and its Potential for Dementia Research
Introduction to Big Data and its Potential for Dementia ResearchIntroduction to Big Data and its Potential for Dementia Research
Introduction to Big Data and its Potential for Dementia Research
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
 
MRDB 5
MRDB 5MRDB 5
MRDB 5
 

More from Institute of Information Systems (HES-SO)

MIE20232.pptx
MIE20232.pptxMIE20232.pptx
Classification of noisy free-text prostate cancer pathology reports using nat...
Classification of noisy free-text prostate cancer pathology reports using nat...Classification of noisy free-text prostate cancer pathology reports using nat...
Classification of noisy free-text prostate cancer pathology reports using nat...
Institute of Information Systems (HES-SO)
 
Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Machine learning assisted citation screening for Systematic Reviews - Anjani ...Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Institute of Information Systems (HES-SO)
 
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Institute of Information Systems (HES-SO)
 
L'IoT dans les usines. Quels avantages ?
L'IoT dans les usines. Quels avantages ?L'IoT dans les usines. Quels avantages ?
L'IoT dans les usines. Quels avantages ?
Institute of Information Systems (HES-SO)
 
Risques opérationnels et le système de contrôle interne : les limites d’un te...
Risques opérationnels et le système de contrôle interne : les limites d’un te...Risques opérationnels et le système de contrôle interne : les limites d’un te...
Risques opérationnels et le système de contrôle interne : les limites d’un te...
Institute of Information Systems (HES-SO)
 
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Institute of Information Systems (HES-SO)
 
Le système de contrôle interne : Présentation générale, enjeux et méthodes
Le système de contrôle interne : Présentation générale, enjeux et méthodesLe système de contrôle interne : Présentation générale, enjeux et méthodes
Le système de contrôle interne : Présentation générale, enjeux et méthodes
Institute of Information Systems (HES-SO)
 
Crowdsourcing-based Mobile Application for Wheelchair Accessibility
Crowdsourcing-based Mobile Application for Wheelchair AccessibilityCrowdsourcing-based Mobile Application for Wheelchair Accessibility
Crowdsourcing-based Mobile Application for Wheelchair Accessibility
Institute of Information Systems (HES-SO)
 
Quelle(s) valeur(s) pour le leadership stratégique ?
Quelle(s) valeur(s) pour le leadership stratégique ?Quelle(s) valeur(s) pour le leadership stratégique ?
Quelle(s) valeur(s) pour le leadership stratégique ?
Institute of Information Systems (HES-SO)
 
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
Institute of Information Systems (HES-SO)
 
NOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
NOSE: une approche Smart-City pour les zones périphériques et extra-urbainesNOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
NOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
Institute of Information Systems (HES-SO)
 
How to detect soft falls on devices
How to detect soft falls on devicesHow to detect soft falls on devices
How to detect soft falls on devices
Institute of Information Systems (HES-SO)
 
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSISFUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
Institute of Information Systems (HES-SO)
 
MOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLS
MOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLSMOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLS
MOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLS
Institute of Information Systems (HES-SO)
 
Enhanced Students Laboratory The GET project
Enhanced Students Laboratory The GET projectEnhanced Students Laboratory The GET project
Enhanced Students Laboratory The GET project
Institute of Information Systems (HES-SO)
 
Solar production prediction based on non linear meteo source adaptation
Solar production prediction based on non linear meteo source adaptationSolar production prediction based on non linear meteo source adaptation
Solar production prediction based on non linear meteo source adaptation
Institute of Information Systems (HES-SO)
 
Exploring the New Trends of Chinese Tourists in Switzerland
Exploring the New Trends of Chinese Tourists in SwitzerlandExploring the New Trends of Chinese Tourists in Switzerland
Exploring the New Trends of Chinese Tourists in Switzerland
Institute of Information Systems (HES-SO)
 
Social Media Data analyzis and Semantics for Tourism Understanding
Social Media Data analyzis and Semantics for Tourism UnderstandingSocial Media Data analyzis and Semantics for Tourism Understanding
Social Media Data analyzis and Semantics for Tourism Understanding
Institute of Information Systems (HES-SO)
 
Valeurs et management agile
Valeurs et management agileValeurs et management agile
Valeurs et management agile
Institute of Information Systems (HES-SO)
 

More from Institute of Information Systems (HES-SO) (20)

MIE20232.pptx
MIE20232.pptxMIE20232.pptx
MIE20232.pptx
 
Classification of noisy free-text prostate cancer pathology reports using nat...
Classification of noisy free-text prostate cancer pathology reports using nat...Classification of noisy free-text prostate cancer pathology reports using nat...
Classification of noisy free-text prostate cancer pathology reports using nat...
 
Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Machine learning assisted citation screening for Systematic Reviews - Anjani ...Machine learning assisted citation screening for Systematic Reviews - Anjani ...
Machine learning assisted citation screening for Systematic Reviews - Anjani ...
 
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...Exploiting biomedical literature to mine out a large multimodal dataset of ra...
Exploiting biomedical literature to mine out a large multimodal dataset of ra...
 
L'IoT dans les usines. Quels avantages ?
L'IoT dans les usines. Quels avantages ?L'IoT dans les usines. Quels avantages ?
L'IoT dans les usines. Quels avantages ?
 
Risques opérationnels et le système de contrôle interne : les limites d’un te...
Risques opérationnels et le système de contrôle interne : les limites d’un te...Risques opérationnels et le système de contrôle interne : les limites d’un te...
Risques opérationnels et le système de contrôle interne : les limites d’un te...
 
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
Le contrôle interne dans les administrations publiques tient-il toutes ses pr...
 
Le système de contrôle interne : Présentation générale, enjeux et méthodes
Le système de contrôle interne : Présentation générale, enjeux et méthodesLe système de contrôle interne : Présentation générale, enjeux et méthodes
Le système de contrôle interne : Présentation générale, enjeux et méthodes
 
Crowdsourcing-based Mobile Application for Wheelchair Accessibility
Crowdsourcing-based Mobile Application for Wheelchair AccessibilityCrowdsourcing-based Mobile Application for Wheelchair Accessibility
Crowdsourcing-based Mobile Application for Wheelchair Accessibility
 
Quelle(s) valeur(s) pour le leadership stratégique ?
Quelle(s) valeur(s) pour le leadership stratégique ?Quelle(s) valeur(s) pour le leadership stratégique ?
Quelle(s) valeur(s) pour le leadership stratégique ?
 
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
 
NOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
NOSE: une approche Smart-City pour les zones périphériques et extra-urbainesNOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
NOSE: une approche Smart-City pour les zones périphériques et extra-urbaines
 
How to detect soft falls on devices
How to detect soft falls on devicesHow to detect soft falls on devices
How to detect soft falls on devices
 
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSISFUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
FUNDAMENTALS OF TEXTURE PROCESSING FOR BIOMEDICAL IMAGE ANALYSIS
 
MOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLS
MOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLSMOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLS
MOBILE COLLECTION AND DISSEMINATION OF SENIORS’ SKILLS
 
Enhanced Students Laboratory The GET project
Enhanced Students Laboratory The GET projectEnhanced Students Laboratory The GET project
Enhanced Students Laboratory The GET project
 
Solar production prediction based on non linear meteo source adaptation
Solar production prediction based on non linear meteo source adaptationSolar production prediction based on non linear meteo source adaptation
Solar production prediction based on non linear meteo source adaptation
 
Exploring the New Trends of Chinese Tourists in Switzerland
Exploring the New Trends of Chinese Tourists in SwitzerlandExploring the New Trends of Chinese Tourists in Switzerland
Exploring the New Trends of Chinese Tourists in Switzerland
 
Social Media Data analyzis and Semantics for Tourism Understanding
Social Media Data analyzis and Semantics for Tourism UnderstandingSocial Media Data analyzis and Semantics for Tourism Understanding
Social Media Data analyzis and Semantics for Tourism Understanding
 
Valeurs et management agile
Valeurs et management agileValeurs et management agile
Valeurs et management agile
 

Recently uploaded

Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 

Recently uploaded (20)

Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 

Studying Public Medical Images from Open Access Literature and Social Networks for Model Training and Knowledge Extraction

  • 1. Studying Public Medical Images from the Open Access Literature and Social Networks for Model Training and Knowledge Extraction Vincent Andrearczyk HES-SO, Switzerland MMM 2020, 08.01.2020 Henning Müller, Vincent Andrearczyk, Oscar Jimenez, Anjani Dhrangadhariya, Roger Schaer, and Manfredo Atzori
  • 2. Motivation • Deep learning has been a driving force for improving many applications of image analysis • Complex networks require large amounts of training data - Data diversity is important for generalizability • Most medical data sets have strong class imbalances (rare diseases) - Rare diseases require data from multiple centers making the organization complex • Many resources that include images have become available in the past few years - PubMed Central, TCIA, social networks, etc.
  • 3. Objectives of this article • Summarize existing approaches that harvest public data – Focusing on PubMed Central and social networks • Highlight advantages and difficulties in exploiting the data – (+) Very diverse data – (+) Rare cases are oversampled – (-) Much pre-treatment and filtering is required • Develop next steps required to fully use the data
  • 4. PubMed Central • Repository with the biomedical open access literature, including images as files, etc. – 3-4 images per article,
  • 5. PubMed Central • Repository with the biomedical open access literature, including images as files, etc. – 3-4 images per article, – increasing # articles
  • 6. Methodology for finding articles • Analysis of tasks of ImageCLEF and work done on these tasks using data from ImageCLEF – Over the past 12 years – Steps of filtering out data taken from this • Use of Google scholar to add references – Terms “medical image classification”, “publicly accessible resources”, “medical literature”, “machine learning” were combined • Dynamically growing data sets were favored • Journal papers were referenced over conference publication
  • 7. Image retrieval • Allows to search for images with text – Or semantic terms such as UMLS or MeSH • Content-based image retrieval Demner-Fushman, et al. (2012), Journal of Computing Science and Engineering
  • 8. Structuring the visual content • Define types of images to make the literature images classifiable – Extremely large variety in most categories – Many sub-categories are possible – Categories with clinical relevance are most important – Allows removing noise – Compound figures are separately treated [ImageCLEF 2013]
  • 9. Challenges in the data • Look-alikes – Much strange content that needs to be removed
  • 10. Challenges in the data • Look-alikes – Much strange content that needs to be removed • Compound figures can not easily be classified, as they may contain aspects of several classes – Cutting them into subfigures makes content accessible
  • 11. Meta data available for PMC • Text of the figure caption – Relatively specific but often short – Hard for compound figures that contain many parts • Full text of the article – Non specific for individual figures – Location of the figure is available • Article title and author-generated key words • Global MeSH terms (Manually attached) – Cover species and organs • Not all is available for all articles (incomplete)
  • 12. Tasks to make figures accessible • Removing very small images & strange aspect ratios • Classify figures into figure types – Using image data and also text – Remove non-relevant images, e.g. flowcharts • Detect and cut compound figures into their parts – Classify these into figure types again • Filter human and animal tissue • Filter specific organs of interest • Find diseases or grading/staging – Ground truth classes for machine learning
  • 13. Advantages of literature images • Rare images are generally used for articles and case descriptions – Mostly extreme cases to share the knowledge on them – Creates critical mass for rare diseases • Images are from many laboratories and thus contain many image variations – Increase generalizability of learned models • Exponentially increasing content
  • 14. Problems with filtered images • Many images might be missed by automatic filtering • Ground truth is not always solid • Images might not have clinical quality – Grey level resolution – No information on level/window setting – Cropped images, arrows in images, other overlays • Size of the images is often small for publications • Scale of images is not known (can be detected) Otalora et al. (2018) MICCAI 2018
  • 15. An example of Twitter images • Images and information posted by pathologists on Twitter • Create dataset of histopathology images • Train machine learning algorithms – identify stains (H&E, IHC ...) – discriminate between different tissues – predict malignant tumors • Limitations: – good results (AUROC 0.9) only for simple tasks: H&E vs rest Schaumberg et al. (2018), BioRxiv
  • 16. Next steps • Quickly increasing content offers many possibilities – Automatic pipelines need to contain update mechanisms based on latest imaging equipment – Community efforts for data curation • Distribute the class labels with confidence scores via PMC • Evaluate impact on machine learning tasks of adding such diverse sources
  • 17. Next steps • We have been working on it! – Mined out 32,486 light microscopy human rare cancer images Dhrangadhariya et al. (2020) SPIE2020 – Automatic generalizable filtering pipeline In preparation: Jimenez et al. (2020) Journal of the American Medical Informatics Association – Benefits in deep learning clinical tasks … to come
  • 18. Conclusions • Images from public resources are complementary to clinical images for machine learning – Rare cases, much diversity – Very large amount of data • How can we obtain high quality annotations with limited effort (for example via active learning)
  • 19. Contact • More information can be found at – http://medgift.hevs.ch/ – http://publications.hevs.ch • Contact: – vincent.andrearczyk@hevs.ch – henning.mueller@hevs.ch