MACHINE LEARNING FOR UNDERSTANDING
BIOMEDICAL PUBLICATIONS
Grigorios Tsoumakas,
School of informatics,
Aristotle university of Thessaloniki
with: A. Anagnostou, A. Fachantidis, A. Lagopoulos,
M. Laliotis, N. Markantonatos, Y. Papanikolaou, I. Vlahavas
ENSEMBLE METHODS
2
Pedro Domingos. 2012. A few useful things to know about
machine learning. Commun. ACM 55
“LEARN MANY MODELS, NOT JUST ONE”
Anthony Goldbloom. Kaggle CEO. Oct 2015.
“As long as Kaggle has been around, it has almost
always been ensembles of decision trees that have
won competitions. It used to be random forest that
was the big winner, but over the last six months a new
algorithm called XGboost has cropped up, and it’s
winning practically every competition in the
structured data category.”
MULTI-LABEL LEARNING
3
𝑋1 𝑋2 … 𝑋 𝒑 𝑌1 𝑌2 … 𝑌 𝒒
0.12 1 … 12 0 1 … 1
2.34 9 … -5 1 1 … 0
1.22 3 … 40 1 0 … 1
2.18 2 … 8 ? ? … ?
1.76 7 … 23 ? ? … ?
𝑝 input variables 𝑞 binary output variables
𝑚 training
examples
unknown
instances
Binary Relevance (BR)
• Learns one binary
model per label
• Ignores label
dependencies
MULTI-LABEL LEARNING FROM BIOLOGICAL DATA
Annotation of proteins with functions
 FunCat, 6 levels, 492 labels, ~9 on avg.
 GO, 14 levels, 3997 labels, ~35 on avg.
Drug discovery (Johnson & Johnson)
 743,336 chemical compounds
 ~13m chemical structure features (sparse)
 5,069 biomolecular targets (e.g. proteins)
4
OUTLINE
1. Semantic indexing of biomedical literature
2. Article screening in systematic reviews
3. Modality classification of biomedical figures
4. PICO sentence identification
5. Funding information extraction
5
Literatum is Atypon’s online content hosting and management platform
Atypon is home to more than one-third of the world’s English-language
professional and scholarly journals — more than any other technology
company
Atypon’s clients include Elsevier, IEEE, MIT Press, Oxford University
Press, …
Atypon was acquired by John Wiley & Sons in 2016 for $120,000,000
6
OUTLINE
1. Semantic indexing of biomedical literature
 Y. Papanikolaou, G. Tsoumakas, M. Laliotis, N. Markantonatos, I. Vlahavas
(2017) Large-Scale Online Semantic Indexing of Biomedical Articles via an
Ensemble of Multi-Label Classification Models, Journal of Biomedical Semantics
2. Article screening in systematic reviews
3. Modality classification of biomedical figures
4. PICO sentence identification
5. Funding information extraction
7
0
200000
400000
600000
800000
1000000
1200000
1950
1953
1956
1959
1962
1965
1968
1971
1974
1977
1980
1983
1986
1989
1992
1995
1998
2001
2004
2007
2010
2013
x $10
CHALLENGE
PubMed abstracts
 12,834,585 (20Gb)
MeSH terms
 27,773, ~13 per abstract on avg.
Online test setting
 3 phases, 5 weeks per phase
 MeSH terms for 6k to 10k abstracts
requested within 21 hours
8
1 million docs / year ≅ 2,740 docs / day
LABEL FREQUENCY DISTRIBUTION
9
4.3 million abstracts
213 labels with 1 example
1,680 labels with less than
10 examples
4 labels with more than
1 million examples
Label frequency
Numberoflabels
PROGRESS 2013 – 2017
10
0,54
0,56
0,58
0,6
0,62
0,64
0,66
0,68
2013 2014 2015 2016 2017
MicroF-Measure
Year
AUTH Fudan NLM
PRE-PROCESSING
11
text of title
and abstract
parsing
tokenization,
lowercasing,
n-gram
extraction
10,876,004 10,699,707 3,950,721
duplicate
removal
journal
filtering
tf-idf
computation,
unit length
normalization
unigrams and
bigrams with
>5 frequency
final feature
vectors
last 12,000
withheld for
evaluation
kinetics prognosis liposomes polymerskinetics prognosis liposomes polymers
4 1 2 3
LEARNING
12
Biomedical
Document
Label
Ranker
Meta
Labeler
Number of relevant labels: 2
Output is: {prognosis, liposomes}
- Any multi-label learning algorithm that can output a ranking of the labels
- We used a linear SVM per label and considered their unthresholded output
- Regression or (ordinal) classification
using original features or label
scores/ranks
- We used linear SVM regression
based on the original features
Tang, L., Rajan, S., Narayanan, V.K. “Large scale multi-label classification
via metalabeler”, Proc. 18th Int. Conf. on World Wide Web (WWW '09)
TIME AND SPACE
Hardware
 4 10-core processors at 2.26 GHz, 1 Tb RAM and
2.4 Tb storage (6 x 600 Gb SAS 10k disks in RAID 5)
Parallel learning/use of binary SVMs
 With 40 threads training & saving takes 36h
 With 20 threads loading & prediction takes 45m
Serialization
 Storing the models in 2013 required 406 Gb
 10x compression due to sparsity (L1 regularized models)
ENSEMBLE APPROACHES: CLASSIFIER SELECTION
Select the model that improves the corresponding F-measure most
 Jimeno-Yepes, A., Mork, J.G., Demner-Fushman, D., Aronson, A.R.: A one-size-fits-
all indexing method does not exist: Automatic selection based on meta-learning.
JCSE 6(2), 151-160 (2012)
 No good for global non-decomposable evaluation measures, like micro-F
Iteratively select the model that improves micro-F
 Fan, R.E., Lin, C.J.: A study on threshold selection for multi-label classification.
Technical report, National Taiwan University (2007)
 Can we trust the evaluation based on only a few positive samples?
14
MULE: MULTI-LABEL ENSEMBLE
1. Determine the globally best model ℎ∗ on a validation set
 Globally best model has been determined on positive samples of all labels
2. Determine for each label which model(s) would lead to an
improvement of the global evaluation measure compared to ℎ∗
3. Compare the differences in predictions of each one of
these models against the predictions of ℎ∗ using a McNemar test
4. If the NH is rejected for one or more models, select the one for
which the NH has lowest probability, otherwise select ℎ∗
 Robustness to uncertainty due to label rarity
15
EMPIRICAL RESULTS IN BIOASQ
16
Model Micro-F Macro-F
Vanilla SVMs 0.5568 0.4789
Weighted SVMs 0.5665 0.5102
MetaLabeler 0.5855 0.5488
Labeled LDA 0.3698 0.3010
Ensemble Micro-F Macro-F
Improve F
0.5584
(all)
0.5339
(MetaLabeler, Weighted SVM)
Improve Micro-F
0.5867
(all)
-
MULE
0.5892
(all)
0.5492
(MetaLabeler, Labeled LDA)
2017 WORK
Improved version of Labeled LDA in both speed and accuracy
 Y. Papanikolaou, G. Tsoumakas, Subset Labeled LDA for Large-Scale
Multi-Label Classification, arXiv:1709.05480
Employing word2vec features
More ensembles
 Stacking-based, frequency-based
Deep learning models
 Deep MLP, CNN
17
OUTLINE
1. Semantic indexing of biomedical literature
2. Article screening in systematic reviews
 A. Anagnostou, A. Lagopoulos, G. Tsoumakas, I. Vlahavas (2017) Combining
Inter-Review Learning-to-Rank and Intra-Review Incremental Training for Title
and Abstract Screening in Systematic Reviews, eHealth Lab of the 8th
Conference and Labs of the Evaluation Forum (CLEF)
3. Modality classification of biomedical figures
4. PICO sentence identification
5. Funding information extraction
18
DIAGNOSTIC TEST ACCURACY (DTA) REVIEWS
Title
 Thromboelastography (TEG) and rotational thromboelastometry (ROTEM) for
trauma-induced coagulopathy in adult trauma patients with bleeding
Ovid MEDLINE query
(Thrombelastogra$ or Thromboelastogra$ or (thromb$ adj2 elastogra$) or
TEG or haemoscope or haemonetics).mp
Thrombelastography/
(thromboelasto$ or thrombelasto$ or (thromb$ adj2 elastom$) or
(rotational adj2 thrombelast) or ROTEM or "tem international").mp.
1 or 2 or 3
exp animals/ not humans.sh.
4 not 5
limit 6 to yr="1970 -Current"
19
THE DATA
Training data
 20 topics, retrieved articles and
relevance after abstract/content
screening
Test data
 30 topics and retrieved articles
20
1
10
100
1000
10000
100000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Pos Neg
OUR APPROACH
Hybrid classification mechanism
Inter-topic model, learning title/abstract
relevance across different topics
Intra-topic model, learning title/abstract
relevance within a specific topic
INTER-TOPIC MODEL
Learning-to-rank binary classifier
Features assessing the similarity
of the document (title, abstract)
with the topic (title, ovid query)
 Common terms
 Levenshtein distance
 Cosine similarity
 BM25
INTRA-TOPIC MODEL: INITIAL TRAINING
23
TF-IDF
INTRA-TOPIC MODEL: ITERATIVE TRAINING
Parameters
 Initial size 𝑘 {5, 10}
 1st batch size {1}
 1st threshold {200, 300}
 2nd batch size {50, 100}
 2nd threshold {1000, 2000}
24
TF-IDF
RESULTS
Inter-Topic: eXtreme Gradient Boosting (XGBoost)
Intra-topic: Support Vector Machine (SVM)
25
k 1st
S 1st
T 2nd
S 2nd
T
5 1 200 100 2000
10 1 300 100 2000
10 1 200 100 1000
10 1 200 50 2000
AP Last R@10 R@20 AUR
0.3 2143 0.66 0.87 0.93
0.29 2124 0.66 0.87 0.92
0.29 2183 0.66 0.87 0.92
0.29 2119 0.66 0.87 0.92
RESULTS
2nd in 14 teams
behind University
of Waterloo
26
FUTURE WORK
More features, semantic representations
 Word2vec
 Glove
 LDA
Under-sampling
CLEF eHEALTH 2018 featuring again the same task
OUTLINE
1. Semantic indexing of biomedical literature
2. Article screening in systematic reviews
3. Modality classification of biomedical figures
 A. Lagopoulos, A. Fachantidis, G. Tsoumakas (2017), Multi-Label Modality
Classification for Figures in Biomedical Literature, 30th IEEE International
Symposium on Computer-Based Medical Systems (CBMS)
4. PICO sentence identification
5. Funding information extraction
28
PUBMED CENTRAL (PMC)
More than 4 million figures available
Great source of information for
biomedical research, education and
clinical decision
Lack of associated meta-data
impede access to this information
29
30 different modalities
as proposed by ImageCLEF
SIMPLE VS COMPOUND
Simple (60%) Compound (40%)
THE STANDARD APPROACH
Compound
Figure
Detection
Multi-class
Model
Simple Figure
Compound Subfigures
Figure Separation Algorithm
Figure separation is not perfect (~85%)
Figure isolation ⇒ information loss
OUR APPROACH: MULTI-LABEL CLASSIFICATION
No use of figure separation algorithm
Three different multi-label learning approaches:
• Simple
• Standard
• Extended
SIMPLE MULTI-LABEL APPROACH
Multi-label
model
Compound Simple
Training
Prediction
STANDARD MULTI-LABEL APPROACH
Compound
Figure
Detection
Simple
Compound
Multi-class
Model
Multi-label
Model
Compound
Figure
Detection
Multi-class
Model
Multi-label
Model
Compound Simple
PredictionTraining
EXTENDED MULTI-LABEL APPROACH
Compound
Figure
Detection
Simple
Compound
Multi-class
Model
Multi-label
Model
Compound
Figure
Detection
Multi-class
Model
Multi-label
Model
Compound Simple
PredictionTraining
MODEL TRAINING
Feature Extraction from JPEG
• BVLC model - Caffe1
• Deep learning (1.2 million images)
• 4096 visual features/figure
Linear Support Vector Machines (SVMs)
• Scikit-learn2
• One-vs-Rest transformation (multiple binaries)
37
1
HTTP://CAFFE.BERKELEYVISION.ORG/
2
HTTP://SCIKIT-LEARN.ORG/
IMAGECLEF 2016 DATASET
20.985 Figures
1.568 Compound
No simple figures with categories
Extracted subfigures as simple
Split 40% - 60% (compound –
subfigures) in order to follow the
distribution of PMC
RESULTS
Approach F1-Macro F1-Micro F1-Samples
Standard (100% separation) 0.3569 0.7786 0.7912
Simple multi-label 0.3139 0.7581 0.7215
Standard multi-label 0.3270 0.7667 0.7726
Extended multi-label 0.3309 0.7666 0.7728
THE SYSTEM
Web app
Weekly updates from PMC
Extended multi-label approach
Easy search & filtering by modality
Build with Apache Solr & AngularJS
Available @
atypon.csd.auth.gr/medieval/
FUTURE WORK
Learning approach
 Textual representation based on caption and full text
Medieval system
 Crowdsourcing
 Active learning
 Gamification
48
OUTLINE
1. Semantic indexing of biomedical literature
2. Article screening in systematic reviews
3. Modality classification of biomedical figures
4. PICO sentence identification
5. Funding information extraction
49
50
PICO SENTENCE IDENTIFICATION
1st version, ~0.51 F-measure
 100 abstracts annotated by computer science PhD students
 Sentence representation (w2v > tf-idf), structural features
 MLPs, GaussianNB > SVMs, XGBoost
2nd version
 120 more abstracts to be annotated by medical experts
 Additional feature engineering
 Semi-supervised learning approaches
 Deep learning approaches revisited
51
OUTLINE
1. Semantic indexing of biomedical literature
2. Article screening in systematic reviews
3. Modality classification of biomedical figures
4. PICO sentence identification
5. Funding information extraction
52
NEW TASK IN BIOASQ 2017
Challenge tasks
 Full Grant extraction, as combination of Grant ID and Grant Agency
 Grant ID extraction, regardless of the corresponding Grant Agency
 Grant Agency extraction, regardless of the specific Grant ID
104 agencies, as considered in the indexing procedure of NLM
53
Timespan Articles Grant IDs Grant Agencies
Training set 2005 – 2013 63k 112k 128k
Dry run set 2013 – 2015 15k 26k 31k
Test set 2015 – 2017 23k - -
EVALUATION
Micro recall used as evaluation measure
 Up to 20 items per article
 Up to 4 unique Grant Agencies (without Grant ID info) per article
 Up to 2 unique Grant Agencies per Grant ID
Sample ground truth for an article
 { "pmid":"17082206", "pmcid":"1634735",
"grantList": [
{"agency":"Wellcome Trust" },
{"grantID":"BB/C51320X/1","agency":"Biotechnology and
Biological Sciences Research Council"}]},
54
RESULTS
55
AUTH: Simple approach based on regular expressions
Fudan: Regular expressions combined with machine learning
Grant ID Grant Agency Full Grant
Fudan 0.9705 0.9907 0.9526
AUTH 0.9498 0.9862 0.9412
DZG 0.9235 0.9122 0.8443
BioASQ 0.8167 0.8312 0.7174
WRAP UP
1. Semantic indexing of biomedical literature
2. Article screening in systematic reviews
3. Modality classification of biomedical figures
4. PICO sentence identification
5. Funding information extraction
56
MACHINE LEARNING FOR UNDERSTANDING
BIOMEDICAL PUBLICATIONS
Grigorios Tsoumakas,
School of informatics,
Aristotle university of Thessaloniki
with: A. Anagnostou, A. Fachantidis, A. Lagopoulos,
M. Laliotis, N. Markantonatos, Y. Papanikolaou, I. Vlahavas

Machine Learning for Understanding Biomedical Publications

  • 1.
    MACHINE LEARNING FORUNDERSTANDING BIOMEDICAL PUBLICATIONS Grigorios Tsoumakas, School of informatics, Aristotle university of Thessaloniki with: A. Anagnostou, A. Fachantidis, A. Lagopoulos, M. Laliotis, N. Markantonatos, Y. Papanikolaou, I. Vlahavas
  • 2.
    ENSEMBLE METHODS 2 Pedro Domingos.2012. A few useful things to know about machine learning. Commun. ACM 55 “LEARN MANY MODELS, NOT JUST ONE” Anthony Goldbloom. Kaggle CEO. Oct 2015. “As long as Kaggle has been around, it has almost always been ensembles of decision trees that have won competitions. It used to be random forest that was the big winner, but over the last six months a new algorithm called XGboost has cropped up, and it’s winning practically every competition in the structured data category.”
  • 3.
    MULTI-LABEL LEARNING 3 𝑋1 𝑋2… 𝑋 𝒑 𝑌1 𝑌2 … 𝑌 𝒒 0.12 1 … 12 0 1 … 1 2.34 9 … -5 1 1 … 0 1.22 3 … 40 1 0 … 1 2.18 2 … 8 ? ? … ? 1.76 7 … 23 ? ? … ? 𝑝 input variables 𝑞 binary output variables 𝑚 training examples unknown instances Binary Relevance (BR) • Learns one binary model per label • Ignores label dependencies
  • 4.
    MULTI-LABEL LEARNING FROMBIOLOGICAL DATA Annotation of proteins with functions  FunCat, 6 levels, 492 labels, ~9 on avg.  GO, 14 levels, 3997 labels, ~35 on avg. Drug discovery (Johnson & Johnson)  743,336 chemical compounds  ~13m chemical structure features (sparse)  5,069 biomolecular targets (e.g. proteins) 4
  • 5.
    OUTLINE 1. Semantic indexingof biomedical literature 2. Article screening in systematic reviews 3. Modality classification of biomedical figures 4. PICO sentence identification 5. Funding information extraction 5
  • 6.
    Literatum is Atypon’sonline content hosting and management platform Atypon is home to more than one-third of the world’s English-language professional and scholarly journals — more than any other technology company Atypon’s clients include Elsevier, IEEE, MIT Press, Oxford University Press, … Atypon was acquired by John Wiley & Sons in 2016 for $120,000,000 6
  • 7.
    OUTLINE 1. Semantic indexingof biomedical literature  Y. Papanikolaou, G. Tsoumakas, M. Laliotis, N. Markantonatos, I. Vlahavas (2017) Large-Scale Online Semantic Indexing of Biomedical Articles via an Ensemble of Multi-Label Classification Models, Journal of Biomedical Semantics 2. Article screening in systematic reviews 3. Modality classification of biomedical figures 4. PICO sentence identification 5. Funding information extraction 7
  • 8.
    0 200000 400000 600000 800000 1000000 1200000 1950 1953 1956 1959 1962 1965 1968 1971 1974 1977 1980 1983 1986 1989 1992 1995 1998 2001 2004 2007 2010 2013 x $10 CHALLENGE PubMed abstracts 12,834,585 (20Gb) MeSH terms  27,773, ~13 per abstract on avg. Online test setting  3 phases, 5 weeks per phase  MeSH terms for 6k to 10k abstracts requested within 21 hours 8 1 million docs / year ≅ 2,740 docs / day
  • 9.
    LABEL FREQUENCY DISTRIBUTION 9 4.3million abstracts 213 labels with 1 example 1,680 labels with less than 10 examples 4 labels with more than 1 million examples Label frequency Numberoflabels
  • 10.
    PROGRESS 2013 –2017 10 0,54 0,56 0,58 0,6 0,62 0,64 0,66 0,68 2013 2014 2015 2016 2017 MicroF-Measure Year AUTH Fudan NLM
  • 11.
    PRE-PROCESSING 11 text of title andabstract parsing tokenization, lowercasing, n-gram extraction 10,876,004 10,699,707 3,950,721 duplicate removal journal filtering tf-idf computation, unit length normalization unigrams and bigrams with >5 frequency final feature vectors last 12,000 withheld for evaluation
  • 12.
    kinetics prognosis liposomespolymerskinetics prognosis liposomes polymers 4 1 2 3 LEARNING 12 Biomedical Document Label Ranker Meta Labeler Number of relevant labels: 2 Output is: {prognosis, liposomes} - Any multi-label learning algorithm that can output a ranking of the labels - We used a linear SVM per label and considered their unthresholded output - Regression or (ordinal) classification using original features or label scores/ranks - We used linear SVM regression based on the original features Tang, L., Rajan, S., Narayanan, V.K. “Large scale multi-label classification via metalabeler”, Proc. 18th Int. Conf. on World Wide Web (WWW '09)
  • 13.
    TIME AND SPACE Hardware 4 10-core processors at 2.26 GHz, 1 Tb RAM and 2.4 Tb storage (6 x 600 Gb SAS 10k disks in RAID 5) Parallel learning/use of binary SVMs  With 40 threads training & saving takes 36h  With 20 threads loading & prediction takes 45m Serialization  Storing the models in 2013 required 406 Gb  10x compression due to sparsity (L1 regularized models)
  • 14.
    ENSEMBLE APPROACHES: CLASSIFIERSELECTION Select the model that improves the corresponding F-measure most  Jimeno-Yepes, A., Mork, J.G., Demner-Fushman, D., Aronson, A.R.: A one-size-fits- all indexing method does not exist: Automatic selection based on meta-learning. JCSE 6(2), 151-160 (2012)  No good for global non-decomposable evaluation measures, like micro-F Iteratively select the model that improves micro-F  Fan, R.E., Lin, C.J.: A study on threshold selection for multi-label classification. Technical report, National Taiwan University (2007)  Can we trust the evaluation based on only a few positive samples? 14
  • 15.
    MULE: MULTI-LABEL ENSEMBLE 1.Determine the globally best model ℎ∗ on a validation set  Globally best model has been determined on positive samples of all labels 2. Determine for each label which model(s) would lead to an improvement of the global evaluation measure compared to ℎ∗ 3. Compare the differences in predictions of each one of these models against the predictions of ℎ∗ using a McNemar test 4. If the NH is rejected for one or more models, select the one for which the NH has lowest probability, otherwise select ℎ∗  Robustness to uncertainty due to label rarity 15
  • 16.
    EMPIRICAL RESULTS INBIOASQ 16 Model Micro-F Macro-F Vanilla SVMs 0.5568 0.4789 Weighted SVMs 0.5665 0.5102 MetaLabeler 0.5855 0.5488 Labeled LDA 0.3698 0.3010 Ensemble Micro-F Macro-F Improve F 0.5584 (all) 0.5339 (MetaLabeler, Weighted SVM) Improve Micro-F 0.5867 (all) - MULE 0.5892 (all) 0.5492 (MetaLabeler, Labeled LDA)
  • 17.
    2017 WORK Improved versionof Labeled LDA in both speed and accuracy  Y. Papanikolaou, G. Tsoumakas, Subset Labeled LDA for Large-Scale Multi-Label Classification, arXiv:1709.05480 Employing word2vec features More ensembles  Stacking-based, frequency-based Deep learning models  Deep MLP, CNN 17
  • 18.
    OUTLINE 1. Semantic indexingof biomedical literature 2. Article screening in systematic reviews  A. Anagnostou, A. Lagopoulos, G. Tsoumakas, I. Vlahavas (2017) Combining Inter-Review Learning-to-Rank and Intra-Review Incremental Training for Title and Abstract Screening in Systematic Reviews, eHealth Lab of the 8th Conference and Labs of the Evaluation Forum (CLEF) 3. Modality classification of biomedical figures 4. PICO sentence identification 5. Funding information extraction 18
  • 19.
    DIAGNOSTIC TEST ACCURACY(DTA) REVIEWS Title  Thromboelastography (TEG) and rotational thromboelastometry (ROTEM) for trauma-induced coagulopathy in adult trauma patients with bleeding Ovid MEDLINE query (Thrombelastogra$ or Thromboelastogra$ or (thromb$ adj2 elastogra$) or TEG or haemoscope or haemonetics).mp Thrombelastography/ (thromboelasto$ or thrombelasto$ or (thromb$ adj2 elastom$) or (rotational adj2 thrombelast) or ROTEM or "tem international").mp. 1 or 2 or 3 exp animals/ not humans.sh. 4 not 5 limit 6 to yr="1970 -Current" 19
  • 20.
    THE DATA Training data 20 topics, retrieved articles and relevance after abstract/content screening Test data  30 topics and retrieved articles 20 1 10 100 1000 10000 100000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Pos Neg
  • 21.
    OUR APPROACH Hybrid classificationmechanism Inter-topic model, learning title/abstract relevance across different topics Intra-topic model, learning title/abstract relevance within a specific topic
  • 22.
    INTER-TOPIC MODEL Learning-to-rank binaryclassifier Features assessing the similarity of the document (title, abstract) with the topic (title, ovid query)  Common terms  Levenshtein distance  Cosine similarity  BM25
  • 23.
    INTRA-TOPIC MODEL: INITIALTRAINING 23 TF-IDF
  • 24.
    INTRA-TOPIC MODEL: ITERATIVETRAINING Parameters  Initial size 𝑘 {5, 10}  1st batch size {1}  1st threshold {200, 300}  2nd batch size {50, 100}  2nd threshold {1000, 2000} 24 TF-IDF
  • 25.
    RESULTS Inter-Topic: eXtreme GradientBoosting (XGBoost) Intra-topic: Support Vector Machine (SVM) 25 k 1st S 1st T 2nd S 2nd T 5 1 200 100 2000 10 1 300 100 2000 10 1 200 100 1000 10 1 200 50 2000 AP Last R@10 R@20 AUR 0.3 2143 0.66 0.87 0.93 0.29 2124 0.66 0.87 0.92 0.29 2183 0.66 0.87 0.92 0.29 2119 0.66 0.87 0.92
  • 26.
    RESULTS 2nd in 14teams behind University of Waterloo 26
  • 27.
    FUTURE WORK More features,semantic representations  Word2vec  Glove  LDA Under-sampling CLEF eHEALTH 2018 featuring again the same task
  • 28.
    OUTLINE 1. Semantic indexingof biomedical literature 2. Article screening in systematic reviews 3. Modality classification of biomedical figures  A. Lagopoulos, A. Fachantidis, G. Tsoumakas (2017), Multi-Label Modality Classification for Figures in Biomedical Literature, 30th IEEE International Symposium on Computer-Based Medical Systems (CBMS) 4. PICO sentence identification 5. Funding information extraction 28
  • 29.
    PUBMED CENTRAL (PMC) Morethan 4 million figures available Great source of information for biomedical research, education and clinical decision Lack of associated meta-data impede access to this information 29
  • 30.
    30 different modalities asproposed by ImageCLEF
  • 31.
    SIMPLE VS COMPOUND Simple(60%) Compound (40%)
  • 32.
    THE STANDARD APPROACH Compound Figure Detection Multi-class Model SimpleFigure Compound Subfigures Figure Separation Algorithm Figure separation is not perfect (~85%) Figure isolation ⇒ information loss
  • 33.
    OUR APPROACH: MULTI-LABELCLASSIFICATION No use of figure separation algorithm Three different multi-label learning approaches: • Simple • Standard • Extended
  • 34.
  • 35.
  • 36.
  • 37.
    MODEL TRAINING Feature Extractionfrom JPEG • BVLC model - Caffe1 • Deep learning (1.2 million images) • 4096 visual features/figure Linear Support Vector Machines (SVMs) • Scikit-learn2 • One-vs-Rest transformation (multiple binaries) 37 1 HTTP://CAFFE.BERKELEYVISION.ORG/ 2 HTTP://SCIKIT-LEARN.ORG/
  • 38.
    IMAGECLEF 2016 DATASET 20.985Figures 1.568 Compound No simple figures with categories Extracted subfigures as simple Split 40% - 60% (compound – subfigures) in order to follow the distribution of PMC
  • 39.
    RESULTS Approach F1-Macro F1-MicroF1-Samples Standard (100% separation) 0.3569 0.7786 0.7912 Simple multi-label 0.3139 0.7581 0.7215 Standard multi-label 0.3270 0.7667 0.7726 Extended multi-label 0.3309 0.7666 0.7728
  • 40.
    THE SYSTEM Web app Weeklyupdates from PMC Extended multi-label approach Easy search & filtering by modality Build with Apache Solr & AngularJS Available @ atypon.csd.auth.gr/medieval/
  • 41.
    FUTURE WORK Learning approach Textual representation based on caption and full text Medieval system  Crowdsourcing  Active learning  Gamification 48
  • 42.
    OUTLINE 1. Semantic indexingof biomedical literature 2. Article screening in systematic reviews 3. Modality classification of biomedical figures 4. PICO sentence identification 5. Funding information extraction 49
  • 43.
  • 44.
    PICO SENTENCE IDENTIFICATION 1stversion, ~0.51 F-measure  100 abstracts annotated by computer science PhD students  Sentence representation (w2v > tf-idf), structural features  MLPs, GaussianNB > SVMs, XGBoost 2nd version  120 more abstracts to be annotated by medical experts  Additional feature engineering  Semi-supervised learning approaches  Deep learning approaches revisited 51
  • 45.
    OUTLINE 1. Semantic indexingof biomedical literature 2. Article screening in systematic reviews 3. Modality classification of biomedical figures 4. PICO sentence identification 5. Funding information extraction 52
  • 46.
    NEW TASK INBIOASQ 2017 Challenge tasks  Full Grant extraction, as combination of Grant ID and Grant Agency  Grant ID extraction, regardless of the corresponding Grant Agency  Grant Agency extraction, regardless of the specific Grant ID 104 agencies, as considered in the indexing procedure of NLM 53 Timespan Articles Grant IDs Grant Agencies Training set 2005 – 2013 63k 112k 128k Dry run set 2013 – 2015 15k 26k 31k Test set 2015 – 2017 23k - -
  • 47.
    EVALUATION Micro recall usedas evaluation measure  Up to 20 items per article  Up to 4 unique Grant Agencies (without Grant ID info) per article  Up to 2 unique Grant Agencies per Grant ID Sample ground truth for an article  { "pmid":"17082206", "pmcid":"1634735", "grantList": [ {"agency":"Wellcome Trust" }, {"grantID":"BB/C51320X/1","agency":"Biotechnology and Biological Sciences Research Council"}]}, 54
  • 48.
    RESULTS 55 AUTH: Simple approachbased on regular expressions Fudan: Regular expressions combined with machine learning Grant ID Grant Agency Full Grant Fudan 0.9705 0.9907 0.9526 AUTH 0.9498 0.9862 0.9412 DZG 0.9235 0.9122 0.8443 BioASQ 0.8167 0.8312 0.7174
  • 49.
    WRAP UP 1. Semanticindexing of biomedical literature 2. Article screening in systematic reviews 3. Modality classification of biomedical figures 4. PICO sentence identification 5. Funding information extraction 56
  • 50.
    MACHINE LEARNING FORUNDERSTANDING BIOMEDICAL PUBLICATIONS Grigorios Tsoumakas, School of informatics, Aristotle university of Thessaloniki with: A. Anagnostou, A. Fachantidis, A. Lagopoulos, M. Laliotis, N. Markantonatos, Y. Papanikolaou, I. Vlahavas