SlideShare a Scribd company logo
1 of 110
Lars Juhl Jensen
Pragmatic text mining
From literature to electronic health
records
why text mining?
data mining
guilt by association
structured data
unstructured text
biomedical literature
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
text corpus
comprehensive lexicon
synonyms
expansion rules
prefixes and suffixes
flexible matching
hyphens and spaces
“black list”
a
the formal way
benchmarking
manually annotated corpus
automatic tagging
compare
quality metrics
precision
recall
F-score
manually annotated corpus
use existing corpus
not new
make new corpus
hard work
natural language processing
part-of-speech tagging
semantic tagging
syntactic parsing
Gene and protein names
Cue words for entity recognition
Verbs for relation extraction
[nxexpr The expression of
[nxgene the cytochrome genes
[nxpg CYC1 and CYC7]]]
is controlled by
[nxpg HAP1]
the pragmatic way
benchmarking light
requires fewer calories
non-annotated corpus
automatic tagging
random sampling
manual inspection
precision
no recall
relative recall
compare methods
co-mentioning
within documents
within paragraphs
within sentences
weighted score
gold standard
associations good?
tagging good enough
unifying text & data
text mining
curated knowledge
experimental data
computational predictions
many sources
different identifiers
varying quality
common identifiers
gold standard
web resources
protein networks
string-db.org
chemical networks
stitch-db.org
subcellular localization
compartments.jensenlab.org
tissue expression
tissues.jensenlab.org
disease associations
electronic health records
Jensen et al., Nature Reviews Genetics, 2012
structured data
Jensen et al., Nature Reviews Genetics, 2012
unstructured data
clinical narrative
Danish
busy doctors
typos
text mining
comprehensive dictionary
adverse drug reactions
drugs
Clozapine
Clozapine
clozapi
n
clossapi
n
klozapin
e
chlosapi
n
chlosapi
ne
chlozapi
n
chlozapi
ne
klossapi
n
closapin
e
klozapi
nklosapi
n
temporal correlations
hand-crafted rules
Eriksson et al., Drug Safety, 2014
Drug introduction Drug discontinuationAdverse event
Adverse eventNegative modifier Indication Pre-existing
condition
Adverse drug reaction Possible
adverse drug reaction
ADR of
additional drug
Eriksson et al., Drug Safety, 2014
Drug introduction Drug discontinuationAdverse eventIdentification start
Adverse eventNegative modifier Indication Pre-existing
condition
Adverse drug reaction Possible
adverse drug reaction
ADR of
additional drug
Eriksson et al., Drug Safety, 2014
Drug introduction Drug discontinuation
Adverse eventNegative modifier Indication Pre-existing
condition
Adverse drug reaction Possible
adverse drug reaction
Adverse event
ADR of
additional drug
Identification start
Eriksson et al., Drug Safety, 2014
Drug introduction Drug discontinuation
Adverse eventNegative modifier Indication Pre-existing
condition
Adverse drug reaction Possible
adverse drug reaction
Adverse event
ADR of
additional drug
Identification start
refind known ADRs
discover new ADRs
Acknowledgments
Molecular networks
Michael Kuhn
Damian Szklarczyk
Andrea
Franceschini
Milan Simonovic
Alexander Roth
Sune Frankild
Christian von
Mering
Peer Bork
Localization and disease
Sune Frankild
Alberto Santos
Janos Binder
Kalliopi Tsafou
Christian Stolte
Albert Palleja
Heiko Horn
Reinhardt Schneider
Sean O’ Donoghue
Medical text mining
Robert Eriksson
Peter Bjødstrup
Jensen
Sune Frankild
Thomas Werge
Søren Brunak

More Related Content

Viewers also liked

Viewers also liked (7)

Semantic Technologies for Healthcare
Semantic Technologies for HealthcareSemantic Technologies for Healthcare
Semantic Technologies for Healthcare
 
Nlp based retrieval of medical information for diagnosis of human diseases
Nlp based retrieval of medical information for diagnosis of human diseasesNlp based retrieval of medical information for diagnosis of human diseases
Nlp based retrieval of medical information for diagnosis of human diseases
 
Leveraging Text Classification Strategies for Clinical and Public Health Appl...
Leveraging Text Classification Strategies for Clinical and Public Health Appl...Leveraging Text Classification Strategies for Clinical and Public Health Appl...
Leveraging Text Classification Strategies for Clinical and Public Health Appl...
 
Text analytics and R - Open Question: is it a good match?
Text analytics and R - Open Question: is it a good match?Text analytics and R - Open Question: is it a good match?
Text analytics and R - Open Question: is it a good match?
 
Big Data Analytics Using Hadoop Cluster On Amazon EMR
Big Data Analytics Using Hadoop Cluster  On Amazon EMRBig Data Analytics Using Hadoop Cluster  On Amazon EMR
Big Data Analytics Using Hadoop Cluster On Amazon EMR
 
A short tutorial on r
A short tutorial on rA short tutorial on r
A short tutorial on r
 
Data science with spark on amazon EMR - Pop-up Loft Tel Aviv
Data science with spark on amazon EMR - Pop-up Loft Tel AvivData science with spark on amazon EMR - Pop-up Loft Tel Aviv
Data science with spark on amazon EMR - Pop-up Loft Tel Aviv
 

Similar to Pragmatic text mining: From literature to electronic health records

Systems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological systemSystems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological system
Lars Juhl Jensen
 

Similar to Pragmatic text mining: From literature to electronic health records (20)

The pragmatic text miner - It's just another type of poorly standardized data
The pragmatic text miner - It's just another type of poorly standardized dataThe pragmatic text miner - It's just another type of poorly standardized data
The pragmatic text miner - It's just another type of poorly standardized data
 
The pragmatic text miner: It’s just another type of poorly standardized data
The pragmatic text miner: It’s just another type of poorly standardized dataThe pragmatic text miner: It’s just another type of poorly standardized data
The pragmatic text miner: It’s just another type of poorly standardized data
 
The pragmatic text miner: It’s just another type of poorly standardized data
The pragmatic text miner: It’s just another type of poorly standardized dataThe pragmatic text miner: It’s just another type of poorly standardized data
The pragmatic text miner: It’s just another type of poorly standardized data
 
Prediction of protein networks through data integration
Prediction of protein networks through data integrationPrediction of protein networks through data integration
Prediction of protein networks through data integration
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
 
The pragmatic text miner: It's just another type of poorly standardized data
The pragmatic text miner: It's just another type of poorly standardized dataThe pragmatic text miner: It's just another type of poorly standardized data
The pragmatic text miner: It's just another type of poorly standardized data
 
Advanced bioinformatics of proteomics datasets
Advanced bioinformaticsof proteomics datasetsAdvanced bioinformaticsof proteomics datasets
Advanced bioinformatics of proteomics datasets
 
Biomedical literature mining
Biomedical literature miningBiomedical literature mining
Biomedical literature mining
 
The STRING database and related tools
The STRING database and related toolsThe STRING database and related tools
The STRING database and related tools
 
Text mining for organism and environment names
Text mining for organism and environment namesText mining for organism and environment names
Text mining for organism and environment names
 
Network integration of heterogeneous data
Network integration of heterogeneous dataNetwork integration of heterogeneous data
Network integration of heterogeneous data
 
Functional association networks - The STRING and STITCH web resources
Functional association networks - The STRING and STITCH web resourcesFunctional association networks - The STRING and STITCH web resources
Functional association networks - The STRING and STITCH web resources
 
Utilizing literature for biological discovery
Utilizing literature for biological discoveryUtilizing literature for biological discovery
Utilizing literature for biological discovery
 
The STRING database
The STRING databaseThe STRING database
The STRING database
 
The STRING database
The STRING databaseThe STRING database
The STRING database
 
STRING - Prediction of protein networks through integration of diverse large-...
STRING - Prediction of protein networks through integration of diverse large-...STRING - Prediction of protein networks through integration of diverse large-...
STRING - Prediction of protein networks through integration of diverse large-...
 
Biological literature mining - from information retrieval to biological disco...
Biological literature mining - from information retrieval to biological disco...Biological literature mining - from information retrieval to biological disco...
Biological literature mining - from information retrieval to biological disco...
 
Systems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological systemSystems biology: Bioinformatics on complete biological system
Systems biology: Bioinformatics on complete biological system
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
 
Introduction to STRING
Introduction to STRINGIntroduction to STRING
Introduction to STRING
 

More from Lars Juhl Jensen

More from Lars Juhl Jensen (20)

One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...
 
One tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicineOne tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicine
 
Extract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotationExtract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotation
 
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using CytoscapeNetwork visualization: A crash course on using Cytoscape
Network visualization: A crash course on using Cytoscape
 
STRING & STITCH : Network integration of heterogeneous data
STRING & STITCH: Network integration of heterogeneous dataSTRING & STITCH: Network integration of heterogeneous data
STRING & STITCH : Network integration of heterogeneous data
 
Biomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured textBiomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured text
 
Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...
 
Network Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and CytoscapeNetwork Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and Cytoscape
 
Cellular networks
Cellular networksCellular networks
Cellular networks
 
Cellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and textCellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and text
 
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
 
STRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous dataSTRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous data
 
Tagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognitionTagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognition
 
Network Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and textNetwork Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and text
 
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactionsMedical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactions
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
 
Cellular Network Biology
Cellular Network BiologyCellular Network Biology
Cellular Network Biology
 
Biomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritizationBiomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritization
 
The Art of Counting: Scoring and ranking co-occurrences in literature
The Art of Counting: Scoring and ranking co-occurrences in literatureThe Art of Counting: Scoring and ranking co-occurrences in literature
The Art of Counting: Scoring and ranking co-occurrences in literature
 

Recently uploaded

CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
Cherry
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
Cherry
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Cherry
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Cherry
 
Lipids: types, structure and important functions.
Lipids: types, structure and important functions.Lipids: types, structure and important functions.
Lipids: types, structure and important functions.
Cherry
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
Cherry
 

Recently uploaded (20)

GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Understanding Partial Differential Equations: Types and Solution Methods
Understanding Partial Differential Equations: Types and Solution MethodsUnderstanding Partial Differential Equations: Types and Solution Methods
Understanding Partial Differential Equations: Types and Solution Methods
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptx
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
Daily Lesson Log in Science 9 Fourth Quarter Physics
Daily Lesson Log in Science 9 Fourth Quarter PhysicsDaily Lesson Log in Science 9 Fourth Quarter Physics
Daily Lesson Log in Science 9 Fourth Quarter Physics
 
Efficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationEfficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence acceleration
 
Lipids: types, structure and important functions.
Lipids: types, structure and important functions.Lipids: types, structure and important functions.
Lipids: types, structure and important functions.
 
GBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) MetabolismGBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) Metabolism
 
CONTRIBUTION OF PANCHANAN MAHESHWARI.pptx
CONTRIBUTION OF PANCHANAN MAHESHWARI.pptxCONTRIBUTION OF PANCHANAN MAHESHWARI.pptx
CONTRIBUTION OF PANCHANAN MAHESHWARI.pptx
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 

Pragmatic text mining: From literature to electronic health records