SlideShare a Scribd company logo
1 of 58
Lars Juhl Jensen
@larsjuhljensen
One tagger, many uses
Illustrating the power of dictionary-based named entity
recognition
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
dictionary
genes / proteins
diseases
expansion rules
prefixes and suffixes
curated blacklist
SDS
software
C++ tagger
>1000 abstracts / second
70–80% recall
80–90% precision
open source
bitbucket.org/larsjuhljensen/tagger/
Docker
hub.docker.com/r/larsjuhljensen/tagger/
web service
tagger.jensenlab.org
community resources
Extract
extract.jensenlab.org
STRING
string-db.org
string-db.org
DISEASES
diseases.jensenlab.org
Cytoscape
curated knowledge
experimental data
co-occurrence text mining
Medline abstracts
<1 km
15 million full-text articles
Westergaard et al., BioRxiv, 2017
~50% more associations
electronic health records
Jensen et al., Nature Reviews Genetics, 2012
in Danish
dictionary
drugs
adverse events
in Danish
named entity recognition
temporal correlations
Drug introduction Drug discontinuation
Adverse eventNegative modifier Indication Pre-existing
condition
Adverse drug reaction Possible
adverse drug reaction
Adverse event
ADR of
additional drug
Identification start
Eriksson et al., Drug Safety, 2014
find novel associations
summary
broadly applicable
keep it simple
free tools
Acknowledgments
Evangelos Pafilis
Sune Pletscher-
Frankild
Nadezhda
Doncheva
Damian Szklarczyk
Michael Kuhn
Robert Eriksson
John “Scooter”
Morris
Tudor Oprea
Christian von
Mering
Peer Bork
Christos Arvanitidis
Søren Brunak
One tagger, many uses: Illustrating the power of dictionary-based named entity recognition

More Related Content

What's hot

Andre Dewanto Resume
Andre Dewanto ResumeAndre Dewanto Resume
Andre Dewanto Resume
Andre Dewanto
 
Genomics on the Half Shell: Making Science more Open
Genomics on the Half Shell: Making Science more OpenGenomics on the Half Shell: Making Science more Open
Genomics on the Half Shell: Making Science more Open
sr320
 
NLP_BioAssayPoster
NLP_BioAssayPosterNLP_BioAssayPoster
NLP_BioAssayPoster
Suman Lama
 
AndreaOrmosMS_Resume06042015
AndreaOrmosMS_Resume06042015AndreaOrmosMS_Resume06042015
AndreaOrmosMS_Resume06042015
Andrea Ormos
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
bosc
 
Kyle Pollard Resume
Kyle Pollard ResumeKyle Pollard Resume
Kyle Pollard Resume
Kyle Pollard
 

What's hot (20)

Dr Justin Schonfeld - Bioinformatics Applications
Dr Justin Schonfeld - Bioinformatics ApplicationsDr Justin Schonfeld - Bioinformatics Applications
Dr Justin Schonfeld - Bioinformatics Applications
 
Andre Dewanto Resume
Andre Dewanto ResumeAndre Dewanto Resume
Andre Dewanto Resume
 
Genomics on the Half Shell: Making Science more Open
Genomics on the Half Shell: Making Science more OpenGenomics on the Half Shell: Making Science more Open
Genomics on the Half Shell: Making Science more Open
 
Dr Julie Stahlhut - Barcode Data Life-cycle
Dr Julie Stahlhut - Barcode Data Life-cycleDr Julie Stahlhut - Barcode Data Life-cycle
Dr Julie Stahlhut - Barcode Data Life-cycle
 
Viral genome sequencing
Viral genome sequencingViral genome sequencing
Viral genome sequencing
 
Collaborative Genomic Data Analyses in the Cloud
Collaborative Genomic Data Analyses in the CloudCollaborative Genomic Data Analyses in the Cloud
Collaborative Genomic Data Analyses in the Cloud
 
NLP_BioAssayPoster
NLP_BioAssayPosterNLP_BioAssayPoster
NLP_BioAssayPoster
 
Science Communication and Impact: A Researcher's Perspective
Science Communication and Impact: A Researcher's PerspectiveScience Communication and Impact: A Researcher's Perspective
Science Communication and Impact: A Researcher's Perspective
 
Global Ranavirus Consortium
Global Ranavirus ConsortiumGlobal Ranavirus Consortium
Global Ranavirus Consortium
 
AndreaOrmosMS_Resume06042015
AndreaOrmosMS_Resume06042015AndreaOrmosMS_Resume06042015
AndreaOrmosMS_Resume06042015
 
Welch Wordifier Bosc2009
Welch Wordifier Bosc2009Welch Wordifier Bosc2009
Welch Wordifier Bosc2009
 
Resume 2015
Resume 2015Resume 2015
Resume 2015
 
TRY - a global database of plant traits
TRY - a global database of plant traitsTRY - a global database of plant traits
TRY - a global database of plant traits
 
Systems Biology Modeling of the Brain in Health & Disease
Systems Biology Modeling of the Brain in Health & DiseaseSystems Biology Modeling of the Brain in Health & Disease
Systems Biology Modeling of the Brain in Health & Disease
 
Ensc 5530 jan2017 ci my draft
Ensc 5530 jan2017 ci my draftEnsc 5530 jan2017 ci my draft
Ensc 5530 jan2017 ci my draft
 
Kyle Pollard Resume
Kyle Pollard ResumeKyle Pollard Resume
Kyle Pollard Resume
 
TaylorSmith_CV
TaylorSmith_CVTaylorSmith_CV
TaylorSmith_CV
 
reference MArk Watson
reference MArk Watsonreference MArk Watson
reference MArk Watson
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
TAIR Presentation ASPB 2016
TAIR Presentation ASPB 2016TAIR Presentation ASPB 2016
TAIR Presentation ASPB 2016
 

Similar to One tagger, many uses: Illustrating the power of dictionary-based named entity recognition

OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
Sean Davis
 
Systems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systemsSystems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systems
Lars Juhl Jensen
 

Similar to One tagger, many uses: Illustrating the power of dictionary-based named entity recognition (20)

One tagger, many uses - Illustrating the power of ontologies in named entity ...
One tagger, many uses - Illustrating the power of ontologies in named entity ...One tagger, many uses - Illustrating the power of ontologies in named entity ...
One tagger, many uses - Illustrating the power of ontologies in named entity ...
 
A Distributed Annotation Pipeline for MSSNG
A Distributed Annotation Pipeline for MSSNGA Distributed Annotation Pipeline for MSSNG
A Distributed Annotation Pipeline for MSSNG
 
The pragmatic text miner - It's just another type of poorly standardized data
The pragmatic text miner - It's just another type of poorly standardized dataThe pragmatic text miner - It's just another type of poorly standardized data
The pragmatic text miner - It's just another type of poorly standardized data
 
The pragmatic text miner: It’s just another type of poorly standardized data
The pragmatic text miner: It’s just another type of poorly standardized dataThe pragmatic text miner: It’s just another type of poorly standardized data
The pragmatic text miner: It’s just another type of poorly standardized data
 
Large-scale data and text mining - Linking proteins, chemicals, and side effects
Large-scale data and text mining - Linking proteins, chemicals, and side effectsLarge-scale data and text mining - Linking proteins, chemicals, and side effects
Large-scale data and text mining - Linking proteins, chemicals, and side effects
 
Network biology - Large-scale integration of data and text
Network biology - Large-scale integration of data and textNetwork biology - Large-scale integration of data and text
Network biology - Large-scale integration of data and text
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
 
STRING: protein association networks
STRING: protein association networksSTRING: protein association networks
STRING: protein association networks
 
STRING: Protein association networks
STRING: Protein association networksSTRING: Protein association networks
STRING: Protein association networks
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
 
Text mining for organism and environment names
Text mining for organism and environment namesText mining for organism and environment names
Text mining for organism and environment names
 
STRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous dataSTRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous data
 
Protein association networks with STRING
Protein association networks with STRINGProtein association networks with STRING
Protein association networks with STRING
 
OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
 
Dictionary-based named entity recognition
Dictionary-based named entity recognitionDictionary-based named entity recognition
Dictionary-based named entity recognition
 
Systems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systemsSystems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systems
 
David
DavidDavid
David
 
Network biology: Large-scale data and text mining
Network biology: Large-scale data and text miningNetwork biology: Large-scale data and text mining
Network biology: Large-scale data and text mining
 
2014 Taverna Tutorial Introduction to eScience and workflows
2014 Taverna Tutorial Introduction to eScience and workflows2014 Taverna Tutorial Introduction to eScience and workflows
2014 Taverna Tutorial Introduction to eScience and workflows
 

More from Lars Juhl Jensen

More from Lars Juhl Jensen (20)

Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using CytoscapeNetwork visualization: A crash course on using Cytoscape
Network visualization: A crash course on using Cytoscape
 
STRING & STITCH : Network integration of heterogeneous data
STRING & STITCH: Network integration of heterogeneous dataSTRING & STITCH: Network integration of heterogeneous data
STRING & STITCH : Network integration of heterogeneous data
 
Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...
 
Network Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and CytoscapeNetwork Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and Cytoscape
 
Cellular networks
Cellular networksCellular networks
Cellular networks
 
Cellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and textCellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and text
 
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
 
Network Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and textNetwork Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and text
 
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactionsMedical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactions
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
 
Cellular Network Biology
Cellular Network BiologyCellular Network Biology
Cellular Network Biology
 
Biomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritizationBiomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritization
 
The Art of Counting: Scoring and ranking co-occurrences in literature
The Art of Counting: Scoring and ranking co-occurrences in literatureThe Art of Counting: Scoring and ranking co-occurrences in literature
The Art of Counting: Scoring and ranking co-occurrences in literature
 
Text-mining-based retrieval of protein networks
Text-mining-based retrieval of protein networksText-mining-based retrieval of protein networks
Text-mining-based retrieval of protein networks
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
 
Gene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and textGene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and text
 
Protein association networks: Large-scale integration of data and text
Protein association networks: Large-scale integration of data and textProtein association networks: Large-scale integration of data and text
Protein association networks: Large-scale integration of data and text
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
 
Gene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and textGene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and text
 

Recently uploaded

POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 

Recently uploaded (20)

POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptx
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.ppt
 

One tagger, many uses: Illustrating the power of dictionary-based named entity recognition