SlideShare a Scribd company logo
1 of 65
Large-scale integration 
of data and text 
Lars Juhl Jensen
cellular network biology
association networks
guilt by association
molecular networks
proteins
string-db.org
small molecules
stitch-db.org
subcellular localization
compartments.jensenlab.org
tissue expression
tissues.jensenlab.org
disease associations
usage statistics
heavily used
especially in the US
data integration
heterogeneous data
curated knowledge
experimental data
computational predictions
many databases
different formats
different identifiers
variable quality
not comparable
hard work
common identifiers
quality scores
score calibration
missing most of the data
text mining
>10 km
named entity recognition
comprehensive lexicon
cyclin dependent kinase 1
CDC2
orthographic variation
hCdc2
“black list”
SDS
co-mentioning
counting
within documents
within paragraphs
within sentences
IDG-specific tasks
target classification
text mining
“protein studiedness”
probabilistic counting
resource integration
disease associations
tissue expression
subcellular localization
automation of updates
web services
remapping of identifiers
predictions for dark matter
network-based inference
questions?

More Related Content

What's hot

Networks of proteins and diseases
Networks of proteins and diseasesNetworks of proteins and diseases
Networks of proteins and diseases
Lars Juhl Jensen
 
Turning big data and text collections into web resrouces
Turning big data and text collections into web resroucesTurning big data and text collections into web resrouces
Turning big data and text collections into web resrouces
Lars Juhl Jensen
 

What's hot (20)

Gene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and textGene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and text
 
Gene Association Networks: Large-scale integration of data and text
Gene Association Networks: Large-scale integration of data and textGene Association Networks: Large-scale integration of data and text
Gene Association Networks: Large-scale integration of data and text
 
The STRING database - Quality scores for heterogeneous interaction data
The STRING database - Quality scores for heterogeneous interaction dataThe STRING database - Quality scores for heterogeneous interaction data
The STRING database - Quality scores for heterogeneous interaction data
 
Gene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and textGene association networks: Large-scale integration of data and text
Gene association networks: Large-scale integration of data and text
 
STRING - Protein networks from data and text mining
STRING - Protein networks from data and text miningSTRING - Protein networks from data and text mining
STRING - Protein networks from data and text mining
 
Network biology: Large-scale data and text mining
Network biology: Large-scale data and text miningNetwork biology: Large-scale data and text mining
Network biology: Large-scale data and text mining
 
The STRING database
The STRING databaseThe STRING database
The STRING database
 
The STRING database and related tools
The STRING database and related toolsThe STRING database and related tools
The STRING database and related tools
 
Network Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and textNetwork Biology: Large-scale integration of data and text
Network Biology: Large-scale integration of data and text
 
Introduction to STRING
Introduction to STRINGIntroduction to STRING
Introduction to STRING
 
Gene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and textGene association networks - Large-scale integration of data and text
Gene association networks - Large-scale integration of data and text
 
Networks of proteins and diseases
Networks of proteins and diseasesNetworks of proteins and diseases
Networks of proteins and diseases
 
Integration of biomedical literature and databases
Integration of biomedical literature and databasesIntegration of biomedical literature and databases
Integration of biomedical literature and databases
 
Turning big data and text collections into web resrouces
Turning big data and text collections into web resroucesTurning big data and text collections into web resrouces
Turning big data and text collections into web resrouces
 
Integration of heterogeneous data
Integration of heterogeneous dataIntegration of heterogeneous data
Integration of heterogeneous data
 
STRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous dataSTRING & related databases: Large-scale integration of heterogeneous data
STRING & related databases: Large-scale integration of heterogeneous data
 
Scientific Highlights: The Reflect and NetPhorest web resources
Scientific Highlights: The Reflect and NetPhorest web resourcesScientific Highlights: The Reflect and NetPhorest web resources
Scientific Highlights: The Reflect and NetPhorest web resources
 
STRING/STITCH tutorial
STRING/STITCH tutorialSTRING/STITCH tutorial
STRING/STITCH tutorial
 
The STRING database
The STRING databaseThe STRING database
The STRING database
 
One tagger, many uses - Illustrating the power of ontologies in named entity ...
One tagger, many uses - Illustrating the power of ontologies in named entity ...One tagger, many uses - Illustrating the power of ontologies in named entity ...
One tagger, many uses - Illustrating the power of ontologies in named entity ...
 

Viewers also liked

Making gene networks through data integration
Making gene networks through data integrationMaking gene networks through data integration
Making gene networks through data integration
Lars Juhl Jensen
 
Medical data and text mining - Linking diseases, drugs, and adverse reactions
Medical data and text mining - Linking diseases, drugs, and adverse reactionsMedical data and text mining - Linking diseases, drugs, and adverse reactions
Medical data and text mining - Linking diseases, drugs, and adverse reactions
Lars Juhl Jensen
 

Viewers also liked (18)

Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
 
In silico and Text-Based Analysis of Cellular Networks
In silico and Text-Based Analysis of Cellular NetworksIn silico and Text-Based Analysis of Cellular Networks
In silico and Text-Based Analysis of Cellular Networks
 
The pragmatic text miner: It’s just another type of poorly standardized data
The pragmatic text miner: It’s just another type of poorly standardized dataThe pragmatic text miner: It’s just another type of poorly standardized data
The pragmatic text miner: It’s just another type of poorly standardized data
 
STRING: protein association networks
STRING: protein association networksSTRING: protein association networks
STRING: protein association networks
 
Making gene networks through data integration
Making gene networks through data integrationMaking gene networks through data integration
Making gene networks through data integration
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
 
Real-time tagging of biomedical entities
Real-time tagging of biomedical entitiesReal-time tagging of biomedical entities
Real-time tagging of biomedical entities
 
Text mining for organism and environment names
Text mining for organism and environment namesText mining for organism and environment names
Text mining for organism and environment names
 
Biomedical text mining and network analysis
Biomedical text mining and network analysisBiomedical text mining and network analysis
Biomedical text mining and network analysis
 
Medical data and text mining - Linking diseases, drugs, and adverse reactions
Medical data and text mining - Linking diseases, drugs, and adverse reactionsMedical data and text mining - Linking diseases, drugs, and adverse reactions
Medical data and text mining - Linking diseases, drugs, and adverse reactions
 
Large-scale biomedical data and text integration
Large-scale biomedical data and text integrationLarge-scale biomedical data and text integration
Large-scale biomedical data and text integration
 
Statistics on big biomedical data - Methods and pitfalls when analyzing high-...
Statistics on big biomedical data - Methods and pitfalls when analyzing high-...Statistics on big biomedical data - Methods and pitfalls when analyzing high-...
Statistics on big biomedical data - Methods and pitfalls when analyzing high-...
 
Large-scale integration of data and text
Large-scale integration of data and textLarge-scale integration of data and text
Large-scale integration of data and text
 
Text and data integration
Text and data integrationText and data integration
Text and data integration
 
The Literature Text Mining Approach In Cancer Research
The Literature Text Mining Approach In Cancer ResearchThe Literature Text Mining Approach In Cancer Research
The Literature Text Mining Approach In Cancer Research
 
Large-scale data and text mining - Linking proteins, chemicals, and side effects
Large-scale data and text mining - Linking proteins, chemicals, and side effectsLarge-scale data and text mining - Linking proteins, chemicals, and side effects
Large-scale data and text mining - Linking proteins, chemicals, and side effects
 
Data and text mining of Danish electronic health records
Data and text mining of Danish electronic health recordsData and text mining of Danish electronic health records
Data and text mining of Danish electronic health records
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
 

Similar to Large-scale integration of data and text

Data integration with STRING
Data integration with STRINGData integration with STRING
Data integration with STRING
Lars Juhl Jensen
 
Network biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text miningNetwork biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text mining
Lars Juhl Jensen
 
Pragmatic text mining: From literature to electronic health records
Pragmatic text mining: From literature to electronic health recordsPragmatic text mining: From literature to electronic health records
Pragmatic text mining: From literature to electronic health records
Lars Juhl Jensen
 
Network biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text miningNetwork biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text mining
Lars Juhl Jensen
 
Systems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systemsSystems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systems
Lars Juhl Jensen
 
The pragmatic text miner: From literature to electronic health records
The pragmatic text miner: From literature to electronic health recordsThe pragmatic text miner: From literature to electronic health records
The pragmatic text miner: From literature to electronic health records
Lars Juhl Jensen
 
Large-scale data and text mining
Large-scale data and text miningLarge-scale data and text mining
Large-scale data and text mining
Lars Juhl Jensen
 

Similar to Large-scale integration of data and text (18)

Data integration with STRING
Data integration with STRINGData integration with STRING
Data integration with STRING
 
Network biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text miningNetwork biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text mining
 
Pragmatic text mining: From literature to electronic health records
Pragmatic text mining: From literature to electronic health recordsPragmatic text mining: From literature to electronic health records
Pragmatic text mining: From literature to electronic health records
 
Network biology: A crash course on STRING and Cytoscape
Network biology: A crash course on STRING and CytoscapeNetwork biology: A crash course on STRING and Cytoscape
Network biology: A crash course on STRING and Cytoscape
 
The pragmatic text miner: It's just another type of poorly standardized data
The pragmatic text miner: It's just another type of poorly standardized dataThe pragmatic text miner: It's just another type of poorly standardized data
The pragmatic text miner: It's just another type of poorly standardized data
 
Network biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text miningNetwork biology: Large-scale data integration and text mining
Network biology: Large-scale data integration and text mining
 
Systems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systemsSystems biology - Bioinformatics on complete biological systems
Systems biology - Bioinformatics on complete biological systems
 
Biomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritizationBiomarker bioinformatics: Network-based candidate prioritization
Biomarker bioinformatics: Network-based candidate prioritization
 
The pragmatic text miner: From literature to electronic health records
The pragmatic text miner: From literature to electronic health recordsThe pragmatic text miner: From literature to electronic health records
The pragmatic text miner: From literature to electronic health records
 
Advanced bioinformatics of proteomics datasets
Advanced bioinformaticsof proteomics datasetsAdvanced bioinformaticsof proteomics datasets
Advanced bioinformatics of proteomics datasets
 
One tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicineOne tagger, many uses: Simple text-mining strategies for biomedicine
One tagger, many uses: Simple text-mining strategies for biomedicine
 
One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...One tagger, many uses: Illustrating the power of dictionary-based named entit...
One tagger, many uses: Illustrating the power of dictionary-based named entit...
 
Large-scale data and text mining
Large-scale data and text miningLarge-scale data and text mining
Large-scale data and text mining
 
Network biology - Large-scale integration of data and text
Network biology - Large-scale integration of data and textNetwork biology - Large-scale integration of data and text
Network biology - Large-scale integration of data and text
 
STRING - Large-scale integration of data and text
STRING - Large-scale integration of data and textSTRING - Large-scale integration of data and text
STRING - Large-scale integration of data and text
 
STRING: Protein association networks
STRING: Protein association networksSTRING: Protein association networks
STRING: Protein association networks
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
 
STRING: Protein networks from data and text mining
STRING: Protein networks from data and text miningSTRING: Protein networks from data and text mining
STRING: Protein networks from data and text mining
 

More from Lars Juhl Jensen

More from Lars Juhl Jensen (20)

Extract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotationExtract 2.0: Text-mining-assisted interactive annotation
Extract 2.0: Text-mining-assisted interactive annotation
 
Network visualization: A crash course on using Cytoscape
Network visualization: A crash course on using CytoscapeNetwork visualization: A crash course on using Cytoscape
Network visualization: A crash course on using Cytoscape
 
STRING & STITCH : Network integration of heterogeneous data
STRING & STITCH: Network integration of heterogeneous dataSTRING & STITCH: Network integration of heterogeneous data
STRING & STITCH : Network integration of heterogeneous data
 
Biomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured textBiomedical text mining: Automatic processing of unstructured text
Biomedical text mining: Automatic processing of unstructured text
 
Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...Medical network analysis: Linking diseases and genes through data and text mi...
Medical network analysis: Linking diseases and genes through data and text mi...
 
Network Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and CytoscapeNetwork Biology: A crash course on STRING and Cytoscape
Network Biology: A crash course on STRING and Cytoscape
 
Cellular networks
Cellular networksCellular networks
Cellular networks
 
Cellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and textCellular Network Biology: Large-scale integration of data and text
Cellular Network Biology: Large-scale integration of data and text
 
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
Statistics on big biomedical data: Methods and pitfalls when analyzing high-t...
 
Tagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognitionTagger: Rapid dictionary-based named entity recognition
Tagger: Rapid dictionary-based named entity recognition
 
Medical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactionsMedical text mining: Linking diseases, drugs, and adverse reactions
Medical text mining: Linking diseases, drugs, and adverse reactions
 
Network biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and textNetwork biology: Large-scale integration of data and text
Network biology: Large-scale integration of data and text
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
 
Cellular Network Biology
Cellular Network BiologyCellular Network Biology
Cellular Network Biology
 
The Art of Counting: Scoring and ranking co-occurrences in literature
The Art of Counting: Scoring and ranking co-occurrences in literatureThe Art of Counting: Scoring and ranking co-occurrences in literature
The Art of Counting: Scoring and ranking co-occurrences in literature
 
Text-mining-based retrieval of protein networks
Text-mining-based retrieval of protein networksText-mining-based retrieval of protein networks
Text-mining-based retrieval of protein networks
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
 

Recently uploaded

Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Cherry
 
Pteris : features, anatomy, morphology and lifecycle
Pteris : features, anatomy, morphology and lifecyclePteris : features, anatomy, morphology and lifecycle
Pteris : features, anatomy, morphology and lifecycle
Cherry
 
COMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demeritsCOMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demerits
Cherry
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Cherry
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
ONLINE VOTING SYSTEM SE Project for vote
ONLINE VOTING SYSTEM SE Project for voteONLINE VOTING SYSTEM SE Project for vote
ONLINE VOTING SYSTEM SE Project for vote
RaunakRastogi4
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Cherry
 

Recently uploaded (20)

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Daily Lesson Log in Science 9 Fourth Quarter Physics
Daily Lesson Log in Science 9 Fourth Quarter PhysicsDaily Lesson Log in Science 9 Fourth Quarter Physics
Daily Lesson Log in Science 9 Fourth Quarter Physics
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
Genome Projects : Human, Rice,Wheat,E coli and Arabidopsis.
Genome Projects : Human, Rice,Wheat,E coli and Arabidopsis.Genome Projects : Human, Rice,Wheat,E coli and Arabidopsis.
Genome Projects : Human, Rice,Wheat,E coli and Arabidopsis.
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.
 
Pteris : features, anatomy, morphology and lifecycle
Pteris : features, anatomy, morphology and lifecyclePteris : features, anatomy, morphology and lifecycle
Pteris : features, anatomy, morphology and lifecycle
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Energy is the beat of life irrespective of the domains. ATP- the energy curre...
Energy is the beat of life irrespective of the domains. ATP- the energy curre...Energy is the beat of life irrespective of the domains. ATP- the energy curre...
Energy is the beat of life irrespective of the domains. ATP- the energy curre...
 
COMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demeritsCOMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demerits
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Terpineol and it's characterization pptx
Terpineol and it's characterization pptxTerpineol and it's characterization pptx
Terpineol and it's characterization pptx
 
Plasmid: types, structure and functions.
Plasmid: types, structure and functions.Plasmid: types, structure and functions.
Plasmid: types, structure and functions.
 
ONLINE VOTING SYSTEM SE Project for vote
ONLINE VOTING SYSTEM SE Project for voteONLINE VOTING SYSTEM SE Project for vote
ONLINE VOTING SYSTEM SE Project for vote
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx