SlideShare a Scribd company logo
1 of 27
Quantifying the content of Biomedical Semantic
Resources as a core for Drug Discovery
Platforms
Ali Hasnain and Dietrich Rebholz-Schuhmann
May 2017
Agenda
• Introduction
• Motivation
• Ontologies
• Biomedical Ontologies
• Drugs and Chemical Compound Ontologies
• Upper level Ontologies
• Data Repositories/ Databases for Drug Discovery
• Gene, Gene Expression and Protein Databases
• Pathway databases
• Chemical and Structure Databases
• Disease Specific Databases for Prevention
• Literature databases
• Life Sciences Linked Open Data Cloud
• Linked Open Drug Data (LODD)
• Bio2RDF
• LinkedLifeData
• Related Work
• Conclusion
2
Introduction
• Biomedical data exists as ontologies, repositories, and
other open data resources e.g, Life Science Linked
Open Data (LS- LOD) relevant in the context of Drug
Discovery and Cancer Chemoprevention.
• The analysis gives an overview of which resources
have to be considered, what amount of data requires
integration and provides the opportunity to tailor
semantic solutions to specific needs in terms of size
and performance.
We live in a world of data
Motivation
4
Linked Data for Cancer Chemoprevention
• Because Biomedical Data is heterogeneous and spread
across multiple sources
5
~5 molecs testable in
the lab
~2000 small
molecs
~100 molecs
~ 10 interesting
pathways
Literature
Insilicomodels
Browsedatabases
Hypothesis
Generation
Linked Data
Heterogeneous Data – Multiple Data sources
DrugBank
DailyMed
CheBI, KEGG
Reactome
Sider
BioPax
Medicare
6
Biomedical Data Integration
nih:EGFR
epidermal growth
factor receptor
Homo
sapiens
CCCCGGCGCAGCGCGGCCGCAGCA
GCCTCCGCCCCCCGCACGGTGTGA
GCGCCCGACGCGGCCGAGGCGG …
nih:EGF
nci:has_description
nih:sequence
nih:organism
nih:interacts
nih:organism
rea:EGFR
rea:Membrane
rea:Receptor
rea:Transferase
rea:keyword
rea:keyword
rea:keyword
NCBI Reactome
sameAs
7
Ontologies
These ontologies can fall into three main categories:
1. The Biomedical ontologies are mainly used by biomedical
applications and define the basic biological structures
(e.g. genes, pathways etc).
2. The Drugs and Chemical Compound Ontologies are
related to the clinical drugs and their active ingredients.
3. The upper level ontologies describe general concepts that
many biomedical ontologies share.
8
Ontology spectrum by Jimeno et. al [1]
[1]: Antonio Jimeno-Yepes, Ernesto Jim´enez-Ruiz, Rafael Berlanga, and Dietrich Rebholz-Schuhmann. Use of shared lexical resources for efficient
ontological engineering. In Semantic Web Applications and Tools for Life Sciences Workshop (SWAT4LS). CEUR WS Proceedings, volume 435, pages 93–
136, 2008
9
Biomedical Ontologies (selected)
• Advancing Clinico-Genomic Trials on Cancer (ACGT) Master Ontology (MO)
– data exchange in oncology, integration of clinical and molecular data
• Biological Pathway Exchange (BioPAX)
– metabolic, biochemical, transcription regulation, protein synthesis, signal transduction
pathways
• Experimental Factor Ontology (EFO)
– enhance and promote consistent annotation, automatic annotation to integrate external
data
• Gene Ontology (GO)
– for describing biological processes, molecular functions and cellular components of gene
products
• Medical Subject Headings (MeSH)
– hierarchical structure for indexing, cataloguing, and searching for biomedical/ health-related
data.
• Microarray Gene Expression Data Ontology (MGED)
– the biological sample, the treatment sample and the micro-array chip technology in the
experiment
• National Cancer Institute (NCI) Thesaurus
– integrates molecular and clinical cancer-related information to integrate, retrieve and relate
concepts
• Ontology for biomedical Investigations (OBI)
– designs, protocols, instrumentation, materials, processes, data in biological & biomedical
investigations 10
Drugs and Chemical Compound Ontologies (selected)
• RxNorm
– standard names for clinical drugs active drug ingredient, dosage
strength, physical form) and links
• Basic Formal Ontology (BFO)
– formalise entities such as 3D enduring objects and comprehending
processes
• OBO Relation Ontology (RO)
– formal definitions of basic relations that cross-cut the biomedical domain
• Provenance Ontology (PROVO)
– provides classes, properties and restrictions for provenance information
11
Generic and Upper Ontologies (selected)
Statistical overview of implementation details of
Ontologies (selected)
Ontology Category Year* Topic Implementation Classes Properties Individuals Depth
ACGT-MO Biomedical 2008 Cancer OWL/CVC/RDF/XML 1769 260 61 18
BioPAX Biomedical 2010 Pathways OWL/CVC/RDF/XML 68 96 0 4
EFO Biomedical 2015 Experimental Factors OWL/CVC/RDF/XML 18596 35 0 14
GO Biomedical 2016 Genomics and Proteomic OWL/CVC/RDF/XML 4419 9 0 16
MeSH Biomedical 2009 Health RDF/TTL/ CSV 252375 38 0 15
MGED Biomedical 2009 Microarray Experiment OWL/CVC/RDF/XML 233 121 698 8
NCIT Biomedical 2007 Clinical care OWL/CVC/RDF/XML 118167 173 45715 16
OBI Biomedical 2008 Experimental Data OWL/CVC/RDF/XML 2932 106 178 16
UMLS Biomedical 1993 Biomedical/ Health RDF 3221702 - - -
RxNorm Drugs 1993 Clinical Drugs OWL/CVC/RDF/XML 118555 46 0 0
BFO Generic 2003 Genuine Upper Ontology OWL/CVC/RDF/XML 35 0 0 5
RO Generic 2005
Relations used in all OBO
ontologies
OWL/CVC/RDF/XML - - - -
PROVO Generic 2012 PROV Data Model OWL/CVC/RDF/XML 30 50 4 3
*Statistics as of Aug 2016 - listed at BioPortal- Year specify the time when the last- most recent version is
produce. “-” means information not available.
12
Classes vs. Properties plot (Selected Ontologies)
13
1769
260
68
96
18596
35
4419
9
252375
38
233
121
118167
173
2932
106
3221702
118555
4635 30
50
CLASSES PROPERTIES
ACGT-MO BioPAX EFO GO MeSH MGED NCIT
OBI UMLS RxNorm BFO RO PROVO
Public Data Repositories for Drug Discovery
• The databases are separated into the following
categories:
– Gene, Gene Expression and Protein Databases for
gene and protein annotations as well as the expression
levels and related clinical data.
– Pathway Databases denoting the protein interactions and
the overall functional outcomes.
– Chemical and Structure Databases including Biological
Activities for the information related to drugs and other
chemicals including also toxicity observations and clinical
trials.
– Disease Specific Databases for Prevention which
deliver content specific to the prevention of cancer.
– Literature Databases
14
Gene, Gene Expression and Protein Databases
• GenBank
– over 65 B nucleotide bases in more than 61 M sequences
• ArrayExpress
– 65060 experiments 1'973'776 assays, annotated data for gene
expression from biological experiments
• Gene Expression Omnibus (GEO)
– 3'848 datasets gene expression for specific studies
• Universal Protein Resource (UniProt)
– 63'686'057 sequences, 21'364'768'379 amino acids
classifications, cross-references, annotation of proteins
• Protein Data Bank (PDB)
– 118280 Biological Structures evidence of experimentally
validated protein structures
• Protein Database
– 30'047Protein Entries, 41'327PPIs translated coding regions
from GenBank, TPA, SwissProt, PIR, PRF, UniProt and PDB.
15
Pathway Databases
• Kyoto Encyclopedia of Genes and Genomes (KEGG)
– 432'883PathwayMaps, 153'776hierarchies, genome
sequencing and high-throughput experimental technologies
• Reactome
– 9'386 Proteins and pathway data for signalling,
transcriptional regulation, translation, apoptosis, other
• Wikipathways
– 2'475 pathways complementing e g. KEGG, Reactome,
Pathway Commons
• cPath: Pathway Database Software
– 31'698 pathways, 1'151'476 interactions, pathway
visualisation, analysis and modelling
16
Chemical and Structure Databases including
Biological Activities
• Chemical Compounds Database (Chembase)
– 150'000 pages, compounds, their physical and chemical properties, mass spectra
• Chemical Entities of Biological Interest (ChEBI)
– 48'296 compounds, natural and synthetic atom, molecule, ion, radical, conformer
• DrugBank
– 8,261 drugs, 4,164 targets, 243 Enzymes, 118 Transporters, drug (chemical,
pharmaceutical), drug target (sequence, structure, pathway)
• PubChem
– 89'124'401 Compounds, compound neighbouring, sub/superstructure, bioactivity
data
• Aggregated Computational Toxicology Resource (ACToR)
– more than 500 public source , environmental chemicals searchable by name and
structure
• ClinicalTrials
– 213'868 studies , offers information for locating clinical trials for diseases and
conditions
• TOXicology Data NETwork (TOXNET)
– toxicology, hazardous chemicals, environmental health and related areas 17
Disease Specific Databases for
Prevention• Colon Chemoprevention Agents Database (CCAD)
– 1,137 agents and literature data for colon chemoprevention in human, rats,
mice
• Dietary Supplements Labels Database
– 5'000 brands of dietary supplements to compare label ingredients in different
brands. Links to other databases such as MedlinePlus and PubMed
• REPAIRtoire Database
– DNA damage links, pathways, proteins for DNA re-pair, diseases related to
mutations
• Pubmed
– journal citations i.e. Primary source of information for bio-medical researchers
• PubMed Dietary Supplement Subset
– dietary supplement literature including vitamin, mineral, botanical/herbal
supplements
18
Literature Databases
Statistical overview of implementation details of
libraries and databases (selected)
Database Category Year* Topic Implementation Size/ Stats
PubMed Literature 1996 Biomedical Literature WebBased/ CSV 11 M Journal citations
PDSS Literature 1999 Citations of dietary supplement WebBased X
DSLD Chemoprevention 2013 Ingredients of dietary supplement WebBased > 5000 selected brands
ClinicalTrials Toxicity 2000 Clinical Trials WebBased 213,868 studies
TOXNET Toxicity 1987 Toxicology Database WebBased X
ACToR Compound 2008 Chemical Toxicity Data WebBased >500 public sources
DrugBank Compound 2008 Drug Data WebBased/LOD 8206 drugs
ChEBI Compound X Small Molecular entities WebBased/LOD 48,296 compounds
PubChem Compound 2004 Compound Structure WebBased/LOD 89,124,401 compounds
ChemSpider Chemical 2007 Compound Structure WebBased >40 million structures
KEGG Pathway 1995 Genomic, Chemical, systemic WebBased/LOD 432883pathway maps
Reactome Pathway 2003 Pathways WebBased 9386 proteins
Wikipathway Pathway 2007 Biological pathways WebBased 2475 pathways
cPath Pathway 2005 Biological pathways Desktop/WebBased 31698 pathways
Uniprot Protein 2002 Protein Sequence WebBased/LOD 63686057sequences
PDB Protein 1971 3D structural data of Proteins WebBased/LOD 30,047protein
*Statistics as of Aug 2016 - Year specify the time when the last- most recent version is produce. “X” means
information not available.
19
Life Sciences Linked Open Data Cloud
• Linked biomedical datasets relevant in a Cancer Chemoprevention
and drug discovery scenario:
– Linked Open Drug Data (LODD)
• Set of linked datasets relevant to Drug Discovery that includes data
from several datasets including Drugbank, LinkedCT, DailyMed,
Diseasome, SIDER, STITCH, Medicare, RxNorm, ClinicalTrials.gov,
NCBI Entrez Gene and OMIM.
– Bio2RDF
• Contains multiple linked biological databases including pathways
databases such as KEGG, PDB and several NCBIs databases. An
open-source project that uses Semantic Web technologies to build
and provide the largest network of Linked Data for the Life Sciences.
– LinkedLifeData
• A semantic data integration platform for the biomedical domain
containing 5 billion RDF statements from various sources including
UniProt, PubMed, EntrezGene and 20 more.
20
The Linked Open Data Cloud
“Life sciences will drive adoption of the Semantic Web, just as high-energy physics
drove the early Web.”
- Sir Tim Berners-Lee, 2005
Proteins
Molecules
Genes
Diseases
21
Meaningful Biomedical Correlation
Proteins
Molecules
Genes
Diseases
:Protein
:Molecule
:Gene
:Disease
Uniprot
PDB
Pfam PROSITE
ProDom
Uniref
UniPark Daily
medDrug
Bank ChemBL
Pub
Chem KEGG
Gene
Ontology
GeneID
Affy
metrix
Homo
gene
MGI
Disea
some
SIDER
22
Statistical overview of datasets involved in LS-
LOD, Bio2RDF and LLD (selected)
Dataset Category Year* Topic Size/ Coverage
Drugbank LODD 2010 Drugs 766920 triples, 4800 drugs
LinkedCT LODD X Clinical Trials 25 M triples, 106000 trials
DailyMed LODD 2010 Drugs 1604983 triples, >36K products
Dbpedia LODD 2009 Drugs/ Diseases/Proteins 218M triples, 2300 drugs, 2200 proteins
Diseasome LODD 2010 Diseases/ Genes 91182 triples, 2600 genes
SIDER LODD 2010 Diseases/ Side Effects 192515 triples, 63K effects, 1737 genes
STITCH LODD 2010 Chemicals/ Proteins 7.5 M chemicals, 0.5 M proteins
ChEMBLE LODD 2010 Assay/ Proteins/ Organisms 130 M triples
Affymetrix Bio2RDF 2014 Microarrays 8694237 triples, 6679943 entities
BioModels Bio2RDF 2014 Biological/ mathematical models 2380009 triples, 188308 entities
BioPortal Bio2RDF 2014 Biological/ biomedical entities 19920395 triples, 2199594 entities
KEGG Bio2RDF 2014 Genes 50197150 triples, 6533307 entities
PharmaG-KB Bio2RDF 2014 Genotypes/ Phenotypes 278049209 triples, 25325504 entities
PubMed Bio2RDF 2014 Citations 5005343905 triples, 412593720 entities
Taxonomy Bio2RDF 2014 Taxonomy 21310356 triples, 1147211 entities
LLD LLD 2014 Drugs, Chromosomes 10192641644 statements
*Statistics as of Aug 2016 (source DataHub) - Year specify the time when the last- most recent version is
produced. “X” means information not available.
23
Triples vs. Unique Entities (selected LS-LOD datasets)
24
86942371
6679943
2380009
188380
19920395
2199594
409942525
50061452
98835804
7337123
326720894
19768641
8801487
530538
3672531
316950
73048
6995
11663
1129
97520151
5950074
3628205
372136
7189769
869985
2323345
176579
3306107223
364255265
48781511
3110993
50197150
6533307
2174579
59776
55914
5032
7323864
305401
# OF TRIPLES # OF UNIQUE ENTITIES
[affymetrix] [biomodels] [bioportal] [chembl] [clinicaltrials] [ctd] [dbsnp] [drugbank] [genage] [gendr]
[goa] [hgnc] [homologene] [interpro] [iproclass] [irefindex] [kegg] [linkedspl] [lsr] [mesh]
[2]: Zeginis, D., et al.: A collaborative methodology for developing a semantic model for interlinking Cancer
Chemoprevention linked-data sources. Semantic Web (2013)
[3]: Hasnain, A.’ et al.: Linked Biomedical Dataspace: Lessons Learned integrating Data for Drug Discovery. In:
International Semantic Web Conference (In-Use Track), October 2014 (2014)
Related Work (selected)
• Zeginis et al. [2] proposed “meet-in-the-middle” approach to develop
the semantic model relevant for cancer chemoprevention. Relevant
data was analysed in a bottom-up fashion from analysing the
domain whereas a top-down approach was considered to collect
ontologies, vocabularies and data models.
• Hasnain et al. [3] proposed Linked Biomedical Dataspace (access
and use biomedical resources relevant for cancer chemoprevention)
with components namely:
– a) knowledge extraction,
– b) link creation,
– c) query execution and
– d) knowledge publishing.
25
Conclusion
• In this paper we introduce and classify different tiers of biomedical Data
relevant to Cancer Chemoprevention and Drug Discovery domain.
• This involves Ontologies, databases and Life Science Linked Open Data in
Healthcare, Life Sciences and Biomedical Domain
• We classify ontologies into three main classes:
– i) biomedical Ontologies (e.g. EFO, OBI, GO etc),
– ii) Drugs and Chemical Compound Ontologies (e.g. RxNorm) and
– iii) Generic and Upper Ontologies (e.g. BFO, RO, PROV).
• Similarly we categorise libraries and databases in five categories:
– (i) Gene, Gene Expression and Protein Databases,
– (ii) Pathway databases,
– (iii) Chemical and Structure Databases including Biological Activities,
– (iv) Disease Specific Databases for Prevention, and
– (v) Literature databases.
26
Thank you

More Related Content

What's hot

Role of bioinformatics in drug designing
Role of bioinformatics in drug designingRole of bioinformatics in drug designing
Role of bioinformatics in drug designingW Roseybala Devi
 
INBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria LópezINBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria LópezINBIOMEDvision
 
IUPHAR/MMV Guide to Malaria Pharmacology - BioMalPar XV
IUPHAR/MMV Guide to Malaria Pharmacology - BioMalPar XVIUPHAR/MMV Guide to Malaria Pharmacology - BioMalPar XV
IUPHAR/MMV Guide to Malaria Pharmacology - BioMalPar XVGuide to PHARMACOLOGY
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...David Peyruc
 
Applicationsofbioinformaticsindrugdiscoveryandprocess
ApplicationsofbioinformaticsindrugdiscoveryandprocessApplicationsofbioinformaticsindrugdiscoveryandprocess
Applicationsofbioinformaticsindrugdiscoveryandprocessjaidev53ster
 
Integrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming DataIntegrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming DataJoel Saltz
 
United States Patent Application Publication
United States Patent Application PublicationUnited States Patent Application Publication
United States Patent Application PublicationCây thuốc Việt
 
Genomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and developmentGenomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and developmentSuchittaU
 
Drug Repurposing- Amazing Opportunity
Drug Repurposing- Amazing Opportunity Drug Repurposing- Amazing Opportunity
Drug Repurposing- Amazing Opportunity Dr Seema Kohli
 
Bioinformatics in drug discovery
Bioinformatics in drug discoveryBioinformatics in drug discovery
Bioinformatics in drug discoveryKAUSHAL SAHU
 
Pathway studiosymposium lorenzi
Pathway studiosymposium lorenziPathway studiosymposium lorenzi
Pathway studiosymposium lorenziAnn-Marie Roche
 
Drug discovery and development
Drug discovery and developmentDrug discovery and development
Drug discovery and developmentrahul_pharma
 

What's hot (20)

ISO 20428 Intro
ISO 20428 IntroISO 20428 Intro
ISO 20428 Intro
 
IUPHAR/BPS Guide to Pharmacology
IUPHAR/BPS Guide to PharmacologyIUPHAR/BPS Guide to Pharmacology
IUPHAR/BPS Guide to Pharmacology
 
Thesis Defence 05-26-2016
Thesis Defence 05-26-2016Thesis Defence 05-26-2016
Thesis Defence 05-26-2016
 
Role of bioinformatics in drug designing
Role of bioinformatics in drug designingRole of bioinformatics in drug designing
Role of bioinformatics in drug designing
 
Project Hippocrates
Project HippocratesProject Hippocrates
Project Hippocrates
 
MURI Summer
MURI SummerMURI Summer
MURI Summer
 
INBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria LópezINBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria López
 
IUPHAR/MMV Guide to Malaria Pharmacology - BioMalPar XV
IUPHAR/MMV Guide to Malaria Pharmacology - BioMalPar XVIUPHAR/MMV Guide to Malaria Pharmacology - BioMalPar XV
IUPHAR/MMV Guide to Malaria Pharmacology - BioMalPar XV
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: tranSMART a Data Warehous...
 
Applicationsofbioinformaticsindrugdiscoveryandprocess
ApplicationsofbioinformaticsindrugdiscoveryandprocessApplicationsofbioinformaticsindrugdiscoveryandprocess
Applicationsofbioinformaticsindrugdiscoveryandprocess
 
Integrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming DataIntegrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming Data
 
BCATSfinal
BCATSfinalBCATSfinal
BCATSfinal
 
United States Patent Application Publication
United States Patent Application PublicationUnited States Patent Application Publication
United States Patent Application Publication
 
Genomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and developmentGenomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and development
 
Drug Repurposing- Amazing Opportunity
Drug Repurposing- Amazing Opportunity Drug Repurposing- Amazing Opportunity
Drug Repurposing- Amazing Opportunity
 
Drug repurposing
Drug repurposingDrug repurposing
Drug repurposing
 
Bioinformatics in drug discovery
Bioinformatics in drug discoveryBioinformatics in drug discovery
Bioinformatics in drug discovery
 
Pathway studiosymposium lorenzi
Pathway studiosymposium lorenziPathway studiosymposium lorenzi
Pathway studiosymposium lorenzi
 
In silico repositioning of approved drugs for rare and neglected diseases
In silico repositioning of approved drugs for rare and neglected diseases In silico repositioning of approved drugs for rare and neglected diseases
In silico repositioning of approved drugs for rare and neglected diseases
 
Drug discovery and development
Drug discovery and developmentDrug discovery and development
Drug discovery and development
 

Similar to Quantifying Biomedical Semantic Resources for Drug Discovery Platforms

Enzymes as drug targets: curated pharmacological information in the 'Guide to...
Enzymes as drug targets: curated pharmacological information in the 'Guide to...Enzymes as drug targets: curated pharmacological information in the 'Guide to...
Enzymes as drug targets: curated pharmacological information in the 'Guide to...Guide to PHARMACOLOGY
 
Big Data Analytics in the Health Domain
Big Data Analytics in the Health DomainBig Data Analytics in the Health Domain
Big Data Analytics in the Health DomainBigData_Europe
 
Development of Tohoku Medical Megabank Integrated Database ”dbTMM”
Development of Tohoku Medical Megabank Integrated Database ”dbTMM”Development of Tohoku Medical Megabank Integrated Database ”dbTMM”
Development of Tohoku Medical Megabank Integrated Database ”dbTMM”ogishima
 
Ontologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataOntologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataYannick Pouliot
 
Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Sachin Kumar
 
Guide to PHARMACOLOGY: a web-Based Compendium for Research and Education
Guide to PHARMACOLOGY: a web-Based Compendium for Research and EducationGuide to PHARMACOLOGY: a web-Based Compendium for Research and Education
Guide to PHARMACOLOGY: a web-Based Compendium for Research and EducationChris Southan
 
Predicting Drug Candidates Safety : the Role and Usage of Knowledge Bases
Predicting Drug Candidates Safety : the Role and Usage of Knowledge BasesPredicting Drug Candidates Safety : the Role and Usage of Knowledge Bases
Predicting Drug Candidates Safety : the Role and Usage of Knowledge BasesAureus Sciences
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europeopen_phacts
 
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...Remedy Informatics
 
iOMICS Clinical & Omnia
iOMICS Clinical & OmniaiOMICS Clinical & Omnia
iOMICS Clinical & OmniaInterpretOmics
 
Wim de Grave: Big Data in life sciences
Wim de Grave:  Big Data in life sciencesWim de Grave:  Big Data in life sciences
Wim de Grave: Big Data in life sciencesFlávio Codeço Coelho
 
Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016Rick Silva
 
Introduction to bioinformatics.pptx
Introduction to bioinformatics.pptxIntroduction to bioinformatics.pptx
Introduction to bioinformatics.pptxMortezaGhandadi1
 
INFORMATICS 2.pptx
INFORMATICS 2.pptxINFORMATICS 2.pptx
INFORMATICS 2.pptxOramadevi1
 
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...Vaticle
 
Bioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolBioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolJesminBinti
 

Similar to Quantifying Biomedical Semantic Resources for Drug Discovery Platforms (20)

Enzymes as drug targets: curated pharmacological information in the 'Guide to...
Enzymes as drug targets: curated pharmacological information in the 'Guide to...Enzymes as drug targets: curated pharmacological information in the 'Guide to...
Enzymes as drug targets: curated pharmacological information in the 'Guide to...
 
Big Data Analytics in the Health Domain
Big Data Analytics in the Health DomainBig Data Analytics in the Health Domain
Big Data Analytics in the Health Domain
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery Systems
 
Development of Tohoku Medical Megabank Integrated Database ”dbTMM”
Development of Tohoku Medical Megabank Integrated Database ”dbTMM”Development of Tohoku Medical Megabank Integrated Database ”dbTMM”
Development of Tohoku Medical Megabank Integrated Database ”dbTMM”
 
Ontologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataOntologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological Data
 
Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics Databases pathways of genomics and proteomics
Databases pathways of genomics and proteomics
 
Guide to PHARMACOLOGY: a web-Based Compendium for Research and Education
Guide to PHARMACOLOGY: a web-Based Compendium for Research and EducationGuide to PHARMACOLOGY: a web-Based Compendium for Research and Education
Guide to PHARMACOLOGY: a web-Based Compendium for Research and Education
 
Predicting Drug Candidates Safety : the Role and Usage of Knowledge Bases
Predicting Drug Candidates Safety : the Role and Usage of Knowledge BasesPredicting Drug Candidates Safety : the Role and Usage of Knowledge Bases
Predicting Drug Candidates Safety : the Role and Usage of Knowledge Bases
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
Ontology-Driven Clinical Intelligence: A Path from the Biobank to Cross-Disea...
 
Bioinformatics .pptx
Bioinformatics .pptxBioinformatics .pptx
Bioinformatics .pptx
 
iOMICS Clinical & Omnia
iOMICS Clinical & OmniaiOMICS Clinical & Omnia
iOMICS Clinical & Omnia
 
Wim de Grave: Big Data in life sciences
Wim de Grave:  Big Data in life sciencesWim de Grave:  Big Data in life sciences
Wim de Grave: Big Data in life sciences
 
Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016Reg Sci Lecture Dec 2016
Reg Sci Lecture Dec 2016
 
Introduction to bioinformatics.pptx
Introduction to bioinformatics.pptxIntroduction to bioinformatics.pptx
Introduction to bioinformatics.pptx
 
INFORMATICS 2.pptx
INFORMATICS 2.pptxINFORMATICS 2.pptx
INFORMATICS 2.pptx
 
INFORMATICS 2.pptx
INFORMATICS 2.pptxINFORMATICS 2.pptx
INFORMATICS 2.pptx
 
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolBioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST Tool
 

More from Syed Muhammad Ali Hasnain

SHARP: Harmonizing cross-workflow Provenance
SHARP: Harmonizing cross-workflow ProvenanceSHARP: Harmonizing cross-workflow Provenance
SHARP: Harmonizing cross-workflow ProvenanceSyed Muhammad Ali Hasnain
 
SHARP: Harmonizing Galaxy and Taverna workflow provenance
SHARP: Harmonizing Galaxy and Taverna workflow provenanceSHARP: Harmonizing Galaxy and Taverna workflow provenance
SHARP: Harmonizing Galaxy and Taverna workflow provenanceSyed Muhammad Ali Hasnain
 
Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...
Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...
Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...Syed Muhammad Ali Hasnain
 
An Approach for Discovering and Exploring Semantic Relationships between Genes
An Approach for Discovering and Exploring Semantic Relationships between GenesAn Approach for Discovering and Exploring Semantic Relationships between Genes
An Approach for Discovering and Exploring Semantic Relationships between GenesSyed Muhammad Ali Hasnain
 
Federated Query Formulation and Processing through BioFed
Federated Query Formulation and Processing through BioFedFederated Query Formulation and Processing through BioFed
Federated Query Formulation and Processing through BioFedSyed Muhammad Ali Hasnain
 
Processing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web TechnologiesProcessing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web TechnologiesSyed Muhammad Ali Hasnain
 
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data CloudA Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data CloudSyed Muhammad Ali Hasnain
 
Improving discovery in Life Sciences Linked Open Data Cloud
Improving discovery in Life Sciences Linked Open Data CloudImproving discovery in Life Sciences Linked Open Data Cloud
Improving discovery in Life Sciences Linked Open Data CloudSyed Muhammad Ali Hasnain
 
Knowledge Processing with Big Data and Semantic Web Technologies
Knowledge Processing with Big Data and  Semantic Web TechnologiesKnowledge Processing with Big Data and  Semantic Web Technologies
Knowledge Processing with Big Data and Semantic Web TechnologiesSyed Muhammad Ali Hasnain
 
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and ExecutionFedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and ExecutionSyed Muhammad Ali Hasnain
 

More from Syed Muhammad Ali Hasnain (11)

Fair data vs 5 star open data final
Fair data vs 5 star open data finalFair data vs 5 star open data final
Fair data vs 5 star open data final
 
SHARP: Harmonizing cross-workflow Provenance
SHARP: Harmonizing cross-workflow ProvenanceSHARP: Harmonizing cross-workflow Provenance
SHARP: Harmonizing cross-workflow Provenance
 
SHARP: Harmonizing Galaxy and Taverna workflow provenance
SHARP: Harmonizing Galaxy and Taverna workflow provenanceSHARP: Harmonizing Galaxy and Taverna workflow provenance
SHARP: Harmonizing Galaxy and Taverna workflow provenance
 
Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...
Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...
Exploiting Cognitive Computing and Frame Semantic Features for Biomedical Doc...
 
An Approach for Discovering and Exploring Semantic Relationships between Genes
An Approach for Discovering and Exploring Semantic Relationships between GenesAn Approach for Discovering and Exploring Semantic Relationships between Genes
An Approach for Discovering and Exploring Semantic Relationships between Genes
 
Federated Query Formulation and Processing through BioFed
Federated Query Formulation and Processing through BioFedFederated Query Formulation and Processing through BioFed
Federated Query Formulation and Processing through BioFed
 
Processing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web TechnologiesProcessing Life Science Data at Scale - using Semantic Web Technologies
Processing Life Science Data at Scale - using Semantic Web Technologies
 
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data CloudA Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
A Provenance assisted Roadmap for Life Sciences Linked Open Data Cloud
 
Improving discovery in Life Sciences Linked Open Data Cloud
Improving discovery in Life Sciences Linked Open Data CloudImproving discovery in Life Sciences Linked Open Data Cloud
Improving discovery in Life Sciences Linked Open Data Cloud
 
Knowledge Processing with Big Data and Semantic Web Technologies
Knowledge Processing with Big Data and  Semantic Web TechnologiesKnowledge Processing with Big Data and  Semantic Web Technologies
Knowledge Processing with Big Data and Semantic Web Technologies
 
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and ExecutionFedViz: A Visual Interface for SPARQL Queries Formulation and Execution
FedViz: A Visual Interface for SPARQL Queries Formulation and Execution
 

Recently uploaded

Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 

Recently uploaded (20)

Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 

Quantifying Biomedical Semantic Resources for Drug Discovery Platforms

  • 1. Quantifying the content of Biomedical Semantic Resources as a core for Drug Discovery Platforms Ali Hasnain and Dietrich Rebholz-Schuhmann May 2017
  • 2. Agenda • Introduction • Motivation • Ontologies • Biomedical Ontologies • Drugs and Chemical Compound Ontologies • Upper level Ontologies • Data Repositories/ Databases for Drug Discovery • Gene, Gene Expression and Protein Databases • Pathway databases • Chemical and Structure Databases • Disease Specific Databases for Prevention • Literature databases • Life Sciences Linked Open Data Cloud • Linked Open Drug Data (LODD) • Bio2RDF • LinkedLifeData • Related Work • Conclusion 2
  • 3. Introduction • Biomedical data exists as ontologies, repositories, and other open data resources e.g, Life Science Linked Open Data (LS- LOD) relevant in the context of Drug Discovery and Cancer Chemoprevention. • The analysis gives an overview of which resources have to be considered, what amount of data requires integration and provides the opportunity to tailor semantic solutions to specific needs in terms of size and performance.
  • 4. We live in a world of data Motivation 4
  • 5. Linked Data for Cancer Chemoprevention • Because Biomedical Data is heterogeneous and spread across multiple sources 5 ~5 molecs testable in the lab ~2000 small molecs ~100 molecs ~ 10 interesting pathways Literature Insilicomodels Browsedatabases Hypothesis Generation Linked Data
  • 6. Heterogeneous Data – Multiple Data sources DrugBank DailyMed CheBI, KEGG Reactome Sider BioPax Medicare 6
  • 7. Biomedical Data Integration nih:EGFR epidermal growth factor receptor Homo sapiens CCCCGGCGCAGCGCGGCCGCAGCA GCCTCCGCCCCCCGCACGGTGTGA GCGCCCGACGCGGCCGAGGCGG … nih:EGF nci:has_description nih:sequence nih:organism nih:interacts nih:organism rea:EGFR rea:Membrane rea:Receptor rea:Transferase rea:keyword rea:keyword rea:keyword NCBI Reactome sameAs 7
  • 8. Ontologies These ontologies can fall into three main categories: 1. The Biomedical ontologies are mainly used by biomedical applications and define the basic biological structures (e.g. genes, pathways etc). 2. The Drugs and Chemical Compound Ontologies are related to the clinical drugs and their active ingredients. 3. The upper level ontologies describe general concepts that many biomedical ontologies share. 8
  • 9. Ontology spectrum by Jimeno et. al [1] [1]: Antonio Jimeno-Yepes, Ernesto Jim´enez-Ruiz, Rafael Berlanga, and Dietrich Rebholz-Schuhmann. Use of shared lexical resources for efficient ontological engineering. In Semantic Web Applications and Tools for Life Sciences Workshop (SWAT4LS). CEUR WS Proceedings, volume 435, pages 93– 136, 2008 9
  • 10. Biomedical Ontologies (selected) • Advancing Clinico-Genomic Trials on Cancer (ACGT) Master Ontology (MO) – data exchange in oncology, integration of clinical and molecular data • Biological Pathway Exchange (BioPAX) – metabolic, biochemical, transcription regulation, protein synthesis, signal transduction pathways • Experimental Factor Ontology (EFO) – enhance and promote consistent annotation, automatic annotation to integrate external data • Gene Ontology (GO) – for describing biological processes, molecular functions and cellular components of gene products • Medical Subject Headings (MeSH) – hierarchical structure for indexing, cataloguing, and searching for biomedical/ health-related data. • Microarray Gene Expression Data Ontology (MGED) – the biological sample, the treatment sample and the micro-array chip technology in the experiment • National Cancer Institute (NCI) Thesaurus – integrates molecular and clinical cancer-related information to integrate, retrieve and relate concepts • Ontology for biomedical Investigations (OBI) – designs, protocols, instrumentation, materials, processes, data in biological & biomedical investigations 10
  • 11. Drugs and Chemical Compound Ontologies (selected) • RxNorm – standard names for clinical drugs active drug ingredient, dosage strength, physical form) and links • Basic Formal Ontology (BFO) – formalise entities such as 3D enduring objects and comprehending processes • OBO Relation Ontology (RO) – formal definitions of basic relations that cross-cut the biomedical domain • Provenance Ontology (PROVO) – provides classes, properties and restrictions for provenance information 11 Generic and Upper Ontologies (selected)
  • 12. Statistical overview of implementation details of Ontologies (selected) Ontology Category Year* Topic Implementation Classes Properties Individuals Depth ACGT-MO Biomedical 2008 Cancer OWL/CVC/RDF/XML 1769 260 61 18 BioPAX Biomedical 2010 Pathways OWL/CVC/RDF/XML 68 96 0 4 EFO Biomedical 2015 Experimental Factors OWL/CVC/RDF/XML 18596 35 0 14 GO Biomedical 2016 Genomics and Proteomic OWL/CVC/RDF/XML 4419 9 0 16 MeSH Biomedical 2009 Health RDF/TTL/ CSV 252375 38 0 15 MGED Biomedical 2009 Microarray Experiment OWL/CVC/RDF/XML 233 121 698 8 NCIT Biomedical 2007 Clinical care OWL/CVC/RDF/XML 118167 173 45715 16 OBI Biomedical 2008 Experimental Data OWL/CVC/RDF/XML 2932 106 178 16 UMLS Biomedical 1993 Biomedical/ Health RDF 3221702 - - - RxNorm Drugs 1993 Clinical Drugs OWL/CVC/RDF/XML 118555 46 0 0 BFO Generic 2003 Genuine Upper Ontology OWL/CVC/RDF/XML 35 0 0 5 RO Generic 2005 Relations used in all OBO ontologies OWL/CVC/RDF/XML - - - - PROVO Generic 2012 PROV Data Model OWL/CVC/RDF/XML 30 50 4 3 *Statistics as of Aug 2016 - listed at BioPortal- Year specify the time when the last- most recent version is produce. “-” means information not available. 12
  • 13. Classes vs. Properties plot (Selected Ontologies) 13 1769 260 68 96 18596 35 4419 9 252375 38 233 121 118167 173 2932 106 3221702 118555 4635 30 50 CLASSES PROPERTIES ACGT-MO BioPAX EFO GO MeSH MGED NCIT OBI UMLS RxNorm BFO RO PROVO
  • 14. Public Data Repositories for Drug Discovery • The databases are separated into the following categories: – Gene, Gene Expression and Protein Databases for gene and protein annotations as well as the expression levels and related clinical data. – Pathway Databases denoting the protein interactions and the overall functional outcomes. – Chemical and Structure Databases including Biological Activities for the information related to drugs and other chemicals including also toxicity observations and clinical trials. – Disease Specific Databases for Prevention which deliver content specific to the prevention of cancer. – Literature Databases 14
  • 15. Gene, Gene Expression and Protein Databases • GenBank – over 65 B nucleotide bases in more than 61 M sequences • ArrayExpress – 65060 experiments 1'973'776 assays, annotated data for gene expression from biological experiments • Gene Expression Omnibus (GEO) – 3'848 datasets gene expression for specific studies • Universal Protein Resource (UniProt) – 63'686'057 sequences, 21'364'768'379 amino acids classifications, cross-references, annotation of proteins • Protein Data Bank (PDB) – 118280 Biological Structures evidence of experimentally validated protein structures • Protein Database – 30'047Protein Entries, 41'327PPIs translated coding regions from GenBank, TPA, SwissProt, PIR, PRF, UniProt and PDB. 15
  • 16. Pathway Databases • Kyoto Encyclopedia of Genes and Genomes (KEGG) – 432'883PathwayMaps, 153'776hierarchies, genome sequencing and high-throughput experimental technologies • Reactome – 9'386 Proteins and pathway data for signalling, transcriptional regulation, translation, apoptosis, other • Wikipathways – 2'475 pathways complementing e g. KEGG, Reactome, Pathway Commons • cPath: Pathway Database Software – 31'698 pathways, 1'151'476 interactions, pathway visualisation, analysis and modelling 16
  • 17. Chemical and Structure Databases including Biological Activities • Chemical Compounds Database (Chembase) – 150'000 pages, compounds, their physical and chemical properties, mass spectra • Chemical Entities of Biological Interest (ChEBI) – 48'296 compounds, natural and synthetic atom, molecule, ion, radical, conformer • DrugBank – 8,261 drugs, 4,164 targets, 243 Enzymes, 118 Transporters, drug (chemical, pharmaceutical), drug target (sequence, structure, pathway) • PubChem – 89'124'401 Compounds, compound neighbouring, sub/superstructure, bioactivity data • Aggregated Computational Toxicology Resource (ACToR) – more than 500 public source , environmental chemicals searchable by name and structure • ClinicalTrials – 213'868 studies , offers information for locating clinical trials for diseases and conditions • TOXicology Data NETwork (TOXNET) – toxicology, hazardous chemicals, environmental health and related areas 17
  • 18. Disease Specific Databases for Prevention• Colon Chemoprevention Agents Database (CCAD) – 1,137 agents and literature data for colon chemoprevention in human, rats, mice • Dietary Supplements Labels Database – 5'000 brands of dietary supplements to compare label ingredients in different brands. Links to other databases such as MedlinePlus and PubMed • REPAIRtoire Database – DNA damage links, pathways, proteins for DNA re-pair, diseases related to mutations • Pubmed – journal citations i.e. Primary source of information for bio-medical researchers • PubMed Dietary Supplement Subset – dietary supplement literature including vitamin, mineral, botanical/herbal supplements 18 Literature Databases
  • 19. Statistical overview of implementation details of libraries and databases (selected) Database Category Year* Topic Implementation Size/ Stats PubMed Literature 1996 Biomedical Literature WebBased/ CSV 11 M Journal citations PDSS Literature 1999 Citations of dietary supplement WebBased X DSLD Chemoprevention 2013 Ingredients of dietary supplement WebBased > 5000 selected brands ClinicalTrials Toxicity 2000 Clinical Trials WebBased 213,868 studies TOXNET Toxicity 1987 Toxicology Database WebBased X ACToR Compound 2008 Chemical Toxicity Data WebBased >500 public sources DrugBank Compound 2008 Drug Data WebBased/LOD 8206 drugs ChEBI Compound X Small Molecular entities WebBased/LOD 48,296 compounds PubChem Compound 2004 Compound Structure WebBased/LOD 89,124,401 compounds ChemSpider Chemical 2007 Compound Structure WebBased >40 million structures KEGG Pathway 1995 Genomic, Chemical, systemic WebBased/LOD 432883pathway maps Reactome Pathway 2003 Pathways WebBased 9386 proteins Wikipathway Pathway 2007 Biological pathways WebBased 2475 pathways cPath Pathway 2005 Biological pathways Desktop/WebBased 31698 pathways Uniprot Protein 2002 Protein Sequence WebBased/LOD 63686057sequences PDB Protein 1971 3D structural data of Proteins WebBased/LOD 30,047protein *Statistics as of Aug 2016 - Year specify the time when the last- most recent version is produce. “X” means information not available. 19
  • 20. Life Sciences Linked Open Data Cloud • Linked biomedical datasets relevant in a Cancer Chemoprevention and drug discovery scenario: – Linked Open Drug Data (LODD) • Set of linked datasets relevant to Drug Discovery that includes data from several datasets including Drugbank, LinkedCT, DailyMed, Diseasome, SIDER, STITCH, Medicare, RxNorm, ClinicalTrials.gov, NCBI Entrez Gene and OMIM. – Bio2RDF • Contains multiple linked biological databases including pathways databases such as KEGG, PDB and several NCBIs databases. An open-source project that uses Semantic Web technologies to build and provide the largest network of Linked Data for the Life Sciences. – LinkedLifeData • A semantic data integration platform for the biomedical domain containing 5 billion RDF statements from various sources including UniProt, PubMed, EntrezGene and 20 more. 20
  • 21. The Linked Open Data Cloud “Life sciences will drive adoption of the Semantic Web, just as high-energy physics drove the early Web.” - Sir Tim Berners-Lee, 2005 Proteins Molecules Genes Diseases 21
  • 22. Meaningful Biomedical Correlation Proteins Molecules Genes Diseases :Protein :Molecule :Gene :Disease Uniprot PDB Pfam PROSITE ProDom Uniref UniPark Daily medDrug Bank ChemBL Pub Chem KEGG Gene Ontology GeneID Affy metrix Homo gene MGI Disea some SIDER 22
  • 23. Statistical overview of datasets involved in LS- LOD, Bio2RDF and LLD (selected) Dataset Category Year* Topic Size/ Coverage Drugbank LODD 2010 Drugs 766920 triples, 4800 drugs LinkedCT LODD X Clinical Trials 25 M triples, 106000 trials DailyMed LODD 2010 Drugs 1604983 triples, >36K products Dbpedia LODD 2009 Drugs/ Diseases/Proteins 218M triples, 2300 drugs, 2200 proteins Diseasome LODD 2010 Diseases/ Genes 91182 triples, 2600 genes SIDER LODD 2010 Diseases/ Side Effects 192515 triples, 63K effects, 1737 genes STITCH LODD 2010 Chemicals/ Proteins 7.5 M chemicals, 0.5 M proteins ChEMBLE LODD 2010 Assay/ Proteins/ Organisms 130 M triples Affymetrix Bio2RDF 2014 Microarrays 8694237 triples, 6679943 entities BioModels Bio2RDF 2014 Biological/ mathematical models 2380009 triples, 188308 entities BioPortal Bio2RDF 2014 Biological/ biomedical entities 19920395 triples, 2199594 entities KEGG Bio2RDF 2014 Genes 50197150 triples, 6533307 entities PharmaG-KB Bio2RDF 2014 Genotypes/ Phenotypes 278049209 triples, 25325504 entities PubMed Bio2RDF 2014 Citations 5005343905 triples, 412593720 entities Taxonomy Bio2RDF 2014 Taxonomy 21310356 triples, 1147211 entities LLD LLD 2014 Drugs, Chromosomes 10192641644 statements *Statistics as of Aug 2016 (source DataHub) - Year specify the time when the last- most recent version is produced. “X” means information not available. 23
  • 24. Triples vs. Unique Entities (selected LS-LOD datasets) 24 86942371 6679943 2380009 188380 19920395 2199594 409942525 50061452 98835804 7337123 326720894 19768641 8801487 530538 3672531 316950 73048 6995 11663 1129 97520151 5950074 3628205 372136 7189769 869985 2323345 176579 3306107223 364255265 48781511 3110993 50197150 6533307 2174579 59776 55914 5032 7323864 305401 # OF TRIPLES # OF UNIQUE ENTITIES [affymetrix] [biomodels] [bioportal] [chembl] [clinicaltrials] [ctd] [dbsnp] [drugbank] [genage] [gendr] [goa] [hgnc] [homologene] [interpro] [iproclass] [irefindex] [kegg] [linkedspl] [lsr] [mesh]
  • 25. [2]: Zeginis, D., et al.: A collaborative methodology for developing a semantic model for interlinking Cancer Chemoprevention linked-data sources. Semantic Web (2013) [3]: Hasnain, A.’ et al.: Linked Biomedical Dataspace: Lessons Learned integrating Data for Drug Discovery. In: International Semantic Web Conference (In-Use Track), October 2014 (2014) Related Work (selected) • Zeginis et al. [2] proposed “meet-in-the-middle” approach to develop the semantic model relevant for cancer chemoprevention. Relevant data was analysed in a bottom-up fashion from analysing the domain whereas a top-down approach was considered to collect ontologies, vocabularies and data models. • Hasnain et al. [3] proposed Linked Biomedical Dataspace (access and use biomedical resources relevant for cancer chemoprevention) with components namely: – a) knowledge extraction, – b) link creation, – c) query execution and – d) knowledge publishing. 25
  • 26. Conclusion • In this paper we introduce and classify different tiers of biomedical Data relevant to Cancer Chemoprevention and Drug Discovery domain. • This involves Ontologies, databases and Life Science Linked Open Data in Healthcare, Life Sciences and Biomedical Domain • We classify ontologies into three main classes: – i) biomedical Ontologies (e.g. EFO, OBI, GO etc), – ii) Drugs and Chemical Compound Ontologies (e.g. RxNorm) and – iii) Generic and Upper Ontologies (e.g. BFO, RO, PROV). • Similarly we categorise libraries and databases in five categories: – (i) Gene, Gene Expression and Protein Databases, – (ii) Pathway databases, – (iii) Chemical and Structure Databases including Biological Activities, – (iv) Disease Specific Databases for Prevention, and – (v) Literature databases. 26

Editor's Notes

  1. We live in a world of data!
  2. Link to next slide is –Linked data is the faclitates complex queries and workflows to be assembled
  3. To discovery which links could our datasets have to other datasources, we’ve explored what types of data are published in the linked open data cloud. What we found was a lot of messy data – looking at 8 datasets containing molecular data, their descriptions are very different; chebi calls molecules compounds, drugbank calls them drugs, dailymed calls them drugs as well but uses a different identifier. Link – how to start linking all of these datasets such that they can be made available in a unified query interface?
  4. EGFR: Epidermal growth factor receptor
  5. BioMedical Ontologies: Advancing Clinico-Genomic Trials on Cancer (ACGT) Master Ontology (MO Biological Pathway Exchange (BioPAX) Experimental Factor Ontology (EFO) Gene Ontology (GO Medical Subject Headings (MeSH Microarray Gene Expression Data Ontology (MGED National Cancer Institute (NCI) Ontology for biomedical Investigations (OBI Unified Medical Language System (UMLS) Drugs and Chemical Compound Ontologies: RxNorm Generic and Upper Ontologies: Basic Formal Ontology (BFO OBO Relation Ontology (RO) Provenance Ontology (PROVO)
  6. Literature Databases: Pubmed PubMed Dietary Supplement Subset Natural Sources of Chemoprevention Agents Databases: Dietary Supplements Labels Database Toxicity and Efficacy Databases: ClinicalTrials TOXicology Data NETwork (TOXNET) Biological Activity of Compounds Databases Aggregated Computational Toxicology Resource (ACToR) DrugBank Chemical Entities of Biological Interest (ChEBI) PubChem Repartoire Database Gene Expression Databases: Cancer Gene Expression Database (CGED) ArrayExpress Gene Expression Omnibus (GEO) Gene and DNA Databases: GenBank Chemical and Physical Structure Databases: ChemSpider Chemical Compounds Database (Chembase) Sigma-Aldrich ChemDB Disease Specific Compound Databases: Colon Chemoprevention Agents Database Pathway Databases: Kyoto Encyclopedia of Genes and Genomes (KEGG) Reactome Wikipathways\footnote cPath: Pathway Database Software\footnote Protein Databases: Universal Protein Resource (UniProt) Protein Data Bank (PDB) Protein Database
  7. M: when data is catalogues, we can discovering new links by crossreferencing with existing datasets -> once we identify these concepts, how do we actualy query them toegether?