DisGeNET: a discovery platform to support translational research and drug discovery

DisGeNET-RDF: a GDA Linked Open Data resource
DisGeNET: a discovery platform to support translational research and drug discovery
Janet Piñero, Núria Queralt-Rosinach, Àlex Bravo, Ferran Sanz and Laura I. Furlong
Integrative Biomedical Informatics Group, Research Programme on Biomedical Informatics; Hospital del Mar Medical Research Institute; Pompeu Fabra University
Acknowledgements
The authors thank the Open PHACTSpartners, MichelDumontierand the OpenLinkstaff for their input, collaborationand help.
Funding: We received support from ISCIII-FEDER (PI13/00082, CP10/00524), from the IMI-JU under grants agreements nº 115002 (eTOX), nº
115191 (Open PHACTS)], nº 115372 (EMIF) and nº 115735 (iPiE), resources of which are composed of financial con-tribution from the European
Union's Seventh Framework Pro-gramme (FP7/2007-2013) and EFPIA companies’ in kind contribu-tion, and the EU H2020 Programme 2014-2020
under grant agreements no. 634143 (MedBioinformatics) and no. 676559 (Elixir-Excelerate). The Research Programme on Biomedical Informatics
(GRIB)is a node of the Spanish National Institute of Bioinformatics(INB).
DisGeNET: Disease-Gene NETwork of relations for discovery
DATA
DISCOVERY
KNOWLEDGE BASE
TOOLS FOR EXPLORATION AND ANALYSIS
Motivation: Better understanding of human gene component and disease mechanisms for translational research and
drug discovery and development.
Challenge: One of the major current bottlenecks for knowledge discovery on the genetic component of diseases is
that the information is fragmented. The vast amount of biomedical information about genotype-phenotype relations
is distributed in several databases, represented and annotated using different data models, vocabularies and
standards, and it is domain and technology-specific, which hampers their access, integration, analysis, and
interpretation.
Approach: DisGeNET Discovery Platform1 collects and integrates the available information on gene-disease
associations (GDAs), covering the whole spectrum of human diseases, and using standards for their annotation and
representation.
DisGeNET in the LOD cloud for translational research
• DisGeNET + external multidomain
sources in LOD.
• It is interlinked to other biomedical
databases to answer scientific
questions that need the interrogation
of cross-domain resources.
• It aims to support the development
of bioinformatic Semantic Web
applications to extract key knowledge
on the molecular mechanisms of
diseases.
Implementation: The platform is composed of a knowledge base and a set of tools for data analysis and interpretation.
EVIDENCE-BASED DISCOVERY
CLINICIAN
INTEROPERABILITY
METADATA
DATABASES&
LITERATURE STANDARDS
INTEGRATION
OPEN
http://www.disgenet.org/
RESEARCHER
CURATOR
BIOINFORMATICIAN &
DEVELOPER
DISCOVERABILITY COMMUNITY USE
LARGE-SCALE EXTRACTION AND INTEGRATION
DIGITAL PUBLICATION,
SHARING AND LINKING
Usage stats (Ago2014-Ago2015):
• 12,040 users, 22,696 sessions
• 14,494 downloads
• DisGeNET used in +20 publications,
cited in +60 articles
• Other Projects: PubAnnotation,
OpenLifeData
Registered:
• biosharing
• OMICtools
• NeuroLex
• Datahub
Present in the Semantic Web:
• URI/RDF/nanpublications
• Machine-processable
• Semantic integration
• Links to the Linked Open Data (LOD)
cloud
• Data analysis across domains
SEMANTIC WEB
What is the tissue expression pattern of the genes associated to Obesity?
• Large-scaleintegration across domains
• 17,181 Genes
• PANTHER class
• 14,610 Diseases
• MeSH class
60% complex,36%rare/Mendelian,
and 4% infectiousdiseases
DO MSH OMIM NCI ORDO ICD9
19 58 38 33 13 12
TRACK OF EVIDENCE
S = WCURATED + WPREDICTED + WLITERATURE
• Provenance(PubMed ID, source)
• DisGeNETscore (evidence)
Web:
http://www.disgenet.org/
RDF:
http://rdf.disgenet.org/
SPARQL:
http://rdf.disgenet.org/sparql/
Open PHACTS API:
https://dev.openphacts.org
ACCESS
Open Database License:
http://opendatacommons.org/licenses/odbl/1.0/
Downloads:
• Tab separated plain text
• SQLite
• RDF
• Trusty nanopublications
Webinterface
SPARQL endpoint / Linked Data browser
Open PHACTS Discovery Platform
Nanopublication network
disGeNET2R R package
DIFFERENT USER PROFILES AVAILABILITY
Metadata:
• data-item description
• dataset description
Programmatic access:
• Automatic analysis
• Higher speed
• Reduce error
• Share results
• Embed in workflows
REPRODUCIBILITY
Several formats and
models
Transparency and
Validation
SOURCES
Recentfindings
429,111 Gene-Disease Associations
Sentence description
NORMALIZATION
HARMONIZATION
• NCBI Gene ID
• UMLS CUIs.
DisGeNET association type ontology
INTEROPERABILITY
SYNTACTIC
COMMON IDs and ONTOLOGIES
SEMANTIC
• 11 common ontologies in
• RDF2
• Nanopublications3
• GENE:
• DISEASE
STANDARDIZATION
Digital objects
DisGeNET association type ontology
Semanticscience Integrated Ontology
(SIO)4
• Normalized Identification Scheme
http://rdf.disgenet.org/resource/gda/ + ID
http://lod-cloud.net/;Aug2014
4,962,315 RDF links to RDF datasets in the LOD
https://datahub.io/dataset/disgenet
(morestatistics)
LOD cloud RDFIZATION
METADATA
RDF
INTERLINKING
• Dataset (Open PHACTS + )
• Linksets (Open PHACTS + )
• Use Open PHACTS guidelines
• Dereferenceable URIs (primary or
)
• SIO
•
• )
OWL
• NCBI Gene ID
• PANTHER Classification
• UMLS CUIs
• MeSH Classification
• Data providers
• Disease annotation in the Open PHACTS Discovery Platform5
• OMIM included
• > 20 000 000 of triples
RDF SCHEMA METADATA INTERLINKING
• Linksets providers
• > 70 000 number of linksets
FUTURE
New data:
• Disease-phenotype associations (HPO)
• New use cases
• New API calls
Score:
• Add to API calls
EXPLORER KNIME
More @ http://www.disgenet.org/web/DisGeNET/menu/rdf#sparql-queries-2
MAPPINGS TO OTHER DISEASE TERMINOLOGIES
DRUG
TARGETPATHWAY
DISEASE
DISEASE PHENOTYPE
DISEASE GENE
GDA
EVIDENCE SNPSCORE
Gene-disease association as entity
• Data item
• Dataset
<disease>
<void:inDataset><dgn-void:disease-dataset>
http://www.myexperiment.org/groups/1125.html
API
References
1. Piñero, J., Queralt-Rosinach, N., Bravo, A., Deu-Pons, J., Bauer-Mehren, A., Baron, M., … Furlong, L. I. (2015). DisGeNET: a
discovery platform for the dynamical exploration of human diseases and their genes. Database, 2015(0),bav028–bav028.
2. Queralt-Rosinach, N., Piñero,J. , Bravo, À, Sanz, F. and Furlong, L.I. DisGeNET-RDF: harnessing the innovative power of the
Semantic Web to explore the genetic basis of diseases, 2015 (submitted).
3. Queralt-Rosinach, N., Kuhn, T., Chichester, C., Dumontier, M., Sanz, F., and Furlong, L.I., Publishing DisGeNET as
Nanopublications. Semantic Web Journal, (to appear), 1-10, 2015.
4. Dumontier, M., Baker, C. J., Baran, J., Callahan, A., Chepelev, L., Cruz-Toledo, J., … Hoehndorf, R. (2014). The Semanticscience
Integrated Ontology (SIO) for biomedical research and knowledge discovery. Journal of Biomedical Semantics, 5(1), 2014.
5. Gray, A. J. G., Groth, P., Loizou, A., Askjaer, S., Brenninkmeijer, C., Burger, K., … Williams, A. J. (2014, January 1). Applying linked
data approaches to pharmacology: Architectural decisions and implementation. Semantic Web. IOS Press. doi:10.3233/SW-
2012-0088
• GDAs described by SIO
https://dev.openphacts.org
/disease/getTargets
http://rdf.disgenet.org/void-v3.0.0.ttl
Which compounds target proteins associated with Parkinson's disease or Alzheimer's disease?
DisGeNET in the Open PHACTS Discovery Platform for drug discovery and development

DisGeNET: a discovery platform to support translational research and drug discovery

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to DisGeNET: a discovery platform to support translational research and drug discovery

Similar to DisGeNET: a discovery platform to support translational research and drug discovery (20)

Recently uploaded

Recently uploaded (20)

DisGeNET: a discovery platform to support translational research and drug discovery