Open Targets, identifying targets for drug development in the treatment of diseases.

Open Targets: integrating
genetics and genomics for
disease biology and
translational medicine

What is the Open Targets
Partnership?
How to navigate the
Open Targets Platform?
Aims
Are there other Open
Targets tools?
Where do I get
help?

Source: PhRMA adaptation based on Tufts CSDD & School of Medicine, and FDA
Lengthy, costly, low success rate, HIGH ATTRITION RATES
Drug discovery: some challenges

Public databases for drug discovery
• EMBL-EBI (European Bioinformatics Institute)
• Elsewhere

Fit everything together
• Time consuming
• Possible lack of resources or expertise
• …

I wish I did not have to go to all
those different places to get the
information I’m after.
I know. If we only had a one-stop
shop with as much data as possible,
plus new analyses and links to the
original source for my own
assessment.
A resource that is
comprehensive, trustworthy,
up-to-date, sustainable,
easy-to-use and free.
Open Targets is all you need!

Our Vision
A partnership to transform drug discovery
through the systematic identification and
prioritisation of targets
https://www.opentargets.org
2014 2016 2017 2018

Data
generation
Therapeutic
hypothesis
Public data
Data
integration
targetvalidation.org
Experimental projects Bioinformatic projects
Virtuous cycle in Open Targets
www.opentargets.org/projects
Concurrent

Open Targets generates data
www.opentargets.org/science/
• EMBL-EBI and Sanger Institute
• > 1,000 cancer cell lines + drug sensitivity data
• RNASeq, CRISPR screens
• Sanger Institute and GSK
• Genome wide knockouts in gut epithelium
• Organoids, metagenomics
• Alzheimer’s and Parkinson’s
• CRISPR screens, iPS cells
• Sanger Institute, Biogen, Gurdon

Open Targets integrates data*
• EMBL-EBI, Biogen, GSK
• Associations between targets and diseases
• Germline variants
• Somatic mutations
• Drug information
• RNA expression
• Animal models
• Text mining
* Publicly available resources
In addition to upcoming Open Targets experimental data

Open Targets Platform
https://www.targetvalidation.org
• Associations between targets and diseases…

• Ensembl Gene IDs e.g. ENSGXXXXXXXXXXX
• UniProt IDs e.g P15056
• HGNC names e.g. DMD
• Also non-coding RNA genes
Our targets  genes or proteins

• Modified version of Experimental Factor Ontology (EFO)
• Controlled vocabulary (Alzheimers versus Alzheimer’s)
• Hierarchy (relationships)
Our diseases
• Promotes consistency
• Increases the richness of annotation
• Allow for easier and automatic integration

Evidence for our T-D associations
https://docs.targetvalidation.org/data-sources/data-sources

Data sources grouped into data types
Genetic
Associations
Somatic
Mutations
Drugs
Affected
Pathways
Differentia
l RNA
expression
Animal
Models
Text
Mining
EVA
GWAS
Catalog
PheWA
S
Cancer Gene
Census
EVA
Expression Atlas PhenoDigm
Europe
PMC
G2P

How the data* flows
JSON
summary
document
Validator Association
score
calculation
Target Profile
Disease profile
* e.g. genetic variants from NHGRI-GWAS catalog

JSON summary document
* IDs (gene, disease, papers) + curation (e.g. manual) + evidence + source + stats for the score
JSON

Association score
Which targets have more
evidence for an association?
What is the relative weight of the
evidence for different targets?

Statistical integration, aggregation and scoring
Four-tier scoring framework
https://docs.targetvalidation.org/getting-started/scoring
A) per evidence (e.g. one SNP from a GWAS paper)
B) per data source (e.g. SNPs from the GWAS catalog)
C) per data type (e.g. Genetic associations)
D) overall

EVA
UniProt
Gene2Phenotype
GWAS catalog
Cancer Gene Census
EVA (somatic)
IntOGen
ChEMBL
Reactome
Expression Atlas
Europe PMC
PhenoDigm
Genetic associations
Somatic mutations
RNA expression
Animal models
Affected pathways
Text mining
Drugs
*1.0
*1.0
*1.0
*1.0
*1.0
*1.0
*1.0
*1.0
*1.0
*0.5
*0.2
*0.2
Association
S1 + S2/22
+ S3/32
+ S4/42
+ Si/i2
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
Genomics England
PhEWAS catalog
*1.0
*1.0
ΣH
ΣH
Calculated at 4
levels:
•Evidence
•Data source
•Data type
•Overall
Score: 0 to 1 (max)
weight factor
Aggregation with
(harmonic sum)
ΣH
Note: Each data set has
its own scoring and
ranking scheme
Aggregating data  harmonic sum

f = sample size (cases versus controls)
s = predicted functional consequence (VEP)
c = p value reported in the paper
Factors affecting the relative strength of an evidence
e.g. GWAS Catalog
S = f * s * c
f, relative occurrence of a target-disease evidence
s, strength of the effect described by the evidence
c, confidence of the observation for the target-disease evidence
https://docs.targetvalidation.org/getting-started/scoring

Ranking target-disease association
Association score: the overall score across all data types
• Based on the data sources
• Different weight applied:
genetic association = drugs = mutations = pathways > RNA expression > animal models = text mining

https://www.targetvalidation.org/
Demo 1: Disease centric workflow
What is the evidence for the
association between a target
and a disease?
Which targets are
associated with a disease?
Pages 7 - 30

In addition to T-D associations
• Everything you wanted to know about…
… but were afraid to ask.
Disease
profile page
Target profile
page

Target profile page*
Protein Drugs
Pathways
interactions
RNA and
protein
baseline
expression
Variants,
isoforms
and
genomic
context
Mouse
phenotypes
Bibliography
Description
Synonyms
Gene Ontology
Protein Structure
Protein Interactions
Similar
Targets
Expression
Atlas
Library/LINK
Extra, extra, extra!
Cancer hallmarks and cancer biomarkers
Gene tree
* e.g. http://www.targetvalidation.org/target/ENSG00000141510

Classification Drugs
Similar
diseases
Bibliography
Open Targets
Library/LINK
Disease profile page*
* e.g. http://www.targetvalidation.org/disease/Orphanet_262

We have a list of 26 possible
targets for inflammatory
bowel disease?
https://tinyurl.com/batch-video
Demo 2: Batch search
Are these targets represented
in other diseases?
Which pathways are
represented in this set of
targets?

https://api.opentargets.io/v3/platform/public/evidence/filter?
target=ENSG00000141867&disease=EFO_0000565&datatype=expression_atla
s&size=100&format=json
REST API calls: some examples*
https://api.opentargets.io/v3/platform/public/search?q=EFO_0003767
https://api.opentargets.io/v3/platform/public/association/filter?
target=ENSG00000110324&direct=false&fields=is_direct&fields=disease.efo_info.la
bel&size=100
* blog.opentargets.org/tag/api/
https://api.opentargets.io/v3/platform/public/search?q=asthma

Breaking down the URLs
https://api.opentargets.io/v3/platform/ public/association/filter
?target=ENSG00000163914&size=10000&fields=target.id&fields=disease.id
Server
Endpoint parameters
Parameters
https://api.opentargets.io/v3/platform/public/association/filter?
target=ENSG00000163914&size=10000&fields=target.id&fields=disease.id

http://api.opentargets.io/v3/platform/docs#
The documentation
Private: methods used by the UI to serve external data.
Subject to change without notice
http://api.opentargets.io/v3/platform/docs#

How to search
http://api.opentargets.io/v3/platform/docs
REST API: some use cases
How to get all diseases
associated with a target
How to get the association score
for a target – disease pair
How to get the evidence for a
target – disease association

How to search
https://api.opentargets.io/v3/platform/public/search?q=PTEN

http://api.opentargets.io/v3/platform
/public/association/filter?
target=ENSG00000171862
&direct=true
How to get all diseases associated with a target

http://api.opentargets.io/v3/platform/
public/association?id=
ENSG00000171862-EFO_0000616
How to get the score for a target – disease pair

How to get the evidence for an association
http://api.opentargets.io/v3/platform/
public/evidence/filter
?target=ENSG00000171862
http://api.opentargets.io/v3/platform
/public/evidence/filter?
target=ENSG00000171862
&disease=Orphanet_2563

Introduction: REST API webinar
https://youtu.be/KQbfhwpeEvc

How to run our REST endpoints
* http://opentargets.readthedocs.io/en/stable/index.html
• Paste the URL in the location bar in a browser
• Use the terminal window (e.g. with CURL command)
• Use our free clients (i.e. Python* and R**)
• Call them from your own application/workflow
** no longer supported by Open Targets; feel free to make PRs

Paste the URL in the a location bar

Command line e.g. CURL –X GET

Python and R clients for the REST API
http://opentargets.readthedocs.iohttp://opentargets.readthedocs.io
No longer supported by
Open Targets
No longer supported by
Open Targets

Can we change the way the
associations are scored? Perhaps
to increase the weight on text
mining data?
Yes, you can with the Open Targets Python client!

Open Targets toolkit: LINK
• LINK: Literature coNcept Knowledgebase
• Subject / predicate / object structured relations
From PubMed abstracts
Proof of Concept
Further developement
http://link.opentargets.io/

Addressing text mining shortcomings
• Entities: genes, diseases, drugs
• Concepts extracted via NLP
(Natural Language Processing)
• 28 M documents, 500 M relations
• http://blog.opentargets.org/link/

Open Targets toolkit: DoRothEA
• Candidate TF-drug interactions in cancer
• 1000 cancer cell lines
• 265 anti-cancer compounds
• 127 transcription factors
http://cancerres.aacrjournals.org/content/early/2017/12/09/0008-5472.CAN-17-1679
dorothea.opentargets.io

Example: Rapamycin
• ~ 1000 cancer cell lines
• 265 anti-cancer compounds
• 127 transcription factors

• Resource of integrated multiomics data
• Added value (e.g. score) and links to original sources
• Graphical web interface: easy to use
April 2018 release
Open Targets Platform
21K
targets
9.7K
diseases
2.3 M
associations
6.1 M
evidence

We support decision-making
Which targets are
associated with a
disease?
Are there FDA drugs
for this association?
…
Can I find out about the
mechanisms of the
disease?

How to access the Platform
Core bioinformatics
pipelines
www.opentargets.org/projects
Experimental
projects
Generate new evidence
CRISPR/Cas9
Organoids and IPS cells
(cellular models for disease)
Integration of available data
Web interface
Batch search tool
REST API
Data dumps
Main data store
Elasticsearch
Angular JS
Web App*
Public
Access
REST
API**
* UI: first released in December 2015
** API first release in April 2016
https://www.targetvalidation.org
https://api.opentargets.io

Our breakthrough paper
http://nar.oxfordjournals.org/content/45/D1/D985.long
http://www.narbreakthrough.com/

blog.opentargets.org/
@targetvalidate
support@targetvalidation.org
http://tinyurl.com/opentargets-in
Help!
https://tinyurl.com/opentargets-youtube
https://docs.targetvalidation.org/

Data sources: GWAS catalog
• Genome Wide Association Studies
• Array-based chips  genotyping 100,000 SNPs genomewide

Details on data sources to associate
targets and diseases
Extra slides

Data sources: UniProt
• Protein: sequence, annotation, function
• Manual curation of coding variants in patients
EMBL-EBI train online

• Variants, genes, phenotypes in rare diseases
• Literature curation  consultant clinical geneticists in the UK
Data sources: Gene2Phenotype

Data sources: PheWAS
• Phenome Wide Association Studies
• A variant associated with multiple phenotypes
• Clinical phenotypes derived from EMR-linked biobank BioVU
• ICD9 codes mapped to EFO

Data sources: GE PanelApp
• Aid clinical interpretation of genomes for the 100K project
• We include ‘green genes’ from version 1+ and phenotypes

Data sources: EVA
• With ClinVar information for rare diseases
• Clinical significance: pathogenic, protective

Data sources: The Cancer Gene Census
• Genes with mutations causally implicated in cancer
• Gene associated with a cancer plus other cancers associated
with that gene

Data sources: IntOGen
• Genes and somatic (driver) mutations, 28 cancer types
• Involvement in cancer biology
• Rubio-Perez et al. 2015

Data sources: ChEMBL
• Known drugs linked to a disease and a known target
• FDA approved for clinical trials or marketing

Data sources: Reactome
• Biochemical reactions and pathways
• Manual curation of pathways affected by mutations

Data sources: SLAPenrich
• 374 pathways curated and mapped to cancer hallmarks
• Divergence of the total number of cancer samples with
genomic alterations
• Mutational burden and total exonic block length of genes

Data sources: PROGENy
• Comparison of pathway activities between normal and primary
samples from The Cancer Genome Atlas
• Inferred from RNA-seq: 9,250 tumour and 741 normal samples
• EGFR, hypoxia, JAK.STAT, MAPK, NFkB, PI3K, TGFb, TNFa,
Trail, VEGF, and p53

Data sources: Expression Atlas
• Baseline expression for human genes
- target profile page
• Differential mRNA expression (healthy versus diseased):
- target-disease associations

Data sources: Europe PMC
• Mining titles, abstracts, full text in research articles
• Target and disease co-occurrence in the same sentence
• Dictionary (not NLP)

Data sources: PhenoDigm
• Semantic approach to associate mouse models with diseases

Aggregating scores across the data
• Using a mathematical function, the harmonic sum*
where S1,S2,...,Si are the individual sorted evidence scores in descending order
* PMID: 19107201, PMID: 20118918
• Advantages:
A) account for replication
B) deflate the effect of large amounts of data e.g. text mining

Target-Disease Association Score
EuropePMC
(Text Mining)
UniProt
(Manual Curation)
ChEMBL
(Manual Curation)
Overall
VERY simplified diagram

Open Targets, identifying targets for drug development in the treatment of diseases.

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Open Targets, identifying targets for drug development in the treatment of diseases.

Similar to Open Targets, identifying targets for drug development in the treatment of diseases. (20)

More from Denise Carvalho-Silva, PhD

More from Denise Carvalho-Silva, PhD (6)

Recently uploaded

Recently uploaded (20)

Open Targets, identifying targets for drug development in the treatment of diseases.

Editor's Notes