2. What is the Open Targets
Partnership?
How to navigate the
Open Targets Platform?
Aims
Are there other Open
Targets tools?
Where do I get
help?
3. Source: PhRMA adaptation based on Tufts CSDD & School of Medicine, and FDA
Lengthy, costly, low success rate, HIGH ATTRITION RATES
Drug discovery: some challenges
4. Public databases for drug discovery
• EMBL-EBI (European Bioinformatics Institute)
• Elsewhere
6. I wish I did not have to go to all
those different places to get the
information I’m after.
I know. If we only had a one-stop
shop with as much data as possible,
plus new analyses and links to the
original source for my own
assessment.
A resource that is
comprehensive, trustworthy,
up-to-date, sustainable,
easy-to-use and free.
Open Targets is all you need!
7. Our Vision
A partnership to transform drug discovery
through the systematic identification and
prioritisation of targets
https://www.opentargets.org
2014 2016 2017 2018
9. Open Targets generates data
www.opentargets.org/science/
• EMBL-EBI and Sanger Institute
• > 1,000 cancer cell lines + drug sensitivity data
• RNASeq, CRISPR screens
• Sanger Institute and GSK
• Genome wide knockouts in gut epithelium
• Organoids, metagenomics
• Alzheimer’s and Parkinson’s
• CRISPR screens, iPS cells
• Sanger Institute, Biogen, Gurdon
10. Open Targets integrates data*
• EMBL-EBI, Biogen, GSK
• Associations between targets and diseases
• Germline variants
• Somatic mutations
• Drug information
• RNA expression
• Animal models
• Text mining
* Publicly available resources
In addition to upcoming Open Targets experimental data
12. • Ensembl Gene IDs e.g. ENSGXXXXXXXXXXX
• UniProt IDs e.g P15056
• HGNC names e.g. DMD
• Also non-coding RNA genes
Our targets genes or proteins
13. • Modified version of Experimental Factor Ontology (EFO)
• Controlled vocabulary (Alzheimers versus Alzheimer’s)
• Hierarchy (relationships)
Our diseases
• Promotes consistency
• Increases the richness of annotation
• Allow for easier and automatic integration
14. Evidence for our T-D associations
https://docs.targetvalidation.org/data-sources/data-sources
15. Data sources grouped into data types
Genetic
Associations
Somatic
Mutations
Drugs
Affected
Pathways
Differentia
l RNA
expression
Animal
Models
Text
Mining
EVA
GWAS
Catalog
PheWA
S
Cancer Gene
Census
EVA
Expression Atlas PhenoDigm
Europe
PMC
G2P
16. How the data* flows
JSON
summary
document
Validator Association
score
calculation
Target Profile
Disease profile
* e.g. genetic variants from NHGRI-GWAS catalog
18. Association score
Which targets have more
evidence for an association?
What is the relative weight of the
evidence for different targets?
19. Statistical integration, aggregation and scoring
Four-tier scoring framework
https://docs.targetvalidation.org/getting-started/scoring
A) per evidence (e.g. one SNP from a GWAS paper)
B) per data source (e.g. SNPs from the GWAS catalog)
C) per data type (e.g. Genetic associations)
D) overall
20. EVA
UniProt
Gene2Phenotype
GWAS catalog
Cancer Gene Census
EVA (somatic)
IntOGen
ChEMBL
Reactome
Expression Atlas
Europe PMC
PhenoDigm
Genetic associations
Somatic mutations
RNA expression
Animal models
Affected pathways
Text mining
Drugs
*1.0
*1.0
*1.0
*1.0
*1.0
*1.0
*1.0
*1.0
*1.0
*0.5
*0.2
*0.2
Association
S1 + S2/22
+ S3/32
+ S4/42
+ Si/i2
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
ΣH
Genomics England
PhEWAS catalog
*1.0
*1.0
ΣH
ΣH
Calculated at 4
levels:
•Evidence
•Data source
•Data type
•Overall
Score: 0 to 1 (max)
weight factor
Aggregation with
(harmonic sum)
ΣH
Note: Each data set has
its own scoring and
ranking scheme
Aggregating data harmonic sum
21. f = sample size (cases versus controls)
s = predicted functional consequence (VEP)
c = p value reported in the paper
Factors affecting the relative strength of an evidence
e.g. GWAS Catalog
S = f * s * c
f, relative occurrence of a target-disease evidence
s, strength of the effect described by the evidence
c, confidence of the observation for the target-disease evidence
https://docs.targetvalidation.org/getting-started/scoring
22. Ranking target-disease association
Association score: the overall score across all data types
• Based on the data sources
• Different weight applied:
genetic association = drugs = mutations = pathways > RNA expression > animal models = text mining
23. https://www.targetvalidation.org/
Demo 1: Disease centric workflow
What is the evidence for the
association between a target
and a disease?
Which targets are
associated with a disease?
Pages 7 - 30
24. In addition to T-D associations
• Everything you wanted to know about…
… but were afraid to ask.
Disease
profile page
Target profile
page
25. Target profile page*
Protein Drugs
Pathways
interactions
RNA and
protein
baseline
expression
Variants,
isoforms
and
genomic
context
Mouse
phenotypes
Bibliography
Description
Synonyms
Gene Ontology
Protein Structure
Protein Interactions
Similar
Targets
Expression
Atlas
Library/LINK
Extra, extra, extra!
Cancer hallmarks and cancer biomarkers
Gene tree
* e.g. http://www.targetvalidation.org/target/ENSG00000141510
28. We have a list of 26 possible
targets for inflammatory
bowel disease?
https://tinyurl.com/batch-video
Demo 2: Batch search
Are these targets represented
in other diseases?
Which pathways are
represented in this set of
targets?
30. Breaking down the URLs
https://api.opentargets.io/v3/platform/ public/association/filter
?target=ENSG00000163914&size=10000&fields=target.id&fields=disease.id
Server
Endpoint parameters
Parameters
https://api.opentargets.io/v3/platform/public/association/filter?
target=ENSG00000163914&size=10000&fields=target.id&fields=disease.id
32. How to search
http://api.opentargets.io/v3/platform/docs
REST API: some use cases
How to get all diseases
associated with a target
How to get the association score
for a target – disease pair
How to get the evidence for a
target – disease association
36. How to get the evidence for an association
http://api.opentargets.io/v3/platform/
public/evidence/filter
?target=ENSG00000171862
http://api.opentargets.io/v3/platform
/public/evidence/filter?
target=ENSG00000171862
&disease=Orphanet_2563
38. How to run our REST endpoints
* http://opentargets.readthedocs.io/en/stable/index.html
• Paste the URL in the location bar in a browser
• Use the terminal window (e.g. with CURL command)
• Use our free clients (i.e. Python* and R**)
• Call them from your own application/workflow
** no longer supported by Open Targets; feel free to make PRs
42. Python and R clients for the REST API
http://opentargets.readthedocs.iohttp://opentargets.readthedocs.io
No longer supported by
Open Targets
No longer supported by
Open Targets
43. Can we change the way the
associations are scored? Perhaps
to increase the weight on text
mining data?
Yes, you can with the Open Targets Python client!
45. Open Targets toolkit: LINK
• LINK: Literature coNcept Knowledgebase
• Subject / predicate / object structured relations
From PubMed abstracts
Proof of Concept
Further developement
http://link.opentargets.io/
46. Addressing text mining shortcomings
• Entities: genes, diseases, drugs
• Concepts extracted via NLP
(Natural Language Processing)
• 28 M documents, 500 M relations
• http://blog.opentargets.org/link/
47. Open Targets toolkit: DoRothEA
• Candidate TF-drug interactions in cancer
• 1000 cancer cell lines
• 265 anti-cancer compounds
• 127 transcription factors
http://cancerres.aacrjournals.org/content/early/2017/12/09/0008-5472.CAN-17-1679
dorothea.opentargets.io
49. • Resource of integrated multiomics data
• Added value (e.g. score) and links to original sources
• Graphical web interface: easy to use
April 2018 release
Open Targets Platform
21K
targets
9.7K
diseases
2.3 M
associations
6.1 M
evidence
51. We support decision-making
Which targets are
associated with a
disease?
Are there FDA drugs
for this association?
…
Can I find out about the
mechanisms of the
disease?
52. How to access the Platform
Core bioinformatics
pipelines
www.opentargets.org/projects
Experimental
projects
Generate new evidence
CRISPR/Cas9
Organoids and IPS cells
(cellular models for disease)
Integration of available data
Web interface
Batch search tool
REST API
Data dumps
Main data store
Elasticsearch
Angular JS
Web App*
Public
Access
REST
API**
* UI: first released in December 2015
** API first release in April 2016
https://www.targetvalidation.org
https://api.opentargets.io
56. Data sources: GWAS catalog
• Genome Wide Association Studies
• Array-based chips genotyping 100,000 SNPs genomewide
57. Details on data sources to associate
targets and diseases
Extra slides
58. Data sources: UniProt
• Protein: sequence, annotation, function
• Manual curation of coding variants in patients
EMBL-EBI train online
59. • Variants, genes, phenotypes in rare diseases
• Literature curation consultant clinical geneticists in the UK
Data sources: Gene2Phenotype
60. Data sources: UniProt
• Protein: sequence, annotation, function
• Manual curation of coding variants in patients
EMBL-EBI train online
61. Data sources: PheWAS
• Phenome Wide Association Studies
• A variant associated with multiple phenotypes
• Clinical phenotypes derived from EMR-linked biobank BioVU
• ICD9 codes mapped to EFO
62. Data sources: GE PanelApp
• Aid clinical interpretation of genomes for the 100K project
• We include ‘green genes’ from version 1+ and phenotypes
63. Data sources: EVA
• With ClinVar information for rare diseases
• Clinical significance: pathogenic, protective
EMBL-EBI train online
64. Data sources: The Cancer Gene Census
• Genes with mutations causally implicated in cancer
• Gene associated with a cancer plus other cancers associated
with that gene
65. Data sources: IntOGen
• Genes and somatic (driver) mutations, 28 cancer types
• Involvement in cancer biology
• Rubio-Perez et al. 2015
66. Data sources: ChEMBL
• Known drugs linked to a disease and a known target
• FDA approved for clinical trials or marketing
EMBL-EBI train online
67. Data sources: Reactome
• Biochemical reactions and pathways
• Manual curation of pathways affected by mutations
EMBL-EBI train online
68. Data sources: SLAPenrich
• 374 pathways curated and mapped to cancer hallmarks
• Divergence of the total number of cancer samples with
genomic alterations
• Mutational burden and total exonic block length of genes
69. Data sources: PROGENy
• Comparison of pathway activities between normal and primary
samples from The Cancer Genome Atlas
• Inferred from RNA-seq: 9,250 tumour and 741 normal samples
• EGFR, hypoxia, JAK.STAT, MAPK, NFkB, PI3K, TGFb, TNFa,
Trail, VEGF, and p53
70. Data sources: Expression Atlas
• Baseline expression for human genes
- target profile page
• Differential mRNA expression (healthy versus diseased):
- target-disease associations
EMBL-EBI train online
71. Data sources: Europe PMC
• Mining titles, abstracts, full text in research articles
• Target and disease co-occurrence in the same sentence
• Dictionary (not NLP)
EMBL-EBI train online
73. Aggregating scores across the data
• Using a mathematical function, the harmonic sum*
where S1,S2,...,Si are the individual sorted evidence scores in descending order
* PMID: 19107201, PMID: 20118918
• Advantages:
A) account for replication
B) deflate the effect of large amounts of data e.g. text mining
PhRMA (
Lengthy: Drug discovery from start (idea) to finish (market) can take up to 20 years.
Costly: specially from clinical trials when human subjects are tested (volunteers or suffering from the condition)
Compounds drop out along the drug discovery journey low success, high attrition
If you work in early stages of drug discovery, which databases or resources do you rely on?
Some of the DBs out there.
Fit everything to generate therapeutic hypothesis.
Going through those resources takes time, my group may not have allocated resources (computing) or expertise (bioinformatics, ML, NLP),
we are working on our own (in one specific lab) so it’s not multidisciplinary.
Can workshop attendees list others?
A shared vision to create a partnership to transform drug discovery through systematics target ID and prioritisation.
From that idea Open Targets (formerly CTTV) was started to combine the world class functional genomics expertise at Sanger with the EBI‘s excellence in bioinformatics and computational biology and the industry expertise of GSK and now Biogen as well with the aim to find out all that‘s important about a target before starting the drug development pipeline.
Thereby the pipeline would be filled with validated and valid targets with a much improved odds ratio for success.
Target ID and prioritisation knowledge cycle.
Our research focusses on data generation, data integration, and enabled data access
Our research is connected as we provide the tools to generate hypothesis and explore them experimentally
We focus primarily on oncology, immunology (including IBD) neurodegeneration with some experiments in other areas or across diseases
The choice of our disease ontology was EFO. It is cross-referenced against DO, OMIM, HP, MP, MONDO as well
Evidence: sequence (SNPs, mutations, germline and somatic curated at the DNA or protein level), differential mRNA expression, sentences from research papers, drug information, pathways affected by pathogenic mutation, animal models (https://github.com/opentargets/json_schema/blob/master/doc/instructions.md)
Target ID and prioritisation knowledge cycle.
Our research focusses on data generation, data integration, and enabled data access
Our research is connected as we provide the tools to generate hypothesis and explore them experimentally
We focus primarily on oncology, immunology (including IBD) neurodegeneration with some experiments in other areas or across diseases
Somatic mutations from cancer drivers genes across 28 tumor types. We have imported data (2014.12) from 6,792 samples spanning 28 cancer types (Rubio-Perez et al)."