A common rejection module (CRM) for acute rejection across multiple organs
Instem-Orthologues-Handout
1. A Systematic Assessment of Human Druggable Target Genes Identifies Absent
Orthologues in Mouse and Rat
Mark Miller1
, Paul M. Bradley2
, Gordon S. Baxter2
, James E. Sidaway3
1
Instem, Conshohocken, PA 19428, US
2
Instem, Melbourn, Cambridge, UK
3
Phenotox Ltd., Bollington, Macclesfield, Cheshire, SK10 4TG, UK
james.sidaway@phenotox.com
mark.miller@instem.com
Introduction
Accurate risk assessment can be undermined by the absence of an orthologous target gene in the discovery and regulatory toxicology
test species.
Here, we have systematically analyzed 3,158 druggable human genes (DHGs) from the Drug Gene Interaction Database (DGIdb) for absent
orthologues in the main toxicology test species (mouse, rat and dog.)
Methods
Selection of Druggable Human Genes (DHG) from DGIdb. DGIdb is a source of druggable genes that integrates 27 highly regarded drug,
gene or drug-gene interaction databases like DrugBank, The IUPHAR/BPS Guide to Pharmacology, and ChEMBL. Only DGIdb genes with one
or more drug interactions were included in the analysis. DGIdb provides private identifiers and symbol-like "names" for each gene, along with
retaining the external identities from 42 different namespaces, like Ensembl, Entrez Gene, Uniprot, etc. The average gene had 30.9 external
identifiers, but no one identifier type that was shared by all genes, so all genes were computationally mapped to HGNC numerical identifiers
via every available external identifier. Genes mapped to multiple HGNC identifiers were disambiguated with a hybrid automated/manual
process. For example, DGIdb gene "MHS3" was mapped to multiple HGNC genes, and its interacting drugs consisted of amlodipine and 4 other
dihydropyridine calcium channel blockers. Since CACNG1 was one of the mappings, it was taken as the intended DHG. In this way, 3,147 DHGs
were identified in DGIdb.
Although included in DGIdb, the supplementary table from Rask-Anderson et al (2014) was independently parsed as a validation exercise, in
a manner similar to that described above. This resulted in the "identification" of 11 additional DHGs, which were already present in DGIdb,
but excluded by our workflow because no interactions were present. The union of the DGIdb and Rask-Anderson (RA) conversions was 3,158
DHGs with HGNC identifiers.
The DGIdb provided links to the supplementary table from Rask-Anderson et. al. (2014) and to the IUPHAR/BPS Guide to Pharmacology were
used for categorical analyses. (See Results.)
DetectionofOrthologueswithmetaPhOrs.metaPhOrsisanorthologyresourcethatintegrates12
high-quality sources, including OrthoMCL, PhylomeDB, and seven Ensembl domains (vertebrates,
bacteria, fungi, etc.) metaPhOrs was selected over other integrated orthology resources because
of the easy access to confidence metrics such as the number of primary sources, the number
of supporting trees, and the consistency across those trees. We casually inspected the mouse
orthologues predicted for the DHGs, especially those with low consistency scores and low tree
counts. For example, the predicted mouse orthologue of human P2RY11 is Ppan (Suppressor of
SWI4 1 homolog), with only 3 PhylomeDB trees and a consistency score of 0.667. After viewing
plots of the various quality metrics, we arbitrarily rejected all metaPhOrs predictions with less than
3 trees or a CS less than 0.9, discarding < 3.5% of the predictions. Opportunities for more rigorous
quality filtering are discussed below.
Checking metaPhOrs for False Negatives with BLAST. Casual inspection also revealed some
false negatives: DHGs for which a mouse orthologue was expected but not observed. Therefore,
we BLASTed the protein sequences for all DHGs against the Uniprot proteomes for mouse, rat,
and dog. Since the BLAST was not done reciprocally between the species, the results should be
explicitly taken as similarities that are as good as, or better than, known orthologies.
In our initial method, if a druggable human protein had a BLAST hit against a mouse (etc.) protein
with greater than 80% identity and greater than 80% alignment coverage, then that gene was
removed from the no-orthologue list, despite the absence of a metaPhOrs prediction.
Subsequently, we used an SVM machine leaning approach for filtering the BLAST results. As an
authority on orthology we used NCBI’s Homologene, which is more compact and convenient than
metaPhOrs, at the cost of not directly providing supporting phylogenetic trees or cross-source
consistency scores. After training with 30% of the BLAST data, the overall accuracy was found to
be 99.3%. Next, the trained algorithm was used to predict how “orthologue-like” the remaining
BLAST results were. Human proteins that had alignments to a mouse protein with an SVM score
> 0.95 were excluded from the no-orthologue set.
For example, metaPhOrs does not predict a mouse orthologue for human UBB (Polyubiquitin-B).
However, its protein aligns to mouse Ubb with 99.6% identity and 100% coverage. That gives the
BLAST result an SVM score of 0.98, and UBB was removed from the list of DHGs with no mouse
orthologues.
Results
Discussion
We have identified druggable human genes that lack rodent orthologues. We attribute this ability to at least two
factors: 1) the availability of high quality, integrated data sets 2) a computational workflow that performs quality
control after every step, instead of saving it as the very last step.
HTR3formsC,DandEareestablishedtargetsoftheantiemeticdolasetron,yettheylackmouse,ratanddogorthologues.
A review of the discovery toxicology approaches that have been used for drugs like dolasetron might inform general
strategies for developing drugs against targets lacking rodent orthologues.
As best practice, the arbitrary filtering of metaPhOrs orthology predictions could be replaced with a machine learning
approach, like that which was used with the BLASTs, and orthologue predictions from Homologene could be used in
addition to metaPhOrs.
The current workflow operates on an overly simplistic premise: the presence or absence of a gene in an animal's
genome determines whether that animal could serve as a model for target-mediated toxicity in humans. This workflow
can easily be expanded to consider the absence or presence of other genes in a toxicity pathway. Other approaches
will be required to determine if the target gene is expressed in the model animal, in the relevant tissue. Likewise, one
must determine whether the gene's product might be misfolded, or non-functional for some other reason.
All metaPhOrs predicted Orthologues
for DHGs, with Quality Filter
metaPhOrs ConsistencyScore
TotalTrees,log10Scale
1101001,000
0.5 0.6 0.7 0.8 0.9 1.0
Ppan as
an “Orthologue”
of P2RY11
Sequence Coverage
%AAIdentity
ConsistencywithOrthology
DHG BLAST, with Initial Arbitrary Cutoffs
and SVM Boundary
100
20
40
60
80
10020 40 60 80 120
100
-1.0
-0.5
0.0
0.5
-1.5
Our initial workflow, using the May, 2014 release of metaPhOrs and the arbitrary BLAST cutoffs, identified 170 DHGs
lacking mouse orthologues, 172 DHGs lacking rat orthologues, and an overlap of 134. By switching to the January, 2016
metaPhOrs and using the SVM-based BLAST filter, smaller, more conservative lists were defined. Manual inspection
showed that the majority of the genes "lost" in moving to the more conservative approach were false identifications
i.e. DHGs that did in fact have orthologues in one or both species. (Data not shown.)
According to the revised method, there are 41 especially challenging DHGs that have no orthologue in mouse, rat
or dog. There are also 22 DHGs that do not have a mouse or rat orthologue, but do have a dog orthologue. In these
cases, dogs may be beneficial for modelling on-target toxicities.
Acknowledgements
DGIdb data was downloaded on 10. February, 2016: Wagner AH, Coffman AC, Ainscough BJ, Spies
NC, Skidmore ZL, Campbell KM, Krysiak K, Pan D, McMichael JF, Eldred JM, Walker JR, Wilson RK, Mardis
ER, Griffith M, Griffith OL. DGIdb 2.0: mining clinically relevant drug-gene interactions. Nucleic Acids
Research. 2016 Jan 4;44(D1):D1036-44.
Rask-Andersen M1, Masuram S, Schiöth HB. The druggable genome: Evaluation of drug targets in
clinical trials suggests major shifts in molecular class and indication. Annu Rev Pharmacol Toxicol.
2014;54:9-26.
We used the May, 2014 and January, 2016 releases of metaPhOrs: Pryszcz, L.P., Huerta-Cepas, J.,
and Gabaldon, T. (2011) MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic
evidence using a consistency-based confidence score. Nucleic Acids Res. 39: e32.
Finally, the DHGs lacking mouse
orthologues, especially those that are
onlypresentintheDGIdbset,tendtohave
a smaller body of literature compared
to those with mouse orthologues, as
judged by links from Entrez Gene to
PubMed. (Kruskal-Wallis rank sum
p-value = 2.433e-09)
Average # of
Citations per Gene
Lacking Mouse Orthologue, Present in DGIdb only 166.1
Lacking Mouse Orthologue, Present in RA TDG & DGIdb 205.2
With Mouse Orthologue 293.7
Rat
Mouse
Mouse
Rat
Dog
3
13
9
0
41
11
3813436 22
Set analysis of Druggable Human Genes that Lack Orthologues in Mouse, Rat and/or Dog
Novelty of Targets, According to Rask-Anderson, et. al. (2014)
Examples of DHGs Lacking Mouse, Rat or Dog Orthologues, Grouped by IUPHAR Pharmacological Class
"Ligand" is an Additional, Author Supplied Class
Initial Approach Revised Approach