SlideShare a Scribd company logo
1 of 1
Download to read offline
A Systematic Assessment of Human Druggable Target Genes Identifies Absent
Orthologues in Mouse and Rat
Mark Miller1
, Paul M. Bradley2
, Gordon S. Baxter2
, James E. Sidaway3
1
Instem, Conshohocken, PA 19428, US
2
Instem, Melbourn, Cambridge, UK
3
Phenotox Ltd., Bollington, Macclesfield, Cheshire, SK10 4TG, UK
james.sidaway@phenotox.com
mark.miller@instem.com
Introduction
Accurate risk assessment can be undermined by the absence of an orthologous target gene in the discovery and regulatory toxicology
test species.
Here, we have systematically analyzed 3,158 druggable human genes (DHGs) from the Drug Gene Interaction Database (DGIdb) for absent
orthologues in the main toxicology test species (mouse, rat and dog.)
Methods
Selection of Druggable Human Genes (DHG) from DGIdb. DGIdb is a source of druggable genes that integrates 27 highly regarded drug,
gene or drug-gene interaction databases like DrugBank, The IUPHAR/BPS Guide to Pharmacology, and ChEMBL. Only DGIdb genes with one
or more drug interactions were included in the analysis. DGIdb provides private identifiers and symbol-like "names" for each gene, along with
retaining the external identities from 42 different namespaces, like Ensembl, Entrez Gene, Uniprot, etc. The average gene had 30.9 external
identifiers, but no one identifier type that was shared by all genes, so all genes were computationally mapped to HGNC numerical identifiers
via every available external identifier. Genes mapped to multiple HGNC identifiers were disambiguated with a hybrid automated/manual
process. For example, DGIdb gene "MHS3" was mapped to multiple HGNC genes, and its interacting drugs consisted of amlodipine and 4 other
dihydropyridine calcium channel blockers. Since CACNG1 was one of the mappings, it was taken as the intended DHG. In this way, 3,147 DHGs
were identified in DGIdb.
Although included in DGIdb, the supplementary table from Rask-Anderson et al (2014) was independently parsed as a validation exercise, in
a manner similar to that described above. This resulted in the "identification" of 11 additional DHGs, which were already present in DGIdb,
but excluded by our workflow because no interactions were present. The union of the DGIdb and Rask-Anderson (RA) conversions was 3,158
DHGs with HGNC identifiers.
The DGIdb provided links to the supplementary table from Rask-Anderson et. al. (2014) and to the IUPHAR/BPS Guide to Pharmacology were
used for categorical analyses. (See Results.)
DetectionofOrthologueswithmetaPhOrs.metaPhOrsisanorthologyresourcethatintegrates12
high-quality sources, including OrthoMCL, PhylomeDB, and seven Ensembl domains (vertebrates,
bacteria, fungi, etc.) metaPhOrs was selected over other integrated orthology resources because
of the easy access to confidence metrics such as the number of primary sources, the number
of supporting trees, and the consistency across those trees. We casually inspected the mouse
orthologues predicted for the DHGs, especially those with low consistency scores and low tree
counts. For example, the predicted mouse orthologue of human P2RY11 is Ppan (Suppressor of
SWI4 1 homolog), with only 3 PhylomeDB trees and a consistency score of 0.667. After viewing
plots of the various quality metrics, we arbitrarily rejected all metaPhOrs predictions with less than
3 trees or a CS less than 0.9, discarding < 3.5% of the predictions. Opportunities for more rigorous
quality filtering are discussed below.
Checking metaPhOrs for False Negatives with BLAST. Casual inspection also revealed some
false negatives: DHGs for which a mouse orthologue was expected but not observed. Therefore,
we BLASTed the protein sequences for all DHGs against the Uniprot proteomes for mouse, rat,
and dog. Since the BLAST was not done reciprocally between the species, the results should be
explicitly taken as similarities that are as good as, or better than, known orthologies.
In our initial method, if a druggable human protein had a BLAST hit against a mouse (etc.) protein
with greater than 80% identity and greater than 80% alignment coverage, then that gene was
removed from the no-orthologue list, despite the absence of a metaPhOrs prediction.
Subsequently, we used an SVM machine leaning approach for filtering the BLAST results. As an
authority on orthology we used NCBI’s Homologene, which is more compact and convenient than
metaPhOrs, at the cost of not directly providing supporting phylogenetic trees or cross-source
consistency scores. After training with 30% of the BLAST data, the overall accuracy was found to
be 99.3%. Next, the trained algorithm was used to predict how “orthologue-like” the remaining
BLAST results were. Human proteins that had alignments to a mouse protein with an SVM score
> 0.95 were excluded from the no-orthologue set.
For example, metaPhOrs does not predict a mouse orthologue for human UBB (Polyubiquitin-B).
However, its protein aligns to mouse Ubb with 99.6% identity and 100% coverage. That gives the
BLAST result an SVM score of 0.98, and UBB was removed from the list of DHGs with no mouse
orthologues.
Results
Discussion
We have identified druggable human genes that lack rodent orthologues. We attribute this ability to at least two
factors: 1) the availability of high quality, integrated data sets 2) a computational workflow that performs quality
control after every step, instead of saving it as the very last step.
HTR3formsC,DandEareestablishedtargetsoftheantiemeticdolasetron,yettheylackmouse,ratanddogorthologues.
A review of the discovery toxicology approaches that have been used for drugs like dolasetron might inform general
strategies for developing drugs against targets lacking rodent orthologues.
As best practice, the arbitrary filtering of metaPhOrs orthology predictions could be replaced with a machine learning
approach, like that which was used with the BLASTs, and orthologue predictions from Homologene could be used in
addition to metaPhOrs.
The current workflow operates on an overly simplistic premise: the presence or absence of a gene in an animal's
genome determines whether that animal could serve as a model for target-mediated toxicity in humans. This workflow
can easily be expanded to consider the absence or presence of other genes in a toxicity pathway. Other approaches
will be required to determine if the target gene is expressed in the model animal, in the relevant tissue. Likewise, one
must determine whether the gene's product might be misfolded, or non-functional for some other reason.
All metaPhOrs predicted Orthologues
for DHGs, with Quality Filter
metaPhOrs ConsistencyScore
TotalTrees,log10Scale
1101001,000
0.5 0.6 0.7 0.8 0.9 1.0
Ppan as
an “Orthologue”
of P2RY11
Sequence Coverage
%AAIdentity
ConsistencywithOrthology
DHG BLAST, with Initial Arbitrary Cutoffs
and SVM Boundary
100
20
40
60
80
10020 40 60 80 120
100
-1.0
-0.5
0.0
0.5
-1.5
Our initial workflow, using the May, 2014 release of metaPhOrs and the arbitrary BLAST cutoffs, identified 170 DHGs
lacking mouse orthologues, 172 DHGs lacking rat orthologues, and an overlap of 134. By switching to the January, 2016
metaPhOrs and using the SVM-based BLAST filter, smaller, more conservative lists were defined. Manual inspection
showed that the majority of the genes "lost" in moving to the more conservative approach were false identifications
i.e. DHGs that did in fact have orthologues in one or both species. (Data not shown.)
According to the revised method, there are 41 especially challenging DHGs that have no orthologue in mouse, rat
or dog. There are also 22 DHGs that do not have a mouse or rat orthologue, but do have a dog orthologue. In these
cases, dogs may be beneficial for modelling on-target toxicities.
Acknowledgements
DGIdb data was downloaded on 10. February, 2016: Wagner AH, Coffman AC, Ainscough BJ, Spies
NC, Skidmore ZL, Campbell KM, Krysiak K, Pan D, McMichael JF, Eldred JM, Walker JR, Wilson RK, Mardis
ER, Griffith M, Griffith OL. DGIdb 2.0: mining clinically relevant drug-gene interactions. Nucleic Acids
Research. 2016 Jan 4;44(D1):D1036-44.
Rask-Andersen M1, Masuram S, Schiöth HB. The druggable genome: Evaluation of drug targets in
clinical trials suggests major shifts in molecular class and indication. Annu Rev Pharmacol Toxicol.
2014;54:9-26.
We used the May, 2014 and January, 2016 releases of metaPhOrs: Pryszcz, L.P., Huerta-Cepas, J.,
and Gabaldon, T. (2011) MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic
evidence using a consistency-based confidence score. Nucleic Acids Res. 39: e32.
Finally, the DHGs lacking mouse
orthologues, especially those that are
onlypresentintheDGIdbset,tendtohave
a smaller body of literature compared
to those with mouse orthologues, as
judged by links from Entrez Gene to
PubMed. (Kruskal-Wallis rank sum
p-value = 2.433e-09)
Average # of
Citations per Gene
Lacking Mouse Orthologue, Present in DGIdb only 166.1
Lacking Mouse Orthologue, Present in RA TDG & DGIdb 205.2
With Mouse Orthologue 293.7
Rat
Mouse
Mouse
Rat
Dog
3
13
9
0
41
11
3813436 22
Set analysis of Druggable Human Genes that Lack Orthologues in Mouse, Rat and/or Dog
Novelty of Targets, According to Rask-Anderson, et. al. (2014)
Examples of DHGs Lacking Mouse, Rat or Dog Orthologues, Grouped by IUPHAR Pharmacological Class
"Ligand" is an Additional, Author Supplied Class
Initial Approach Revised Approach

More Related Content

What's hot

PUTATIVE DRUG TARGET IDENTIFICATION FOR SEPTIC ARTHRITIS THROUGH DATA MINING ...
PUTATIVE DRUG TARGET IDENTIFICATION FOR SEPTIC ARTHRITIS THROUGH DATA MINING ...PUTATIVE DRUG TARGET IDENTIFICATION FOR SEPTIC ARTHRITIS THROUGH DATA MINING ...
PUTATIVE DRUG TARGET IDENTIFICATION FOR SEPTIC ARTHRITIS THROUGH DATA MINING ...
Jing Zang
 
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Natalio Krasnogor
 
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Sean Ekins
 
human_mutation_article
human_mutation_articlehuman_mutation_article
human_mutation_article
Neha Gupta
 
Jenney et al 2016 Early avoidance
Jenney et al 2016 Early avoidanceJenney et al 2016 Early avoidance
Jenney et al 2016 Early avoidance
Teddy Uzamere
 

What's hot (20)

Computational design of novel candidate drug molecules for schistosomiasis
Computational design of novel candidate drug molecules for schistosomiasisComputational design of novel candidate drug molecules for schistosomiasis
Computational design of novel candidate drug molecules for schistosomiasis
 
COMPUTER ASSISTED DRUG DISCOVERY
COMPUTER ASSISTED DRUG DISCOVERYCOMPUTER ASSISTED DRUG DISCOVERY
COMPUTER ASSISTED DRUG DISCOVERY
 
Drug Repurposing Against Infectious Diseases
Drug Repurposing Against Infectious Diseases Drug Repurposing Against Infectious Diseases
Drug Repurposing Against Infectious Diseases
 
PUTATIVE DRUG TARGET IDENTIFICATION FOR SEPTIC ARTHRITIS THROUGH DATA MINING ...
PUTATIVE DRUG TARGET IDENTIFICATION FOR SEPTIC ARTHRITIS THROUGH DATA MINING ...PUTATIVE DRUG TARGET IDENTIFICATION FOR SEPTIC ARTHRITIS THROUGH DATA MINING ...
PUTATIVE DRUG TARGET IDENTIFICATION FOR SEPTIC ARTHRITIS THROUGH DATA MINING ...
 
Biochemical and bioinformatic investigations of potential drug targets in Pla...
Biochemical and bioinformatic investigations of potential drug targets in Pla...Biochemical and bioinformatic investigations of potential drug targets in Pla...
Biochemical and bioinformatic investigations of potential drug targets in Pla...
 
Chemogenomic profiling
Chemogenomic profilingChemogenomic profiling
Chemogenomic profiling
 
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
 
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...
Computational Drug Discovery: Machine Learning for Making Sense of Big Data i...
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
 
Assignment on Limitation of animal experimentation
Assignment on Limitation of animal experimentationAssignment on Limitation of animal experimentation
Assignment on Limitation of animal experimentation
 
Ketamine as an antidepressant
Ketamine as an antidepressantKetamine as an antidepressant
Ketamine as an antidepressant
 
Alternatives to Animal Testing
Alternatives to Animal TestingAlternatives to Animal Testing
Alternatives to Animal Testing
 
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
 
human_mutation_article
human_mutation_articlehuman_mutation_article
human_mutation_article
 
nm0915-965-2
nm0915-965-2nm0915-965-2
nm0915-965-2
 
Jenney et al 2016 Early avoidance
Jenney et al 2016 Early avoidanceJenney et al 2016 Early avoidance
Jenney et al 2016 Early avoidance
 
Nc state lecture v2 Computational Toxicology
Nc state lecture v2 Computational ToxicologyNc state lecture v2 Computational Toxicology
Nc state lecture v2 Computational Toxicology
 
Homology modelling for the exploration of drug targets
Homology modelling for the exploration of drug targetsHomology modelling for the exploration of drug targets
Homology modelling for the exploration of drug targets
 
Alternative methods to animal testing: review
Alternative methods to animal testing: reviewAlternative methods to animal testing: review
Alternative methods to animal testing: review
 
IUPHAR/BPS Guide to Pharmacology
IUPHAR/BPS Guide to PharmacologyIUPHAR/BPS Guide to Pharmacology
IUPHAR/BPS Guide to Pharmacology
 

Similar to Instem-Orthologues-Handout

CrossGen-Merck manuscript
CrossGen-Merck manuscriptCrossGen-Merck manuscript
CrossGen-Merck manuscript
Kush Sharma
 
Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...
Mutiple Sclerosis
 
EXTRAPOLATION OF IN VITRO DATA TO PRECLINICAL
EXTRAPOLATION OF IN VITRO DATA TO PRECLINICALEXTRAPOLATION OF IN VITRO DATA TO PRECLINICAL
EXTRAPOLATION OF IN VITRO DATA TO PRECLINICAL
TMU
 
The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicine
mhaendel
 
A common rejection module (CRM) for acute rejection across multiple organs
A common rejection module (CRM) for acute rejection across multiple organsA common rejection module (CRM) for acute rejection across multiple organs
A common rejection module (CRM) for acute rejection across multiple organs
Kevin Jaglinski
 

Similar to Instem-Orthologues-Handout (20)

Update on the Druggable Proteome
Update on the Druggable ProteomeUpdate on the Druggable Proteome
Update on the Druggable Proteome
 
PGX Data Mining
PGX Data MiningPGX Data Mining
PGX Data Mining
 
Research proposal sjtu
Research proposal sjtuResearch proposal sjtu
Research proposal sjtu
 
CrossGen-Merck manuscript
CrossGen-Merck manuscriptCrossGen-Merck manuscript
CrossGen-Merck manuscript
 
MORPH-R article
MORPH-R articleMORPH-R article
MORPH-R article
 
Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...Contribution of genome-wide association studies to scientific research: a pra...
Contribution of genome-wide association studies to scientific research: a pra...
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand up
 
EXTRAPOLATION OF IN VITRO DATA TO PRECLINICAL
EXTRAPOLATION OF IN VITRO DATA TO PRECLINICALEXTRAPOLATION OF IN VITRO DATA TO PRECLINICAL
EXTRAPOLATION OF IN VITRO DATA TO PRECLINICAL
 
Extrapolation of in vitro data to preclinical and.pptx
Extrapolation of in vitro data to preclinical and.pptxExtrapolation of in vitro data to preclinical and.pptx
Extrapolation of in vitro data to preclinical and.pptx
 
Genome responses of trypanosome infected cattle
Genome responses of trypanosome infected cattleGenome responses of trypanosome infected cattle
Genome responses of trypanosome infected cattle
 
Journal
JournalJournal
Journal
 
Analysing curated protein targets: Partitioning the drugged and the druggable
Analysing curated protein targets: Partitioning the drugged and the druggable Analysing curated protein targets: Partitioning the drugged and the druggable
Analysing curated protein targets: Partitioning the drugged and the druggable
 
Genomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and developmentGenomics and proteomics in drug discovery and development
Genomics and proteomics in drug discovery and development
 
Classification of In Vitro Genotoxicants Using a Multiplexed Assay (MultiFlow™)
Classification of In Vitro Genotoxicants Using a Multiplexed Assay (MultiFlow™)Classification of In Vitro Genotoxicants Using a Multiplexed Assay (MultiFlow™)
Classification of In Vitro Genotoxicants Using a Multiplexed Assay (MultiFlow™)
 
The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicine
 
Computer Aided Drug Design
Computer Aided Drug DesignComputer Aided Drug Design
Computer Aided Drug Design
 
Biomedicine & Pharmacotherapy
Biomedicine & PharmacotherapyBiomedicine & Pharmacotherapy
Biomedicine & Pharmacotherapy
 
TDRtargets.org: an open-access resource for prioritizing possible drug target...
TDRtargets.org: an open-access resource for prioritizing possible drug target...TDRtargets.org: an open-access resource for prioritizing possible drug target...
TDRtargets.org: an open-access resource for prioritizing possible drug target...
 
Unc slides on computational toxicology
Unc slides on computational toxicologyUnc slides on computational toxicology
Unc slides on computational toxicology
 
A common rejection module (CRM) for acute rejection across multiple organs
A common rejection module (CRM) for acute rejection across multiple organsA common rejection module (CRM) for acute rejection across multiple organs
A common rejection module (CRM) for acute rejection across multiple organs
 

Instem-Orthologues-Handout

  • 1. A Systematic Assessment of Human Druggable Target Genes Identifies Absent Orthologues in Mouse and Rat Mark Miller1 , Paul M. Bradley2 , Gordon S. Baxter2 , James E. Sidaway3 1 Instem, Conshohocken, PA 19428, US 2 Instem, Melbourn, Cambridge, UK 3 Phenotox Ltd., Bollington, Macclesfield, Cheshire, SK10 4TG, UK james.sidaway@phenotox.com mark.miller@instem.com Introduction Accurate risk assessment can be undermined by the absence of an orthologous target gene in the discovery and regulatory toxicology test species. Here, we have systematically analyzed 3,158 druggable human genes (DHGs) from the Drug Gene Interaction Database (DGIdb) for absent orthologues in the main toxicology test species (mouse, rat and dog.) Methods Selection of Druggable Human Genes (DHG) from DGIdb. DGIdb is a source of druggable genes that integrates 27 highly regarded drug, gene or drug-gene interaction databases like DrugBank, The IUPHAR/BPS Guide to Pharmacology, and ChEMBL. Only DGIdb genes with one or more drug interactions were included in the analysis. DGIdb provides private identifiers and symbol-like "names" for each gene, along with retaining the external identities from 42 different namespaces, like Ensembl, Entrez Gene, Uniprot, etc. The average gene had 30.9 external identifiers, but no one identifier type that was shared by all genes, so all genes were computationally mapped to HGNC numerical identifiers via every available external identifier. Genes mapped to multiple HGNC identifiers were disambiguated with a hybrid automated/manual process. For example, DGIdb gene "MHS3" was mapped to multiple HGNC genes, and its interacting drugs consisted of amlodipine and 4 other dihydropyridine calcium channel blockers. Since CACNG1 was one of the mappings, it was taken as the intended DHG. In this way, 3,147 DHGs were identified in DGIdb. Although included in DGIdb, the supplementary table from Rask-Anderson et al (2014) was independently parsed as a validation exercise, in a manner similar to that described above. This resulted in the "identification" of 11 additional DHGs, which were already present in DGIdb, but excluded by our workflow because no interactions were present. The union of the DGIdb and Rask-Anderson (RA) conversions was 3,158 DHGs with HGNC identifiers. The DGIdb provided links to the supplementary table from Rask-Anderson et. al. (2014) and to the IUPHAR/BPS Guide to Pharmacology were used for categorical analyses. (See Results.) DetectionofOrthologueswithmetaPhOrs.metaPhOrsisanorthologyresourcethatintegrates12 high-quality sources, including OrthoMCL, PhylomeDB, and seven Ensembl domains (vertebrates, bacteria, fungi, etc.) metaPhOrs was selected over other integrated orthology resources because of the easy access to confidence metrics such as the number of primary sources, the number of supporting trees, and the consistency across those trees. We casually inspected the mouse orthologues predicted for the DHGs, especially those with low consistency scores and low tree counts. For example, the predicted mouse orthologue of human P2RY11 is Ppan (Suppressor of SWI4 1 homolog), with only 3 PhylomeDB trees and a consistency score of 0.667. After viewing plots of the various quality metrics, we arbitrarily rejected all metaPhOrs predictions with less than 3 trees or a CS less than 0.9, discarding < 3.5% of the predictions. Opportunities for more rigorous quality filtering are discussed below. Checking metaPhOrs for False Negatives with BLAST. Casual inspection also revealed some false negatives: DHGs for which a mouse orthologue was expected but not observed. Therefore, we BLASTed the protein sequences for all DHGs against the Uniprot proteomes for mouse, rat, and dog. Since the BLAST was not done reciprocally between the species, the results should be explicitly taken as similarities that are as good as, or better than, known orthologies. In our initial method, if a druggable human protein had a BLAST hit against a mouse (etc.) protein with greater than 80% identity and greater than 80% alignment coverage, then that gene was removed from the no-orthologue list, despite the absence of a metaPhOrs prediction. Subsequently, we used an SVM machine leaning approach for filtering the BLAST results. As an authority on orthology we used NCBI’s Homologene, which is more compact and convenient than metaPhOrs, at the cost of not directly providing supporting phylogenetic trees or cross-source consistency scores. After training with 30% of the BLAST data, the overall accuracy was found to be 99.3%. Next, the trained algorithm was used to predict how “orthologue-like” the remaining BLAST results were. Human proteins that had alignments to a mouse protein with an SVM score > 0.95 were excluded from the no-orthologue set. For example, metaPhOrs does not predict a mouse orthologue for human UBB (Polyubiquitin-B). However, its protein aligns to mouse Ubb with 99.6% identity and 100% coverage. That gives the BLAST result an SVM score of 0.98, and UBB was removed from the list of DHGs with no mouse orthologues. Results Discussion We have identified druggable human genes that lack rodent orthologues. We attribute this ability to at least two factors: 1) the availability of high quality, integrated data sets 2) a computational workflow that performs quality control after every step, instead of saving it as the very last step. HTR3formsC,DandEareestablishedtargetsoftheantiemeticdolasetron,yettheylackmouse,ratanddogorthologues. A review of the discovery toxicology approaches that have been used for drugs like dolasetron might inform general strategies for developing drugs against targets lacking rodent orthologues. As best practice, the arbitrary filtering of metaPhOrs orthology predictions could be replaced with a machine learning approach, like that which was used with the BLASTs, and orthologue predictions from Homologene could be used in addition to metaPhOrs. The current workflow operates on an overly simplistic premise: the presence or absence of a gene in an animal's genome determines whether that animal could serve as a model for target-mediated toxicity in humans. This workflow can easily be expanded to consider the absence or presence of other genes in a toxicity pathway. Other approaches will be required to determine if the target gene is expressed in the model animal, in the relevant tissue. Likewise, one must determine whether the gene's product might be misfolded, or non-functional for some other reason. All metaPhOrs predicted Orthologues for DHGs, with Quality Filter metaPhOrs ConsistencyScore TotalTrees,log10Scale 1101001,000 0.5 0.6 0.7 0.8 0.9 1.0 Ppan as an “Orthologue” of P2RY11 Sequence Coverage %AAIdentity ConsistencywithOrthology DHG BLAST, with Initial Arbitrary Cutoffs and SVM Boundary 100 20 40 60 80 10020 40 60 80 120 100 -1.0 -0.5 0.0 0.5 -1.5 Our initial workflow, using the May, 2014 release of metaPhOrs and the arbitrary BLAST cutoffs, identified 170 DHGs lacking mouse orthologues, 172 DHGs lacking rat orthologues, and an overlap of 134. By switching to the January, 2016 metaPhOrs and using the SVM-based BLAST filter, smaller, more conservative lists were defined. Manual inspection showed that the majority of the genes "lost" in moving to the more conservative approach were false identifications i.e. DHGs that did in fact have orthologues in one or both species. (Data not shown.) According to the revised method, there are 41 especially challenging DHGs that have no orthologue in mouse, rat or dog. There are also 22 DHGs that do not have a mouse or rat orthologue, but do have a dog orthologue. In these cases, dogs may be beneficial for modelling on-target toxicities. Acknowledgements DGIdb data was downloaded on 10. February, 2016: Wagner AH, Coffman AC, Ainscough BJ, Spies NC, Skidmore ZL, Campbell KM, Krysiak K, Pan D, McMichael JF, Eldred JM, Walker JR, Wilson RK, Mardis ER, Griffith M, Griffith OL. DGIdb 2.0: mining clinically relevant drug-gene interactions. Nucleic Acids Research. 2016 Jan 4;44(D1):D1036-44. Rask-Andersen M1, Masuram S, Schiöth HB. The druggable genome: Evaluation of drug targets in clinical trials suggests major shifts in molecular class and indication. Annu Rev Pharmacol Toxicol. 2014;54:9-26. We used the May, 2014 and January, 2016 releases of metaPhOrs: Pryszcz, L.P., Huerta-Cepas, J., and Gabaldon, T. (2011) MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score. Nucleic Acids Res. 39: e32. Finally, the DHGs lacking mouse orthologues, especially those that are onlypresentintheDGIdbset,tendtohave a smaller body of literature compared to those with mouse orthologues, as judged by links from Entrez Gene to PubMed. (Kruskal-Wallis rank sum p-value = 2.433e-09) Average # of Citations per Gene Lacking Mouse Orthologue, Present in DGIdb only 166.1 Lacking Mouse Orthologue, Present in RA TDG & DGIdb 205.2 With Mouse Orthologue 293.7 Rat Mouse Mouse Rat Dog 3 13 9 0 41 11 3813436 22 Set analysis of Druggable Human Genes that Lack Orthologues in Mouse, Rat and/or Dog Novelty of Targets, According to Rask-Anderson, et. al. (2014) Examples of DHGs Lacking Mouse, Rat or Dog Orthologues, Grouped by IUPHAR Pharmacological Class "Ligand" is an Additional, Author Supplied Class Initial Approach Revised Approach