SlideShare a Scribd company logo
1 of 1
Download to read offline
Disease-target associations from drug trial links
Proposed confidence metrics:
● nStudy : Study count for association.
● nStudyNewness : Study count weighted by newness of study (newer better).
● nStudyPhase : Study count weighted by phase of study (completed better).
● nPub : Study publications.
● nPubTypes : Study publications (results type better).
● nDiseaseMention : Disease mention count for disease-target association.
● nDrugMention : Drug mention count for disease-target association.
● nDrug : Drug count for disease-target association.
● nAssay : Assay count for drug-target association.
● nAssayPchembl : Assay count for drug-target association, weighted by pChembl.
1.2M associations; 164K unique disease-gene pairs. Examples:
Mining ClinicalTrials.gov via CTTI AACT for
drug target hypotheses
Jeremy Yang1
, Roger Sayle2
, Lars Juhl Jensen3
and Tudor Oprea1
1
University of New Mexico, Albuquerque, USA; 2
NextMove Scientific Software, Cambridge, UK; 3
Novo Nordisk Foundation Center for Protein Research, Copenhagen, DK
We mined ClinicalTrials.gov using the CTTI-AACT db from Duke University for
drugs/chemicals and diseases/conditions. Named entitiy recognition (NER) requires
specialized tools and expertly curated dictionaries for comprehensive and high quality
results, hence use of NextMove Leadmine for chemical NER and JensenLab
Tagger for disease NER. Study designs and outcomes can offer new and unique
drug target knowledge. Target hypotheses can be inferred indirectly via drugs and
diseases, a valuable source of evidence to Illuminate the Druggable Genome.
Drug trials produce new and rich experimental data
AACT accessed Dec 3, 2019.
Chemical NER with NextMove LeadMine
https://www.nextmovesoftware.com/
Intervention names and study descriptions mined by LeadMine v3.14.1. Drug trials
(interventional): 130740; drug names: 14969; SMILES: 4869. Many non-structures,
e.g. "placebo", "test product", "medication", "chemotherapy". Top drugs by total
mentions:
Compounds mapped to: PubChem via PUG REST API, SMILES exact search;
ChEMBL via REST API, InChIKey search; targets via ChEMBL bioassays.
Targets mapped to IDG-TCRD/Pharos via UniProt ID.
Compound-target mapping via PubChem & ChEMBL
Overview
IDG F2F Meeting - 11-12 Feb 2020 - Arlington, VA
This work supported by US National Institutes of Health (grants U54 CA189205 and U24 224370), Illuminating the Druggable Genome Knowledge Management Center (IDG KMC), and
by the Novo Nordisk Foundation (grant number NNF14CC0001).
Disease NER with JensenLab Tagger
https://github.com/larsjuhljensen/tagger
JensenLab diseases dictionary, on detailed descriptions.
Disease mention totals by merging to resolved Disease Ontology term (DOID).
Top diseases by total mentions:
doid N_mentions terms
DOID:162 31143 CANCER; CANcer; Cancer; Malignant Tumor; Malignant neoplasm; Malignant tumor; Primary Cancer; Primary cancer; cancer; ...
DOID:9351 18955 DIABETES; DIABETES MELLITUS; DIAbetes; DIabetes; Diabetes; Diabetes Mellitus; Diabetes mellitus; diabetes; diabetes Mellitus...
DOID:6713 18461 CVA; Cerebrovascular Accident; Cerebrovascular Disease; Cerebrovascular accident; Cerebrovascular disease; STROKE; STRokE;…
DOID:2030 13621 ANXIETY; Anxiety; Anxiety Disorder; Anxiety State; Anxiety disorder; Anxiety state; anxiety; anxiety disorder; anxiety state; ...
DOID:1612 11586 BREAST CANCER; BReast CAncer; BReast Cancer; Breast Cancer; Breast cancer; Breast tumor; Breast-cancer; Primary breast cancer;…
DOID:2841 10773 ASTHMA; Asthma; BHR; Bronchial hyper-reactivity; Bronchial hyperreactivity; EIA; Exercise-induced asthma; asthma; bronchial hyper re …
DOID:3083 10726 CHRONIC OBSTRUCTIVE PULMONARY DISEASE; COLD; COPD; COPd; Chronic Obstructive Lung Disease; Chronic Obstructive L…
DOID:9970 10193 OBESITY; OBesity; Obesity; obEsity; obe-sity; obesity
DOID:10763 9816 HBP; HTN; HYPERTENSION; High Blood Pressure; High blood pressure; High-blood pressure; Hypertension; Hypertensive disease; high…
Aggregate Analysis of ClinicalTrials.gov (AACT) db
from the Clinical Trials Transformation Initiative (CTTI)
Why not target NER?
Clinical trials not designed to communicate molecular mechanisms to research
scientists, but with focus on clinical efficacy and safety. In due diligence we performed
target NER, and compared with target NER on arbitrary non-biomedical text: tweets
from the Twitter API for #brexit (26 Nov 2019). We find 8.64 target entities per 1000
chars in the tweets (e.g. "TAX", "LIAR", "NHS", "DANGER", "insulin"), vs. 6.63 in the
clinical trials descriptions. While not proof this does support prior belief and target
inference via chemical NER.
nct_id drug_name cid disease_term doid gene_symbol uniprot idgTDL
NCT02600741 Fluphenazine 3372 schizophrenia DOID:5419 MC5R P33032 Tchem
NCT00008190 melphalan 460612 acute leukemia DOID:12603 MMP2 P08253 Tchem
NCT00003012 methotrexate 126941 breast cancer DOID:1612 KDM4E B2RXH2 Tchem
NCT01445522 ABT-888 11960529 lymphoma DOID:0060058 PARP12 Q9H0J9 Tdark
NCT03201250 Cabozantinib 25102847 multiple myeloma DOID:9538 ANKK1 Q8NFD2 Tbio
https://aact.ctti-clinicaltrials.org/
CDK_smi2img N_mentions names
2787 Abraxane; PACLITAXEL; Paclitaxel; Taxol; abraxane; paclitaxel; taxol
2654 CYCLOPHOSPHAMIDE; Ciclophosphamide; Cyclophosphamid; Cyclophosphamide;
ciclophosphamide; cyclophosphamide
2552 CISPLATIN; Cis Platinum; Cis-platinum; Cisplatin; Cisplatine; Cisplatinum; cis Platinum;
cis-platinum; cisplatin; cisplatine; cisplatinum

More Related Content

What's hot

Data Visualization in Biomedical Sciences: More than Meets the Eye
Data Visualization in Biomedical Sciences: More than Meets the EyeData Visualization in Biomedical Sciences: More than Meets the Eye
Data Visualization in Biomedical Sciences: More than Meets the EyeNils Gehlenborg
 
Warfarin vitamin k_patel_2008
Warfarin vitamin k_patel_2008Warfarin vitamin k_patel_2008
Warfarin vitamin k_patel_2008ainun endarwati
 
HCV research: Recent findings and future challenges
HCV research: Recent findings and future challengesHCV research: Recent findings and future challenges
HCV research: Recent findings and future challengesMax Moldovan
 
Citation practices and the construction of scientific fact--ECA-facts-preconf...
Citation practices and the construction of scientific fact--ECA-facts-preconf...Citation practices and the construction of scientific fact--ECA-facts-preconf...
Citation practices and the construction of scientific fact--ECA-facts-preconf...jodischneider
 
PHR 6604- Review of Study Types
PHR 6604- Review of Study TypesPHR 6604- Review of Study Types
PHR 6604- Review of Study TypesAmber Cann
 
Personalized medicine through wes and big data analytics
Personalized medicine through wes and big data analyticsPersonalized medicine through wes and big data analytics
Personalized medicine through wes and big data analyticsJunaidAKG
 
A common rejection module (CRM) for acute rejection across multiple organs
A common rejection module (CRM) for acute rejection across multiple organsA common rejection module (CRM) for acute rejection across multiple organs
A common rejection module (CRM) for acute rejection across multiple organsKevin Jaglinski
 
Using primary care databases to evaluate drug benefits and harms: are the res...
Using primary care databases to evaluate drug benefits and harms: are the res...Using primary care databases to evaluate drug benefits and harms: are the res...
Using primary care databases to evaluate drug benefits and harms: are the res...David Springate
 
What WikiCite can learn from biomedical citation networks--Wikicite2017--2017...
What WikiCite can learn from biomedical citation networks--Wikicite2017--2017...What WikiCite can learn from biomedical citation networks--Wikicite2017--2017...
What WikiCite can learn from biomedical citation networks--Wikicite2017--2017...jodischneider
 
ClinicalCodes.org: An online repository of clinical code lists for primary ca...
ClinicalCodes.org: An online repository of clinical code lists for primary ca...ClinicalCodes.org: An online repository of clinical code lists for primary ca...
ClinicalCodes.org: An online repository of clinical code lists for primary ca...David Springate
 
The case for Genomic Medicine, (Personalized, Individualized Medicine). Medic...
The case for Genomic Medicine, (Personalized, Individualized Medicine). Medic...The case for Genomic Medicine, (Personalized, Individualized Medicine). Medic...
The case for Genomic Medicine, (Personalized, Individualized Medicine). Medic...Mathura Shanmugasundaram PhD
 
Meg Ehm: Fueling a Genetics-Driven Drug Discovery Organization
Meg Ehm: Fueling a Genetics-Driven Drug Discovery OrganizationMeg Ehm: Fueling a Genetics-Driven Drug Discovery Organization
Meg Ehm: Fueling a Genetics-Driven Drug Discovery OrganizationTHL
 

What's hot (15)

Data Visualization in Biomedical Sciences: More than Meets the Eye
Data Visualization in Biomedical Sciences: More than Meets the EyeData Visualization in Biomedical Sciences: More than Meets the Eye
Data Visualization in Biomedical Sciences: More than Meets the Eye
 
Warfarin vitamin k_patel_2008
Warfarin vitamin k_patel_2008Warfarin vitamin k_patel_2008
Warfarin vitamin k_patel_2008
 
HCV research: Recent findings and future challenges
HCV research: Recent findings and future challengesHCV research: Recent findings and future challenges
HCV research: Recent findings and future challenges
 
Citation practices and the construction of scientific fact--ECA-facts-preconf...
Citation practices and the construction of scientific fact--ECA-facts-preconf...Citation practices and the construction of scientific fact--ECA-facts-preconf...
Citation practices and the construction of scientific fact--ECA-facts-preconf...
 
PHR 6604- Review of Study Types
PHR 6604- Review of Study TypesPHR 6604- Review of Study Types
PHR 6604- Review of Study Types
 
Personalized medicine through wes and big data analytics
Personalized medicine through wes and big data analyticsPersonalized medicine through wes and big data analytics
Personalized medicine through wes and big data analytics
 
A common rejection module (CRM) for acute rejection across multiple organs
A common rejection module (CRM) for acute rejection across multiple organsA common rejection module (CRM) for acute rejection across multiple organs
A common rejection module (CRM) for acute rejection across multiple organs
 
Priyakant_Author_IJTLD
Priyakant_Author_IJTLDPriyakant_Author_IJTLD
Priyakant_Author_IJTLD
 
Sara Gerke: "AI in Drug Discovery and Clinical Trials"
Sara Gerke: "AI in Drug Discovery and Clinical Trials"Sara Gerke: "AI in Drug Discovery and Clinical Trials"
Sara Gerke: "AI in Drug Discovery and Clinical Trials"
 
Using primary care databases to evaluate drug benefits and harms: are the res...
Using primary care databases to evaluate drug benefits and harms: are the res...Using primary care databases to evaluate drug benefits and harms: are the res...
Using primary care databases to evaluate drug benefits and harms: are the res...
 
What WikiCite can learn from biomedical citation networks--Wikicite2017--2017...
What WikiCite can learn from biomedical citation networks--Wikicite2017--2017...What WikiCite can learn from biomedical citation networks--Wikicite2017--2017...
What WikiCite can learn from biomedical citation networks--Wikicite2017--2017...
 
ClinicalCodes.org: An online repository of clinical code lists for primary ca...
ClinicalCodes.org: An online repository of clinical code lists for primary ca...ClinicalCodes.org: An online repository of clinical code lists for primary ca...
ClinicalCodes.org: An online repository of clinical code lists for primary ca...
 
Cardiology04 (1)
Cardiology04 (1)Cardiology04 (1)
Cardiology04 (1)
 
The case for Genomic Medicine, (Personalized, Individualized Medicine). Medic...
The case for Genomic Medicine, (Personalized, Individualized Medicine). Medic...The case for Genomic Medicine, (Personalized, Individualized Medicine). Medic...
The case for Genomic Medicine, (Personalized, Individualized Medicine). Medic...
 
Meg Ehm: Fueling a Genetics-Driven Drug Discovery Organization
Meg Ehm: Fueling a Genetics-Driven Drug Discovery OrganizationMeg Ehm: Fueling a Genetics-Driven Drug Discovery Organization
Meg Ehm: Fueling a Genetics-Driven Drug Discovery Organization
 

Similar to Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses

dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET
 
Prospective identification of drug safety signals
Prospective identification of drug safety signalsProspective identification of drug safety signals
Prospective identification of drug safety signalsIMSHealthRWES
 
Guide to Pharmacology Poster - ELIXIR All Hands 2020
Guide to Pharmacology Poster - ELIXIR All Hands 2020Guide to Pharmacology Poster - ELIXIR All Hands 2020
Guide to Pharmacology Poster - ELIXIR All Hands 2020Guide to PHARMACOLOGY
 
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...Vaticle
 
2013-04-17: The Promise, Current State, And Future of Personalized Medicine
2013-04-17: The Promise, Current State, And Future of Personalized Medicine2013-04-17: The Promise, Current State, And Future of Personalized Medicine
2013-04-17: The Promise, Current State, And Future of Personalized MedicineBaltimore Lean Startup
 
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Jeremy Yang
 
Indications discovery and drug repurposing
Indications discovery and drug repurposingIndications discovery and drug repurposing
Indications discovery and drug repurposingSean Ekins
 
IUPHAR Guide to IMMUNOPHARMACOLOGY Poster - Pharmacology 2016
IUPHAR Guide to IMMUNOPHARMACOLOGY Poster - Pharmacology 2016IUPHAR Guide to IMMUNOPHARMACOLOGY Poster - Pharmacology 2016
IUPHAR Guide to IMMUNOPHARMACOLOGY Poster - Pharmacology 2016Guide to PHARMACOLOGY
 
NPC-PD2 PPP collab-PLoS 2015
NPC-PD2 PPP collab-PLoS 2015NPC-PD2 PPP collab-PLoS 2015
NPC-PD2 PPP collab-PLoS 2015Sitta Sittampalam
 
Biomedicine & Pharmacotherapy
Biomedicine & PharmacotherapyBiomedicine & Pharmacotherapy
Biomedicine & PharmacotherapyTrustlife
 
Five emerging trends
Five emerging trends Five emerging trends
Five emerging trends priya arrora
 
NGS for Infectious Disease Diagnostics: An Opportunity for Growth
NGS for Infectious Disease Diagnostics: An Opportunity for Growth NGS for Infectious Disease Diagnostics: An Opportunity for Growth
NGS for Infectious Disease Diagnostics: An Opportunity for Growth Alira Health
 
Evaluating the Medical Literature
Evaluating the Medical LiteratureEvaluating the Medical Literature
Evaluating the Medical LiteratureClista Clanton
 

Similar to Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses (20)

dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020
 
Prospective identification of drug safety signals
Prospective identification of drug safety signalsProspective identification of drug safety signals
Prospective identification of drug safety signals
 
Guide to Pharmacology Poster - ELIXIR All Hands 2020
Guide to Pharmacology Poster - ELIXIR All Hands 2020Guide to Pharmacology Poster - ELIXIR All Hands 2020
Guide to Pharmacology Poster - ELIXIR All Hands 2020
 
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
Leveraging Publicly Accessible Clinical Trails Data Sharing, Dissemination an...
 
Systemic review and meta analysis of rapid flu testing
Systemic review and meta analysis of rapid flu testing Systemic review and meta analysis of rapid flu testing
Systemic review and meta analysis of rapid flu testing
 
2013-04-17: The Promise, Current State, And Future of Personalized Medicine
2013-04-17: The Promise, Current State, And Future of Personalized Medicine2013-04-17: The Promise, Current State, And Future of Personalized Medicine
2013-04-17: The Promise, Current State, And Future of Personalized Medicine
 
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
 
Indications discovery and drug repurposing
Indications discovery and drug repurposingIndications discovery and drug repurposing
Indications discovery and drug repurposing
 
IUPHAR Guide to IMMUNOPHARMACOLOGY Poster - Pharmacology 2016
IUPHAR Guide to IMMUNOPHARMACOLOGY Poster - Pharmacology 2016IUPHAR Guide to IMMUNOPHARMACOLOGY Poster - Pharmacology 2016
IUPHAR Guide to IMMUNOPHARMACOLOGY Poster - Pharmacology 2016
 
NPC-PD2 PPP collab-PLoS 2015
NPC-PD2 PPP collab-PLoS 2015NPC-PD2 PPP collab-PLoS 2015
NPC-PD2 PPP collab-PLoS 2015
 
Biomedicine & Pharmacotherapy
Biomedicine & PharmacotherapyBiomedicine & Pharmacotherapy
Biomedicine & Pharmacotherapy
 
Five emerging trends
Five emerging trends Five emerging trends
Five emerging trends
 
GtoImmuPdb_2017
GtoImmuPdb_2017GtoImmuPdb_2017
GtoImmuPdb_2017
 
NGS for Infectious Disease Diagnostics: An Opportunity for Growth
NGS for Infectious Disease Diagnostics: An Opportunity for Growth NGS for Infectious Disease Diagnostics: An Opportunity for Growth
NGS for Infectious Disease Diagnostics: An Opportunity for Growth
 
Homework
HomeworkHomework
Homework
 
IUPHAR Guide to IMMUNOPHARMACOLOGY
IUPHAR Guide to IMMUNOPHARMACOLOGYIUPHAR Guide to IMMUNOPHARMACOLOGY
IUPHAR Guide to IMMUNOPHARMACOLOGY
 
Evaluating the Medical Literature
Evaluating the Medical LiteratureEvaluating the Medical Literature
Evaluating the Medical Literature
 
In_silico_Modeling_Personalized_Therapy _Pulmonary_Artery_Hypertension.pdf
In_silico_Modeling_Personalized_Therapy _Pulmonary_Artery_Hypertension.pdfIn_silico_Modeling_Personalized_Therapy _Pulmonary_Artery_Hypertension.pdf
In_silico_Modeling_Personalized_Therapy _Pulmonary_Artery_Hypertension.pdf
 
Marsh pers strat-mednov2014
Marsh pers strat-mednov2014Marsh pers strat-mednov2014
Marsh pers strat-mednov2014
 
PMR Buzz
PMR BuzzPMR Buzz
PMR Buzz
 

More from Jeremy Yang

TIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS AnalyticsTIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS AnalyticsJeremy Yang
 
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizerDrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizerJeremy Yang
 
TIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST APITIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST APIJeremy Yang
 
Ex-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles ExplorerEx-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles ExplorerJeremy Yang
 
Open Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource posterOpen Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource posterJeremy Yang
 
Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)Jeremy Yang
 
Bibliological data science and drug discovery
Bibliological data science and drug discoveryBibliological data science and drug discovery
Bibliological data science and drug discoveryJeremy Yang
 
BioMISS: Language Diversity of Computing
BioMISS: Language Diversity of ComputingBioMISS: Language Diversity of Computing
BioMISS: Language Diversity of ComputingJeremy Yang
 
The Language Diversity of Computing
The Language Diversity of ComputingThe Language Diversity of Computing
The Language Diversity of ComputingJeremy Yang
 
RMSD: routine measure stirs doubts
RMSD: routine measure stirs doubtsRMSD: routine measure stirs doubts
RMSD: routine measure stirs doubtsJeremy Yang
 
Canonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsCanonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsJeremy Yang
 
Molecular scaffolds poster
Molecular scaffolds posterMolecular scaffolds poster
Molecular scaffolds posterJeremy Yang
 
Molecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discoveryMolecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discoveryJeremy Yang
 
The BADAPPLE promiscuity plugin for BARD
The BADAPPLE promiscuity plugin for BARDThe BADAPPLE promiscuity plugin for BARD
The BADAPPLE promiscuity plugin for BARDJeremy Yang
 
Cheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesCheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesJeremy Yang
 
How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...Jeremy Yang
 
UNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applicationsUNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applicationsJeremy Yang
 
Cyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingCyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingJeremy Yang
 
Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNJeremy Yang
 

More from Jeremy Yang (19)

TIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS AnalyticsTIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS Analytics
 
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizerDrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
 
TIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST APITIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST API
 
Ex-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles ExplorerEx-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles Explorer
 
Open Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource posterOpen Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource poster
 
Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)
 
Bibliological data science and drug discovery
Bibliological data science and drug discoveryBibliological data science and drug discovery
Bibliological data science and drug discovery
 
BioMISS: Language Diversity of Computing
BioMISS: Language Diversity of ComputingBioMISS: Language Diversity of Computing
BioMISS: Language Diversity of Computing
 
The Language Diversity of Computing
The Language Diversity of ComputingThe Language Diversity of Computing
The Language Diversity of Computing
 
RMSD: routine measure stirs doubts
RMSD: routine measure stirs doubtsRMSD: routine measure stirs doubts
RMSD: routine measure stirs doubts
 
Canonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsCanonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformatics
 
Molecular scaffolds poster
Molecular scaffolds posterMolecular scaffolds poster
Molecular scaffolds poster
 
Molecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discoveryMolecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discovery
 
The BADAPPLE promiscuity plugin for BARD
The BADAPPLE promiscuity plugin for BARDThe BADAPPLE promiscuity plugin for BARD
The BADAPPLE promiscuity plugin for BARD
 
Cheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesCheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case Studies
 
How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...
 
UNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applicationsUNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applications
 
Cyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in BiocomputingCyberinfrastructure Day 2010: Applications in Biocomputing
Cyberinfrastructure Day 2010: Applications in Biocomputing
 
Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCN
 

Recently uploaded

Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Silpa
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxSilpa
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfSumit Kumar yadav
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxSilpa
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsbassianu17
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxSilpa
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Silpa
 

Recently uploaded (20)

Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 

Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses

  • 1. Disease-target associations from drug trial links Proposed confidence metrics: ● nStudy : Study count for association. ● nStudyNewness : Study count weighted by newness of study (newer better). ● nStudyPhase : Study count weighted by phase of study (completed better). ● nPub : Study publications. ● nPubTypes : Study publications (results type better). ● nDiseaseMention : Disease mention count for disease-target association. ● nDrugMention : Drug mention count for disease-target association. ● nDrug : Drug count for disease-target association. ● nAssay : Assay count for drug-target association. ● nAssayPchembl : Assay count for drug-target association, weighted by pChembl. 1.2M associations; 164K unique disease-gene pairs. Examples: Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses Jeremy Yang1 , Roger Sayle2 , Lars Juhl Jensen3 and Tudor Oprea1 1 University of New Mexico, Albuquerque, USA; 2 NextMove Scientific Software, Cambridge, UK; 3 Novo Nordisk Foundation Center for Protein Research, Copenhagen, DK We mined ClinicalTrials.gov using the CTTI-AACT db from Duke University for drugs/chemicals and diseases/conditions. Named entitiy recognition (NER) requires specialized tools and expertly curated dictionaries for comprehensive and high quality results, hence use of NextMove Leadmine for chemical NER and JensenLab Tagger for disease NER. Study designs and outcomes can offer new and unique drug target knowledge. Target hypotheses can be inferred indirectly via drugs and diseases, a valuable source of evidence to Illuminate the Druggable Genome. Drug trials produce new and rich experimental data AACT accessed Dec 3, 2019. Chemical NER with NextMove LeadMine https://www.nextmovesoftware.com/ Intervention names and study descriptions mined by LeadMine v3.14.1. Drug trials (interventional): 130740; drug names: 14969; SMILES: 4869. Many non-structures, e.g. "placebo", "test product", "medication", "chemotherapy". Top drugs by total mentions: Compounds mapped to: PubChem via PUG REST API, SMILES exact search; ChEMBL via REST API, InChIKey search; targets via ChEMBL bioassays. Targets mapped to IDG-TCRD/Pharos via UniProt ID. Compound-target mapping via PubChem & ChEMBL Overview IDG F2F Meeting - 11-12 Feb 2020 - Arlington, VA This work supported by US National Institutes of Health (grants U54 CA189205 and U24 224370), Illuminating the Druggable Genome Knowledge Management Center (IDG KMC), and by the Novo Nordisk Foundation (grant number NNF14CC0001). Disease NER with JensenLab Tagger https://github.com/larsjuhljensen/tagger JensenLab diseases dictionary, on detailed descriptions. Disease mention totals by merging to resolved Disease Ontology term (DOID). Top diseases by total mentions: doid N_mentions terms DOID:162 31143 CANCER; CANcer; Cancer; Malignant Tumor; Malignant neoplasm; Malignant tumor; Primary Cancer; Primary cancer; cancer; ... DOID:9351 18955 DIABETES; DIABETES MELLITUS; DIAbetes; DIabetes; Diabetes; Diabetes Mellitus; Diabetes mellitus; diabetes; diabetes Mellitus... DOID:6713 18461 CVA; Cerebrovascular Accident; Cerebrovascular Disease; Cerebrovascular accident; Cerebrovascular disease; STROKE; STRokE;… DOID:2030 13621 ANXIETY; Anxiety; Anxiety Disorder; Anxiety State; Anxiety disorder; Anxiety state; anxiety; anxiety disorder; anxiety state; ... DOID:1612 11586 BREAST CANCER; BReast CAncer; BReast Cancer; Breast Cancer; Breast cancer; Breast tumor; Breast-cancer; Primary breast cancer;… DOID:2841 10773 ASTHMA; Asthma; BHR; Bronchial hyper-reactivity; Bronchial hyperreactivity; EIA; Exercise-induced asthma; asthma; bronchial hyper re … DOID:3083 10726 CHRONIC OBSTRUCTIVE PULMONARY DISEASE; COLD; COPD; COPd; Chronic Obstructive Lung Disease; Chronic Obstructive L… DOID:9970 10193 OBESITY; OBesity; Obesity; obEsity; obe-sity; obesity DOID:10763 9816 HBP; HTN; HYPERTENSION; High Blood Pressure; High blood pressure; High-blood pressure; Hypertension; Hypertensive disease; high… Aggregate Analysis of ClinicalTrials.gov (AACT) db from the Clinical Trials Transformation Initiative (CTTI) Why not target NER? Clinical trials not designed to communicate molecular mechanisms to research scientists, but with focus on clinical efficacy and safety. In due diligence we performed target NER, and compared with target NER on arbitrary non-biomedical text: tweets from the Twitter API for #brexit (26 Nov 2019). We find 8.64 target entities per 1000 chars in the tweets (e.g. "TAX", "LIAR", "NHS", "DANGER", "insulin"), vs. 6.63 in the clinical trials descriptions. While not proof this does support prior belief and target inference via chemical NER. nct_id drug_name cid disease_term doid gene_symbol uniprot idgTDL NCT02600741 Fluphenazine 3372 schizophrenia DOID:5419 MC5R P33032 Tchem NCT00008190 melphalan 460612 acute leukemia DOID:12603 MMP2 P08253 Tchem NCT00003012 methotrexate 126941 breast cancer DOID:1612 KDM4E B2RXH2 Tchem NCT01445522 ABT-888 11960529 lymphoma DOID:0060058 PARP12 Q9H0J9 Tdark NCT03201250 Cabozantinib 25102847 multiple myeloma DOID:9538 ANKK1 Q8NFD2 Tbio https://aact.ctti-clinicaltrials.org/ CDK_smi2img N_mentions names 2787 Abraxane; PACLITAXEL; Paclitaxel; Taxol; abraxane; paclitaxel; taxol 2654 CYCLOPHOSPHAMIDE; Ciclophosphamide; Cyclophosphamid; Cyclophosphamide; ciclophosphamide; cyclophosphamide 2552 CISPLATIN; Cis Platinum; Cis-platinum; Cisplatin; Cisplatine; Cisplatinum; cis Platinum; cis-platinum; cisplatin; cisplatine; cisplatinum