Digging out Structures for Repurposing: Non-competitive Intelligence
Upcoming SlideShare
Loading in...5
×
 

Digging out Structures for Repurposing: Non-competitive Intelligence

on

  • 534 views

Prepared as visitor seminar for PubChem, April 2013

Prepared as visitor seminar for PubChem, April 2013

Statistics

Views

Total Views
534
Views on SlideShare
534
Embed Views
0

Actions

Likes
0
Downloads
11
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • IUPAC in abstract converted by MeSH but not transferred to PubChemChemicalize.org used for conversion, matched patent sourcesTherefore structure is there but code synonym is notNo ones responsibility to submit the code-to-struc

Digging out Structures for Repurposing: Non-competitive Intelligence                            Digging out Structures for Repurposing: Non-competitive Intelligence Presentation Transcript

  • Digging out Structures for Repurposing: Non-competitive Intelligence PubChem Seminar April 2013 Christopher Southan, TW2Informatics, Göteborg, Sweden [1]
  • Dr Christopher Southan, Ph.D., M.Sc.,B.Sc.TW2Informatics: http://www.cdsouthan.info/Consult/CDS_cons.htmMobile: +46(0)702-530710Skype: cdsouthanEmail: cdsouthan@hotmail.comTwitter: http://twitter.com/#!/cdsouthanBlog: http://cdsouthan.blogspot.com/LinkedIN: http://www.linkedin.com/in/cdsouthanPublications: http://www.citeulike.org/user/cdsouthan/order/year,,/publicationsPresentations: http://www.slideshare.net/cdsouthan [2]
  • Outline• Trawling for repurposing-relevant data• Code names statistics and name > structure triage• The NCATS/MRC challenge• Story of JNJ-39393406• Scaling-up Code name hunting and x-mapping• Code name in clinical trials, MeSH, PubChem• Story of PF-04457845• Trials, MeSH and PubChem code name intersects• Conclusions [3]
  • Intelligence: trawling compound information Competitive Non-competitive• Directed towards commercially • Directed towards repositioning any positioning and/or repurposing compound own portfolio • Collaborative approaches to IP• Major big pharma activity holders (but new IP possible)• Mixed commercial/public sources • Can utilise public resources alone• Internal specialists • Different domain expert entry• Typically a closed activity (i.e. little points open “best practice”) • Predominantly an open activity• Typically therapeutic area aligned (e.g. OSDD) • Can be hypothesis-neutral [4]
  • Structures:connecting to repurposing-relevant data• Code names and synonyms• Resolving these to structures• Database entries• BioAssay results• Target/pathway links• In vitro & in vivo research papers• Clinical trial results and papers• Patents for analogues and SAR• Comparative in vivo data• Mendelian and GWAS disease links• Expression data for cpds• In silico modeling (including rare or NTDs)• Vendor similarity matches [5]
  • Code names: 2-15 year information hole Pharmaprojects 2009-10 figures [6]
  • Drugs,code names, INN/USANs and structures: few congruent hard numbers• Pharmaprojects (2013) drug profiles ~ 50,000• Thomson Reuters Cortelis (2012) drug monographs = 41,889• Pharmaprojects (via ProQuest, 2012) records ~ 35,000• Thomson Reuters Partnering (2011 structures, PMID: 22024215) = 17,901• Pharmaprojects (2003 structures) = 14,000• ChEMBL USANs (2013) = 10,568• PubChem (2013) “USAN [synonym] OR INN [synonym]” = 9,890• Pharmaprojects (2010 in development, no structure count) = 9,737• GVKBIO Clinical Candidate structures (2008, PMID:20298516) = 8,864• Pharmaprojects (2010 review, no structures) Phase 1+2+3 = 3,828 [7]
  • Code names: major repurposing potential – but..• ~ 95% of the 30K are/will become “parked” or “abandoned”• Can be repurposed in silico at least• Obvious hierarchy : leads> development > clinical trials > INN > approved• Problems – New code names < 50% - 70% blinded (i.e. no structures) – Some older code names never un-blinded – Code naming practices independent and completely ad hoc – Publications, conference reports, clinical trials entries, press releases and portfolio listings linked to “blinded” code names (no structures) – Even for public declarations (e.g. papers) data linked into “the system” (e.g. synonym mapping) is patchy – Code originators do not provenance public database entries – Data supporting non-progression decisions rarely disclosed – http://chembl.blogspot.se/p/research-code-stems.html 100’s of codes [8]
  • Code name-to-structure mapping triageDig out the code names Name/image > struc PubChem Substance • chemicalize.org, OPSIN, Chemical Identifier Resolver, PubChem Compound sketchers, OSRA PubMed/MeSH • Cross-checks: – SMILES/SDF/InChI strings PubChem and ChemSpider Google Scholar – InChIKey in Google – SureChemOpen patent search Google Images – Clinicaltrials.gov – Synonym trawling Google open (filtered) [9]
  • The NCATS/MRC industry sponsoredrepurposing exercise: the joy of code lists [10]
  • NCATS/MRC repurposing candidateshttp://cdsouthan.blogspot.se/2012/09/mrc-22-vs-ncats-58-repurposing-lists.html [11]
  • NCATS/MRC: summary statistics PMID 23159359 • 70 code names – no structures • 18 INNs & 4 codes-only in PubChem • 24 strucs “dug out” but PubChem-ve • 24 codes remain blinded [12]
  • Sleuthing down a JNJ-39393406 structure: from darkness to twilight [13]
  • JNJ-39393406:NCATS documentation PubChem -ve [14]
  • JNJ-39393406: ClinicalTrials.gov [15]
  • JNJ-39393406 in PubMed [16]
  • JNJ-39393406: open Google [17]
  • JNJ-39393406: Google Scholar (was) structure -ve [18]
  • JNJ-39393406 in Google images: finally a mappingBut where did these two vendors get their mapping from ? [19]
  • (Probable) JNJ-39393406 in PubChem:CID 1675566 patent-only sources and near-neighbours [20]
  • (Probable) JNJ-39393406:SureChemOpen patent match with corroborative data PubChem SID 152835708 Cf NCATS data [21]
  • More JNJ-39393406 mystery:InChIKey in Google > ChemSpider > 3rd vendor [22]
  • Not all JNJ-s are blinded: JNJ-40418677 IUPAC in abstract but code still PubChem –veIUPAC name converted at chemicalize.org for PubChem mapping [23]
  • Scaling-up code name retrieval: wild card searches [24]
  • Phases & codes in Clinicaltrials.gov: thin on results• Interventional studies = 115356 , 7895 with results (7%)• Results | Interventional Studies | Phase 1, 2, 3 | Industry = 4477• Interventional Studies | GSK* | Phase 1, 2, 3 | Industry = 1004• Results | Interventional Studies | GSK* | Phase 1, 2, 3 | Industry = 122 (12%)• Interventional Studies | GSK* OR AZD* OR JNJ* OR PF0* | Phase 1, 2, 3 | Industry = 1640• Results | Interventional Studies | GSK* OR AZD* OR JNJ* OR PF0* | Phase 1, 2, 3 | Industry = 185 (11%) [25]
  • altrials.net: public pressure > more results > more repurposing opportunities http://www.youtube.com/watch?v=lQ6YTU5kGXw&fe ature=youtu.be&t=28m39s [26]
  • Stemming code names in MeSh [27]
  • Code names in PubChem Compound (CIDs) CID:SID ratio 275:1039 [28]
  • Codes in PubChem: selected matches [29]
  • “GSK-” in ChEMBL : 61 [30]
  • Tracking PF-04457845 through the system [31]
  • PubMed intersects: finding PF-04457845 [32]
  • PF-04457845: PubMed [33]
  • PF-04457845: Clinicaltrials.org [34]
  • PF-04457845: PubChem CID 24771824 Substance (SID)capture of activity,vendor and patent sources [35]
  • Wikipedia: links to other development compounds But who put them in ? [36]
  • PF-04457845: (almost) a total system success• Declared efficacy failure > possible repurposing candidate• Selection of analogues and a probe [18F]PF-9811 (CID 70679467)• The “system” did well because of good publishing practice (e.g. full text)• Code, structure, target, papers, trials and patents all connected• 5mg for $275But-• Serendipitous finding (no “efficacy failure” or “study stopped” tags)• Lack of clinicaltrials.org <> PubMed• BindingDB using deprecated ChEBI ID• PMID:21505060 not yet in ChEMBL• No direct target or patent nos. in CID record because no DrugBank, SCRIPDB or IBM capture• [18F]PF-9811 PubChem, [(18)F]PF-9811 PubMed, PF-9811-18F Books [37]
  • Looking at code name intersects in different parts of the system [38]
  • Clinicaltrials.org JNJ* Word cloud JNJ-28431754 = Canagliflozin = CID 24812758 [39]
  • Company Pipelines: GSK codes for 2012 [40]
  • GSK codes: PubChem vs. 2012 Pipeline [41]
  • Clinical Trials, PubChem, MeSH: GSK [42]
  • Clinical Trials, PubChem, MeSH: JNJ [43]
  • Clinical, PubChem, MeSH, & 2012 Pipeline:GSK [44]
  • Conclusions• Stalled development candidates, designated by company codes, constitute a large potential repurposing information estate• Historical in vitro , pharmacological & clinical data linked to ~ 30K codes• But only 40-50% have structures assignable from open sources• An even smaller proportion have code names in PubChem• Public name>struc>data capture is ad hoc and needs improving• Repurposing-relevant relationships are not easy to dig out• Some “non competitive intelligence” approaches are shown here• The big push for transparency and open access should improve disclosure, data capture, linkage and repurposing opportunities Happy hunting ! TED Talk: Francis Collins: We need better drugs -- now http://www.ted.com/talks/francis_collins_we_need_better_drugs_now.html [45]